Game-Theoretic Learning:

Size: px
Start display at page:

Download "Game-Theoretic Learning:"

Transcription

1 Game-Theoretic Learning: Regret Minimization vs. Utility Maximization Amy Greenwald with David Gondek, Amir Jafari, and Casey Marks Brown University University of Pennsylvania November 17, 2004

2 Background No-external-regret learning converges to the set of minimax equilibria, in zero-sum games. [e.g., Freund and Schapire 1996] No-internal-regret learning converges to the set of correlated equilibria, in general-sum games. [e.g., Foster and Vohra 1997] 1

3 Foreground 1. Definitions A continuum of no-regret properties, called no-φ-regret. A continuum of game-theoretic equilibria, called Φ-equilibria. 2. Existence Theorem Constructive proof: No-Φ-regret learning algorithms exist, Φ. 3. Convergence Theorem No-Φ-regret learning converges to the set of Φ-equilibria, Φ. 4. Surprising Result No-internal-regret is the strongest form of no-φ-regret learning. Therefore, no no-φ-regret algorithm learns Nash equilibria. 2

4 Outline Game Theory Single Agent Learning Model Multiagent Learning & Game-Theoretic Equilibria 3

5 Game Theory: A Crash Course 1. General-Sum Games Nash Equilibrium Correlated Equilibrium 2. Zero-Sum Games Minimax Equilibrium 4

6 An Example Prisoners Dilemma C D C 4,4 0,5 D 5,0 1,1 C: Cooperate D: Defect 5

7 One-Shot Games A one-shot game is a 3-tuple Γ = (I,(A i, r i ) i I ), where I is a set of players for all players i I, a set of pure actions A i with a i A i a reward function r i : A R, where A = i I A i with a A R R 6

8 One-Shot Games A one-shot game is a 3-tuple Γ = (I,(A i, r i ) i I ), where I is a set of players for all players i I, a set of pure actions A i with a i A i a reward function r i : A R, where A = i I A i with a A The players can employ randomized or mixed actions: for all players i I, a set of mixed actions Q i = {q i R A i j q ij = 1 & q ij 0, j}, with q i Q i an expected reward function r i : Q R, where Q = i I Q i with q Q, s.t. r i (q) = a A q(a)r i(a) 7

9 Nash Equilibrium Notation Write a = (a i, a i ) A for a i A i and a i A i = j i A j. Write q = (q i, q i ) Q for q i Q i and q i Q i = j i Q i. Definition A Nash equilibrium is a mixed action profile q s.t. r i (q ) r i (q i, q i ), for all players i and for all mixed actions q i Q i. Theorem [Nash 51] Every finite strategic form game has a mixed strategy Nash equilibrium. 8

10 Correlated Equilibrium Chicken L R T 6,6 2,7 B 7,2 0,0 CE L R T 1/2 1/4 B 1/4 0 max12π TL + 9π TR + 9π BL + 0π BR subject to π TL + π TR + π BL + π BR = 1 π TL, π TR, π BL, π BR 0 6π L T + 2π R T 7π L T + 0π R T 7π L B + 0π R B 6π L B + 2π R B 6π T L + 2π B L 7π T L + 0π B L 7π T R + 0π B R 6π T R + 2π B R 9

11 Correlated Equilibrium Chicken L R T 6,6 2,7 B 7,2 0,0 CE L R T 1/2 1/4 B 1/4 0 max12π TL + 9π TR + 9π BL + 0π BR subject to π TL + π TR + π BL + π BR = 1 π TL, π TR, π BL, π BR 0 6π TL + 2π TR 7π TL + 0π TR 7π BL + 0π BR 6π BL + 2π BR 6π TL + 2π BL 7π TL + 0π BL 7π TR + 0π BR 6π TR + 2π BR 10

12 Correlated Equilibrium Definition A mixed action profile q Q is a correlated equilibrium iff for all pure actions j, k A i, a i A i q(j, a i ) (r i (j, a i ) r i (k, a i )) 0 (1) Observe Every Nash equilibrium is a correlated equilibrium Every finite normal form game has a correlated equilibrium. 11

13 Zero-Sum Games Matching Pennies H T H 1,1 1, 1 T 1, 1 1,1 Rock-Paper-Scissors R P S R 0,0 1,1 1, 1 P 1, 1 0,0 1,1 S 1,1 1, 1 0,0 i I r i(a) = 0, for all a A i I r i(a) = c, for all a A, for some c R 12

14 Minimax Equilibrium Example L R T 1 2 B 4 3 Definition A mixed action profile (q1, q 2 ) Q is a minimax equilibrium in a two-player, zero-sum game iff r 1 (q 1, q 2 ) r 1(j, q 2 ), j A 1 l 2 (q 1, q 2 ) l 2(q 1, k), k A 2 13

15 Single Agent Learning Model set of actions N = {1,..., n} for all times t, mixed action vector q t Q = {q R n i q i = 1 & q i 0, i} pure action vector a t = e i for some pure action i reward vector r t = (r 1,..., r n ) [0,1] n A learning algorithm A is a sequence of functions q t : History t 1 Q, where a History is a sequence of action-reward pairs (a 1, r 1 ),(a 2, r 2 ),

16 Transformations Mixed Transformations Φ LINEAR = {φ : Q Q} = the set of all linear transformations = the set of all row stochastic matrices Φ SWAP = {φ : Q Q φ deterministic} Φ LINEAR Pure Transformations F SWAP = {F : N N} = the set of all pure transformations 15

17 Isomorphism The operation of elements of F SWAP on N = the operation of elements of Φ SWAP on Q φ ij = δ F(i)=j (2) k e k φ = e F(k) (3) Example If n = 4 and F = {1 2,2 3,3 4,4 1}, then φ = Thus, q 1, q 2, q 3, q 4 φ = q 4, q 1, q 2, q 3, for all q 1, q 2, q 3, q 4 Q. 16

18 External Regret Matrices F EXT = {F j F SWAP j N}, where F j (k) = j Φ EXT = {φ j Φ SWAP j N}, where e k φ j = e j Example If n = 4, then φ 2 = Thus, q 1, q 2, q 3, q 4 φ = 0,1,0,0, for all q 1, q 2, q 3, q 4 Q. 17

19 Internal Regret Matrices { F INT = {F ij F SWAP ij N}, where F ij j if k = i (k) = k otherwise { Φ INT = {φ ij Φ SWAP ij N}, where e k φ ij ej if k = i = otherwise e k Example If n = 4, then φ 23 = Thus, q 1, q 2, q 3, q 4 φ = q 1,0, q 2 +q 3, q 4, for all q 1, q 2, q 3, q 4 Q. 18

20 Regret Vector ρ R Φ Observed Regret Vector ρ φ (r, a) = r aφ r a Expected Regret Vector ˆρ φ (r, q) = E[ρ φ (r, a) a q] = ρ φ (r, E[a a q]) = r qφ r q No Observed Φ-Regret limsup t 1 t t τ=1 ρ φ(r τ, a τ ) 0, for all φ Φ, a.s. No Expected Φ-Regret limsup t 1 t t τ=1 ˆρ φ(r τ, q τ ) 0, for all φ Φ 19

21 Approachability U V is said to be approachable iff there exists learning algorithm A = q 1, q 2,... s.t. for any sequence of rewards r 1, r 2,..., lim d(u, t ρt ) = lim inf d(u, t u U ρt ) = 0 a.s., where ρ t denotes the average value of ρ through time t. A Φ-no-regret learning algorithm is one whose observed regret approaches the negative orthant R Φ. 20

22 Blackwell s Theorem The negative orthant R Φ is approachable iff there exists a learning algorithm A = q 1, q 2,... s.t. for any sequence of rewards r 1, r 2,..., ρ(r t+1, q t+1 ) ( ρ t ) + 0 (4) for all times t, where x + = max{x,0}. Moreover, this procedure can be used to approach the negative orthant R Φ : if ρ t R Φ, play arbitrarily; if ρ t V \ R Φ, play according to A. 21

23 Regret Matching Algorithm Given Φ Given Y R Φ + If φ Φ Y φ = 0, play arbitrarily If φ Φ Y φ > 0, define stochastic matrix A A(Φ, Y ) = φ Φ φy φ φ Φ Y φ (5) play mixed strategy q = qa Regret Matching Theorem Regret matching satisfies the generalized Blackwell condition: ρ(r, q) Y = 0 22

24 Proof ρ(r, q) Y = = = φ Φ φ Φ φ Φ = r = ( ρ φ (r, q)y φ (6) (r qφ r q)y φ (7) r (qφy φ qy φ ) (8) q ( φ Φ ( φ Φ Y φ ) r φy φ q ( q φ Φ Y φ ) φ Φ φy φ φ Φ Y φ q ) (9) (10) = φ Φ ( Y φ ) r (qa q) (11) = φ Φ Y φ ) r (q q) (12) = 0 (13) 23

25 Generic Regret Matching Algorithm (Φ, g) for t = 1,..., T 1. play mixed strategy q t 2. realize pure action i 3. observe rewards r t 4. for all φ Φ compute instantaneous regret observed ρ t φ ρ φ(r t, e i ) = r t e i φ r t e i expected ρ t φ ρ φ(r t, q t ) = r t q t φ r t q t update cumulative regret vector X t φ = Xt 1 φ 5. compute Y = g(x t ) 6. compute A = φ Φ φy φ φ Φ Y φ 7. solve for the fixed point q t+1 = q t+1 A + ρ t φ 24

26 Special Cases of Regret Matching Foster and Vohra 97 (Φ INT ) Hart and Mas-Colell 00 (Φ EXT ) Choose G(X) = 1 2 k (X+ k )2 so that g k (X) = X + k Freund and Schapire 95 (Φ EXT ) Cesa-Bianchi and Lugosi 03 (Φ INT ) Choose G(X) = 1 η ln( k eηx k) so that gk (X) = e ηx k / k eηx k 25

27 Multiagent Model a set of players I (i I) for all players i, a set of pure actions A i with a i A i a set of mixed actions Q i with q i Q i a reward function r i : A [0,1], where A = i A i with a A an expected reward function r i : Q [0,1], where Q = i Q i with q Q s.t. r i (q) = a A q(a)r i(a) a set Φ i (φ i Φ i ) 26

28 Φ-Equilibrium An mixed action profile q Q is a Φ-equilibrium iff r i (φ i (q)) r i (q), for all players i and for all φ i Φ i. Examples Correlated Equilibrium: Φ i = Φ INT, for all players i Generalized Minimax Equilibrium: Φ i = Φ EXT, for all players i 27

29 Convergence Theorem If all players i play via some no-φ i -regret algorithm, then the joint empirical distribution of play converges to the set of Φ-equilibria, almost surely. Proof For all players i, for all φ i Φ i, almost surely. lim sup r i(φ i (z t )) r i (z t ) (14) t = limsup t 1 t t τ=1 r i (φ i (a τ i ), aτ i ) 1 t t τ=1 r i (a τ i, aτ i ) (15) 0 (16) 28

30 Zero-Sum Games Matching Pennies H T H 1,1 1, 1 T 1, 1 1,1 Rock-Paper-Scissors R P S R 0,0 1,1 1, 1 P 1, 1 0,0 1,1 S 1,1 1, 1 0,0 29

31 Matching Pennies Weight Weights NER exponential [η = ]: Matching Pennies Player 1: H Player 1: T 0.1 Player 2: H Player 2: T Time Empirical Frequency Frequencies NER exponential [η = ]: Matching Pennies 1 Player 1: H Player 1: T Player 2: H Player 2: T Time 30

32 Rock-Paper-Scissors Weights Frequencies 1 NER polynomial [p = 2]: Rock, Paper, Scissors 1 NER polynomial [p = 2]: Rock, Paper, Scissors Weight Player 1: R Player 1: P 0.2 Player 1: S Player 2: R 0.1 Player 2: P Player 2: S Time Empirical Frequency Player 1: R Player 1: P Player 1: S Player 2: R Player 2: P Player 2: S Time 31

33 General-Sum Games Shapley Game L C R T 0,0 1,0 0,1 M 0,1 0,0 1,0 B 1,0 0,1 0,0 Correlated Equilibrium L C R T 0 1/6 1/6 M 1/6 0 1/6 B 1/6 1/6 0 32

34 Shapley Game: No Internal Regret Learning Frequencies Empirical Frequency NIR polynomial [p = 2]: Shapley Game Player 1: T Player 1: M Player 1: B Player 2: L Player 2: C Player 2: R Empirical Frequency NIR exponential [η = ]: Shapley Game Player 1: T Player 1: M Player 1: B Player 2: L Player 2: C Player 2: R Time Time 33

35 Shapley Game: No Internal Regret Learning Joint Frequencies 1 NIR polynomial [p = 2]: Shapley Game 1 NIR exponential [η = ]: Shapley Game Empirical Frequency (T,L) (M,L) (B,L) (T,C) (M,C) (B,C) (T,R) (M,R) (B,R) Empirical Frequency (T,L) (M,L) (B,L) (T,C) (M,C) (B,C) (T,R) (M,R) (B,R) Time Time 34

36 Shapley Game: No External Regret Learning Frequencies Empirical Frequency NER polynomial [p = 2]: Shapley Game Player 1: T Player 1: M Player 1: B Player 2: L Player 2: C Player 2: R Empirical Frequency NER exponential [η = ]: Shapley Game Player 1: T Player 1: M Player 1: B Player 2: L Player 2: C Player 2: R Time Time 35

37 Summary No-external- and no-internal-regret can be defined along one continuum, no-φ-regret. No-Φ-regret learning algorithms exist, Φ. No-Φ-regret learning converges to the set of Φ-equilibria, Φ. No-internal-regret learning is the strongest form of no-φ-regret learning. Therefore, Nash equilibrium cannot be learned via no-φ-regret learning. 36

38 A little rationality goes a long way [Hart 03] Regret Minimization vs. Utility Maximization RM is easy to implement. RM justifies randomness in actions. Can RM be used to explain human behavior? 37

Games A game is a tuple = (I; (S i ;r i ) i2i) where ffl I is a set of players (i 2 I) ffl S i is a set of (pure) strategies (s i 2 S i ) Q ffl r i :

Games A game is a tuple = (I; (S i ;r i ) i2i) where ffl I is a set of players (i 2 I) ffl S i is a set of (pure) strategies (s i 2 S i ) Q ffl r i : On the Connection between No-Regret Learning, Fictitious Play, & Nash Equilibrium Amy Greenwald Brown University Gunes Ercal, David Gondek, Amir Jafari July, Games A game is a tuple = (I; (S i ;r i ) i2i)

More information

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks

Blackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01

More information

C 4,4 0,5 0,0 0,0 B 0,0 0,0 1,1 1 2 T 3,3 0,0. Freund and Schapire: Prisoners Dilemma

C 4,4 0,5 0,0 0,0 B 0,0 0,0 1,1 1 2 T 3,3 0,0. Freund and Schapire: Prisoners Dilemma On No-Regret Learning, Fictitious Play, and Nash Equilibrium Amir Jafari Department of Mathematics, Brown University, Providence, RI 292 Amy Greenwald David Gondek Department of Computer Science, Brown

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal

More information

Belief-based Learning

Belief-based Learning Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium

More information

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007 MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing

More information

Convergence and No-Regret in Multiagent Learning

Convergence and No-Regret in Multiagent Learning Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent

More information

Prediction and Playing Games

Prediction and Playing Games Prediction and Playing Games Vineel Pratap vineel@eng.ucsd.edu February 20, 204 Chapter 7 : Prediction, Learning and Games - Cesa Binachi & Lugosi K-Person Normal Form Games Each player k (k =,..., K)

More information

Multiagent Value Iteration in Markov Games

Multiagent Value Iteration in Markov Games Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart August 2016 Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart Center for the Study of Rationality

More information

Correlated Equilibria: Rationality and Dynamics

Correlated Equilibria: Rationality and Dynamics Correlated Equilibria: Rationality and Dynamics Sergiu Hart June 2010 AUMANN 80 SERGIU HART c 2010 p. 1 CORRELATED EQUILIBRIA: RATIONALITY AND DYNAMICS Sergiu Hart Center for the Study of Rationality Dept

More information

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti Game Theory, Evolutionary Dynamics, and Multi-Agent Learning Prof. Nicola Gatti (nicola.gatti@polimi.it) Game theory Game theory: basics Normal form Players Actions Outcomes Utilities Strategies Solutions

More information

Zero-Sum Games Public Strategies Minimax Theorem and Nash Equilibria Appendix. Zero-Sum Games. Algorithmic Game Theory.

Zero-Sum Games Public Strategies Minimax Theorem and Nash Equilibria Appendix. Zero-Sum Games. Algorithmic Game Theory. Public Strategies Minimax Theorem and Nash Equilibria Appendix 2013 Public Strategies Minimax Theorem and Nash Equilibria Appendix Definition Definition A zero-sum game is a strategic game, in which for

More information

Computing Minmax; Dominance

Computing Minmax; Dominance Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination

More information

No-Regret Learning in Convex Games

No-Regret Learning in Convex Games Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213 Amy Greenwald Casey Marks Department of Computer Science, Brown University, Providence, RI 02912 Abstract

More information

Walras-Bowley Lecture 2003

Walras-Bowley Lecture 2003 Walras-Bowley Lecture 2003 Sergiu Hart This version: September 2004 SERGIU HART c 2004 p. 1 ADAPTIVE HEURISTICS A Little Rationality Goes a Long Way Sergiu Hart Center for Rationality, Dept. of Economics,

More information

Learning correlated equilibria in games with compact sets of strategies

Learning correlated equilibria in games with compact sets of strategies Learning correlated equilibria in games with compact sets of strategies Gilles Stoltz a,, Gábor Lugosi b a cnrs and Département de Mathématiques et Applications, Ecole Normale Supérieure, 45 rue d Ulm,

More information

Calibration and Nash Equilibrium

Calibration and Nash Equilibrium Calibration and Nash Equilibrium Dean Foster University of Pennsylvania and Sham Kakade TTI September 22, 2005 What is a Nash equilibrium? Cartoon definition of NE: Leroy Lockhorn: I m drinking because

More information

Improving Convergence Rates in Multiagent Learning Through Experts and Adaptive Consultation

Improving Convergence Rates in Multiagent Learning Through Experts and Adaptive Consultation Improving Convergence Rates in Multiagent Learning Through Experts and Adaptive Consultation Greg Hines Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, N2L 3G1 Kate Larson

More information

Deterministic Calibration and Nash Equilibrium

Deterministic Calibration and Nash Equilibrium University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2-2008 Deterministic Calibration and Nash Equilibrium Sham M. Kakade Dean P. Foster University of Pennsylvania Follow

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Bimatrix Games We are given two real m n matrices A = (a ij ), B = (b ij

More information

Multiagent Learning Using a Variable Learning Rate

Multiagent Learning Using a Variable Learning Rate Multiagent Learning Using a Variable Learning Rate Michael Bowling, Manuela Veloso Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213-3890 Abstract Learning to act in a multiagent

More information

The Folk Theorem for Finitely Repeated Games with Mixed Strategies

The Folk Theorem for Finitely Repeated Games with Mixed Strategies The Folk Theorem for Finitely Repeated Games with Mixed Strategies Olivier Gossner February 1994 Revised Version Abstract This paper proves a Folk Theorem for finitely repeated games with mixed strategies.

More information

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),

More information

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* Computational Game Theory Spring Semester, 2005/6 Lecture 5: 2-Player Zero Sum Games Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* 1 5.1 2-Player Zero Sum Games In this lecture

More information

arxiv: v1 [cs.gt] 1 Sep 2015

arxiv: v1 [cs.gt] 1 Sep 2015 HC selection for MCTS in Simultaneous Move Games Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games arxiv:1509.00149v1 [cs.gt] 1 Sep 2015 Vojtěch Kovařík vojta.kovarik@gmail.com

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

4. Opponent Forecasting in Repeated Games

4. Opponent Forecasting in Repeated Games 4. Opponent Forecasting in Repeated Games Julian and Mohamed / 2 Learning in Games Literature examines limiting behavior of interacting players. One approach is to have players compute forecasts for opponents

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm

More information

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo! Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training

More information

DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R I LAGGING ANCHOR ALGORITHM. Xiaosong Lu and Howard M. Schwartz

DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R I LAGGING ANCHOR ALGORITHM. Xiaosong Lu and Howard M. Schwartz International Journal of Innovative Computing, Information and Control ICIC International c 20 ISSN 349-498 Volume 7, Number, January 20 pp. 0 DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R

More information

Ergodicity and Non-Ergodicity in Economics

Ergodicity and Non-Ergodicity in Economics Abstract An stochastic system is called ergodic if it tends in probability to a limiting form that is independent of the initial conditions. Breakdown of ergodicity gives rise to path dependence. We illustrate

More information

Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing

Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Jacob W. Crandall and Michael A. Goodrich Computer Science Department Brigham Young University Provo, UT 84602

More information

Normal-form games. Vincent Conitzer

Normal-form games. Vincent Conitzer Normal-form games Vincent Conitzer conitzer@cs.duke.edu 2/3 of the average game Everyone writes down a number between 0 and 100 Person closest to 2/3 of the average wins Example: A says 50 B says 10 C

More information

Game Theory and its Applications to Networks - Part I: Strict Competition

Game Theory and its Applications to Networks - Part I: Strict Competition Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis

More information

Learning to Coordinate Efficiently: A Model-based Approach

Learning to Coordinate Efficiently: A Model-based Approach Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion

More information

Mechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham

Mechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham Game Theory Course: Jackson, Leyton-Brown & Shoham Bayesian Game Setting Extend the social choice setting to a new setting where agents can t be relied upon to disclose their preferences honestly Start

More information

Cyclic Equilibria in Markov Games

Cyclic Equilibria in Markov Games Cyclic Equilibria in Markov Games Martin Zinkevich and Amy Greenwald Department of Computer Science Brown University Providence, RI 02912 {maz,amy}@cs.brown.edu Michael L. Littman Department of Computer

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Internal Regret in On-line Portfolio Selection

Internal Regret in On-line Portfolio Selection Internal Regret in On-line Portfolio Selection Gilles Stoltz gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi lugosi@upf.es)

More information

Internal Regret in On-line Portfolio Selection

Internal Regret in On-line Portfolio Selection Internal Regret in On-line Portfolio Selection Gilles Stoltz (gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi (lugosi@upf.es)

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:

More information

A (Brief) Introduction to Game Theory

A (Brief) Introduction to Game Theory A (Brief) Introduction to Game Theory Johanne Cohen PRiSM/CNRS, Versailles, France. Goal Goal is a Nash equilibrium. Today The game of Chicken Definitions Nash Equilibrium Rock-paper-scissors Game Mixed

More information

Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games

Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games Jacob W. Crandall Michael A. Goodrich Computer Science Department, Brigham Young University, Provo, UT 84602 USA crandall@cs.byu.edu

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Optimal Convergence in Multi-Agent MDPs

Optimal Convergence in Multi-Agent MDPs Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,

More information

No-regret algorithms for structured prediction problems

No-regret algorithms for structured prediction problems No-regret algorithms for structured prediction problems Geoffrey J. Gordon December 21, 2005 CMU-CALD-05-112 School of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 Abstract No-regret

More information

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Nancy Fulda and Dan Ventura Department of Computer Science Brigham Young University

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Multiagent Learning. Foundations and Recent Trends. Stefano Albrecht and Peter Stone

Multiagent Learning. Foundations and Recent Trends. Stefano Albrecht and Peter Stone Multiagent Learning Foundations and Recent Trends Stefano Albrecht and Peter Stone Tutorial at IJCAI 2017 conference: http://www.cs.utexas.edu/~larg/ijcai17_tutorial Overview Introduction Multiagent Models

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Prisoner s Dilemma. Veronica Ciocanel. February 25, 2013

Prisoner s Dilemma. Veronica Ciocanel. February 25, 2013 n-person February 25, 2013 n-person Table of contents 1 Equations 5.4, 5.6 2 3 Types of dilemmas 4 n-person n-person GRIM, GRIM, ALLD Useful to think of equations 5.4 and 5.6 in terms of cooperation and

More information

Basic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria

Basic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria Basic Game Theory University of Waterloo January 7, 2013 Outline 1 2 3 What is game theory? The study of games! Bluffing in poker What move to make in chess How to play Rock-Scissors-Paper Also study of

More information

A Second-order Bound with Excess Losses

A Second-order Bound with Excess Losses A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands

More information

Game Theory: Lecture 3

Game Theory: Lecture 3 Game Theory: Lecture 3 Lecturer: Pingzhong Tang Topic: Mixed strategy Scribe: Yuan Deng March 16, 2015 Definition 1 (Mixed strategy). A mixed strategy S i : A i [0, 1] assigns a probability S i ( ) 0 to

More information

Notes on Coursera s Game Theory

Notes on Coursera s Game Theory Notes on Coursera s Game Theory Manoel Horta Ribeiro Week 01: Introduction and Overview Game theory is about self interested agents interacting within a specific set of rules. Self-Interested Agents have

More information

Chapter 9. Mixed Extensions. 9.1 Mixed strategies

Chapter 9. Mixed Extensions. 9.1 Mixed strategies Chapter 9 Mixed Extensions We now study a special case of infinite strategic games that are obtained in a canonic way from the finite games, by allowing mixed strategies. Below [0, 1] stands for the real

More information

The Distribution of Optimal Strategies in Symmetric Zero-sum Games

The Distribution of Optimal Strategies in Symmetric Zero-sum Games The Distribution of Optimal Strategies in Symmetric Zero-sum Games Florian Brandl Technische Universität München brandlfl@in.tum.de Given a skew-symmetric matrix, the corresponding two-player symmetric

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Learning to play partially-specified equilibrium

Learning to play partially-specified equilibrium Learning to play partially-specified equilibrium Ehud Lehrer and Eilon Solan August 29, 2007 August 29, 2007 First draft: June 2007 Abstract: In a partially-specified correlated equilibrium (PSCE ) the

More information

Game Theory. Professor Peter Cramton Economics 300

Game Theory. Professor Peter Cramton Economics 300 Game Theory Professor Peter Cramton Economics 300 Definition Game theory is the study of mathematical models of conflict and cooperation between intelligent and rational decision makers. Rational: each

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

Brown s Original Fictitious Play

Brown s Original Fictitious Play manuscript No. Brown s Original Fictitious Play Ulrich Berger Vienna University of Economics, Department VW5 Augasse 2-6, A-1090 Vienna, Austria e-mail: ulrich.berger@wu-wien.ac.at March 2005 Abstract

More information

Bounded Rationality, Strategy Simplification, and Equilibrium

Bounded Rationality, Strategy Simplification, and Equilibrium Bounded Rationality, Strategy Simplification, and Equilibrium UPV/EHU & Ikerbasque Donostia, Spain BCAM Workshop on Interactions, September 2014 Bounded Rationality Frequently raised criticism of game

More information

Computing Minmax; Dominance

Computing Minmax; Dominance Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

A Folk Theorem For Stochastic Games With Finite Horizon

A Folk Theorem For Stochastic Games With Finite Horizon A Folk Theorem For Stochastic Games With Finite Horizon Chantal Marlats January 2010 Chantal Marlats () A Folk Theorem For Stochastic Games With Finite Horizon January 2010 1 / 14 Introduction: A story

More information

Game Theory. 2.1 Zero Sum Games (Part 2) George Mason University, Spring 2018

Game Theory. 2.1 Zero Sum Games (Part 2) George Mason University, Spring 2018 Game Theory 2.1 Zero Sum Games (Part 2) George Mason University, Spring 2018 The Nash equilibria of two-player, zero-sum games have various nice properties. Minimax Condition A pair of strategies is in

More information

Noncooperative continuous-time Markov games

Noncooperative continuous-time Markov games Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and

More information

A General Class of Adaptive Strategies 1

A General Class of Adaptive Strategies 1 Journal of Economic Theory 98, 2654 (2001) doi:10.1006jeth.2000.2746, available online at http:www.idealibrary.com on A General Class of Adaptive Strategies 1 Sergiu Hart 2 Center for Rationality and Interactive

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Lecture 6: April 25, 2006

Lecture 6: April 25, 2006 Computational Game Theory Spring Semester, 2005/06 Lecture 6: April 25, 2006 Lecturer: Yishay Mansour Scribe: Lior Gavish, Andrey Stolyarenko, Asaph Arnon Partially based on scribe by Nataly Sharkov and

More information

CSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1

CSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1 CSC304 Lecture 5 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Recap Last lecture Cost-sharing games o Price of anarchy (PoA) can be n o Price of stability (PoS) is O(log n)

More information

Achieving Pareto Optimality Through Distributed Learning

Achieving Pareto Optimality Through Distributed Learning 1 Achieving Pareto Optimality Through Distributed Learning Jason R. Marden, H. Peyton Young, and Lucy Y. Pao Abstract We propose a simple payoff-based learning rule that is completely decentralized, and

More information

BELIEFS & EVOLUTIONARY GAME THEORY

BELIEFS & EVOLUTIONARY GAME THEORY 1 / 32 BELIEFS & EVOLUTIONARY GAME THEORY Heinrich H. Nax hnax@ethz.ch & Bary S. R. Pradelski bpradelski@ethz.ch May 15, 217: Lecture 1 2 / 32 Plan Normal form games Equilibrium invariance Equilibrium

More information

Achieving Pareto Optimality Through Distributed Learning

Achieving Pareto Optimality Through Distributed Learning 1 Achieving Pareto Optimality Through Distributed Learning Jason R. Marden, H. Peyton Young, and Lucy Y. Pao Abstract We propose a simple payoff-based learning rule that is completely decentralized, and

More information

Convergence to Pareto Optimality in General Sum Games via Learning Opponent s Preference

Convergence to Pareto Optimality in General Sum Games via Learning Opponent s Preference Convergence to Pareto Optimality in General Sum Games via Learning Opponent s Preference Dipyaman Banerjee Department of Math & CS University of Tulsa Tulsa, OK, USA dipyaman@gmail.com Sandip Sen Department

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result

More information

Correlated Q-Learning

Correlated Q-Learning Journal of Machine Learning Research 1 (27) 1 1 Submitted /; Published / Correlated Q-Learning Amy Greenwald Department of Computer Science Brown University Providence, RI 2912 Keith Hall Department of

More information

Game Theory Lecture 7 Refinements: Perfect equilibrium

Game Theory Lecture 7 Refinements: Perfect equilibrium Game Theory Lecture 7 Refinements: Perfect equilibrium Christoph Schottmüller University of Copenhagen October 23, 2014 1 / 18 Outline 1 Perfect equilibrium in strategic form games 2 / 18 Refinements I

More information

Multiagent Systems Motivation. Multiagent Systems Terminology Basics Shapley value Representation. 10.

Multiagent Systems Motivation. Multiagent Systems Terminology Basics Shapley value Representation. 10. Multiagent Systems July 2, 2014 10. Coalition Formation Multiagent Systems 10. Coalition Formation B. Nebel, C. Becker-Asano, S. Wöl Albert-Ludwigs-Universität Freiburg July 2, 2014 10.1 Motivation 10.2

More information

Writing Game Theory in L A TEX

Writing Game Theory in L A TEX Writing Game Theory in L A TEX Thiago Silva First Version: November 22, 2015 This Version: November 13, 2017 List of Figures and Tables 1 2x2 Matrix: Prisoner s ilemma Normal-Form Game............. 3 2

More information

Distributed Learning based on Entropy-Driven Game Dynamics

Distributed Learning based on Entropy-Driven Game Dynamics Distributed Learning based on Entropy-Driven Game Dynamics Bruno Gaujal joint work with Pierre Coucheney and Panayotis Mertikopoulos Inria Aug., 2014 Model Shared resource systems (network, processors)

More information

Evolution & Learning in Games

Evolution & Learning in Games 1 / 27 Evolution & Learning in Games Econ 243B Jean-Paul Carvalho Lecture 2. Foundations of Evolution & Learning in Games II 2 / 27 Outline In this lecture, we shall: Take a first look at local stability.

More information

REPEATED GAMES. Jörgen Weibull. April 13, 2010

REPEATED GAMES. Jörgen Weibull. April 13, 2010 REPEATED GAMES Jörgen Weibull April 13, 2010 Q1: Can repetition induce cooperation? Peace and war Oligopolistic collusion Cooperation in the tragedy of the commons Q2: Can a game be repeated? Game protocols

More information

Multi-Agent Learning with Policy Prediction

Multi-Agent Learning with Policy Prediction Multi-Agent Learning with Policy Prediction Chongjie Zhang Computer Science Department University of Massachusetts Amherst, MA 3 USA chongjie@cs.umass.edu Victor Lesser Computer Science Department University

More information

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,

More information

CS 798: Multiagent Systems

CS 798: Multiagent Systems CS 798: Multiagent Systems and Utility Kate Larson Cheriton School of Computer Science University of Waterloo January 6, 2010 Outline 1 Self-Interested Agents 2 3 4 5 Self-Interested Agents We are interested

More information

Microeconomics. 2. Game Theory

Microeconomics. 2. Game Theory Microeconomics 2. Game Theory Alex Gershkov http://www.econ2.uni-bonn.de/gershkov/gershkov.htm 18. November 2008 1 / 36 Dynamic games Time permitting we will cover 2.a Describing a game in extensive form

More information

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm * Evolutionary Dynamics and Extensive Form Games by Ross Cressman Reviewed by William H. Sandholm * Noncooperative game theory is one of a handful of fundamental frameworks used for economic modeling. It

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft

More information

Performance and Convergence of Multi-user Online Learning

Performance and Convergence of Multi-user Online Learning Performance and Convergence of Multi-user Online Learning Cem Tekin, Mingyan Liu Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, Michigan, 4809-222 Email: {cmtkn,

More information

Mixed Nash Equilibria

Mixed Nash Equilibria lgorithmic Game Theory, Summer 2017 Mixed Nash Equilibria Lecture 2 (5 pages) Instructor: Thomas Kesselheim In this lecture, we introduce the general framework of games. Congestion games, as introduced

More information

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo Game Theory Giorgio Fagiolo giorgio.fagiolo@univr.it https://mail.sssup.it/ fagiolo/welcome.html Academic Year 2005-2006 University of Verona Summary 1. Why Game Theory? 2. Cooperative vs. Noncooperative

More information

Learning Equilibrium as a Generalization of Learning to Optimize

Learning Equilibrium as a Generalization of Learning to Optimize Learning Equilibrium as a Generalization of Learning to Optimize Dov Monderer and Moshe Tennenholtz Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa 32000,

More information