Game-Theoretic Learning:
|
|
- Mervin Knight
- 5 years ago
- Views:
Transcription
1 Game-Theoretic Learning: Regret Minimization vs. Utility Maximization Amy Greenwald with David Gondek, Amir Jafari, and Casey Marks Brown University University of Pennsylvania November 17, 2004
2 Background No-external-regret learning converges to the set of minimax equilibria, in zero-sum games. [e.g., Freund and Schapire 1996] No-internal-regret learning converges to the set of correlated equilibria, in general-sum games. [e.g., Foster and Vohra 1997] 1
3 Foreground 1. Definitions A continuum of no-regret properties, called no-φ-regret. A continuum of game-theoretic equilibria, called Φ-equilibria. 2. Existence Theorem Constructive proof: No-Φ-regret learning algorithms exist, Φ. 3. Convergence Theorem No-Φ-regret learning converges to the set of Φ-equilibria, Φ. 4. Surprising Result No-internal-regret is the strongest form of no-φ-regret learning. Therefore, no no-φ-regret algorithm learns Nash equilibria. 2
4 Outline Game Theory Single Agent Learning Model Multiagent Learning & Game-Theoretic Equilibria 3
5 Game Theory: A Crash Course 1. General-Sum Games Nash Equilibrium Correlated Equilibrium 2. Zero-Sum Games Minimax Equilibrium 4
6 An Example Prisoners Dilemma C D C 4,4 0,5 D 5,0 1,1 C: Cooperate D: Defect 5
7 One-Shot Games A one-shot game is a 3-tuple Γ = (I,(A i, r i ) i I ), where I is a set of players for all players i I, a set of pure actions A i with a i A i a reward function r i : A R, where A = i I A i with a A R R 6
8 One-Shot Games A one-shot game is a 3-tuple Γ = (I,(A i, r i ) i I ), where I is a set of players for all players i I, a set of pure actions A i with a i A i a reward function r i : A R, where A = i I A i with a A The players can employ randomized or mixed actions: for all players i I, a set of mixed actions Q i = {q i R A i j q ij = 1 & q ij 0, j}, with q i Q i an expected reward function r i : Q R, where Q = i I Q i with q Q, s.t. r i (q) = a A q(a)r i(a) 7
9 Nash Equilibrium Notation Write a = (a i, a i ) A for a i A i and a i A i = j i A j. Write q = (q i, q i ) Q for q i Q i and q i Q i = j i Q i. Definition A Nash equilibrium is a mixed action profile q s.t. r i (q ) r i (q i, q i ), for all players i and for all mixed actions q i Q i. Theorem [Nash 51] Every finite strategic form game has a mixed strategy Nash equilibrium. 8
10 Correlated Equilibrium Chicken L R T 6,6 2,7 B 7,2 0,0 CE L R T 1/2 1/4 B 1/4 0 max12π TL + 9π TR + 9π BL + 0π BR subject to π TL + π TR + π BL + π BR = 1 π TL, π TR, π BL, π BR 0 6π L T + 2π R T 7π L T + 0π R T 7π L B + 0π R B 6π L B + 2π R B 6π T L + 2π B L 7π T L + 0π B L 7π T R + 0π B R 6π T R + 2π B R 9
11 Correlated Equilibrium Chicken L R T 6,6 2,7 B 7,2 0,0 CE L R T 1/2 1/4 B 1/4 0 max12π TL + 9π TR + 9π BL + 0π BR subject to π TL + π TR + π BL + π BR = 1 π TL, π TR, π BL, π BR 0 6π TL + 2π TR 7π TL + 0π TR 7π BL + 0π BR 6π BL + 2π BR 6π TL + 2π BL 7π TL + 0π BL 7π TR + 0π BR 6π TR + 2π BR 10
12 Correlated Equilibrium Definition A mixed action profile q Q is a correlated equilibrium iff for all pure actions j, k A i, a i A i q(j, a i ) (r i (j, a i ) r i (k, a i )) 0 (1) Observe Every Nash equilibrium is a correlated equilibrium Every finite normal form game has a correlated equilibrium. 11
13 Zero-Sum Games Matching Pennies H T H 1,1 1, 1 T 1, 1 1,1 Rock-Paper-Scissors R P S R 0,0 1,1 1, 1 P 1, 1 0,0 1,1 S 1,1 1, 1 0,0 i I r i(a) = 0, for all a A i I r i(a) = c, for all a A, for some c R 12
14 Minimax Equilibrium Example L R T 1 2 B 4 3 Definition A mixed action profile (q1, q 2 ) Q is a minimax equilibrium in a two-player, zero-sum game iff r 1 (q 1, q 2 ) r 1(j, q 2 ), j A 1 l 2 (q 1, q 2 ) l 2(q 1, k), k A 2 13
15 Single Agent Learning Model set of actions N = {1,..., n} for all times t, mixed action vector q t Q = {q R n i q i = 1 & q i 0, i} pure action vector a t = e i for some pure action i reward vector r t = (r 1,..., r n ) [0,1] n A learning algorithm A is a sequence of functions q t : History t 1 Q, where a History is a sequence of action-reward pairs (a 1, r 1 ),(a 2, r 2 ),
16 Transformations Mixed Transformations Φ LINEAR = {φ : Q Q} = the set of all linear transformations = the set of all row stochastic matrices Φ SWAP = {φ : Q Q φ deterministic} Φ LINEAR Pure Transformations F SWAP = {F : N N} = the set of all pure transformations 15
17 Isomorphism The operation of elements of F SWAP on N = the operation of elements of Φ SWAP on Q φ ij = δ F(i)=j (2) k e k φ = e F(k) (3) Example If n = 4 and F = {1 2,2 3,3 4,4 1}, then φ = Thus, q 1, q 2, q 3, q 4 φ = q 4, q 1, q 2, q 3, for all q 1, q 2, q 3, q 4 Q. 16
18 External Regret Matrices F EXT = {F j F SWAP j N}, where F j (k) = j Φ EXT = {φ j Φ SWAP j N}, where e k φ j = e j Example If n = 4, then φ 2 = Thus, q 1, q 2, q 3, q 4 φ = 0,1,0,0, for all q 1, q 2, q 3, q 4 Q. 17
19 Internal Regret Matrices { F INT = {F ij F SWAP ij N}, where F ij j if k = i (k) = k otherwise { Φ INT = {φ ij Φ SWAP ij N}, where e k φ ij ej if k = i = otherwise e k Example If n = 4, then φ 23 = Thus, q 1, q 2, q 3, q 4 φ = q 1,0, q 2 +q 3, q 4, for all q 1, q 2, q 3, q 4 Q. 18
20 Regret Vector ρ R Φ Observed Regret Vector ρ φ (r, a) = r aφ r a Expected Regret Vector ˆρ φ (r, q) = E[ρ φ (r, a) a q] = ρ φ (r, E[a a q]) = r qφ r q No Observed Φ-Regret limsup t 1 t t τ=1 ρ φ(r τ, a τ ) 0, for all φ Φ, a.s. No Expected Φ-Regret limsup t 1 t t τ=1 ˆρ φ(r τ, q τ ) 0, for all φ Φ 19
21 Approachability U V is said to be approachable iff there exists learning algorithm A = q 1, q 2,... s.t. for any sequence of rewards r 1, r 2,..., lim d(u, t ρt ) = lim inf d(u, t u U ρt ) = 0 a.s., where ρ t denotes the average value of ρ through time t. A Φ-no-regret learning algorithm is one whose observed regret approaches the negative orthant R Φ. 20
22 Blackwell s Theorem The negative orthant R Φ is approachable iff there exists a learning algorithm A = q 1, q 2,... s.t. for any sequence of rewards r 1, r 2,..., ρ(r t+1, q t+1 ) ( ρ t ) + 0 (4) for all times t, where x + = max{x,0}. Moreover, this procedure can be used to approach the negative orthant R Φ : if ρ t R Φ, play arbitrarily; if ρ t V \ R Φ, play according to A. 21
23 Regret Matching Algorithm Given Φ Given Y R Φ + If φ Φ Y φ = 0, play arbitrarily If φ Φ Y φ > 0, define stochastic matrix A A(Φ, Y ) = φ Φ φy φ φ Φ Y φ (5) play mixed strategy q = qa Regret Matching Theorem Regret matching satisfies the generalized Blackwell condition: ρ(r, q) Y = 0 22
24 Proof ρ(r, q) Y = = = φ Φ φ Φ φ Φ = r = ( ρ φ (r, q)y φ (6) (r qφ r q)y φ (7) r (qφy φ qy φ ) (8) q ( φ Φ ( φ Φ Y φ ) r φy φ q ( q φ Φ Y φ ) φ Φ φy φ φ Φ Y φ q ) (9) (10) = φ Φ ( Y φ ) r (qa q) (11) = φ Φ Y φ ) r (q q) (12) = 0 (13) 23
25 Generic Regret Matching Algorithm (Φ, g) for t = 1,..., T 1. play mixed strategy q t 2. realize pure action i 3. observe rewards r t 4. for all φ Φ compute instantaneous regret observed ρ t φ ρ φ(r t, e i ) = r t e i φ r t e i expected ρ t φ ρ φ(r t, q t ) = r t q t φ r t q t update cumulative regret vector X t φ = Xt 1 φ 5. compute Y = g(x t ) 6. compute A = φ Φ φy φ φ Φ Y φ 7. solve for the fixed point q t+1 = q t+1 A + ρ t φ 24
26 Special Cases of Regret Matching Foster and Vohra 97 (Φ INT ) Hart and Mas-Colell 00 (Φ EXT ) Choose G(X) = 1 2 k (X+ k )2 so that g k (X) = X + k Freund and Schapire 95 (Φ EXT ) Cesa-Bianchi and Lugosi 03 (Φ INT ) Choose G(X) = 1 η ln( k eηx k) so that gk (X) = e ηx k / k eηx k 25
27 Multiagent Model a set of players I (i I) for all players i, a set of pure actions A i with a i A i a set of mixed actions Q i with q i Q i a reward function r i : A [0,1], where A = i A i with a A an expected reward function r i : Q [0,1], where Q = i Q i with q Q s.t. r i (q) = a A q(a)r i(a) a set Φ i (φ i Φ i ) 26
28 Φ-Equilibrium An mixed action profile q Q is a Φ-equilibrium iff r i (φ i (q)) r i (q), for all players i and for all φ i Φ i. Examples Correlated Equilibrium: Φ i = Φ INT, for all players i Generalized Minimax Equilibrium: Φ i = Φ EXT, for all players i 27
29 Convergence Theorem If all players i play via some no-φ i -regret algorithm, then the joint empirical distribution of play converges to the set of Φ-equilibria, almost surely. Proof For all players i, for all φ i Φ i, almost surely. lim sup r i(φ i (z t )) r i (z t ) (14) t = limsup t 1 t t τ=1 r i (φ i (a τ i ), aτ i ) 1 t t τ=1 r i (a τ i, aτ i ) (15) 0 (16) 28
30 Zero-Sum Games Matching Pennies H T H 1,1 1, 1 T 1, 1 1,1 Rock-Paper-Scissors R P S R 0,0 1,1 1, 1 P 1, 1 0,0 1,1 S 1,1 1, 1 0,0 29
31 Matching Pennies Weight Weights NER exponential [η = ]: Matching Pennies Player 1: H Player 1: T 0.1 Player 2: H Player 2: T Time Empirical Frequency Frequencies NER exponential [η = ]: Matching Pennies 1 Player 1: H Player 1: T Player 2: H Player 2: T Time 30
32 Rock-Paper-Scissors Weights Frequencies 1 NER polynomial [p = 2]: Rock, Paper, Scissors 1 NER polynomial [p = 2]: Rock, Paper, Scissors Weight Player 1: R Player 1: P 0.2 Player 1: S Player 2: R 0.1 Player 2: P Player 2: S Time Empirical Frequency Player 1: R Player 1: P Player 1: S Player 2: R Player 2: P Player 2: S Time 31
33 General-Sum Games Shapley Game L C R T 0,0 1,0 0,1 M 0,1 0,0 1,0 B 1,0 0,1 0,0 Correlated Equilibrium L C R T 0 1/6 1/6 M 1/6 0 1/6 B 1/6 1/6 0 32
34 Shapley Game: No Internal Regret Learning Frequencies Empirical Frequency NIR polynomial [p = 2]: Shapley Game Player 1: T Player 1: M Player 1: B Player 2: L Player 2: C Player 2: R Empirical Frequency NIR exponential [η = ]: Shapley Game Player 1: T Player 1: M Player 1: B Player 2: L Player 2: C Player 2: R Time Time 33
35 Shapley Game: No Internal Regret Learning Joint Frequencies 1 NIR polynomial [p = 2]: Shapley Game 1 NIR exponential [η = ]: Shapley Game Empirical Frequency (T,L) (M,L) (B,L) (T,C) (M,C) (B,C) (T,R) (M,R) (B,R) Empirical Frequency (T,L) (M,L) (B,L) (T,C) (M,C) (B,C) (T,R) (M,R) (B,R) Time Time 34
36 Shapley Game: No External Regret Learning Frequencies Empirical Frequency NER polynomial [p = 2]: Shapley Game Player 1: T Player 1: M Player 1: B Player 2: L Player 2: C Player 2: R Empirical Frequency NER exponential [η = ]: Shapley Game Player 1: T Player 1: M Player 1: B Player 2: L Player 2: C Player 2: R Time Time 35
37 Summary No-external- and no-internal-regret can be defined along one continuum, no-φ-regret. No-Φ-regret learning algorithms exist, Φ. No-Φ-regret learning converges to the set of Φ-equilibria, Φ. No-internal-regret learning is the strongest form of no-φ-regret learning. Therefore, Nash equilibrium cannot be learned via no-φ-regret learning. 36
38 A little rationality goes a long way [Hart 03] Regret Minimization vs. Utility Maximization RM is easy to implement. RM justifies randomness in actions. Can RM be used to explain human behavior? 37
Games A game is a tuple = (I; (S i ;r i ) i2i) where ffl I is a set of players (i 2 I) ffl S i is a set of (pure) strategies (s i 2 S i ) Q ffl r i :
On the Connection between No-Regret Learning, Fictitious Play, & Nash Equilibrium Amy Greenwald Brown University Gunes Ercal, David Gondek, Amir Jafari July, Games A game is a tuple = (I; (S i ;r i ) i2i)
More informationBlackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks
Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01
More informationC 4,4 0,5 0,0 0,0 B 0,0 0,0 1,1 1 2 T 3,3 0,0. Freund and Schapire: Prisoners Dilemma
On No-Regret Learning, Fictitious Play, and Nash Equilibrium Amir Jafari Department of Mathematics, Brown University, Providence, RI 292 Amy Greenwald David Gondek Department of Computer Science, Brown
More informationAdvanced Machine Learning
Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal
More informationBelief-based Learning
Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium
More informationLecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007
MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing
More informationConvergence and No-Regret in Multiagent Learning
Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent
More informationPrediction and Playing Games
Prediction and Playing Games Vineel Pratap vineel@eng.ucsd.edu February 20, 204 Chapter 7 : Prediction, Learning and Games - Cesa Binachi & Lugosi K-Person Normal Form Games Each player k (k =,..., K)
More informationMultiagent Value Iteration in Markov Games
Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationSmooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics
Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart August 2016 Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart Center for the Study of Rationality
More informationCorrelated Equilibria: Rationality and Dynamics
Correlated Equilibria: Rationality and Dynamics Sergiu Hart June 2010 AUMANN 80 SERGIU HART c 2010 p. 1 CORRELATED EQUILIBRIA: RATIONALITY AND DYNAMICS Sergiu Hart Center for the Study of Rationality Dept
More informationGame Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti
Game Theory, Evolutionary Dynamics, and Multi-Agent Learning Prof. Nicola Gatti (nicola.gatti@polimi.it) Game theory Game theory: basics Normal form Players Actions Outcomes Utilities Strategies Solutions
More informationZero-Sum Games Public Strategies Minimax Theorem and Nash Equilibria Appendix. Zero-Sum Games. Algorithmic Game Theory.
Public Strategies Minimax Theorem and Nash Equilibria Appendix 2013 Public Strategies Minimax Theorem and Nash Equilibria Appendix Definition Definition A zero-sum game is a strategic game, in which for
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationNo-Regret Learning in Convex Games
Geoffrey J. Gordon Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213 Amy Greenwald Casey Marks Department of Computer Science, Brown University, Providence, RI 02912 Abstract
More informationWalras-Bowley Lecture 2003
Walras-Bowley Lecture 2003 Sergiu Hart This version: September 2004 SERGIU HART c 2004 p. 1 ADAPTIVE HEURISTICS A Little Rationality Goes a Long Way Sergiu Hart Center for Rationality, Dept. of Economics,
More informationLearning correlated equilibria in games with compact sets of strategies
Learning correlated equilibria in games with compact sets of strategies Gilles Stoltz a,, Gábor Lugosi b a cnrs and Département de Mathématiques et Applications, Ecole Normale Supérieure, 45 rue d Ulm,
More informationCalibration and Nash Equilibrium
Calibration and Nash Equilibrium Dean Foster University of Pennsylvania and Sham Kakade TTI September 22, 2005 What is a Nash equilibrium? Cartoon definition of NE: Leroy Lockhorn: I m drinking because
More informationImproving Convergence Rates in Multiagent Learning Through Experts and Adaptive Consultation
Improving Convergence Rates in Multiagent Learning Through Experts and Adaptive Consultation Greg Hines Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, N2L 3G1 Kate Larson
More informationDeterministic Calibration and Nash Equilibrium
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2-2008 Deterministic Calibration and Nash Equilibrium Sham M. Kakade Dean P. Foster University of Pennsylvania Follow
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationGame Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin
Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Bimatrix Games We are given two real m n matrices A = (a ij ), B = (b ij
More informationMultiagent Learning Using a Variable Learning Rate
Multiagent Learning Using a Variable Learning Rate Michael Bowling, Manuela Veloso Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213-3890 Abstract Learning to act in a multiagent
More informationThe Folk Theorem for Finitely Repeated Games with Mixed Strategies
The Folk Theorem for Finitely Repeated Games with Mixed Strategies Olivier Gossner February 1994 Revised Version Abstract This paper proves a Folk Theorem for finitely repeated games with mixed strategies.
More informationSatisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games
Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),
More informationComputational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*
Computational Game Theory Spring Semester, 2005/6 Lecture 5: 2-Player Zero Sum Games Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* 1 5.1 2-Player Zero Sum Games In this lecture
More informationarxiv: v1 [cs.gt] 1 Sep 2015
HC selection for MCTS in Simultaneous Move Games Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games arxiv:1509.00149v1 [cs.gt] 1 Sep 2015 Vojtěch Kovařík vojta.kovarik@gmail.com
More informationSelecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden
1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized
More information4. Opponent Forecasting in Repeated Games
4. Opponent Forecasting in Repeated Games Julian and Mohamed / 2 Learning in Games Literature examines limiting behavior of interacting players. One approach is to have players compute forecasts for opponents
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationOnline Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationDECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R I LAGGING ANCHOR ALGORITHM. Xiaosong Lu and Howard M. Schwartz
International Journal of Innovative Computing, Information and Control ICIC International c 20 ISSN 349-498 Volume 7, Number, January 20 pp. 0 DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R
More informationErgodicity and Non-Ergodicity in Economics
Abstract An stochastic system is called ergodic if it tends in probability to a limiting form that is independent of the initial conditions. Breakdown of ergodicity gives rise to path dependence. We illustrate
More informationLearning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing
Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Jacob W. Crandall and Michael A. Goodrich Computer Science Department Brigham Young University Provo, UT 84602
More informationNormal-form games. Vincent Conitzer
Normal-form games Vincent Conitzer conitzer@cs.duke.edu 2/3 of the average game Everyone writes down a number between 0 and 100 Person closest to 2/3 of the average wins Example: A says 50 B says 10 C
More informationGame Theory and its Applications to Networks - Part I: Strict Competition
Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis
More informationLearning to Coordinate Efficiently: A Model-based Approach
Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion
More informationMechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham
Game Theory Course: Jackson, Leyton-Brown & Shoham Bayesian Game Setting Extend the social choice setting to a new setting where agents can t be relied upon to disclose their preferences honestly Start
More informationCyclic Equilibria in Markov Games
Cyclic Equilibria in Markov Games Martin Zinkevich and Amy Greenwald Department of Computer Science Brown University Providence, RI 02912 {maz,amy}@cs.brown.edu Michael L. Littman Department of Computer
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi lugosi@upf.es)
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz (gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi (lugosi@upf.es)
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:
More informationA (Brief) Introduction to Game Theory
A (Brief) Introduction to Game Theory Johanne Cohen PRiSM/CNRS, Versailles, France. Goal Goal is a Nash equilibrium. Today The game of Chicken Definitions Nash Equilibrium Rock-paper-scissors Game Mixed
More informationLearning to Compete, Compromise, and Cooperate in Repeated General-Sum Games
Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games Jacob W. Crandall Michael A. Goodrich Computer Science Department, Brigham Young University, Provo, UT 84602 USA crandall@cs.byu.edu
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationOptimal Convergence in Multi-Agent MDPs
Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,
More informationNo-regret algorithms for structured prediction problems
No-regret algorithms for structured prediction problems Geoffrey J. Gordon December 21, 2005 CMU-CALD-05-112 School of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 Abstract No-regret
More informationIncremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests
Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Nancy Fulda and Dan Ventura Department of Computer Science Brigham Young University
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationMultiagent Learning. Foundations and Recent Trends. Stefano Albrecht and Peter Stone
Multiagent Learning Foundations and Recent Trends Stefano Albrecht and Peter Stone Tutorial at IJCAI 2017 conference: http://www.cs.utexas.edu/~larg/ijcai17_tutorial Overview Introduction Multiagent Models
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationPrisoner s Dilemma. Veronica Ciocanel. February 25, 2013
n-person February 25, 2013 n-person Table of contents 1 Equations 5.4, 5.6 2 3 Types of dilemmas 4 n-person n-person GRIM, GRIM, ALLD Useful to think of equations 5.4 and 5.6 in terms of cooperation and
More informationBasic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria
Basic Game Theory University of Waterloo January 7, 2013 Outline 1 2 3 What is game theory? The study of games! Bluffing in poker What move to make in chess How to play Rock-Scissors-Paper Also study of
More informationA Second-order Bound with Excess Losses
A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands
More informationGame Theory: Lecture 3
Game Theory: Lecture 3 Lecturer: Pingzhong Tang Topic: Mixed strategy Scribe: Yuan Deng March 16, 2015 Definition 1 (Mixed strategy). A mixed strategy S i : A i [0, 1] assigns a probability S i ( ) 0 to
More informationNotes on Coursera s Game Theory
Notes on Coursera s Game Theory Manoel Horta Ribeiro Week 01: Introduction and Overview Game theory is about self interested agents interacting within a specific set of rules. Self-Interested Agents have
More informationChapter 9. Mixed Extensions. 9.1 Mixed strategies
Chapter 9 Mixed Extensions We now study a special case of infinite strategic games that are obtained in a canonic way from the finite games, by allowing mixed strategies. Below [0, 1] stands for the real
More informationThe Distribution of Optimal Strategies in Symmetric Zero-sum Games
The Distribution of Optimal Strategies in Symmetric Zero-sum Games Florian Brandl Technische Universität München brandlfl@in.tum.de Given a skew-symmetric matrix, the corresponding two-player symmetric
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More informationLearning to play partially-specified equilibrium
Learning to play partially-specified equilibrium Ehud Lehrer and Eilon Solan August 29, 2007 August 29, 2007 First draft: June 2007 Abstract: In a partially-specified correlated equilibrium (PSCE ) the
More informationGame Theory. Professor Peter Cramton Economics 300
Game Theory Professor Peter Cramton Economics 300 Definition Game theory is the study of mathematical models of conflict and cooperation between intelligent and rational decision makers. Rational: each
More informationApplications of on-line prediction. in telecommunication problems
Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some
More informationBrown s Original Fictitious Play
manuscript No. Brown s Original Fictitious Play Ulrich Berger Vienna University of Economics, Department VW5 Augasse 2-6, A-1090 Vienna, Austria e-mail: ulrich.berger@wu-wien.ac.at March 2005 Abstract
More informationBounded Rationality, Strategy Simplification, and Equilibrium
Bounded Rationality, Strategy Simplification, and Equilibrium UPV/EHU & Ikerbasque Donostia, Spain BCAM Workshop on Interactions, September 2014 Bounded Rationality Frequently raised criticism of game
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationA Folk Theorem For Stochastic Games With Finite Horizon
A Folk Theorem For Stochastic Games With Finite Horizon Chantal Marlats January 2010 Chantal Marlats () A Folk Theorem For Stochastic Games With Finite Horizon January 2010 1 / 14 Introduction: A story
More informationGame Theory. 2.1 Zero Sum Games (Part 2) George Mason University, Spring 2018
Game Theory 2.1 Zero Sum Games (Part 2) George Mason University, Spring 2018 The Nash equilibria of two-player, zero-sum games have various nice properties. Minimax Condition A pair of strategies is in
More informationNoncooperative continuous-time Markov games
Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and
More informationA General Class of Adaptive Strategies 1
Journal of Economic Theory 98, 2654 (2001) doi:10.1006jeth.2000.2746, available online at http:www.idealibrary.com on A General Class of Adaptive Strategies 1 Sergiu Hart 2 Center for Rationality and Interactive
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationLecture 6: April 25, 2006
Computational Game Theory Spring Semester, 2005/06 Lecture 6: April 25, 2006 Lecturer: Yishay Mansour Scribe: Lior Gavish, Andrey Stolyarenko, Asaph Arnon Partially based on scribe by Nataly Sharkov and
More informationCSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1
CSC304 Lecture 5 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Recap Last lecture Cost-sharing games o Price of anarchy (PoA) can be n o Price of stability (PoS) is O(log n)
More informationAchieving Pareto Optimality Through Distributed Learning
1 Achieving Pareto Optimality Through Distributed Learning Jason R. Marden, H. Peyton Young, and Lucy Y. Pao Abstract We propose a simple payoff-based learning rule that is completely decentralized, and
More informationBELIEFS & EVOLUTIONARY GAME THEORY
1 / 32 BELIEFS & EVOLUTIONARY GAME THEORY Heinrich H. Nax hnax@ethz.ch & Bary S. R. Pradelski bpradelski@ethz.ch May 15, 217: Lecture 1 2 / 32 Plan Normal form games Equilibrium invariance Equilibrium
More informationAchieving Pareto Optimality Through Distributed Learning
1 Achieving Pareto Optimality Through Distributed Learning Jason R. Marden, H. Peyton Young, and Lucy Y. Pao Abstract We propose a simple payoff-based learning rule that is completely decentralized, and
More informationConvergence to Pareto Optimality in General Sum Games via Learning Opponent s Preference
Convergence to Pareto Optimality in General Sum Games via Learning Opponent s Preference Dipyaman Banerjee Department of Math & CS University of Tulsa Tulsa, OK, USA dipyaman@gmail.com Sandip Sen Department
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationA Polynomial-time Nash Equilibrium Algorithm for Repeated Games
A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result
More informationCorrelated Q-Learning
Journal of Machine Learning Research 1 (27) 1 1 Submitted /; Published / Correlated Q-Learning Amy Greenwald Department of Computer Science Brown University Providence, RI 2912 Keith Hall Department of
More informationGame Theory Lecture 7 Refinements: Perfect equilibrium
Game Theory Lecture 7 Refinements: Perfect equilibrium Christoph Schottmüller University of Copenhagen October 23, 2014 1 / 18 Outline 1 Perfect equilibrium in strategic form games 2 / 18 Refinements I
More informationMultiagent Systems Motivation. Multiagent Systems Terminology Basics Shapley value Representation. 10.
Multiagent Systems July 2, 2014 10. Coalition Formation Multiagent Systems 10. Coalition Formation B. Nebel, C. Becker-Asano, S. Wöl Albert-Ludwigs-Universität Freiburg July 2, 2014 10.1 Motivation 10.2
More informationWriting Game Theory in L A TEX
Writing Game Theory in L A TEX Thiago Silva First Version: November 22, 2015 This Version: November 13, 2017 List of Figures and Tables 1 2x2 Matrix: Prisoner s ilemma Normal-Form Game............. 3 2
More informationDistributed Learning based on Entropy-Driven Game Dynamics
Distributed Learning based on Entropy-Driven Game Dynamics Bruno Gaujal joint work with Pierre Coucheney and Panayotis Mertikopoulos Inria Aug., 2014 Model Shared resource systems (network, processors)
More informationEvolution & Learning in Games
1 / 27 Evolution & Learning in Games Econ 243B Jean-Paul Carvalho Lecture 2. Foundations of Evolution & Learning in Games II 2 / 27 Outline In this lecture, we shall: Take a first look at local stability.
More informationREPEATED GAMES. Jörgen Weibull. April 13, 2010
REPEATED GAMES Jörgen Weibull April 13, 2010 Q1: Can repetition induce cooperation? Peace and war Oligopolistic collusion Cooperation in the tragedy of the commons Q2: Can a game be repeated? Game protocols
More informationMulti-Agent Learning with Policy Prediction
Multi-Agent Learning with Policy Prediction Chongjie Zhang Computer Science Department University of Massachusetts Amherst, MA 3 USA chongjie@cs.umass.edu Victor Lesser Computer Science Department University
More informationIntroduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationCS 798: Multiagent Systems
CS 798: Multiagent Systems and Utility Kate Larson Cheriton School of Computer Science University of Waterloo January 6, 2010 Outline 1 Self-Interested Agents 2 3 4 5 Self-Interested Agents We are interested
More informationMicroeconomics. 2. Game Theory
Microeconomics 2. Game Theory Alex Gershkov http://www.econ2.uni-bonn.de/gershkov/gershkov.htm 18. November 2008 1 / 36 Dynamic games Time permitting we will cover 2.a Describing a game in extensive form
More informationEvolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *
Evolutionary Dynamics and Extensive Form Games by Ross Cressman Reviewed by William H. Sandholm * Noncooperative game theory is one of a handful of fundamental frameworks used for economic modeling. It
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationPerformance and Convergence of Multi-user Online Learning
Performance and Convergence of Multi-user Online Learning Cem Tekin, Mingyan Liu Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, Michigan, 4809-222 Email: {cmtkn,
More informationMixed Nash Equilibria
lgorithmic Game Theory, Summer 2017 Mixed Nash Equilibria Lecture 2 (5 pages) Instructor: Thomas Kesselheim In this lecture, we introduce the general framework of games. Congestion games, as introduced
More informationFirst Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo
Game Theory Giorgio Fagiolo giorgio.fagiolo@univr.it https://mail.sssup.it/ fagiolo/welcome.html Academic Year 2005-2006 University of Verona Summary 1. Why Game Theory? 2. Cooperative vs. Noncooperative
More informationLearning Equilibrium as a Generalization of Learning to Optimize
Learning Equilibrium as a Generalization of Learning to Optimize Dov Monderer and Moshe Tennenholtz Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa 32000,
More information