Prediction and Playing Games
|
|
- Beryl Short
- 5 years ago
- Views:
Transcription
1 Prediction and Playing Games Vineel Pratap February 20, 204 Chapter 7 : Prediction, Learning and Games - Cesa Binachi & Lugosi
2 K-Person Normal Form Games Each player k (k =,..., K) has N k possible actions - i k,..., N K K-tuple of all the players actions i = (i,..., Ii K ) K k= {,..., N k} Loss suffered by player k is l (k), where l (k) : i [0, ]
3 Mixed strategy Mixed strategy for player k is a probability distribution p k = (p (k),..., p(k) N k ) Action played by player k, I (k) is a random variable taking values in the set {,..., N k } according to the distribution p (k) K-tuple of actions played by all players, I = (I (),..., I (K) ) Mixed strategy profile, π(i) = P[I = i] = p () i for all i = (i,...i k ) K k= {,..., N k} x... x p (K) i K
4 Nash Equilibrium The expected loss of player is πl (k) E l (k) (I) N =... i = N K i K = p () i x... x p (K) i K l (k) (i,..., i K ) A mixed strategy profile π = p () x... x p (K) is called a Nash Equilibrium if π l k π k lk for all k =,..., K and mixed strategies q (k), π k = p() x... x q (k) x... x p (K) Nash Theorem : Every finite game has a mixed strategy Nash equilibrium
5 Two-Person Zero-Sum Games For each pair of actions i = (i, i 2 ), where i {,..., N } and i 2 {,..., N 2 }, the losses of the two players satisfy l () (i) = l (2) (i) Simplifying notation - replace l (), N, N 2 by l, N, M Mixed strategy profile π = p x q, where p = (p,..., p N ) and q = (q,..., q M ), is a Nash equilibrium if and only if for all p = (p,..., p N ) and q = (,..., q M ), N M p i q j l(i, j) N i= j= M p i q j l(i, j) i= j= N i= j= M p i q j l(i, j)
6 Two-Person Zero-Sum Games Introduce notation l(p, q) = N max p max max M i= j= p i q j l(i, j) l(p, ) = l(p, q) = l(p, q) p l(p, ) max l(p, ) p l(p, ) max Also, for all p and, l(p, ) p l(p, ) max p p p l(p, ) () l(p, ) max l(p, ) for all p max l(p, ) max p l(p, ) (2)
7 von Neumann s imax theorem From () & (2), the existence of Nash equilibrium p x q implies that max l(p, ) = max l(p, ) p p The common value is called the value of the game, V Nash equilibrium, p x q l(p, q) = V As far as I can see, there could be no theory of games... without that theorem... I thought there was nothing worth publishing until the Minimax Theorem was proved - John von Neumann, 928
8 Correlated Equilibrium A probability distribution P over the set K k= {,..., N k} of all possible K-tuples of actions is called a correlated equilibrium if for all k =,..., K, E l (k) (I) E l (k) (I, Ĩ (k) ), where the r.v. I = (I (),..., I (K) ) is distributed according to P and (I, Ĩ (k) ) = (I (),..., I (k ), Ĩ (k), I (k+),..., I (K) ), where Ĩ (k) is an arbitrary {,..., N k }-valued r.v. that is a function of I (k)
9 Correlated Equilibrium Lemma: A probability distribution P over the set of all K-tuples i = (i,..., i K ) of actions is a correlated equilibrium if and only if, for every player k {,..., K} and actions j, j {,..., N k }, we have P(i) (l (k) (i) l (k) (i, j )) 0 i:i k =j where (i, j ) = (i,..., i k, j, i k+,..., i K ).
10 Repeated Two-Player Zero-Sum Games At each time instant t =, 2,... player k(k =,..., K) selects a mixed strategy p (k) t = (p (k),t,..., p(k) N k,t ) over the set,..., N k of his actions and draws an action I (k) t according to the distribution. Mixed strategy p (k) t may depend on the sequence of random variables I,..., I t
11 Repeated Two-Player Zero-Sum Games If all players play to keep their internal regret small, then the joint empirical frequencies of play converge to the set of correlated equilibria If every player uses a well-calibrated forecasting strategy to predict the K-tuple of actions I t and chooses an action that is the best reply to the forecasted distibution, the same convergence is also achieved.
12 Regret based strategies At each round t, based on the past plays of both players, the row player chooses an action I t {,..., N} according to the mixed strategy p t and the column player chooses an action J t {,..., M} according to the mixed strategy q t. If the row player knew the column player s actions J,..., J n in advance, he would choose I t = arg i=,...,n l(i, J t ) invoking a total loss n i=,...,nl(i, J t ). A meaningful objective is to imize the difference between the row player s cumulative loss and the cumulative loss of the best constant strategy, imize n l(i t, J t ) l(i, J t ) i=,...,n
13 Hannan consistent strategy A strategy is Hannan consistent if the regret is o(), regardless of how the column player behaves. Assug row player chooses his actions I t, regardless of what the column player does, ( lim sup n n l(i t, J t ) i=,...,n n ) l(i, J t ) 0 almost surely. For example, this may be achieved by the exponentially weighted average mixed strategy ( p i,t = exp η ) t s= l(i,js) N k= ( η exp ) i = {,.., N}, η > 0 t s= l(k,js)
14 Hannan consistent strategy Notation : l(p, j) = N M p i l(i, j) and l(i, q) = q j l(i, j) i= Theorem: Assume that in a two-person zero-sum game the row player plays according to a Hannan-consistent strategy. Then lim sup n n l(i t, J t ) V i= almost surely. Assug Hannan consistency, it suffices to show that i=,...,n n i=,...,n n l(i, J t ) V l(i, J t ) = p n l(p, J t )
15 Hannan consistent strategy Then, letting ˆq j,n = n n I J t=j be the emperical probability of the row player s action being j, p n l(p, J t ) = p M ˆq j,n l(p, j) j= = p l(p, ˆq n ) max q p l(p, q) = V. Corollary : Assume that in a two-person zero-sum game, both players play according to some Hannan consistent strategy. Then lim n n l(i t, J t ) = V almost surely.
Learning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationSmooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics
Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart August 2016 Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics Sergiu Hart Center for the Study of Rationality
More informationGame-Theoretic Learning:
Game-Theoretic Learning: Regret Minimization vs. Utility Maximization Amy Greenwald with David Gondek, Amir Jafari, and Casey Marks Brown University University of Pennsylvania November 17, 2004 Background
More informationAdvanced Machine Learning
Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationApplications of on-line prediction. in telecommunication problems
Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some
More informationZero-Sum Games Public Strategies Minimax Theorem and Nash Equilibria Appendix. Zero-Sum Games. Algorithmic Game Theory.
Public Strategies Minimax Theorem and Nash Equilibria Appendix 2013 Public Strategies Minimax Theorem and Nash Equilibria Appendix Definition Definition A zero-sum game is a strategic game, in which for
More informationLecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007
MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More information4. Opponent Forecasting in Repeated Games
4. Opponent Forecasting in Repeated Games Julian and Mohamed / 2 Learning in Games Literature examines limiting behavior of interacting players. One approach is to have players compute forecasts for opponents
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationGames A game is a tuple = (I; (S i ;r i ) i2i) where ffl I is a set of players (i 2 I) ffl S i is a set of (pure) strategies (s i 2 S i ) Q ffl r i :
On the Connection between No-Regret Learning, Fictitious Play, & Nash Equilibrium Amy Greenwald Brown University Gunes Ercal, David Gondek, Amir Jafari July, Games A game is a tuple = (I; (S i ;r i ) i2i)
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationLearning correlated equilibria in games with compact sets of strategies
Learning correlated equilibria in games with compact sets of strategies Gilles Stoltz a,, Gábor Lugosi b a cnrs and Département de Mathématiques et Applications, Ecole Normale Supérieure, 45 rue d Ulm,
More informationAlgorithmic Game Theory and Applications. Lecture 4: 2-player zero-sum games, and the Minimax Theorem
Algorithmic Game Theory and Applications Lecture 4: 2-player zero-sum games, and the Minimax Theorem Kousha Etessami 2-person zero-sum games A finite 2-person zero-sum (2p-zs) strategic game Γ, is a strategic
More informationGame Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin
Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Bimatrix Games We are given two real m n matrices A = (a ij ), B = (b ij
More informationGame Theory: Lecture 3
Game Theory: Lecture 3 Lecturer: Pingzhong Tang Topic: Mixed strategy Scribe: Yuan Deng March 16, 2015 Definition 1 (Mixed strategy). A mixed strategy S i : A i [0, 1] assigns a probability S i ( ) 0 to
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationKAKUTANI S FIXED POINT THEOREM AND THE MINIMAX THEOREM IN GAME THEORY
KAKUTANI S FIXED POINT THEOREM AND THE MINIMAX THEOREM IN GAME THEORY YOUNGGEUN YOO Abstract. The imax theorem is one of the most important results in game theory. It was first introduced by John von Neumann
More informationCalibration and Nash Equilibrium
Calibration and Nash Equilibrium Dean Foster University of Pennsylvania and Sham Kakade TTI September 22, 2005 What is a Nash equilibrium? Cartoon definition of NE: Leroy Lockhorn: I m drinking because
More informationChapter 9. Mixed Extensions. 9.1 Mixed strategies
Chapter 9 Mixed Extensions We now study a special case of infinite strategic games that are obtained in a canonic way from the finite games, by allowing mixed strategies. Below [0, 1] stands for the real
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationSelecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden
1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized
More informationIntroduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationNotes on Coursera s Game Theory
Notes on Coursera s Game Theory Manoel Horta Ribeiro Week 01: Introduction and Overview Game theory is about self interested agents interacting within a specific set of rules. Self-Interested Agents have
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationPart IB Optimisation
Part IB Optimisation Theorems Based on lectures by F. A. Fischer Notes taken by Dexter Chua Easter 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after
More informationCS260: Machine Learning Theory Lecture 12: No Regret and the Minimax Theorem of Game Theory November 2, 2011
CS260: Machine Learning heory Lecture 2: No Regret and the Minimax heorem of Game heory November 2, 20 Lecturer: Jennifer Wortman Vaughan Regret Bound for Randomized Weighted Majority In the last class,
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationA Second-order Bound with Excess Losses
A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands
More informationBandit View on Continuous Stochastic Optimization
Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta
More informationarxiv: v1 [cs.gt] 1 Sep 2015
HC selection for MCTS in Simultaneous Move Games Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games arxiv:1509.00149v1 [cs.gt] 1 Sep 2015 Vojtěch Kovařík vojta.kovarik@gmail.com
More information1 Equilibrium Comparisons
CS/SS 241a Assignment 3 Guru: Jason Marden Assigned: 1/31/08 Due: 2/14/08 2:30pm We encourage you to discuss these problems with others, but you need to write up the actual homework alone. At the top of
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More informationMechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham
Game Theory Course: Jackson, Leyton-Brown & Shoham Bayesian Game Setting Extend the social choice setting to a new setting where agents can t be relied upon to disclose their preferences honestly Start
More informationCSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1
CSC304 Lecture 5 Game Theory : Zero-Sum Games, The Minimax Theorem CSC304 - Nisarg Shah 1 Recap Last lecture Cost-sharing games o Price of anarchy (PoA) can be n o Price of stability (PoS) is O(log n)
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationOnline learning CMPUT 654. October 20, 2011
Online learning CMPUT 654 Gábor Bartók Dávid Pál Csaba Szepesvári István Szita October 20, 2011 Contents 1 Shooting Game 4 1.1 Exercises...................................... 6 2 Weighted Majority Algorithm
More information6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games
6.254 : Game Theory with Engineering Applications Lecture 7: Asu Ozdaglar MIT February 25, 2010 1 Introduction Outline Uniqueness of a Pure Nash Equilibrium for Continuous Games Reading: Rosen J.B., Existence
More informationToday: Linear Programming (con t.)
Today: Linear Programming (con t.) COSC 581, Algorithms April 10, 2014 Many of these slides are adapted from several online sources Reading Assignments Today s class: Chapter 29.4 Reading assignment for
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz (gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi (lugosi@upf.es)
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationEASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University
EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationNotes for Week 3: Learning in non-zero sum games
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes for Week 3: Learning in non-zero sum games Instructor: Robert Kleinberg February 05 09, 2007 Notes by: Yogi Sharma 1 Announcements The material
More informationDeterministic Calibration and Nash Equilibrium
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2-2008 Deterministic Calibration and Nash Equilibrium Sham M. Kakade Dean P. Foster University of Pennsylvania Follow
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.
More informationCOMPARISON OF INFORMATION STRUCTURES IN ZERO-SUM GAMES. 1. Introduction
COMPARISON OF INFORMATION STRUCTURES IN ZERO-SUM GAMES MARCIN PESKI* Abstract. This note provides simple necessary and su cient conditions for the comparison of information structures in zero-sum games.
More informationDefinition Existence CG vs Potential Games. Congestion Games. Algorithmic Game Theory
Algorithmic Game Theory Definitions and Preliminaries Existence of Pure Nash Equilibria vs. Potential Games (Rosenthal 1973) A congestion game is a tuple Γ = (N,R,(Σ i ) i N,(d r) r R) with N = {1,...,n},
More informationLower Semicontinuity of the Solution Set Mapping in Some Optimization Problems
Lower Semicontinuity of the Solution Set Mapping in Some Optimization Problems Roberto Lucchetti coauthors: A. Daniilidis, M.A. Goberna, M.A. López, Y. Viossat Dipartimento di Matematica, Politecnico di
More information6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Potential Games
6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Asu Ozdaglar MIT March 2, 2010 1 Introduction Outline Review of Supermodular Games Reading: Fudenberg and Tirole, Section 12.3.
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationTijmen Daniëls Universiteit van Amsterdam. Abstract
Pure strategy dominance with quasiconcave utility functions Tijmen Daniëls Universiteit van Amsterdam Abstract By a result of Pearce (1984), in a finite strategic form game, the set of a player's serially
More informationThe Folk Theorem for Finitely Repeated Games with Mixed Strategies
The Folk Theorem for Finitely Repeated Games with Mixed Strategies Olivier Gossner February 1994 Revised Version Abstract This paper proves a Folk Theorem for finitely repeated games with mixed strategies.
More informationCorrelated Equilibria: Rationality and Dynamics
Correlated Equilibria: Rationality and Dynamics Sergiu Hart June 2010 AUMANN 80 SERGIU HART c 2010 p. 1 CORRELATED EQUILIBRIA: RATIONALITY AND DYNAMICS Sergiu Hart Center for the Study of Rationality Dept
More information6.891 Games, Decision, and Computation February 5, Lecture 2
6.891 Games, Decision, and Computation February 5, 2015 Lecture 2 Lecturer: Constantinos Daskalakis Scribe: Constantinos Daskalakis We formally define games and the solution concepts overviewed in Lecture
More informationWalras-Bowley Lecture 2003
Walras-Bowley Lecture 2003 Sergiu Hart This version: September 2004 SERGIU HART c 2004 p. 1 ADAPTIVE HEURISTICS A Little Rationality Goes a Long Way Sergiu Hart Center for Rationality, Dept. of Economics,
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationCMU Noncooperative games 4: Stackelberg games. Teacher: Ariel Procaccia
CMU 15-896 Noncooperative games 4: Stackelberg games Teacher: Ariel Procaccia A curious game Playing up is a dominant strategy for row player So column player would play left Therefore, (1,1) is the only
More informationComputing Solution Concepts of Normal-Form Games. Song Chong EE, KAIST
Computing Solution Concepts of Normal-Form Games Song Chong EE, KAIST songchong@kaist.edu Computing Nash Equilibria of Two-Player, Zero-Sum Games Can be expressed as a linear program (LP), which means
More informationLINEAR PROGRAMMING III
LINEAR PROGRAMMING III ellipsoid algorithm combinatorial optimization matrix games open problems Lecture slides by Kevin Wayne Last updated on 7/25/17 11:09 AM LINEAR PROGRAMMING III ellipsoid algorithm
More informationCS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm
CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the
More informationGambling in a rigged casino: The adversarial multi-armed bandit problem
Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at
More informationIntroduction to Information Entropy Adapted from Papoulis (1991)
Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION
More informationWeak Dominance and Never Best Responses
Chapter 4 Weak Dominance and Never Best Responses Let us return now to our analysis of an arbitrary strategic game G := (S 1,...,S n, p 1,...,p n ). Let s i, s i be strategies of player i. We say that
More informationA Multiplayer Generalization of the MinMax Theorem
A Multiplayer Generalization of the MinMax Theorem Yang Cai Ozan Candogan Constantinos Daskalakis Christos Papadimitriou Abstract We show that in zero-sum polymatrix games, a multiplayer generalization
More informationApproximate Equilibria Lower Bound Asymmetric. Michael Erjemenko
Approximate Equilibria Lower Bound Asymmetric Topic of the Paper First hardness result about approximibility of PNE Computing an equilibrium from an initial state is PSPACE-complete PLS-hard to compute
More informationThe FTRL Algorithm with Strongly Convex Regularizers
CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked
More informationBlackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks
Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01
More informationInternal Regret in On-line Portfolio Selection
Internal Regret in On-line Portfolio Selection Gilles Stoltz gilles.stoltz@ens.fr) Département de Mathématiques et Applications, Ecole Normale Supérieure, 75005 Paris, France Gábor Lugosi lugosi@upf.es)
More informationLecture 8 Inequality Testing and Moment Inequality Models
Lecture 8 Inequality Testing and Moment Inequality Models Inequality Testing In the previous lecture, we discussed how to test the nonlinear hypothesis H 0 : h(θ 0 ) 0 when the sample information comes
More informationarxiv: v1 [cs.lg] 8 Nov 2010
Blackwell Approachability and Low-Regret Learning are Equivalent arxiv:0.936v [cs.lg] 8 Nov 200 Jacob Abernethy Computer Science Division University of California, Berkeley jake@cs.berkeley.edu Peter L.
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationThis manuscript is for review purposes only.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 ROBUSTNESS PROPERTIES IN FICTITIOUS-PLAY-TYPE ALGORITHMS BRIAN SWENSON, SOUMMYA KAR, JOAO XAVIER, AND DAVID
More informationEfficient Sensor Network Planning Method. Using Approximate Potential Game
Efficient Sensor Network Planning Method 1 Using Approximate Potential Game Su-Jin Lee, Young-Jin Park, and Han-Lim Choi, Member, IEEE arxiv:1707.00796v1 [cs.gt] 4 Jul 2017 Abstract This paper addresses
More informationBelief-based Learning
Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium
More informationPrisoner s Dilemma. Veronica Ciocanel. February 25, 2013
n-person February 25, 2013 n-person Table of contents 1 Equations 5.4, 5.6 2 3 Types of dilemmas 4 n-person n-person GRIM, GRIM, ALLD Useful to think of equations 5.4 and 5.6 in terms of cooperation and
More informationAn adaptive algorithm for finite stochastic partial monitoring
An adaptive algorithm for finite stochastic partial monitoring Gábor Bartók Navid Zolghadr Csaba Szepesvári Department of Computing Science, University of Alberta, AB, Canada, T6G E8 bartok@ualberta.ca
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationMonotonic ɛ-equilibria in strongly symmetric games
Monotonic ɛ-equilibria in strongly symmetric games Shiran Rachmilevitch April 22, 2016 Abstract ɛ-equilibrium allows for worse actions to be played with higher probability than better actions. I introduce
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 3 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a)
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationZero sum games Proving the vn theorem. Zero sum games. Roberto Lucchetti. Politecnico di Milano
Politecnico di Milano General form Definition A two player zero sum game in strategic form is the triplet (X, Y, f : X Y R) f (x, y) is what Pl1 gets from Pl2, when they play x, y respectively. Thus g
More informationGame Theory and its Applications to Networks - Part I: Strict Competition
Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis
More informationConditional Swap Regret and Conditional Correlated Equilibrium
Conditional Swap Regret and Conditional Correlated quilibrium Mehryar Mohri Courant Institute and Google 251 Mercer Street New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute 251 Mercer
More informationGame theory Lecture 19. Dynamic games. Game theory
Lecture 9. Dynamic games . Introduction Definition. A dynamic game is a game Γ =< N, x, {U i } n i=, {H i } n i= >, where N = {, 2,..., n} denotes the set of players, x (t) = f (x, u,..., u n, t), x(0)
More informationGame Theory Fall 2003
Game Theory Fall 2003 Problem Set 1 [1] In this problem (see FT Ex. 1.1) you are asked to play with arbitrary 2 2 games just to get used to the idea of equilibrium computation. Specifically, consider the
More informationDuality. Peter Bro Mitersen (University of Aarhus) Optimization, Lecture 9 February 28, / 49
Duality Maximize c T x for x F = {x (R + ) n Ax b} If we guess x F, we can say that c T x is a lower bound for the optimal value without executing the simplex algorithm. Can we make similar easy guesses
More informationNormal-form games. Vincent Conitzer
Normal-form games Vincent Conitzer conitzer@cs.duke.edu 2/3 of the average game Everyone writes down a number between 0 and 100 Person closest to 2/3 of the average wins Example: A says 50 B says 10 C
More informationThe Distribution of Optimal Strategies in Symmetric Zero-sum Games
The Distribution of Optimal Strategies in Symmetric Zero-sum Games Florian Brandl Technische Universität München brandlfl@in.tum.de Given a skew-symmetric matrix, the corresponding two-player symmetric
More informationLecture 12. Applications to Algorithmic Game Theory & Computational Social Choice. CSC Allan Borodin & Nisarg Shah 1
Lecture 12 Applications to Algorithmic Game Theory & Computational Social Choice CSC2420 - Allan Borodin & Nisarg Shah 1 A little bit of game theory CSC2420 - Allan Borodin & Nisarg Shah 2 Recap: Yao s
More informationMS&E 246: Lecture 12 Static games of incomplete information. Ramesh Johari
MS&E 246: Lecture 12 Static games of incomplete information Ramesh Johari Incomplete information Complete information means the entire structure of the game is common knowledge Incomplete information means
More informationExponential Weights on the Hypercube in Polynomial Time
European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Adversarial bandits Definition: sequential game. Lower bounds on regret from the stochastic case. Exp3: exponential weights
More informationAlgorithmic Game Theory and Applications. Lecture 7: The LP Duality Theorem
Algorithmic Game Theory and Applications Lecture 7: The LP Duality Theorem Kousha Etessami recall LP s in Primal Form 1 Maximize c 1 x 1 + c 2 x 2 +... + c n x n a 1,1 x 1 + a 1,2 x 2 +... + a 1,n x n
More information