Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti

Similar documents
Belief-based Learning

A (Brief) Introduction to Game Theory

Evolution & Learning in Games

Game Theory for Linguists

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo

DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R I LAGGING ANCHOR ALGORITHM. Xiaosong Lu and Howard M. Schwartz

Iterated Strict Dominance in Pure Strategies

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Computation of Efficient Nash Equilibria for experimental economic games

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Computational Evolutionary Game Theory and why I m never using PowerPoint for another presentation involving maths ever again

Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples

Prisoner s Dilemma. Veronica Ciocanel. February 25, 2013

Basic Game Theory. Kate Larson. January 7, University of Waterloo. Kate Larson. What is Game Theory? Normal Form Games. Computing Equilibria

Introduction to Game Theory. Outline. Topics. Recall how we model rationality. Notes. Notes. Notes. Notes. Tyler Moore.

The Dynamics of Generalized Reinforcement Learning

Game-Theoretic Learning:

Game Theory and Evolution

Computing Minmax; Dominance

Quantum Games. Quantum Strategies in Classical Games. Presented by Yaniv Carmeli

C31: Game Theory, Lecture 1

CSC304 Lecture 5. Game Theory : Zero-Sum Games, The Minimax Theorem. CSC304 - Nisarg Shah 1

Graph topology and the evolution of cooperation

BELIEFS & EVOLUTIONARY GAME THEORY

Static (or Simultaneous- Move) Games of Complete Information

Evolutionary Game Theory

Introduction to Game Theory

Game Theory. Professor Peter Cramton Economics 300

Industrial Organization Lecture 3: Game Theory

CS 798: Multiagent Systems

Introduction to Game Theory

Population Games and Evolutionary Dynamics

Level K Thinking. Mark Dean. Columbia University - Spring 2017

Multiagent Learning Using a Variable Learning Rate

An Introduction to Evolutionary Game Theory

Decision Theory: Q-Learning

Microeconomics. 2. Game Theory

Game Theory: Spring 2017

Algorithmic Game Theory. Alexander Skopalik

Writing Game Theory in L A TEX

Empathy in Bimatrix Games

EVOLUTIONARY STABILITY FOR TWO-STAGE HAWK-DOVE GAMES

Introduction to Quantum Game Theory

Mohammad Hossein Manshaei 1394

Introduction to Game Theory. Outline. Proposal feedback. Proposal feedback: written feedback. Notes. Notes. Notes. Notes. Tyler Moore.

Games A game is a tuple = (I; (S i ;r i ) i2i) where ffl I is a set of players (i 2 I) ffl S i is a set of (pure) strategies (s i 2 S i ) Q ffl r i :

Outline for today. Stat155 Game Theory Lecture 16: Evolutionary game theory. Evolutionarily stable strategies. Nash equilibrium.

EVOLUTIONARY GAMES WITH GROUP SELECTION

Other Equilibrium Notions

Normal-form games. Vincent Conitzer

The Stability of Strategic Plasticity

Costly Signals and Cooperation

Computing Minmax; Dominance

A Generic Bound on Cycles in Two-Player Games

Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games

Complexity in social dynamics : from the. micro to the macro. Lecture 4. Franco Bagnoli. Lecture 4. Namur 7-18/4/2008

EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium

Weak Dominance and Never Best Responses

Reinforcement Learning. Introduction

Political Economy of Institutions and Development: Problem Set 1. Due Date: Thursday, February 23, in class.

Games with Perfect Information

Meaning, Evolution and the Structure of Society

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

UC Berkeley Haas School of Business Game Theory (EMBA 296 & EWMBA 211) Summer 2016

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

The Evolution of Cooperation under Cheap Pseudonyms

arxiv:math/ v1 [math.oc] 29 Jun 2004

Bounded Rationality, Strategy Simplification, and Equilibrium

Multiagent (Deep) Reinforcement Learning

6 The Principle of Optimality

Game Theory -- Lecture 4. Patrick Loiseau EURECOM Fall 2016

MAS and Games. Department of Informatics Clausthal University of Technology SS 18

Game theory lecture 4. September 24, 2012

arxiv: v1 [physics.soc-ph] 14 Mar 2019

Mechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham

4. Opponent Forecasting in Repeated Games

University of Warwick, Department of Economics Spring Final Exam. Answer TWO questions. All questions carry equal weight. Time allowed 2 hours.

Multiagent Value Iteration in Markov Games

Perforce there perished many a stock, unable: By propagation to forge a progeny.

Advanced Microeconomics

Zero-Sum Games Public Strategies Minimax Theorem and Nash Equilibria Appendix. Zero-Sum Games. Algorithmic Game Theory.

Decision Theory: Markov Decision Processes

Biology as Information Dynamics

Game interactions and dynamics on networked populations

Evolutionary Game Theory: Overview and Recent Results

Convergence and No-Regret in Multiagent Learning

Lecture Notes on Game Theory

Computing Equilibria of Repeated And Dynamic Games

15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)

Evolutionary Game Theory

Bounded Rationality Lecture 2. Full (Substantive, Economic) Rationality

REPEATED GAMES. Jörgen Weibull. April 13, 2010

Computational Problems Related to Graph Structures in Evolution

Learning Equilibrium as a Generalization of Learning to Optimize

Chapter 7. Evolutionary Game Theory

Population Games and Evolutionary Dynamics

Some Thought-ettes on Artificial Agent Modeling and Its Uses

Game Theory: Lecture 3

Opponent Modelling by Sequence Prediction and Lookahead in Two-Player Games

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Transcription:

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning Prof. Nicola Gatti (nicola.gatti@polimi.it)

Game theory

Game theory: basics Normal form Players Actions Outcomes Utilities Strategies Solutions

Game theory: basics Normal form Players Actions Outcomes Utilities Strategies Solutions

Game theory: basics Rock Paper Scissors Normal form Players Actions Outcomes Utilities Strategies Solutions Rock Paper Scissors

Game theory: basics Rock Paper Scissors Normal form Players Rock Tie wins wins Actions Outcomes Utilities Strategies Paper wins Tie wins Solutions Scissors wins wins Tie

Game theory: basics Rock Paper Scissors Normal form Players Actions Outcomes Utilities Strategies Solutions Rock 0,0-1,1 1,-1 Paper 1,-1 0,0-1,1 Scissors -1,1 1,-1 0,0

Game theory: basics Rock Paper Scissors Normal form Players Rock 0,0-1,1 1,-1 1(Rock) Actions Outcomes Utilities Strategies Paper 1,-1 0,0-1,1 1(Paper) Solutions Scissors -1,1 1,-1 0,0 1(Scissors) 2(Rock) 2(Paper) 2(Scissors)

Game theory: basics Rock Paper Scissors Normal form Players Rock 0,0-1,1 1,-1 1/3 Actions Outcomes Utilities Strategies Paper 1,-1 0,0-1,1 1/3 Solutions Scissors -1,1 1,-1 0,0 1/3 1/3 1/3 1/3

Nash Equilibrium A strategy profile 1 2 arg max 1 2 2 arg max 2 n n 1U 1 1U 2 ( 1, 2 ) 2 o 2 o is a Nash equilibrium if and if:

Evolutionary dynamics

An evolutionary interpretation 33% 33% 33% Rock Paper Scissors

An evolutionary interpretation Infinite population of individuals 33% 33% 33% Rock Paper Scissors

Bacterials

An evolutionary interpretation 33% 33% 30% 50% 33% 20% Rock Paper Scissors Rock Paper Scissors

An evolutionary interpretation Rock Paper Scissors 33% 33% Rock 0,0-1,1 1,-1 Paper 1,-1 0,0-1,1 30% 50% 33% Scissors -1,1 1,-1 0,0 20% Rock Paper Scissors Rock Paper Scissors

An evolutionary interpretation Rock Paper Scissors 33% 33% Rock 0,0-1,1 1,-1 30% 50% Paper 1,-1 0,0-1,1 33% Scissors -1,1 1,-1 0,0 20% Rock Paper Scissors Rock Paper Scissors At each round, each individual of a population plays against each individual of the opponent s population

An evolutionary interpretation Utility = - 0.3 Utility = 0.1 Rock Paper Scissors 33% 33% Rock 0,0-1,1 1,-1 Paper 1,-1 0,0-1,1 30% 50% 33% Scissors -1,1 1,-1 0,0 20% Rock Paper Scissors Rock Paper Scissors Utility = 0.2

An evolutionary interpretation Utility = - 0.3 33% 33% Utility = 0.1 33% Rock Paper Scissors Utility = 0.2

An evolutionary interpretation Utility = - 0.3 33% 33% Utility = 0.1 33% Rock Paper Scissors Utility = 0.2 The average utility is 0

An evolutionary interpretation Utility = - 0.3 33% 33% Utility = 0.1 33% Rock Paper Scissors Utility = 0.2 The average utility is 0 A positive (utility - average utility) leads to an increase of the population A negative (utility - average utility) leads to a decrease of the population

An evolutionary interpretation 17% 37% 46% Rock Paper Scissors New population after the replication

Revision protocol Question: how the populations change? 1(a, t) = 1 (a, t) Replicator dynamics e a U 1 2 (t) 1(t)U 1 2 (t) Utility given by playing a with a probability of 1 Average population utility

Evolutionary Stable Strategies A strategy is an ESS if it is immune to invasion by mutant strategies, given that the mutants initially occupy a small fraction of population Every ESS is an asymptotically stable fixed point of the replicator dynamics

Evolutionary Stable Strategies A strategy is an ESS if it is immune to invasion by mutant strategies, given that the mutants initially occupy a small fraction of population Every ESS is an asymptotically stable fixed point of the replicator dynamics Nash equilibria ESS While a NE always exists, an ESS may not exist

Prisoner s dilemma Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Prisoner s dilemma Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Prisoner s dilemma 1 Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1 s cooperate probability 0.75 y 1 0.5 0.25 0 0 0.25 0.5 0.75 1 x 1 s cooperate probability

Prisoner s dilemma 1 Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1 s cooperate probability 0.75 y 1 0.5 0.25 asymptotically stable 0 0 0.25 0.5 0.75 1 x 1 s cooperate probability

Stag hunt Stag Hare Stag 4,4 1,3 Hare 3,1 3,3

Stag hunt Stag Hare Stag 4,4 1,3 Hare 3,1 3,3

Stag hunt Stag Hare 1 0.75 Stag 4,4 1,3 Hare 3,1 3,3 s stag probability 1 0.5 0.25 0 0 0.25 0.5 0.75 1 x 1 s stag probability

Stag hunt 1 asymptotically stable Stag Hare 0.75 Stag 4,4 1,3 Hare 3,1 3,3 s stag probability 1 0.5 0.25 asymptotically stable 0 0 0.25 0.5 0.75 1 x 1 s stag probability

Matching pennies Head Tail Head 0,1 1,0 Tail 1,0 0,1

Matching pennies Head Tail Head 0,1 1,0 Tail 1,0 0,1

Matching pennies 1 Head Tail 0.75 Head 0,1 1,0 Tail 1,0 0,1 s head probability y 1 0.5 0.25 0 0 0.25 0.5 0.75 1 x 1 s head probability

Matching pennies 1 Head Tail 0.75 Head 0,1 1,0 Tail 1,0 0,1 s head probability y 1 0.5 0.25 0 0 0.25 0.5 0.75 1 x 1 s head probability

Multi-agent learning

Markov decision problem

Reinforcement learning

Q-learning (1) For every pair state/action: Q(s, a) Q(s, a)+ h i r + max a 0 Q(s, a 0 ) Q(s, a) learning rate reward discount factor

Example: normal-form games 1 state 2 actions (Cooperate, Defeat) Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Example: normal-form games Q(a) Q(a)+ r Q(a) =0.2 1(a) = 1(a) = ( 1.0 a = Cooperate 0.0 ( a = Defeat 0.2 a = Cooperate 0.8 a = Defeat Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Example: normal-form games Q(a) Q(a)+ r Q(a) =0.2 round s action s Q function t =0 Q(Cooperate) = 0 t = 1 a = Cooperate Q(Cooperate) = 0.6 t = 2 a = Defeat Q(Cooperate) = 0.48 t = 3 a = Defeat Q(Cooperate) = 0.384 t = 4 a = Defeat Q(Cooperate) = 0.3072 t = 5 a = Defeat Q(Cooperate) = 0.24576 t = 6 a = Cooperate Q(Cooperate) = 0.496608 ( 1.0 a = Cooperate 1(a) = 0.0 a = Defeat ( 0.2 a = Cooperate 1(a) = 0.8 a = Defeat Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Example: normal-form games Q(a) Q(a)+ r Q(a) =0.2 round s action s Q function t =0 Q(Cooperate) = 0 t = 1 a = Cooperate Q(Cooperate) = 0.6 t = 2 a = Defeat Q(Cooperate) = 0.48 t = 3 a = Defeat Q(Cooperate) = 0.384 t = 4 a = Defeat Q(Cooperate) = 0.3072 t = 5 a = Defeat Q(Cooperate) = 0.24576 t = 6 a = Cooperate Q(Cooperate) = 0.496608 ( 1.0 a = Cooperate 1(a) = 0.0 a = Defeat ( 0.2 a = Cooperate 1(a) = 0.8 a = Defeat Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Example: normal-form games Q(a) Q(a)+ r Q(a) =0.2 round s action s Q function t =0 Q(Cooperate) = 0 t = 1 a = Cooperate Q(Cooperate) = 0.6 t = 2 a = Defeat Q(Cooperate) = 0.48 t = 3 a = Defeat Q(Cooperate) = 0.384 t = 4 a = Defeat Q(Cooperate) = 0.3072 t = 5 a = Defeat Q(Cooperate) = 0.24576 t = 6 a = Cooperate Q(Cooperate) = 0.496608 ( 1.0 a = Cooperate 1(a) = 0.0 a = Defeat ( 0.2 a = Cooperate 1(a) = 0.8 a = Defeat Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Example: normal-form games Q(a) Q(a)+ r Q(a) =0.2 round s action s Q function t =0 Q(Cooperate) = 0 t = 1 a = Cooperate Q(Cooperate) = 0.6 t = 2 a = Defeat Q(Cooperate) = 0.48 t = 3 a = Defeat Q(Cooperate) = 0.384 t = 4 a = Defeat Q(Cooperate) = 0.3072 t = 5 a = Defeat Q(Cooperate) = 0.24576 t = 6 a = Cooperate Q(Cooperate) = 0.496608 ( 1.0 a = Cooperate 1(a) = 0.0 a = Defeat ( 0.2 a = Cooperate 1(a) = 0.8 a = Defeat Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Example: normal-form games Q(a) Q(a)+ r Q(a) =0.2 round s action s Q function t =0 Q(Cooperate) = 0 t = 1 a = Cooperate Q(Cooperate) = 0.6 t = 2 a = Defeat Q(Cooperate) = 0.48 t = 3 a = Defeat Q(Cooperate) = 0.384 t = 4 a = Defeat Q(Cooperate) = 0.3072 t = 5 a = Defeat Q(Cooperate) = 0.24576 t = 6 a = Cooperate Q(Cooperate) = 0.496608 ( 1.0 a = Cooperate 1(a) = 0.0 a = Defeat ( 0.2 a = Cooperate 1(a) = 0.8 a = Defeat Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Example: normal-form games Q(a) Q(a)+ r Q(a) =0.2 round s action s Q function t =0 Q(Cooperate) = 0 t = 1 a = Cooperate Q(Cooperate) = 0.6 t = 2 a = Defeat Q(Cooperate) = 0.48 t = 3 a = Defeat Q(Cooperate) = 0.384 t = 4 a = Defeat Q(Cooperate) = 0.3072 t = 5 a = Defeat Q(Cooperate) = 0.24576 t = 6 a = Cooperate Q(Cooperate) = 0.496608 ( 1.0 a = Cooperate 1(a) = 0.0 a = Defeat ( 0.2 a = Cooperate 1(a) = 0.8 a = Defeat Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Example: normal-form games Q(a) Q(a)+ r Q(a) =0.2 round s action s Q function t =0 Q(Cooperate) = 0 t = 1 a = Cooperate Q(Cooperate) = 0.6 t = 2 a = Defeat Q(Cooperate) = 0.48 t = 3 a = Defeat Q(Cooperate) = 0.384 t = 4 a = Defeat Q(Cooperate) = 0.3072 t = 5 a = Defeat Q(Cooperate) = 0.24576 t = 6 a = Cooperate Q(Cooperate) = 0.496608 ( 1.0 a = Cooperate 1(a) = 0.0 a = Defeat ( 0.2 a = Cooperate 1(a) = 0.8 a = Defeat Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Q-learning (2) Softmax (a.k.a. Boltzam exploration) i(a) = exp(q(s, a)/ ) P a 0 exp(q(s, a 0 )/ )

Q-learning (2) Softmax (a.k.a. Boltzam exploration) i(a) = exp(q(s, a)/ ) P a exp(q(s, a 0 )/ ) 0 temperature

Q-learning (2) Softmax (a.k.a. Boltzam exploration) i(a) = exp(q(s, a)/ ) P a exp(q(s, a 0 )/ ) 0 temperature Every action is played with strictly positive probability The larger the temperature, the smoother the function If the temperature is 0, we would have a best response

Example: normal-form games Q(Cooperate) Q(Defeat) 1(Cooperate) 1(Cooperate) 0 0 0.5 0.5 1 0 0.731 0.269 5 0 0.99331 0.00669 10 0 0.999955 0.000045

Example: normal-form games Q(Cooperate) Q(Defeat) 1(Cooperate) 1(Cooperate) 0 0 0.5 0.5 1 0 0.731 0.269 5 0 0.99331 0.00669 10 0 0.999955 0.000045

Example: normal-form games Q(Cooperate) Q(Defeat) 1(Cooperate) 1(Cooperate) 0 0 0.5 0.5 1 0 0.731 0.269 5 0 0.99331 0.00669 10 0 0.999955 0.000045

Example: normal-form games Q(Cooperate) Q(Defeat) 1(Cooperate) 1(Cooperate) 0 0 0.5 0.5 1 0 0.731 0.269 5 0 0.99331 0.00669 10 0 0.999955 0.000045

Self-play Q-learning dynamics

Self-play learning Q-learning algorithm Q-learning algorithm Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1

Learning dynamics Assumptions: Time is continuous All the actions can be selected simultaneously

Learning dynamics Assumptions: Time is continuous All the actions can be selected simultaneously 1(a, t) = 1(a, t) e a U 1 2 (t) 1(t)U 1 2 (t) 1 (a, t) log( 1 (a)) X 1(a 0 ) log( 1 (a 0 )) a 0

Learning dynamics Assumptions: Time is continuous All the actions can be selected simultaneously 1(a, t) = 1(a, t) e a U 1 2 (t) 1(t)U 1 2 (t) 1 (a, t) log( 1 (a)) X 1(a 0 ) log( 1 (a 0 )) a 0 exploitation term exploration term

Learning dynamics Assumptions: Time is continuous All the actions can be selected simultaneously 1(a, t) = 1(a, t) e a U 1 2 (t) 1(t)U 1 2 (t) 1 (a, t) log( 1 (a)) X 1(a 0 ) log( 1 (a 0 )) a 0 exploitation term exploration term When the temperature is 0, the Q-learning behaves as the replicator dynamics

0 Prisoner s dilemma 1 Cooperate Defeat Cooperate 3,3 0,5 Defeat 5,0 1,1 s cooperate probability y 1 0.75 0.5 0.25 asymptotically stable 0 0 0.25 0.5 0.75 1 x 1 s cooperate probability

Stag hunt 1 asymptotically stable Stag Hare 0.75 Stag 4,4 1,3 Hare 3,1 3,3 s stag probability y 1 0.5 0.25 asymptotically stable 0 0 0.25 0.5 0.75 1 x 1 s stag probability 1 0.75 0.7

Matching pennies 1 Head Tail Head 0,1 1,0 Tail 1,0 0,1 s head probability y 1 0.75 0.5 0.25 0 0 0.25 0.5 0.75 1 Player x 1 s head probability 1 1 0.75 0.75