Walras-Bowley Lecture 2003

Similar documents
Correlated Equilibria: Rationality and Dynamics

Smooth Calibration, Leaky Forecasts, Finite Recall, and Nash Dynamics

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Calibration and Nash Equilibrium

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007

Deterministic Calibration and Nash Equilibrium

Brown s Original Fictitious Play

Self-stabilizing uncoupled dynamics

Irrational behavior in the Brown von Neumann Nash dynamics

Existence of sparsely supported correlated equilibria

Existence of sparsely supported correlated equilibria

Irrational behavior in the Brown von Neumann Nash dynamics

Game-Theoretic Learning:

האוניברסיטה העברית בירושלים

6.254 : Game Theory with Engineering Applications Lecture 8: Supermodular and Potential Games

Games A game is a tuple = (I; (S i ;r i ) i2i) where ffl I is a set of players (i 2 I) ffl S i is a set of (pure) strategies (s i 2 S i ) Q ffl r i :

1 The General Definition

MS&E 246: Lecture 12 Static games of incomplete information. Ramesh Johari

Network Games: Learning and Dynamics

EC319 Economic Theory and Its Applications, Part II: Lecture 7

1 Equilibrium Comparisons

A General Class of Adaptive Strategies 1

Possibility and Impossibility of Learning with Limited Behavior Rules

NOTE. A 2 2 Game without the Fictitious Play Property

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Reinforcement Learning

EC319 Economic Theory and Its Applications, Part II: Lecture 2

Distributional stability and equilibrium selection Evolutionary dynamics under payoff heterogeneity (II) Dai Zusai

Learning Equilibrium as a Generalization of Learning to Optimize

Bayes Correlated Equilibrium and Comparing Information Structures

Mechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham

Game Theory and Control

BELIEFS & EVOLUTIONARY GAME THEORY

epub WU Institutional Repository

6.254 : Game Theory with Engineering Applications Lecture 7: Supermodular Games

Models of Strategic Reasoning Lecture 2

Game Theory. Bargaining Theory. ordi Massó. International Doctorate in Economic Analysis (IDEA) Universitat Autònoma de Barcelona (UAB)

The Query Complexity of Correlated Equilibria

Belief-based Learning

Correlated Equilibrium in Games with Incomplete Information

Game Theory: Spring 2017

Robust Learning Equilibrium

CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria

Advanced Machine Learning

Learning correlated equilibria in games with compact sets of strategies

Economics 703 Advanced Microeconomics. Professor Peter Cramton Fall 2017

Learning in Mean Field Games

MS&E 246: Lecture 4 Mixed strategies. Ramesh Johari January 18, 2007

: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8

Evolution & Learning in Games

Consistent Beliefs in Extensive Form Games

Global Games I. Mehdi Shadmehr Department of Economics University of Miami. August 25, 2011

Cooperative bargaining: independence and monotonicity imply disagreement

4. Opponent Forecasting in Repeated Games

Bargaining Efficiency and the Repeated Prisoners Dilemma. Bhaskar Chakravorti* and John Conley**

Prediction and Playing Games

Small Sample of Related Literature

Area I: Contract Theory Question (Econ 206)

6.891 Games, Decision, and Computation February 5, Lecture 2

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

1 Lattices and Tarski s Theorem

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*

Distributed Learning based on Entropy-Driven Game Dynamics

Recap Social Choice Functions Fun Game Mechanism Design. Mechanism Design. Lecture 13. Mechanism Design Lecture 13, Slide 1

Algorithmic Strategy Complexity

Near-Potential Games: Geometry and Dynamics

Mathematical Economics - PhD in Economics

Game Theory Fall 2003

An Experiment to Evaluate Bayesian Learning of Nash Equilibrium Play

The Game of Normal Numbers

Computational Evolutionary Game Theory and why I m never using PowerPoint for another presentation involving maths ever again

Incentive Compatible Regression Learning

Nonlinear Dynamics between Micromotives and Macrobehavior

LONG PERSUASION GAMES

Achieving Pareto Optimality Through Distributed Learning

Minimax regret and strategic uncertainty

Convergence and No-Regret in Multiagent Learning

Coevolutionary Modeling in Networks 1/39

Sequential Equilibria of Multi-Stage Games with Infinite Sets of Types and Actions

Stackelberg-solvable games and pre-play communication

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo

Math 152: Applicable Mathematics and Computing

Correlated Equilibria of Classical Strategic Games with Quantum Signals

Title: The Castle on the Hill. Author: David K. Levine. Department of Economics UCLA. Los Angeles, CA phone/fax

Game Theory. Professor Peter Cramton Economics 300

A (Brief) Introduction to Game Theory

On the Stability of the Cournot Equilibrium: An Evolutionary Approach

General-sum games. I.e., pretend that the opponent is only trying to hurt you. If Column was trying to hurt Row, Column would play Left, so

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

Hybrid Machine Learning Algorithms

Tijmen Daniëls Universiteit van Amsterdam. Abstract

Secure Protocols or How Communication Generates Correlation

Theory Field Examination Game Theory (209A) Jan Question 1 (duopoly games with imperfect information)

Algorithmic Game Theory and Applications. Lecture 4: 2-player zero-sum games, and the Minimax Theorem

Better-Reply Dynamics with Bounded Recall

Uncoupled automata and pure Nash equilibria

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin

Quantum Games Have No News for Economists 1

Informed Principal in Private-Value Environments

Learning to play partially-specified equilibrium

Transcription:

Walras-Bowley Lecture 2003 Sergiu Hart This version: September 2004 SERGIU HART c 2004 p. 1

ADAPTIVE HEURISTICS A Little Rationality Goes a Long Way Sergiu Hart Center for Rationality, Dept. of Economics, Dept. of Mathematics The Hebrew University of Jerusalem http://www.ma.huji.ac.il/~hart SERGIU HART c 2004 p. 2

Most of this talk is based on joint work with Andreu Mas-Colell (Universitat Pompeu Fabra, Barcelona) SERGIU HART c 2004 p. 3

Most of this talk is based on joint work with Andreu Mas-Colell (Universitat Pompeu Fabra, Barcelona) All papers and this presentation are available on my home page http://www.ma.huji.ac.il/~hart SERGIU HART c 2004 p. 3

Papers http://www.ma.huji.ac.il/~hart SERGIU HART c 2004 p. 4

Papers http://www.ma.huji.ac.il/~hart Econometrica (2000) Journal of Economic Theory (2001) Economic Essays (2001) Games and Economic Behavior (2003) American Economic Review (2003) SERGIU HART c 2004 p. 4

PART I Introduction: Dynamics SERGIU HART c 2004 p. 5

Dynamics SERGIU HART c 2004 p. 6

Dynamics Learning SERGIU HART c 2004 p. 6

Dynamics Learning START: prior beliefs STEP: observe update (Bayes) optimize (best-reply) REPEAT SERGIU HART c 2004 p. 6

Dynamics Evolution SERGIU HART c 2004 p. 7

Dynamics Evolution populations each individual fixed action ( gene ) frequencies of each action in the population mixed strategy SERGIU HART c 2004 p. 7

Dynamics Evolution populations each individual fixed action ( gene ) frequencies of each action in the population mixed strategy Change: Selection higher payoff higher frequency Mutation random and relatively rare SERGIU HART c 2004 p. 7

Dynamics Adaptive Heuristics SERGIU HART c 2004 p. 8

Dynamics Adaptive Heuristics rules of thumb myopic simple stimulus response, reinforcement behavioral, experiments non-bayesian SERGIU HART c 2004 p. 8

Dynamics Adaptive Heuristics rules of thumb myopic simple stimulus response, reinforcement behavioral, experiments non-bayesian Example: Fictitious Play SERGIU HART c 2004 p. 8

Dynamics Adaptive Heuristics rules of thumb myopic simple stimulus response, reinforcement behavioral, experiments non-bayesian Example: Fictitious Play (Play optimally against the empirical distribution of past play of the other player) SERGIU HART c 2004 p. 8

?. jevolution. jheuristics. Learning? SERGIU HART c 2004 p. 9

Rationality. jevolution. jheuristics. Learning Rationality SERGIU HART c 2004 p. 9

Rationality. jevolution. jheuristics. Learning Rationality most of this talk SERGIU HART c 2004 p. 9

Can simple adaptive heuristics lead to sophisticated rational behavior? SERGIU HART c 2004 p. 10

Game N-person game in strategic (normal) form Players i = 1, 2,..., N SERGIU HART c 2004 p. 11

Game N-person game in strategic (normal) form Players i = 1, 2,..., N For each player i: Actions s i in S i SERGIU HART c 2004 p. 11

Game N-person game in strategic (normal) form Players i = 1, 2,..., N For each player i: Actions s i in S i For each player i: Payoffs (utilities) u i (s) u i (s 1, s 2,..., s N ) SERGIU HART c 2004 p. 11

Dynamics Dynamics Time t = 1, 2,... SERGIU HART c 2004 p. 12

Dynamics Dynamics Time t = 1, 2,... At time t each player i chooses an action s i t in S i SERGIU HART c 2004 p. 12

PART II Regret Matching SERGIU HART c 2004 p. 13

Advertisement DON T YOU FEEL A PANG OF REGRET? 47.15% YIELD 47.15% January May 2003 17.92% 4.43% XYZ Nasdaq Dow Jones Don t wait! Ask your broker today Haaretz June 3, 2003 SERGIU HART c 2004 p. 14

Regret Matching REGRET MATCHING = Switch next period to a different action with a probability that is proportional to the regret for that action SERGIU HART c 2004 p. 15

Regret Matching REGRET MATCHING = Switch next period to a different action with a probability that is proportional to the regret for that action REGRET = increase in payoff had such a change always been made in the past SERGIU HART c 2004 p. 15

Regret U = average payoff up to now SERGIU HART c 2004 p. 16

Regret U = average payoff up to now V (k) = average payoff if action k had been played instead of the current action j every time in the past that j was played SERGIU HART c 2004 p. 16

Regret U = average payoff up to now V (k) = average payoff if action k had been played instead of the current action j every time in the past that j was played R(k) = [V (k) U] + = regret for action k SERGIU HART c 2004 p. 16

Regret U = average payoff up to now V (k) = average payoff if action k had been played instead of the current action j every time in the past that j was played R(k) = [V (k) U] + = regret for action k R(k) R i T (j k) = [ 1 T ) t T : s (u ] it = j i (k, s i t ) u i (s t ) + SERGIU HART c 2004 p. 16

Regret U = average payoff up to now V (k) = average payoff if action k had been played instead of the current action j every time in the past that j was played R(k) = [V (k) U] + = regret for action k R(k) R i T (j k) = [ 1 T ) t T : s (u ] it = j i (k, s i t ) u i (s t ) + SERGIU HART c 2004 p. 16

Regret U = average payoff up to now V (k) = average payoff if action k had been played instead of the current action j every time in the past that j was played R(k) = [V (k) U] + = regret for action k R(k) R i T (j k) = [ 1 T ) t T : s (u ] it = j i (k, s i t ) u i (s t ) + SERGIU HART c 2004 p. 16

Regret U = average payoff up to now V (k) = average payoff if action k had been played instead of the current action j every time in the past that j was played R(k) = [V (k) U] + = regret for action k R(k) R i T (j k) = [ 1 T ) t T : s (u ] it = j i (k, s i t ) u i (s t ) + SERGIU HART c 2004 p. 16

Regret U = average payoff up to now V (k) = average payoff if action k had been played instead of the current action j every time in the past that j was played R(k) = [V (k) U] + = regret for action k R(k) R i T (j k) = [ 1 T ) t T : s (u ] it = j i (k, s i t ) u i (s t ) + SERGIU HART c 2004 p. 16

Regret Matching Next period play: Switch to action k with a probability that is proportional to the regret R(k) (for k j) SERGIU HART c 2004 p. 17

Regret Matching Next period play: Switch to action k with a probability that is proportional to the regret R(k) (for k j) Play the same action j of last period with the remaining probability SERGIU HART c 2004 p. 17

Regret Matching Next period play: Switch to action k with a probability that is proportional to the regret R(k) (for k j) Play the same action j of last period with the remaining probability σ(k) σ i T+1 (k) = cr(k), for k j σ(j) σ i T+1 (j) = 1 k j cr(k) SERGIU HART c 2004 p. 17

Regret Matching Next period play: Switch to action k with a probability that is proportional to the regret R(k) (for k j) Play the same action j of last period with the remaining probability σ(k) σ i T+1 (k) = cr(k), for k j σ(j) σ i T+1 (j) = 1 k j cr(k) c = a fixed positive constant (so that the probability of not switching is > 0) SERGIU HART c 2004 p. 17

Regret Matching Theorem Theorem If all players play Regret Matching then the joint distribution of play converges to the set of CORRELATED EQUILIBRIA of the game SERGIU HART c 2004 p. 18

Joint Distribution of Play Joint distribution of play z T = The relative frequencies that the N-tuples of actions have been played up to time T SERGIU HART c 2004 p. 19

Joint Distribution of Play Joint distribution of play z T = The relative frequencies that the N-tuples of actions have been played up to time T SERGIU HART c 2004 p. 19

Joint Distribution of Play Joint distribution of play z T = The relative frequencies that the N-tuples of actions have been played up to time T T = 1 SERGIU HART c 2004 p. 19

Joint Distribution of Play Joint distribution of play z T = The relative frequencies that the N-tuples of actions have been played up to time T T = 1 0 0 0 1 0 0 z 1 SERGIU HART c 2004 p. 19

Joint Distribution of Play Joint distribution of play z T = The relative frequencies that the N-tuples of actions have been played up to time T T = 2 0 0 1/2 1/2 0 0 z 2 SERGIU HART c 2004 p. 19

Joint Distribution of Play Joint distribution of play z T = The relative frequencies that the N-tuples of actions have been played up to time T T = 3 0 0 2/3 1/3 0 0 z 3 SERGIU HART c 2004 p. 19

Joint Distribution of Play Joint distribution of play z T = The relative frequencies that the N-tuples of actions have been played up to time T T = 10 3/10 0 2/10 1/10 3/10 1/10 z 10 SERGIU HART c 2004 p. 19

Joint Distribution of Play Note 1: The fact that the players randomize independently at each period does not imply that the joint distribution is independent! T = 10 3/10 0 2/10 1/10 3/10 1/10 z 10 SERGIU HART c 2004 p. 19

Joint Distribution of Play Note 1: The fact that the players randomize independently at each period does not imply that the joint distribution is independent! Note 2: Players observe the joint distribution (the history of play) SERGIU HART c 2004 p. 19

Joint Distribution of Play Note 1: The fact that the players randomize independently at each period does not imply that the joint distribution is independent! Note 2: Players observe the joint distribution (the history of play) Note 3: Players react to the joint distribution (patterns, coincidences, communication, signals,...) SERGIU HART c 2004 p. 19

Correlated Equilibrium A Correlated Equilibrium is a Nash equilibrium when the players receive payoff-irrelevant signals before playing the game (Aumann 1974) SERGIU HART c 2004 p. 20

Correlated Equilibrium A Correlated Equilibrium is a Nash equilibrium when the players receive payoff-irrelevant signals before playing the game (Aumann 1974) Examples: SERGIU HART c 2004 p. 20

Correlated Equilibrium A Correlated Equilibrium is a Nash equilibrium when the players receive payoff-irrelevant signals before playing the game (Aumann 1974) Examples: Independent signals SERGIU HART c 2004 p. 20

Correlated Equilibrium A Correlated Equilibrium is a Nash equilibrium when the players receive payoff-irrelevant signals before playing the game (Aumann 1974) Examples: Independent signals Nash equilibrium SERGIU HART c 2004 p. 20

Correlated Equilibrium A Correlated Equilibrium is a Nash equilibrium when the players receive payoff-irrelevant signals before playing the game (Aumann 1974) Examples: Independent signals Nash equilibrium Public signals ( sunspots ) SERGIU HART c 2004 p. 20

Correlated Equilibrium A Correlated Equilibrium is a Nash equilibrium when the players receive payoff-irrelevant signals before playing the game (Aumann 1974) Examples: Independent signals Nash equilibrium Public signals ( sunspots ) convex combinations of Nash equilibria SERGIU HART c 2004 p. 20

Correlated Equilibrium A Correlated Equilibrium is a Nash equilibrium when the players receive payoff-irrelevant signals before playing the game (Aumann 1974) Examples: Independent signals Nash equilibrium Public signals ( sunspots ) convex combinations of Nash equilibria Butterflies play the Chicken Game ( Speckled Wood Pararge aegeria) SERGIU HART c 2004 p. 20

Correlated Equilibria "Chicken" game LEAVE STAY LEAVE 5, 5 3, 6 STAY 6, 3 0, 0 SERGIU HART c 2004 p. 21

Correlated Equilibria "Chicken" game LEAVE STAY LEAVE 5, 5 3, 6 STAY 6, 3 0, 0 a Nash equilibrium SERGIU HART c 2004 p. 21

Correlated Equilibria "Chicken" game LEAVE STAY LEAVE 5, 5 3, 6 STAY 6, 3 0, 0 another Nash equilibriumi SERGIU HART c 2004 p. 21

Correlated Equilibria "Chicken" game LEAVE STAY LEAVE 5, 5 3, 6 STAY 6, 3 0, 0 L 0 1/2 1/2 0 a (publicly) correlated equilibrium SERGIU HART c 2004 p. 21

Correlated Equilibria "Chicken" game LEAVE STAY LEAVE 5, 5 3, 6 STAY 6, 3 0, 0 L S L 1/3 1/3 S 1/3 0 another correlated equilibrium after signal L play LEAVE after signal S play STAY SERGIU HART c 2004 p. 21

Correlated Equilibrium A Correlated Equilibrium is a Nash equilibrium when the players receive payoff-irrelevant signals before playing the game (Aumann 1974) Examples: Independent signals Nash equilibrium Public signals ( sunspots ) convex combinations of Nash equilibria Butterflies play the Chicken Game ( Speckled Wood Pararge aegeria) SERGIU HART c 2004 p. 22

Correlated Equilibrium A Correlated Equilibrium is a Nash equilibrium when the players receive payoff-irrelevant signals before playing the game (Aumann 1974) Examples: Independent signals Nash equilibrium Public signals ( sunspots ) convex combinations of Nash equilibria Butterflies play the Chicken Game ( Speckled Wood Pararge aegeria) Boston Celtics front line SERGIU HART c 2004 p. 22

Correlated Equilibrium Signals (public, correlated) are unavoidable SERGIU HART c 2004 p. 23

Correlated Equilibrium Signals (public, correlated) are unavoidable Bayesian Rationality Correlated Equilibrium (Aumann 1987) SERGIU HART c 2004 p. 23

Correlated Equilibrium Signals (public, correlated) are unavoidable Bayesian Rationality Correlated Equilibrium (Aumann 1987) A joint distribution z is a correlated equilibrium u(j, s i )z(j, s i ) u(k, s i )z(j, s i ) s i s i for all i N and all j, k S i SERGIU HART c 2004 p. 23

Regret Matching Theorem [recall] Theorem If all players play Regret Matching then the joint distribution of play converges to the set of CORRELATED EQUILIBRIA of the game SERGIU HART c 2004 p. 24

Regret Matching Theorem CE = set of correlated equilibria z T = joint distribution of play up to time T distance(z T, CE) 0 as T (a.s.) SERGIU HART c 2004 p. 25

Regret Matching Theorem CE = set of correlated equilibria z T = joint distribution of play up to time T distance(z T, CE) 0 as T (a.s.) z T is approximately a correlated equilibrium (or z T is a correlated approximate equilibrium) from some time on (for all large enough T ) SERGIU HART c 2004 p. 25

Regret Matching Theorem Proof z T is a correlated equilibrium there is no regret: RT i (j k) = 0 for all players and all actions SERGIU HART c 2004 p. 26

Regret Matching Theorem Proof z T is a correlated equilibrium there is no regret: RT i (j k) = 0 for all players and all actions Regret Matching all regrets converge to 0 (Proof: Blackwell Approachability for the vector of regrets + approximate eigenvector probabilities by transition probabilities) SERGIU HART c 2004 p. 26

Regret Matching Theorem Proof z T is a correlated equilibrium there is no regret: RT i (j k) = 0 for all players and all actions Regret Matching all regrets converge to 0 (Proof: Blackwell Approachability for the vector of regrets + approximate eigenvector probabilities by transition probabilities) Note: z T converges to the set CE, not to a point SERGIU HART c 2004 p. 26

Remarks Correlating device: the history of play SERGIU HART c 2004 p. 27

Remarks Correlating device: the history of play Other procedures leading to correlated equilibria: Foster Vohra 1997 Calibrated Learning: best-reply to calibrated forecasts Fudenberg Levine 1999 Conditional Smooth Fictitious Play Eigenvector strategy SERGIU HART c 2004 p. 27

Remarks Correlating device: the history of play Other procedures leading to correlated equilibria: Foster Vohra 1997 Calibrated Learning: best-reply to calibrated forecasts Fudenberg Levine 1999 Conditional Smooth Fictitious Play Eigenvector strategy Not heuristics! SERGIU HART c 2004 p. 27

Behavioral Aspects Behavioral aspects of Regret Matching: SERGIU HART c 2004 p. 28

Behavioral Aspects Behavioral aspects of Regret Matching: Commonly used rules of behavior SERGIU HART c 2004 p. 28

Behavioral Aspects Behavioral aspects of Regret Matching: Commonly used rules of behavior Never change a winning team SERGIU HART c 2004 p. 28

Behavioral Aspects Behavioral aspects of Regret Matching: Commonly used rules of behavior Never change a winning team The higher would have been the payoff from another action the higher the tendency to switch to it SERGIU HART c 2004 p. 28

Behavioral Aspects Behavioral aspects of Regret Matching: Commonly used rules of behavior Never change a winning team The higher would have been the payoff from another action the higher the tendency to switch to it Small probability of switching (the status quo bias") SERGIU HART c 2004 p. 28

Behavioral Aspects Behavioral aspects of Regret Matching: Commonly used rules of behavior Never change a winning team The higher would have been the payoff from another action the higher the tendency to switch to it Small probability of switching (the status quo bias") Stimulus-response, reinforcement SERGIU HART c 2004 p. 28

Behavioral Aspects Behavioral aspects of Regret Matching: Commonly used rules of behavior Never change a winning team The higher would have been the payoff from another action the higher the tendency to switch to it Small probability of switching (the status quo bias") Stimulus-response, reinforcement No beliefs (defined directly on actions) No best-reply (better-reply?) SERGIU HART c 2004 p. 28

Behavioral Aspects Similar to models of learning, experimental and behavioral economics: SERGIU HART c 2004 p. 29

Behavioral Aspects Similar to models of learning, experimental and behavioral economics: Bush Mosteller 1955 Erev Roth 1995, 1998 Camerer Ho 1997, 1998, 1999... SERGIU HART c 2004 p. 29

Behavioral Aspects Similar to models of learning, experimental and behavioral economics: Bush Mosteller 1955 Erev Roth 1995, 1998 Camerer Ho 1997, 1998, 1999... SERGIU HART c 2004 p. 29

Behavioral Aspects Similar to models of learning, experimental and behavioral economics: Bush Mosteller 1955 Erev Roth 1995, 1998 Camerer Ho 1997, 1998, 1999... N. Camille et al, The Involvement of the Orbitofrontal Cortex in the Experience of Regret Science May 2004 (304: 1167 1170) SERGIU HART c 2004 p. 29

PART III Generalized Regret Matching SERGIU HART c 2004 p. 30

Questions How special is Regret Matching? SERGIU HART c 2004 p. 31

Questions How special is Regret Matching? Why does conditional smooth fictitious play work? SERGIU HART c 2004 p. 31

Questions How special is Regret Matching? Why does conditional smooth fictitious play work? Any connections? SERGIU HART c 2004 p. 31

Generalized Regret Matching Regret Matching = Switching probabilities are proportional to the regrets: σ(k) = cr(k) SERGIU HART c 2004 p. 32

Generalized Regret Matching Regret Matching = Switching probabilities are proportional to the regrets: σ(k) = cr(k) Generalized Regret Matching = Switching probabilities are a function of the regrets: σ(k) = f(r(k)) SERGIU HART c 2004 p. 32

Generalized Regret Matching Regret Matching = Switching probabilities are proportional to the regrets: σ(k) = cr(k) Generalized Regret Matching = Switching probabilities are a function of the regrets: σ(k) = f(r(k)) f is a sign-preserving function: f(0) = 0, and x > 0 f(x) > 0 f is a Lipschitz continuous function (in fact, much more general: f k,j, potential) SERGIU HART c 2004 p. 32

Generalized Regret Matching Theorem If all players play Generalized Regret Matching then the joint distribution of play converges to the set of correlated equilibria of the game SERGIU HART c 2004 p. 33

Generalized Regret Matching Theorem If all players play Generalized Regret Matching then the joint distribution of play converges to the set of correlated equilibria of the game Proof: Universal approachability strategies + Amotz Cahn, M.Sc. thesis, 2000 SERGIU HART c 2004 p. 33

Special Cases Play probabilities proportional to the m-th power of the regrets (f(x) = cx m, for m 1) SERGIU HART c 2004 p. 34

Special Cases Play probabilities proportional to the m-th power of the regrets (f(x) = cx m, for m 1) m = 1: Regret Matching SERGIU HART c 2004 p. 34

Special Cases Play probabilities proportional to the m-th power of the regrets (f(x) = cx m, for m 1) m = 1: Regret Matching m = : Positive probability only to actions with maximal regret SERGIU HART c 2004 p. 34

Special Cases Play probabilities proportional to the m-th power of the regrets (f(x) = cx m, for m 1) m = 1: Regret Matching m = : Positive probability only to actions with maximal regret Conditional Fictitious Play SERGIU HART c 2004 p. 34

Special Cases Play probabilities proportional to the m-th power of the regrets (f(x) = cx m, for m 1) m = 1: Regret Matching m = : Positive probability only to actions with maximal regret Conditional Fictitious Play But: Not continuous SERGIU HART c 2004 p. 34

Special Cases Play probabilities proportional to the m-th power of the regrets (f(x) = cx m, for m 1) m = 1: Regret Matching m = : Positive probability only to actions with maximal regret Conditional Fictitious Play But: Not continuous Therefore: Smooth Conditional Fictitious Play SERGIU HART c 2004 p. 34

PART IV Unknown Game SERGIU HART c 2004 p. 35

Unknown Game The case of the Unknown game : The player knows only Its own set of actions Its own past actions and payoffs SERGIU HART c 2004 p. 36

Unknown Game The case of the Unknown game : The player knows only Its own set of actions Its own past actions and payoffs The player does not know the game (other players, actions, payoff functions, history of other players actions and payoffs) SERGIU HART c 2004 p. 36

Proxy Regret Unknown game Unknown regret (The player does not know what the payoff would have been if he had played a different action k) SERGIU HART c 2004 p. 37

Proxy Regret Unknown game Unknown regret (The player does not know what the payoff would have been if he had played a different action k) Proxy Regret for k: Use the payoffs received when k has been actually played in the past SERGIU HART c 2004 p. 37

Proxy Regret Unknown game Unknown regret (The player does not know what the payoff would have been if he had played a different action k) Proxy Regret for k: Use the payoffs received when k has been actually played in the past Theorem. If all players play strategies based on proxy regret, then the joint distribution of play converges to the set of correlated equilibria of the game SERGIU HART c 2004 p. 37

PART V Uncoupled Dynamics SERGIU HART c 2004 p. 38

Nash Equilibrium Question: Adaptive heuristics Nash equilibria? SERGIU HART c 2004 p. 39

Nash Equilibrium Question: Adaptive heuristics Nash equilibria? In SPECIAL classes of games: YES SERGIU HART c 2004 p. 39

Nash Equilibrium Question: Adaptive heuristics Nash equilibria? In SPECIAL classes of games: YES Fictitious play, Regret-based,... Two-person zero-sum games Two-person potential games Supermodular games... SERGIU HART c 2004 p. 39

Nash Equilibrium Question: Adaptive heuristics Nash equilibria? In SPECIAL classes of games: YES Fictitious play, Regret-based,... Two-person zero-sum games Two-person potential games Supermodular games... In GENERAL games: NO SERGIU HART c 2004 p. 39

Nash Equilibrium Question: Adaptive heuristics Nash equilibria? In SPECIAL classes of games: YES Fictitious play, Regret-based,... Two-person zero-sum games Two-person potential games Supermodular games... In GENERAL games: NO WHY? SERGIU HART c 2004 p. 39

Uncoupled Dynamics General dynamic for 2-person games: ẋ(t) = F ( x(t), y(t) ; u 1, u 2 ) ẏ(t) = G ( x(t), y(t) ; u 1, u 2 ) x(t) (S 1 ), y(t) (S 2 ) SERGIU HART c 2004 p. 40

Uncoupled Dynamics General dynamic for 2-person games: ẋ(t) = F ( x(t), y(t) ; u 1, u 2 ) ẏ(t) = G ( x(t), y(t) ; u 1, u 2 ) Uncoupled dynamic: ẋ(t) = F ( x(t), y(t) ; u 1 ) ẏ(t) = G ( x(t), y(t) ; u 2 ) x(t) (S 1 ), y(t) (S 2 ) SERGIU HART c 2004 p. 40

Uncoupled Dynamics Adaptive ( rational ) dynamics (best-reply, better-reply, payoff-improving, monotonic, fictitious play, regret-based, replicator dynamics,...) are uncoupled SERGIU HART c 2004 p. 41

Uncoupled Dynamics Adaptive ( rational ) dynamics (best-reply, better-reply, payoff-improving, monotonic, fictitious play, regret-based, replicator dynamics,...) are uncoupled Uncoupledness is a natural informational condition SERGIU HART c 2004 p. 41

Nash-Convergent Dynamics Consider a family of games, each having a unique Nash equilibrium (no coordination problems ) SERGIU HART c 2004 p. 42

Nash-Convergent Dynamics Consider a family of games, each having a unique Nash equilibrium (no coordination problems ) A dynamic is Nash-convergent if it always converges to the unique Nash equilibrium Regularity conditions: The unique Nash equilibrium is a stable rest-point of the dynamic SERGIU HART c 2004 p. 42

Impossibility There exist no uncoupled dynamics which guarantee Nash convergence SERGIU HART c 2004 p. 43

Impossibility There exist no uncoupled dynamics which guarantee Nash convergence There are simple families of games whose unique Nash equilibrium is unstable for every uncoupled dynamic SERGIU HART c 2004 p. 43

Impossibility Adaptive ( rational ) dynamics (best-reply, better-reply, payoff-improving, monotonic, fictitious play, regret-based, replicator dynamics,...) are uncoupled SERGIU HART c 2004 p. 44

Impossibility Adaptive ( rational ) dynamics (best-reply, better-reply, payoff-improving, monotonic, fictitious play, regret-based, replicator dynamics,...) are uncoupled cannot always converge to Nash equilibria SERGIU HART c 2004 p. 44

Nash vs Correlated Correlated equilibria Uncoupled dynamics SERGIU HART c 2004 p. 45

Nash vs Correlated Correlated equilibria Uncoupled dynamics Nash equilibria Coupled dynamics SERGIU HART c 2004 p. 45

Nash vs Correlated Correlated equilibria Uncoupled dynamics Nash equilibria Coupled dynamics Law of Conservation of Coordination There must be coordination either in the equilibrium concept or in the dynamic SERGIU HART c 2004 p. 45

Summary SERGIU HART c 2004 p. 46

Where Do We Go From Here? SERGIU HART c 2004 p. 47

Where Do We Go From Here? Dynamics and equilibria Which equilibria? Which dynamics? SERGIU HART c 2004 p. 47

Where Do We Go From Here? Dynamics and equilibria Which equilibria? Which dynamics? Correlated equilibria: theory and practice Coordination Communication Bounded complexity SERGIU HART c 2004 p. 47

Where Do We Go From Here? Dynamics and equilibria Which equilibria? Which dynamics? Correlated equilibria: theory and practice Coordination Communication Bounded complexity Experiments, empirics Theory SERGIU HART c 2004 p. 47

Where Do We Go From Here? Dynamics and equilibria Which equilibria? Which dynamics? Correlated equilibria: theory and practice Coordination Communication Bounded complexity Experiments, empirics Theory Joint distribution of play (instead of just the marginals) SERGIU HART c 2004 p. 47

Summary SERGIU HART c 2004 p. 48

Summary Therejis a simple adaptive heuristic always leading to correlated equilibria SERGIU HART c 2004 p. 48

Summary Therejis a simple adaptive heuristic always leading to correlated equilibria (Regret Matching) SERGIU HART c 2004 p. 48

Summary Therejis a simple adaptive heuristic always leading to correlated equilibria (Regret Matching) There are many adaptive heuristics always leading to correlated equilibria SERGIU HART c 2004 p. 48

Summary Therejis a simple adaptive heuristic always leading to correlated equilibria (Regret Matching) There are many adaptive heuristics always leading to correlated equilibria (Generalized Regret Matching) SERGIU HART c 2004 p. 48

Summary Therejis a simple adaptive heuristic always leading to correlated equilibria (Regret Matching) There are many adaptive heuristics always leading to correlated equilibria (Generalized Regret Matching) There can be jno adaptive heuristics always leading to Nash equilibria SERGIU HART c 2004 p. 48

Summary Therejis a simple adaptive heuristic always leading to correlated equilibria (Regret Matching) There are many adaptive heuristics always leading to correlated equilibria (Generalized Regret Matching) There can be jno adaptive heuristics always leading to Nash equilibria (Uncoupledness) SERGIU HART c 2004 p. 48

Summary ADAPTIVE HEURISTICS SERGIU HART c 2004 p. 49

Summary BEHAVIORAL ADAPTIVE HEURISTICS SERGIU HART c 2004 p. 49

Summary BEHAVIORAL ADAPTIVE HEURISTICS CORRELATED EQUILIBRIA SERGIU HART c 2004 p. 49

Summary BEHAVIORAL ADAPTIVE HEURISTICS CORRELATED EQUILIBRIA Regret Matching SERGIU HART c 2004 p. 49

Summary BEHAVIORAL ADAPTIVE HEURISTICS CORRELATED EQUILIBRIA Generalized Regret Matching SERGIU HART c 2004 p. 49

Summary BEHAVIORAL ADAPTIVE HEURISTICS CORRELATED EQUILIBRIA Uncoupledness SERGIU HART c 2004 p. 49

Question Can simple adaptive heuristics lead to sophisticated rational behavior? SERGIU HART c 2004 p. 50

Answer Q Can simple adaptive heuristics lead to sophisticated rational behavior? YES! SERGIU HART c 2004 p. 50

Answer Q Can simple adaptive heuristics lead to sophisticated rational behavior? YES! in time... SERGIU HART c 2004 p. 50

Summary Macro BEHAVIORAL RATIONAL SERGIU HART c 2004 p. 51

Summary Macro BEHAVIORAL RATIONAL ADAPTIVE HEURISTICS SERGIU HART c 2004 p. 51

Title ADAPTIVE HEURISTICS ( A Little Rationality Goes a Long Way) SERGIU HART c 2004 p. 52

Title ADAPTIVE HEURISTICS (A Little Rationality Goes a Long Way) Rationality Takes Time SERGIU HART c 2004 p. 52

Title ADAPTIVE HEURISTICS (A Little Rationality Goes a Long Way) Rationality Takes Time Regret?... No Regret! SERGIU HART c 2004 p. 52

Title ADAPTIVE HEURISTICS (A Little Rationality Goes a Long Way) Rationality Takes Time Regret?... No Regret! SERGIU HART c 2004 p. 52