Dynamic Level-k Model in Games Teck Ho and Xuanming Su UC erkeley March, 2010 Teck Hua Ho 1
4-stage Centipede Game 4 2 16 8 1 8 4 32 1 2 3 4 64 16 5 Outcome Round 1 2 3 4 5 1 5 6.2% 30.3% 35.9% 20.0% 7.6% 6 10 8.1% 41.2% 38.2% 10.3% 2.2% ackward Induction 100% 0% 0% 0% 0% March, 2010 Teck Hua Ho 2
6-Stage Centipede Game 4 2 16 8 1 8 4 32 64 32 16 128 1 2 3 4 5 6 Outcome Round 1 2 3 4 5 6 7 1 5 0.0% 5.5% 17.2% 33.1% 33.1% 9.00% 2.10% 6 10 1.5% 7.4% 22.8% 44.1% 16.9% 6.60% 0.70% 256 64 7 ackward Induction 100% 0% 0% 0% 0% 0% 0% March, 2010 Teck Hua Ho 3
Outline ackward did induction i and dits systematic violations il i Dynamic Level-k model and the main theoretical results The centipede game Resolving well-known paradoxes: o o Cooperation in finitely repeated prisoner ss dilemma Chain-store paradox n empirical application: The centipede game lternative explanations o o Reputation-based story Social preferences March, 2010 Teck Hua Ho 4
ackward Induction Principle ackward did induction is the most widely idl accepted tdprinciple il to generate prediction in dynamic games of complete information Extensive-form games (e.g., Centipede) Finitely repeated games (e.g., Repeated PD and chain-store paradox) Multi-person dynamic programming For the principle p to work, every yplayer must be willingness to bet on others rationality March, 2010 Teck Hua Ho 5
Violations of ackward Induction Well-known violations in economic experiments include: (http://en.wikipedia.org/wiki/ackward_induction ): Passing in the centipede game Cooperation in the finitely repeated PD Chain-store paradox Likely to be a failure of mutual consistency condition (different people make initial different bets on others rationality) March, 2010 Teck Hua Ho 6
Standard ssumptions in Equilibrium nalysis ssumptions ackward DLk Solution Method Induction Model Strategic Thinking X X est Response X X Mutual Consistency X? Instant t Equilibration X? March, 2010 Teck Hua Ho 7
Notations S : Total number of subgames (indexed by s) I : Total number of players (indexed by i) N s : Total number of players who are active at subgame s 4 2 16 8 1 8 4 32 64 16 S = 4, I = 2, N = N = N = N = 4 1 2 3 4 = 1 March, 2010 Teck Hua Ho 8
Deviation from ackward Induction δ ( I 1 L 1,..., L, G ) = ) S s 1 N s i 1 S N s 1 i D s ( L, L = = D s (L i,l ) = 1, 0, a ( L i ) a ( L otherwise ) 0 δ (.) 1 March, 2010 Teck Hua Ho 9
Examples Examples 64 4 2 16 8 64 16 4 1 2 8 16 4 8 32 } { } { }; { T T T T L T P L T P L E1 2 1 0 ] 0 1 [1 ),, ( },,, { },,,, { };,,, { 4 1 4 = + + + = = = = G L L T T T T L T P L T P L δ Ex1: 1 0 ] 0 0 [1 ) ( },,, { },,,, { };,,, { 1 + + + = = = G L L T T T T L T T L T P L δ Ex2: March, 2010 Teck Hua Ho 10 4 0 ] 0 0 [1 ),, ( 4 1 4 = + + + = G L L δ
Systematic Violation 1: Limited Induction 4 1 2 8 16 4 8 32 4 2 16 8 1 8 4 32 64 16 64 32 16 128 256 64 δ ( L, L, G ) < δ ( L, L, G ( 4 G6 ) March, 2010 Teck Hua Ho 11
Limited Induction in Centipede Game Figure 1: Deviation in 4-stage versus 6-stage game (1 st round) March, 2010 Teck Hua Ho 12
Systematic Violation 2: Time Unraveling 64 16 4 1 2 8 16 4 8 32 δ ( L ( t ), L ( t ), G ) 0 as t March, 2010 Teck Hua Ho 13
Time Unraveling in Centipede Game Figure 2: Deviation in 1 st vs. 10 th round of the 4-stage game March, 2010 Teck Hua Ho 14
Outline ackward did induction i and dits systematic violations il i Level-k model and the main theoretical results The centipede game Resolving well-known paradoxes: o o Cooperation in finitely repeated prisoner ss dilemma Chain-store paradox n empirical application: The centipede game lternative explanations o o Reputation-based story Social preferences March, 2010 Teck Hua Ho 15
Research question To develop a good descriptive model to predict the probability of player i (i=1,,i) choosing strategy j at subgame s (s=1,.., S) in any dynamic game of complete information P ij (s) March, 2010 Teck Hua Ho 16
Criteria of a Good Model Nests backward induction as a special case ehavioral plausible Heterogeneous in their bets on others rationality Captures limited induction and time unraveling Fits data well Simple (with as few parameters as the data would allow) March, 2010 Teck Hua Ho 17
Standard ssumptions in Equilibrium nalysis ssumptions ackward Hierarchical Induction Strategizing Solution Method Strategic Thinking X X est Response X X Mutual Consistency X Heterogenous ets Instant Equilibration X Learning March, 2010 Teck Hua Ho 18
Dynamic Level-kk Model: Summary Players choose rule from a rule hierarchy h Players make differential initial bets on others chosen rules fter each game play, players observe others rules (e.g., strategy method) Players update their beliefs on rules chosen by others Players always choose a rule to maximize their subjective expected utility in each round March, 2010 Teck Hua Ho 19
Dynamic Level-k Model: Rule Hierarchy Players choose rule from a rule hierarchy h generated dby bestresponses Rule hierarchy: L, L1, L2,... 0 L L ( ) k = R Lk 1 Restrict t L 0 to follow behavior proposed in the existing literature L = I March, 2010 Teck Hua Ho 20
Dynamic Level-k Model: Poisson Initial elief Different people make different initial i i bets on others chosen rules Poisson distributed initial beliefs: f ( K) = e τ K λ K! λ : average belief of rules used by opponents f(k) fraction of players think that their opponents use L k-1 rule. March, 2010 Teck Hua Ho 21
Dynamic Level-k model: elief Updating at the End of Round t Initial belief strength: N k (0) = β Update after observing which rule opponent chose i i N ( t ) = Ν ( t 1 ) + I(k,t) 1 k k i k ( t ) = S N k ' = 0 i k N ( t) i k ( t) I(k, t) = 1 if opponent chose L k and 0 otherwise ayesian updating involving a multi-nomial distribution with a Dirichlet prior (Fudenberg and Levine, 1998; Camerer and Ho, 1999) March, 2010 Teck Hua Ho 22
Dynamic Level-k model: : Optimal lrule in Round dt+1 Optimal rule k * : k * = arg max k = 1,.., S S S s= 1 k ' = 1 i k ' ( t) π ( a ks, a k ' s ) Let the specified action of rule L k at subgame s be a ks March, 2010 Teck Hua Ho 23
The Centipede Game (Rule Hierarchy) Player Player (P,-,P-) (,, (-,P,-,P),,, (P,-,P-) (P,-,T,-) (P,-,T,-) (T,-,T,-) (-,P,-,T) (-,T,-,P) (-,T,-,T) (-,T,-,T) March, 2010 Teck Hua Ho 24
Player in 4-Stage Centipede Game N i k(t) β=0.5 Round (t) L 0 L 1 L 2 L 3 L 4 Rule Used by Opponent Optimal Rule (Player ) 0 β L 2 1 β 1 L 3 L 2 2 β 2 L 3 L 2 3 β 3 L 3 L 4 March, 2010 Teck Hua Ho 25
Dynamic Level-kk Model: Summary Players choose rule from a rule hierarchy h Players make differential initial bets on others chosen rules fter each game play, players observe others rules (e.g., strategy method) Players update their beliefs on rules chosen by others Players always choose a rule to maximize their subjective expected utility in each round 2-paramter extension of backward induction (λ and β) March, 2010 Teck Hua Ho 26
Main Theoretical Results: Limited Induction δ ( L, L, G ) < δ ( L, L, G ) ( 4 G6 March, 2010 Teck Hua Ho 27
Main Theoretical Results: Time Unraveling δ ( L ( t ), L ( t ), G ) 0 as t March, 2010 Teck Hua Ho 28
Iterated Prisoner ss Dilemma (Rule Hierarchy) 33 3,3 05 0,5 5,0 1,1 Level Strategy 0 TFT* 1 TFT,D 2 TFT,D,D 3 TFT,D,D,D K TFT,D 1,D k * Kreps et al (1982) March, 2010 Teck Hua Ho 29
Main Theoretical Results δ ( L, L, GT ) < δ ( L, L, GT '); T' > T March, 2010 Teck Hua Ho 30
Main Theoretical Results δ ( L ( t), L ( t), G) 0 as t March, 2010 Teck Hua Ho 31
Properties of Level-0 Rule Maximize group payoff: level-0 player always chooses a decision that if others do the same will lead to the largest total payoff for the group (e.g., TFT in RPD) Protect individual id payoff: While maximizing i i group payoff, a level-0 player also ensures that the chosen decision rule is robust against continued exploitation by others (e.g., TFT in RPD) March, 2010 Teck Hua Ho 32
Chain-Store Paradox (Rule Hierarchy) E OUT IN 5 1 CS FIGHT SHRE 0 2 0 2 Level Chain Store (CS) Entrant 0 FIGHT(F) GTR: OUT unless CSi is observed to share (then F,F,F,..,F,F,S 1 ENTER(E) 2 F,F,F,..,F,S,S GRE, E 3 FF F,F,,..F,S,S,S FSSS GTR,E,E E K F,..,F,S 1,..,S k GTR,E 1,..,E k-1 March, 2010 Teck Hua Ho 33
Main Theoretical Results March, 2010 Teck Hua Ho 34
Outline ackward did induction i and dits systematic violations il i Level-k model and the main theoretical results The centipede game Resolving well-known paradoxes: o o Cooperation in finitely repeated prisoner ss dilemma Chain-store paradox n empirical application: The centipede game lternative explanations o o Reputation-based story Social preferences March, 2010 Teck Hua Ho 35
4-Stage versus 6-Stage Centipede Games 4 1 2 8 16 4 8 32 4 2 16 8 1 8 4 32 64 16 64 32 16 128 256 64 March, 2010 Teck Hua Ho 36
Empirical Regularities Outcome Round 1 2 3 4 5 1 5 62% 6.2% 30.3% 3% 35.9% 20.0% 0% 76% 7.6% 6 10 8.1% 41.2% 38.2% 10.3% 2.2% Outcome Round 1 2 3 4 5 6 7 1 5 0.0% 5.5% 17.2% 33.1% 33.1% 9.00% 2.10% 6 10 1.5% 7.4% 22.8% 44.1% 16.9% 6.60% 0.70% March, 2010 Teck Hua Ho 37
Dynamic Level-k Model s Prediction in 4-stage game March, 2010 Teck Hua Ho 38
Dynamic Level-k Model s Prediction in 6-stage game March, 2010 Teck Hua Ho 39
MLE Model Estimates 11 1.1 Special cases are rejected oth heterogeneity and learning are important March, 2010 Teck Hua Ho 40
Model Predictions March, 2010 Teck Hua Ho 41
lternative 1: Gang of Four s Story y( (Kreps, et al, 1982) large θ = proportion of altruistic players (level 0 players) March, 2010 Teck Hua Ho 42
Gang of Four s Predictions (LL=-955.7) March, 2010 Teck Hua Ho 43
lternative 2: Social Preferences March, 2010 Teck Hua Ho 44
Conclusions Dynamic level-k l model lis an empirical i alternative to I Captures limited induction and time unraveling Explains violations of I in centipede game Explains paradoxical behaviors in 2 well-known games (cooperation finitely fiitl repeated tdpd, chain-store hi paradox) Dynamic level-k model can be considered a tracing procedure for backward induction (since the former converges to the latter as time goes to infinity) March, 2010 Teck Hua Ho 45