Evolution of the Theory of Mind

Similar documents
Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)

Definitions and Proofs

Puri cation 1. Stephen Morris Princeton University. July Economics.

Area I: Contract Theory Question (Econ 206)

SF2972 Game Theory Exam with Solutions March 15, 2013

6 Evolution of Networks

NTU IO (I) : Classnote 03 Meng-Yu Liang March, 2009

Extensive games (with perfect information)

Reputations. Larry Samuelson. Yale University. February 13, 2013

Refinements - change set of equilibria to find "better" set of equilibria by eliminating some that are less plausible

Evolutionary Dynamics and Extensive Form Games by Ross Cressman. Reviewed by William H. Sandholm *

Equilibria in Games with Weak Payoff Externalities

Introduction to Game Theory

EC3224 Autumn Lecture #04 Mixed-Strategy Equilibrium

Opting Out in a War of Attrition. Abstract

Bayesian Persuasion Online Appendix

Area I: Contract Theory Question (Econ 206)

Economics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection

Quantum Games Have No News for Economists 1

Consistent Beliefs in Extensive Form Games

THE EVOLUTION OF INTERTEMPORAL PREFERENCES

Economics 201B Economic Theory (Spring 2017) Bargaining. Topics: the axiomatic approach (OR 15) and the strategic approach (OR 7).

Preliminary Results on Social Learning with Partial Observations

BELIEFS & EVOLUTIONARY GAME THEORY

Equilibrium Refinements

A Note on the Existence of Ratifiable Acts

Wars of Attrition with Budget Constraints

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Nonlinear Dynamics between Micromotives and Macrobehavior

Extensive Form Games I

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

EVOLUTIONARY STABILITY FOR TWO-STAGE HAWK-DOVE GAMES

On the Impossibility of Predicting the Behavior of Rational Agents. Dean P. Foster* and H. Peyton Young ** February, 1999 This version: October, 2000

A Necessary and Sufficient Condition for Convergence of Statistical to Strategic Equilibria of Market Games

Players as Serial or Parallel Random Access Machines. Timothy Van Zandt. INSEAD (France)

WEAKLY DOMINATED STRATEGIES: A MYSTERY CRACKED

Rationalizable Partition-Confirmed Equilibrium

Learning by (limited) forward looking players

Confronting Theory with Experimental Data and vice versa. Lecture VII Social learning. The Norwegian School of Economics Nov 7-11, 2011

Rationalization of Collective Choice Functions by Games with Perfect Information. Yongsheng Xu

Neuro Observational Learning

Economics 3012 Strategic Behavior Andy McLennan October 20, 2006

Payoff Continuity in Incomplete Information Games

Uncertainty. Michael Peters December 27, 2013

Robust Knowledge and Rationality

Lecture December 2009 Fall 2009 Scribe: R. Ring In this lecture we will talk about

Rationalizable Partition-Confirmed Equilibrium

The Index of Nash Equilibria

ON FORWARD INDUCTION

Open Sequential Equilibria of Multi-Stage Games with Infinite Sets of Types and Actions

Government 2005: Formal Political Theory I

Entry under an Information-Gathering Monopoly Alex Barrachina* June Abstract

Do Shareholders Vote Strategically? Voting Behavior, Proposal Screening, and Majority Rules. Supplement

The ambiguous impact of contracts on competition in the electricity market Yves Smeers

Observations on Cooperation

Doing It Now or Later

ECO 199 GAMES OF STRATEGY Spring Term 2004 Precepts Week 7 March Questions GAMES WITH ASYMMETRIC INFORMATION QUESTIONS

Confronting Theory with Experimental Data and vice versa. European University Institute. May-Jun Lectures 7-8: Equilibrium

Bargaining, Contracts, and Theories of the Firm. Dr. Margaret Meyer Nuffield College

Cooperation in Social Dilemmas through Position Uncertainty

Evolutionary Bargaining Strategies

Distributed Learning based on Entropy-Driven Game Dynamics

Deceptive Advertising with Rational Buyers

Ex Post Cheap Talk : Value of Information and Value of Signals

Imperfect Monitoring and Impermanent Reputations

. Introduction to Game Theory Lecture Note 8: Dynamic Bayesian Games. HUANG Haifeng University of California, Merced

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Coordination and Continuous Choice

Learning Equilibrium as a Generalization of Learning to Optimize

Online Appendices for Large Matching Markets: Risk, Unraveling, and Conflation

Microeconomics. 2. Game Theory

Models of Strategic Reasoning Lecture 2

Conjectural Variations in Aggregative Games: An Evolutionary Perspective

Multiple Equilibria in the Citizen-Candidate Model of Representative Democracy.

: Cryptography and Game Theory Ran Canetti and Alon Rosen. Lecture 8

Industrial Organization Lecture 3: Game Theory

Bayesian Learning in Social Networks

Perfect Bayesian Equilibrium

Modeling Bounded Rationality of Agents During Interactions

EconS Advanced Microeconomics II Handout on Subgame Perfect Equilibrium (SPNE)

Basics of Game Theory

UC Berkeley Haas School of Business Game Theory (EMBA 296 & EWMBA 211) Summer 2016

Conditional equilibria of multi-stage games with infinite sets of signals and actions

ONLINE APPENDICES FOR INCENTIVES IN EXPERIMENTS: A THEORETICAL INVESTIGATION BY AZRIELI, CHAMBERS & HEALY

Reputation and Conflict

On the Unique D1 Equilibrium in the Stackelberg Model with Asymmetric Information Janssen, M.C.W.; Maasland, E.

Economics 209A Theory and Application of Non-Cooperative Games (Fall 2013) Extensive games with perfect information OR6and7,FT3,4and11

Belief-based Learning

Review of topics since what was covered in the midterm: Topics that we covered before the midterm (also may be included in final):

Hierarchical Bayesian Persuasion

Brown s Original Fictitious Play

Computational Evolutionary Game Theory and why I m never using PowerPoint for another presentation involving maths ever again

Evidence with Uncertain Likelihoods

Game Theory. Wolfgang Frimmel. Perfect Bayesian Equilibrium

Guilt in Games. P. Battigalli and M. Dufwenberg (AER, 2007) Presented by Luca Ferocino. March 21 st,2014

A Rothschild-Stiglitz approach to Bayesian persuasion

Game Theory and its Applications to Networks - Part I: Strict Competition

1 Lattices and Tarski s Theorem

SF2972 Game Theory Written Exam with Solutions June 10, 2011

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo

Transcription:

Evolution of the Theory of Mind Daniel Monte, São Paulo School of Economics (EESP-FGV) Nikolaus Robalino, Simon Fraser University Arthur Robson, Simon Fraser University Conference Biological Basis of Economic Preferences and Behavior Becker Friedman Institute, University of Chicago May 5, 2012

The Theory of Mind (TOM) refers to the ability to ascribe agency to other individuals (and to oneself), to ascribe beliefs and desires, more particularly, to one and all. This ability is manifest in non-autistic humans beyond infancy, but less obvious in other species. A key early experiment is the Sally-Ann test. (Baron- Cohen, Leslie, Frith (1985), for example.)

Onishi and Baillargeon (2005) find that even 15-month-olds exhibit behavior consistent with TOM in a non-verbal task where violation of expectation is inferred from increased length of gaze.

Onishi and Baillargeon (2005) find that even 15-month-olds exhibit behavior consistent with TOM in a non-verbal task where violation of expectation is inferred from increased length of gaze. Would this work on nonhuman species too?

Brodmann area 10

Such an ability is crucial to game theory. That is, it is usually necessary to understand an opponent s payoffs, to put oneself in his shoes, that is, in order to predict his behavior and therefore to choose an optimal strategy oneself.

Such an ability is crucial to game theory. That is, it is usually necessary to understand an opponent s payoffs, to put oneself in his shoes, that is, in order to predict his behavior and therefore to choose an optimal strategy oneself.

This ability might be seen as an aspect of Machiavellian Intelligence. The hypothesis here is that our intelligence evolved under pressure to outsmart our conspecifics. (See, for example, Byrne and Whiten (1998).) This is often contrasted with the Ecological Intelligence Hypothesis that the nonhuman environment provided the impetus (Robson and Kaplan AER (2003), for example).

This ability might be seen as an aspect of Machiavellian Intelligence. The hypothesis here is that our intelligence evolved under pressure to outsmart our conspecifics. (See, for example, Byrne and Whiten (1998).) This is often contrasted with the Ecological Intelligence Hypothesis that the nonhuman environment provided the impetus (Robson and Kaplan AER (2003), for example).

The present approach substantially extends and generalizes Robson Why Would Nature Give Individuals Utility Functions? JPE (2001), who considered the evolutionary rationale for own utility in a decision-theoretic framework. In contrast, we consider a rationale for knowing the utility functions of others in a strategic framework. In both cases, however, it is the need to address novelty that is the evolutionary impetus.

The present approach substantially extends and generalizes Robson Why Would Nature Give Individuals Utility Functions? JPE (2001), who considered the evolutionary rationale for own utility in a decision-theoretic framework. In contrast, we consider a rationale for knowing the utility functions of others in a strategic framework. In both cases, however, it is the need to address novelty that is the evolutionary impetus. The approach here contrasts with that of a small literature represented by Stahl GEB (1993), who considers the evolutionary advantage of greater strategic sophistication. (See also Crawford and Iriberri (2007) and Mohlin (2012), for example.) Stahl argues that it may be better to be (lucky and) dumb than to be smart. We obtain a clearer advantage to smart mainly because we consider a game with outcomes that are randomly selected from a growing outcome set, rather than a particular fixed game.

Here we investigate the evolutionary basis of an ability to acquire the preferences of an opponent. We contrast such more sophisticated agents with naive agents who adapt to each game as in evolutionary game theory and as in reinforcement learning in psychology.

Consider an anecdote about vervet monkeys from Cheney and Seyfarth (1990). When a male vervet sought to join Kitui s group, where Kitui was bottom-ranked, Kitui might make the leopard warning cry. The interloper would stay in his tree, his plans thwarted. The TOM Kitui understood the effect of such a cry on the others and was deliberately deceptive. A naive Kitui, in contrast, had no model of other vervets preferences and beliefs. Inadvertently, perhaps, he once made the leopard warning in such a circumstance and it worked. (That the latter option was actually true was suggested by Kitui occasionally walking on the ground towards the other male, alarm calling all the while.)

Consider an anecdote about vervet monkeys from Cheney and Seyfarth (1990). When a male vervet sought to join Kitui s group, where Kitui was bottom-ranked, Kitui might make the leopard warning cry. The interloper would stay in his tree, his plans thwarted. The TOM Kitui understood the effect of such a cry on the others and was deliberately deceptive. A naive Kitui, in contrast, had no model of other vervets preferences and beliefs. Inadvertently, perhaps, he once made the leopard warning in such a circumstance and it worked. (That the latter option was actually true was suggested by Kitui occasionally walking on the ground towards the other male, alarm calling all the while.)

Here we model the contest between TOMer s and naive players. We consider games of perfect information, the simplest of which has two stages. If player 1 is naive, she must see all games before learning to play appropriately; if she is a TOMer, she must merely see player 2 confronted with all possible pairs of outcomes. The edge for the TOMer s then derives from there being many more games than pairs of outcomes.

Here we model the contest between TOMer s and naive players. We consider games of perfect information, the simplest of which has two stages. If player 1 is naive, she must see all games before learning to play appropriately; if she is a TOMer, she must merely see player 2 confronted with all possible pairs of outcomes. The edge for the TOMer s then derives from there being many more games than pairs of outcomes. We demonstrate this edge by considering an environment that becomes richer as time passes. If the environment rapidly becomes more complex, neither type is able to keep up; if the environment only slowly becomes more complex both types can keep up. In an intermediate range, however, the TOMers learn essentially everything, the naive types essentially nothing.

Two-Stage Model Two large populations player 1 s ( she s ) and player 2 s ( he s ). At t = 1, 2,..., randomly paired to play a simple two-stage game of perfect information. Game tree with exactly two moves at each of the non-terminal nodes. The games vary on account of the outcomes assigned to the terminal nodes. Each player has a fixed strict preference ordering over the countably infinite set X. P 2 L 5 5555555 P 1 R P 2 L R 6 666666 R L x 1 x 2 x 3 x 4

Two-Stage Model Two large populations player 1 s ( she s ) and player 2 s ( he s ). At t = 1, 2,..., randomly paired to play a simple two-stage game of perfect information. Game tree with exactly two moves at each of the non-terminal nodes. The games vary on account of the outcomes assigned to the terminal nodes. Each player has a fixed strict preference ordering over the countably infinite set X. P 2 L 5 5555555 P 1 R P 2 L R 6 666666 R L x 1 x 2 x 3 x 4

The two-stage game at t is Γ t. At each t Γ t is completed by a random draw of four outcomes from the finite X t X. If any two of the outcomes are identical, it is known to the sophisticated players that they are indifferent; if the outcomes are different, they induce a strict preference ordering for the player in question, but this strict preference is not known to other players.

The two-stage game at t is Γ t. At each t Γ t is completed by a random draw of four outcomes from the finite X t X. If any two of the outcomes are identical, it is known to the sophisticated players that they are indifferent; if the outcomes are different, they induce a strict preference ordering for the player in question, but this strict preference is not known to other players. Naive players adapt to each game; theory-of-mind (TOM) players construct the opponent s preferences.

Each history of outcomes and choices is public information. If a mixed strategy is used, this would be observed, although just on the nodes that are reached. We consider only symmetric strategies.

Each history of outcomes and choices is public information. If a mixed strategy is used, this would be observed, although just on the nodes that are reached. We consider only symmetric strategies. There are no feedback effects of individuals actions on their opponents and so there is no incentive to play non-myopically. Player 2 s have no interest in player 1 s payoffs. Further, the player 1 s react only to the distribution of player 2 choices. Since any particular player 2 has no effect on this distribution of player 2 choices, any particular player 2 is myopic.

Each history of outcomes and choices is public information. If a mixed strategy is used, this would be observed, although just on the nodes that are reached. We consider only symmetric strategies. There are no feedback effects of individuals actions on their opponents and so there is no incentive to play non-myopically. Player 2 s have no interest in player 1 s payoffs. Further, the player 1 s react only to the distribution of player 2 choices. Since any particular player 2 has no effect on this distribution of player 2 choices, any particular player 2 is myopic. The outcome sets increase over time, capturing growing complexity. There is a sequence t 1,..., t k,..., where t k is the arrival date of the k-th new object. The initial set of objects is X 0 with size n. Thus, in t k to t k+1 1 the number of objects is n + k. We assume t k = k α, for α 0.

A naive player adapts to each game Γ t. For simplicity, we assume that if the game Γ t is novel, the naive player plays inappropriately at t, say by randomizing 50-50 at each decision node. However, the next time this same game in encountered, the naive player makes the appropriate choices. That is, the adaptive learning here is as fast as it possibly could be. Even with this advantage, however, the naive types will be outdone by the TOM types.

A naive player adapts to each game Γ t. For simplicity, we assume that if the game Γ t is novel, the naive player plays inappropriately at t, say by randomizing 50-50 at each decision node. However, the next time this same game in encountered, the naive player makes the appropriate choices. That is, the adaptive learning here is as fast as it possibly could be. Even with this advantage, however, the naive types will be outdone by the TOM types. In contrast, a player has a theory of mind if she knows that her opponent will take binary decisions that are consistent with some initially unknown preference ordering. She does not avail herself of the transitivity of her opponent s preferences. Our comparison of the naive type with the TOM is then a comparison of the speed of the associated learning processes.

We now characterize the maximum amount of knowledge in this environment for naive and TOM players. For naive player 1 s or 2 s at date t the maximum amount of knowledge is simply the number of games, G t, say. We have that G t = X t 4

We now characterize the maximum amount of knowledge in this environment for naive and TOM players. For naive player 1 s or 2 s at date t the maximum amount of knowledge is simply the number of games, G t, say. We have that G t = X t 4 If the number of distinct games that have played before date t is Kt N, then we consider the fraction of these that are known to any naive player as L N t = KN t G t.

For the TOM players, consider the number of binary choices for a particular player. When that particular player makes some binary choice, that is, this is common knowledge to all TOM players. For both players, Q t is the number of outcome pairs. We then have Q t = X t 2

For the TOM players, consider the number of binary choices for a particular player. When that particular player makes some binary choice, that is, this is common knowledge to all TOM players. For both players, Q t is the number of outcome pairs. We then have Q t = X t 2 If Kt 2 is the number of player 2 s binary choices that have been revealed to TOM player 1 s, then the fraction of these that have been revealed is L 2 t = K2 t Q t

We now have the main result for the two-stage game case Theorem 1 All the results here concern player 1 i) If α [0, 2) then L 2 t 0 and LN t 0 as t in probability. That is, both the sophisticated and the naive type are overwhelmed by the rapid rate of arrival of new outcomes. ii) If α (4, ), then L 2 t 1 and LN t 1 as t in probability. That is, the rate of arrival of new outcomes is slow enough that both types are able to essentially learn everything. iii) Finally, however, if α (2, 4), then L 2 t 1 but LN t 0 as t in probability. That is, for this intermediate range of arrival rates, the TOM type learns essentially everything, while the naive type learns essentially nothing.

Except possibly for the values 2 and 4, the results are dramatic either everything is learnt in the limit or nothing is. Indeed, the range where nothing is learnt is inescapable in that the arrival rate of novelty outstrips there the maximum rate at which learning can occur. So the real contribution is to show the less obvious result that full learning occurs essentially whenever it is even possible that it could.

Except possibly for the values 2 and 4, the results are dramatic either everything is learnt in the limit or nothing is. Indeed, the range where nothing is learnt is inescapable in that the arrival rate of novelty outstrips there the maximum rate at which learning can occur. So the real contribution is to show the less obvious result that full learning occurs essentially whenever it is even possible that it could. In terms of the contest between the two types, the existence of an interval over which the TOM type learns everything and the naive type learns nothing implies we can finesse the issue of considering payoffs explicitly. Whatever these payoffs might be it is clear that the TOM type is outdoing the naive type in this intermediate range.

Revealed Preference Suppose that x 2 y and w 2 z for some distinct x, y, z, w X. Consider the following game P 2 L 5 5555555 P 1 R P 2 L R 7 7777777 R L x y z w

Revealed Preference Suppose that x 2 y and w 2 z for some distinct x, y, z, w X. Consider the following game P 2 L 5 5555555 P 1 R P 2 L R 7 7777777 R L x y z w Then, if 1 knows these aspects of 2 then she will choose L and get x, if x 1 w, but R and get w, if w 1 x. Conversely, if 1 chooses appropriately in any game like this, she has revealed that she has a correct representation of 2 s preferences.

There is no simpler way to ensure that player 1 always chooses correctly in all two-stage games, for all preferences 1.

Three (or More)-Stage Model Three equally large populations player 1 s, player 2 s, and player 3 s. At t = 1, 2,..., players matched randomly in triples to play a three-stage game of perfect information with exactly two moves at each of the non-terminal nodes. Each player has a fixed strict preference ordering over a countably infinite set of outcomes X.

Three (or More)-Stage Model Three equally large populations player 1 s, player 2 s, and player 3 s. At t = 1, 2,..., players matched randomly in triples to play a three-stage game of perfect information with exactly two moves at each of the non-terminal nodes. Each player has a fixed strict preference ordering over a countably infinite set of outcomes X. The three-stage game at t is Γ t which is now completed by a random draw of a set of eight outcomes from finite X t X.

Three (or More)-Stage Model Three equally large populations player 1 s, player 2 s, and player 3 s. At t = 1, 2,..., players matched randomly in triples to play a three-stage game of perfect information with exactly two moves at each of the non-terminal nodes. Each player has a fixed strict preference ordering over a countably infinite set of outcomes X. The three-stage game at t is Γ t which is now completed by a random draw of a set of eight outcomes from finite X t X.

L P 3 R L 5 5555555 P 2 L 5 5555555 R P 1 P 3 6 666666 6 6666666 x 1 x 2 x 3 x 4 L R R P 2 P 3

L P 3 R L 5 5555555 P 2 L 5 5555555 R P 1 P 3 6 666666 6 6666666 x 1 x 2 x 3 x 4 L R R P 2 P 3 What if the naive type of player 2, for example, were less naive, keying not on the entire three stage-game, but on the subgame deriving from each choice by player 1? A somewhat weaker version of the argument goes through.

The type of the last player, now player 3, is again irrelevant, since player 3 uses only his own preferences. TOM player 2 s construct a simple model of player 3 s preferences. TOM player 1 s construct simple models of the other two players preferences, and the player 1 s also need to know that the player 2 s know player 3 s preferences.

Individual players again have no incentive to mislead the opponent about one s preferences. In the three-stage case, this observation has force only for the player 2 s and 3 s. That is, the player 1 s cannot advantageously mislead the player 2 s or 3 s because the player 2 s and 3 s do not consider 1 s preferences. The player 1 s and 2 s react only to the distribution of choices made by the player 3 s. Since any particular player 3 then has no effect on the distribution of player 3 choices, and hence cannot affect the play of the player 1 s or 2 s, any such particular player 3 must behave myopically. Similarly, the player 1 s only react to the distribution of choices made by the player 2 s and so there is no incentive for the player 2 s to distort the choices made in order to influence player 1 s.

Individual players again have no incentive to mislead the opponent about one s preferences. In the three-stage case, this observation has force only for the player 2 s and 3 s. That is, the player 1 s cannot advantageously mislead the player 2 s or 3 s because the player 2 s and 3 s do not consider 1 s preferences. The player 1 s and 2 s react only to the distribution of choices made by the player 3 s. Since any particular player 3 then has no effect on the distribution of player 3 choices, and hence cannot affect the play of the player 1 s or 2 s, any such particular player 3 must behave myopically. Similarly, the player 1 s only react to the distribution of choices made by the player 2 s and so there is no incentive for the player 2 s to distort the choices made in order to influence player 1 s. We again focus on outcome sets that increase over time, described precisely as before.

As before, a player who is naive adapts to each game Γ t as a distinct circumstance, but takes only two trials to play appropriately.

As before, a player who is naive adapts to each game Γ t as a distinct circumstance, but takes only two trials to play appropriately. Again, in contrast, a player with a theory of mind knows her opponent will take decisions that are consistent with some preference ordering. She has the capacity to learn this ordering.

As before, a player who is naive adapts to each game Γ t as a distinct circumstance, but takes only two trials to play appropriately. Again, in contrast, a player with a theory of mind knows her opponent will take decisions that are consistent with some preference ordering. She has the capacity to learn this ordering.

For naive player 1 s, 2 s or 3 s at date t the maximum amount of knowledge is again the number of games, G t, say. Now G t = X t 8

For naive player 1 s, 2 s or 3 s at date t the maximum amount of knowledge is again the number of games, G t, say. Now G t = X t 8 If the number of distinct games that have played before at date t is Kt N, the fraction of games that are known to any naive player is L N t = KN t G t.

For the players with a theory of mind, it is convenient to consider the number of binary choices that can be made by a particular player, j, say. When player j makes some binary choice, that is, this is common knowledge to all TOM players. For all players, the number of outcome pairs is as before Q t = X t 2. If K j t is the number of player j = 2, 3 s binary choices that have been revealed to all TOM players, as common knowledge, then the fraction of these is L j t = Kj t Q t.

The following is then the main result for the three-stage game case. We first show that player 2 derives an advantage from TOM over naivete, essentially as in the two-stage game. Once player 2 is of TOM type, we then show that player 1 derives an advantage from TOM over naivete. Theorem 2 A) Suppose that player 1 plays in an arbitrary fashion. Then we have the following results for player 2 i) If α [0, 2) then L 3 t 0 and LN t 0 as t in probability. That is, both the sophisticated and the naive type of player 2 are overwhelmed by the rapid rate of arrival of new outcomes. ii) If α (8, ), then L 3 t 1 and LN t 1 as t in probability. That is, the rate of arrival of new outcomes is slow enough that both types are able to essentially learn everything. iii) Finally, however, if α (2, 8), then L 3 t 1 but LN t 0 as t in probability. That is, for this intermediate range of arrival rates, the TOM type learns essentially everything, while the naive type learns essentially nothing.

B) Suppose that player 2 is TOM. Then we have the following results for player 1 i) If α [0, 2) then L 2 t 0, L 3 t 0, and L N t 0, as t in probability. That is, both the sophisticated and the naive type are overwhelmed by the rapid rate of arrival of new outcomes. ii) If α (8, ), then L 2 t 1, L3 t 1, and LN t 1 as t in probability. That is, the rate of arrival of new outcomes is slow enough that both types are able to essentially learn everything. iii) Finally, however, if α (2, 8), then L 2 t 1, L3 t 1, but LN t 0 as t in probability. That is, for this intermediate range of arrival rates, the TOM type learns essentially everything, while the naive type learns essentially nothing.

Again, the results are dramatic except possibly at two points, either everything is learnt in the limit or nothing is. Again, when nothing is learnt, it is because it is simply mechanically impossible to keep up with the rate of novelty, so that the key contribution of this theorem is to show that everything is learnt essentially whenever this is not mechanically ruled out.

Again, the results are dramatic except possibly at two points, either everything is learnt in the limit or nothing is. Again, when nothing is learnt, it is because it is simply mechanically impossible to keep up with the rate of novelty, so that the key contribution of this theorem is to show that everything is learnt essentially whenever this is not mechanically ruled out. As in the two-stage case, there is then an interval over which the TOM type learns everything and the naive type learns nothing. This interval is larger in the three-stage game case because the naive types of player 1 or 2 face a larger set of possible games, now with eight outcomes drawn from the outcome set, and hence can only keep up with a slower rate of novelty. On the other hand, for the result in A), the TOM type of player 2 still needs to know only how player 3 would make each possible binary choice.

For B), the situation for player 1 is more complex, because player 1 not only needs to know both player 2 s preferences and player 3 s, but also needs to know that player 2 knows player 3 s preferences. This would seem likely to shift the transition point from no learning to full learning for TOM player 1 s. However, the following observations apply. As long as α > 2, player 1 will learn player 3 s preferences completely in the limit. In addition, at the same time that 3 s choices reveal information about 3 s preferences to player 1, they reveal the same information to player 2 and player 1 knows this. But now, given only that α > 2, player 1 can also completely learn player 2 s preferences.

For B), the situation for player 1 is more complex, because player 1 not only needs to know both player 2 s preferences and player 3 s, but also needs to know that player 2 knows player 3 s preferences. This would seem likely to shift the transition point from no learning to full learning for TOM player 1 s. However, the following observations apply. As long as α > 2, player 1 will learn player 3 s preferences completely in the limit. In addition, at the same time that 3 s choices reveal information about 3 s preferences to player 1, they reveal the same information to player 2 and player 1 knows this. But now, given only that α > 2, player 1 can also completely learn player 2 s preferences. For A), the large numbers of player 1 s, 2 s and 3 s ensure that the player 2 s should be sequentially rational. It might somehow be to the player 2s overall advantage to be perceived by the player 1 s as naive rather than sophisticated. However, when there are many player 2 s, with some of these naive and some sophisticated, each individual player 2 has no effect on player 1 s perceptions of the distribution of player 2 s. Thus each sophisticated sequentially rational player 2 outperforms each naive player 2.

It is always clear that learning must be slow if L 2 t is close to one. When α > 2, however, the proof involves showing this is the only circumstance under which learning is slow. There are two complicating factors. The first is that there are subgames in which 2 s choice cannot reveal information about 2 s preferences because there is insufficient knowledge about 3 s preferences and therefore choices.

It is always clear that learning must be slow if L 2 t is close to one. When α > 2, however, the proof involves showing this is the only circumstance under which learning is slow. There are two complicating factors. The first is that there are subgames in which 2 s choice cannot reveal information about 2 s preferences because there is insufficient knowledge about 3 s preferences and therefore choices. The second factor concerns the existence of player 2 subgames with outcomes that are avoided by player 3, thus making it difficult to reveal information about 2 s preferences. Such games arise even as t. However, A1 implies that these problematic games are a vanishing fraction of all games as t.

These results extend straightforwardly to a perfect information game with S stages and S players. When considering the sophistication of player s we assume that players s + 1,..., S are already TOM. The critical value of α for any TOM player remains 2, for all s. The critical value of α for a naive player grows with the number of possible games that can be formed.

Further Extensions How much do the current results depend on the particular model described here? Although the environment is rather particular, it is best seen merely as a test to discriminate between the underlying characteristics of ToM s and of the naive players.

Further Extensions How much do the current results depend on the particular model described here? Although the environment is rather particular, it is best seen merely as a test to discriminate between the underlying characteristics of ToM s and of the naive players. What about games with imperfect information? With multiple Nash equilibria, it is not clear how to disentangle the lack of knowledge of payoffs from the lack of information about which equilibrium is to be played, at least in the absence of strong assumptions. Perhaps normal form games crop up along with the games of perfect information emphasized here, using outcomes drawn from the same set. Whether or not any learning can be accomplished on such normal form games, our approach shows that learning would arise based only on the games of perfect information.

Within the class of games of perfect information, it is only for simplicity that we restrict attention to a fixed game tree. The tree itself could be random: it might involve a random number of moves or a random order of play, for example. Similarly, players could be allowed to move multiple times, and so on.

Within the class of games of perfect information, it is only for simplicity that we restrict attention to a fixed game tree. The tree itself could be random: it might involve a random number of moves or a random order of play, for example. Similarly, players could be allowed to move multiple times, and so on. Similarly, the assumption that individuals have a strict ranking over each distinct pair of outcomes is basically innocuous. If indifference is allowed, suppose, for example, that individuals randomize over each pair of indifferent outcomes. The indifference of player i between z and z would then be common knowledge to the ToM types, if ever player i chose z over z and z over z.

We assume here that ToM types do not apply transitivity in their deductions about the preferences of other players. This might make a difference to the relevant ranges of the growth parameter α. The new value of α cannot exceed 2, since applying transitivity could not be disadvantageous. It could not lower the critical value of α below 1.

We assume here that ToM types do not apply transitivity in their deductions about the preferences of other players. This might make a difference to the relevant ranges of the growth parameter α. The new value of α cannot exceed 2, since applying transitivity could not be disadvantageous. It could not lower the critical value of α below 1. More sophisticated naive types could clearly do better than the ones we describe here. If naive types assign beliefs to subgames, rather than to entire games, for example, they would do as well as ToM types in the two stage game case. More generally, with three or more stages, such more sophisticated naive players would do better than the naive players considered here, but not as well as the sophisticated players.

Experiments It would be interest to experimentally implement the model, perhaps simplified to have no innovation. Also it seems we might not need a very large number of subjects in each of the I pools. Induce the same preferences over a large set of outcomes for each of the player i s for i = 1,..., I by using monetary payoffs. No player knows the other players payoffs. Play the game otherwise as above. How fast would players learn other players preferences? Would they be closer to the sophisticated ToM types described above or to the naive types? How would the number of stages I affect matters?

The End

Sketch of Proofs We treat a general case with I stages and A choices at each decision node.

Sketch of Proofs We treat a general case with I stages and A choices at each decision node. No learning in the limit If outcomes arrive at too fast a rate, it is straightforward to prove that learning cannot occur even when the greatest possible amount of information is revealed in every period.

Sketch of Proofs We treat a general case with I stages and A choices at each decision node. No learning in the limit If outcomes arrive at too fast a rate, it is straightforward to prove that learning cannot occur even when the greatest possible amount of information is revealed in every period. Lemma 1 In each of the following convergence is sure. i) Suppose α [0, 2). Then L t i 1,..., I. 0 for each preference type i = ii) Suppose there are T terminal nodes. If α [0, T ), then L N t 0.

Sketch of Proofs We treat a general case with I stages and A choices at each decision node. No learning in the limit If outcomes arrive at too fast a rate, it is straightforward to prove that learning cannot occur even when the greatest possible amount of information is revealed in every period. Lemma 1 In each of the following convergence is sure. i) Suppose α [0, 2). Then L t i 1,..., I. 0 for each preference type i = ii) Suppose there are T terminal nodes. If α [0, T ), then L N t 0.

Results About Learning Theorem 1 is a special case of Theorem 2. Attention is restricted to the ToM players, and so to the L i t s. (The corresponding claim about the naive types goes through with minor changes to the analysis.) As hypothesized in Theorem 2, when considering L i t, players i + 1,...,, I are taken to be ToM.

Results About Learning Theorem 1 is a special case of Theorem 2. Attention is restricted to the ToM players, and so to the L i t s. (The corresponding claim about the naive types goes through with minor changes to the analysis.) As hypothesized in Theorem 2, when considering L i t, players i + 1,...,, I are taken to be ToM. Auxiliary Results The first minor result, Proposition 1, relates how much is commonly known about i s preferences over pairs of outcomes to what is commonly known about i s preferences over A-tuples of outcomes. The need for this result can be finessed, for expositional simplicity, by setting A = 2.

The gist of Proposition 2 is the following. Suppose types i,..., I are all ToM and that L i+1 t,..., L I t each converge to one in probability. Then, in the limit, the probability of revealing new information about i s preferences is small only if the fraction of extant knowledge about preferences, L i t, is close to one. Indeed, although the probability of revealing new information about i is clearly small if 1 L i t is small, the converse is not obviously true.

The gist of Proposition 2 is the following. Suppose types i,..., I are all ToM and that L i+1 t,..., L I t each converge to one in probability. Then, in the limit, the probability of revealing new information about i s preferences is small only if the fraction of extant knowledge about preferences, L i t, is close to one. Indeed, although the probability of revealing new information about i is clearly small if 1 L i t is small, the converse is not obviously true. Proposition 2, however, establishes an appropriate lower bound. This bound decomposes the expected amount learnt into a factor involving of 1 L i t, reflecting what is yet to be revealed about i s preferences, and a residual term.

Proposition 2 Suppose each of the random variables L i+1 t,..., L I t converges to one in probability. Then for each ε [0, 1] there exists a random variable ξt iε such that E(K t+1 H t ) K t ε 2i (1 L i t) 2i ξ iε t, where ξ iε t converges in probability to a continuous function, m i : [0, 1] [0, 1] such that m i 0 as ε 0.

Proposition 2 Suppose each of the random variables L i+1 t,..., L I t converges to one in probability. Then for each ε [0, 1] there exists a random variable ξt iε such that E(K t+1 H t ) K t ε 2i (1 L i t) 2i ξ iε t, where ξ iε t converges in probability to a continuous function, m i : [0, 1] [0, 1] such that m i 0 as ε 0. Proposition 2 is the heart of the matter. Consider the case of player I. What is a lower bound on the probability that something new will be learned about this player s preferences? Such a lower bound arises from the event that every pair of outcomes that follows choice by player I is unfamiliar to the ToM players. This lower bound is then [1 L I t ]2I 1 [1 L I t ]2I.

For players who are not last, the situation is more complex. There are two complicating factors. The first is that there are i-type subgames in which i s choice cannot reveal information about i s preferences because there is insufficient knowledge about the remaining players choices.

For players who are not last, the situation is more complex. There are two complicating factors. The first is that there are i-type subgames in which i s choice cannot reveal information about i s preferences because there is insufficient knowledge about the remaining players choices. The second factor concerns the existence of i-type subgames with outcomes that are avoided by the remaining opponents, thus making it difficult to reveal information about i s preferences. Such games arise even as t. However, A1 implies that these problematic games are a vanishing fraction of all games as t.

For players who are not last, the situation is more complex. There are two complicating factors. The first is that there are i-type subgames in which i s choice cannot reveal information about i s preferences because there is insufficient knowledge about the remaining players choices. The second factor concerns the existence of i-type subgames with outcomes that are avoided by the remaining opponents, thus making it difficult to reveal information about i s preferences. Such games arise even as t. However, A1 implies that these problematic games are a vanishing fraction of all games as t. These two complicating factors account for the additional multiplicative and additive terms in the bound obtained in Proposition 2.

Proposition 3 then essentially completes the proof of Theorem 2 (and Theorem 1) by applying the result of Proposition 2.

Proposition 3 then essentially completes the proof of Theorem 2 (and Theorem 1) by applying the result of Proposition 2. Proposition 3. Suppose that, for each ε [0, 1] there exists a random variable ξt iε such that E(K t+1 H t ) K t ε 2i (1 L i t) 2i ξ iε t, where ξt iε converges in probability to a continuous function, m i : [0, 1] [0, 1] such that m i 0 as ε 0. If, in addition, α > 2, then L i t converges to one in probability.

Consider the case that I = 3 for simplicity of exposition and consider player 2. Assume for the moment that L 2 t converges in probability to the random variable L. Fix η > 0 and let A denote the event L < 1 η. Given the above facts about the probability of new information being revealed, the random variable E(Kt+τ 2 h t) is bounded below asymptotically by Kt 2 + η2 τ, on A, as τ., That is, on A, in the limit learning occurs at least linearly in τ. Observe now that Q t+τ is of order τ 2/α. Then, dividing through by the non-random Q t+τ, we obtain that E(L 2 t h t) on A. Since by definition L 2 t is bounded above by one surely, it must be that P (A) = 0. That is, if α > 2, and L 2 t converges in probability to L, it must be that L = 1 almost surely.

It remains then to show the convergence in probability just assumed. We do this by showing that the L t processes lie in a class of generalized martingales. Consider the following

It remains then to show the convergence in probability just assumed. We do this by showing that the L t processes lie in a class of generalized martingales. Consider the following Weak-Submartingale in the Limit: The adapted process (L t, h t ) is a weak sub-martingale in the limit (w-submil) if, for each η > 0, there is a T such that τ t T implies P (E(L τ h t ) L t η) > 1 η, a.e. and uniformly in τ.

It remains then to show the convergence in probability just assumed. We do this by showing that the L t processes lie in a class of generalized martingales. Consider the following Weak-Submartingale in the Limit: The adapted process (L t, h t ) is a weak sub-martingale in the limit (w-submil) if, for each η > 0, there is a T such that τ t T implies P (E(L τ h t ) L t η) > 1 η, a.e. and uniformly in τ. Egghe (1984) shows that bounded weak sub-martingales in the limit have limits in probability.

Given the above, we need to prove that the L t sequences are w- submils. Clearly the L t process has the sub-martingale property in between arrival dates. However, at an arrival date, L t is discounted Q by the factor t 1, and increases by at most. The key is then Q t+1 Q t+1 to show that the sequence of L t at the arrival dates is nevertheless a w-submil. This follows because there is a process converging to 1 such that L t at the arrival dates can only decrease if it is greater than this process or close to it.

The End