Behavorial learning in a population of interacting agents Jean-Pierre Nadal

Size: px
Start display at page:

Download "Behavorial learning in a population of interacting agents Jean-Pierre Nadal"

Transcription

1 Behavorial learning in a population of interacting agents Jean-Pierre Nadal nadal@lps.ens.fr Laboratoire de Physique Statistique, ENS and Centre d Analyse et Mathématique Sociales, EHESS 1

2 Continental divide Van Huyck, Battalio & Cook, 1997 Nash 2

3 Continental divide 3

4 continental divide Camerer 4

5 . modeling human and animal behaviour (experimental psychology) computational neuroscience (at the neuronal level): decision making based on expected reward/punishment, motor control, economics/game theory: behavorial game theory Some ref.: Behavioral learning BushR. & Mosteller, F., Psychological Rev Rescorla,R.A.&Wagner,A.R.(1972) A theory of Pavlovian conditioning Sutton and Barto, Reinforcement learning, 1984, 1988, 1990 Book: The MIT Press, 1998 free online Behavioral game theory: Cross 1973 ; Arthur 1991 ; McAllister 1991; Walliser 1997 ; Camerer 1998 Dayan P & Daw ND (2008), Decision theory, reinforcement learning, and the brain. Cognitive, Affective & Behavioral Neuroscience

6 Behavorial learning For a set of possible actions, utility / pay-off/ profit not known in advance: Exploration making choices whose possible outcomes are not (well) known Learning: reinforcement of the actions which appear to be the most efficient: higher probability to choose such actions in the future Exploitation of acquired knowledge: past experience (possibly of others) allows to make expectations on outcomes of some actions/choices/strategies. Efficient learning; compromise exploration/exploitation. Collective scale: learning in a population of interacting agents 6

7 Attractions dynamics At each time step every agent i makes a choice (among a set of possible actions/choices/strategies ω = 1,, Ω) Iterated game: at each time step t, agent i associates to each possible action ω a weight (an attraction ) A i (ω, t ) ( estimate of < u i (ω ) > ) Choice of ω i (t): p i (ω i (t) = ω) = f ( A i (ω, t ) ) / Σ { ω} f (A i (ω, t )) With, e.g.: f(x) = exp ( β x ) («logit») 7

8 Reinforcement learning: basic idea A i (ω i, t ) ω = 3 choosen at t payoff u i (3, ω i (t) ) ω i = 1 ω i = 2 ω i = 3 ω i = 4 strategies (actions) A i (ω, t ) = agent i s attraction for action ω: the larger A i (ω, t ), the larger the probability for agent i to choose ω i =ω 8

9 Basic reinforcement learning A i (ω i, t ) If payoffs u i (ω, ω i (t) ) are known for ω=1,2,3,4 «fictitious play» ω i = 1 ω i = 2 ω i = 3 ω i = 4 strategies (actions) The larger A i (ω, t ), the larger the probability for agent i to choose ω i =ω (Cournot 1838 ; Brown 1951 ; Robinson 1951) 9

10 Basic reinforcement learning Ai (ω i, t+1) renormalisation : uniform weakening of the attractions A i ω i = 1 ω i = 2 ω i = 3 ω i = 4 strategies (actions) 10

11 Attractions dynamics At each time step every agent i makes a choice (among a set of possible actions/choices/strategies ω = 1,, Ω) Choice rule: depends on «attractions» (weights) {A i (ω, t), ω=1,,ω} the greater the attraction A i (ω, t) for ω, the greater the probability p i (ω, t) that i choose ω at time t. A i (ω, t) ~ expectation/estimate of the payoff if ω i =ω ~ opinion on the usefulness of ω deterministic choice: ω i (t) = argmax A i (ω, t) ω i 0 (t) ω probabilistic choice ω i (t) = { ω i 0 (t) with proba. 1 ε «trembling hand» any other ω with proba. ε / (Ω 1) p i (ω,t) = f ( A i (ω,t) ) / Z i (t) Z i (t) = Σ ω f( A i (ω,t) ) example: f (A) = exp ( β A ) logit choice function 11

12 Adaptation of attractions updating of attractions: family of learning rules A i (ω, t+1) ) = (1 - μ) A i (ω,t) + μ Φ[π i (ω,t), ω i (t)] π i (ω, t) payoff (which would have been) received at t if ω i (t) = ω [ payoff at t depends on the actions/choices of the other agents at time t: π i (ω, t) = π i (ω i = ω, {ω j (t), j =1,,N; j i} ) ] «fictitious play»: A i (ω, t+1) ) = (1 - μ) A i (ω,t) + μ π i (ω,t), ω = 1,, Ω «weighted belief learning» A i (ω, t+1) ) = (1 - μ) A i (ω,t) + μ π i (ω,t) if ω i (t) = ω A i (ω, t+1) ) = (1 - μ) A i (ω,t) + μ δ π i (ω,t) otherwise. 0 < δ < 1 fictitious play: δ = 1 myopic best response: μ = 1, δ = 1, ε = 0 12

13 Marché aux poissons de Marseille G. Weisbuch, A. Kirman, D. Herreiner (2000) (Données : A. Vignes) -pas de prix affiché ( pas de jeu fictif ) - observation : mélange de clients fidèles et de clients infidèles On se place du point de vue d un acheteur i, face à K vendeurs Stratégies S i de i = { k = aller chez le vendeur numéro k (k = 1,, K) } Renforcement A ik (t) = poids attribué par i à la stratégie «visiter le vendeur k» pour k = S i (t) : A ik ( t + 1 ) = (1 μ) A ik ( t ) + μ u ik ( t ) pour k S i (t) : A ik ( t + 1 ) = (1 μ) A ik ( t ) p i (S i (t) = k, t) = f (A ik ( t ) ) / Σ {k =1,,K} f (A ik ( t ) ) 13

14 Marché aux poissons de Marseille (suite) A ik ( t + 1 ) - A ik ( t ) = - μ A ik ( t ) + μ u ik ( t ) Hypothèse : convergence (régime stationnaire) à chaque fois que i visite k : u ik ( t ) = u ik Approximation de type «champ moyen» : u ik ( t ) < u ik > < u ik > = p i (S i = k) u ik point fixe : A ik = < u ik >, soit : A ik = u ik exp ( β A ik ) / Σ {k } exp (β A ik ) Cas le plus simple : K = 2 vendeurs, et u i1 = u i2 = u Δ i = A i1 - A i2 Points fixes : Δ = u tanh [ β Δ /2 ] β < β c = 2/u Δ = 0 stable : p i (S i = k) = 1/2 β > β c Δ = 0 instable, Δ +, Δ stables (0 < Δ + = - Δ ). pour chaque agent i, la dynamique conduit à Δ i = Δ + ou à Δ i = Δ Si de plus on a hétérogénéité des β : β i < β c «infidélité» β i > β c «fidélité» 14

15 Laboratory experiments 15

16 Experiment: laboratory version of the Dying Seminar Alexis GARAPIN, Bernard RUFFIEUX, Viktoriya SEMESHENKO, Mirta B. GORDON 16

17 4 different information treatments All treatments A B C & D for each individual, information on the individual threshold and payoffs Treatment A: On line OL Each individual is given the additional following information: All individuals thresholds All On line individual decisions Ex post: number of participants for each period of each seminar Treatment B: (ex post) # participants NP Each individual is given the additional following information All individuals thresholds Ex post: number of participants for each period of each seminar Treatment C: (ex post) threshold reached H Each individual is given the additional following information Ex post: Individual threshold reached or not Treatment D: Payoff P No additional information (therefore, a subject who did not participate a seminar does not know whether his threshold would have been reached). 17

18 Putting the Dying Seminar into the lab. Seminars with N = 16 potential participants. In every treatments, the payoffs are the same for an individual seminar (one period): Individual endowment: 50 (with 100 = 1,45 ) Non participant: 50 Participant with threshold reached: 200 Participant with threshold not reached: 0

19 Seminar 1 fragile stable stable

20 Seminar 1

21 Seminar 1

22 Seminar # Distribution of thresholds (IWP) f1bis(hi) F1bis(Hi) stable F1(Hi) stable 2 fragile Hi 22

23 Experiment: laboratory version of the Dying Seminar Alexis GARAPIN, Bernard RUFFIEUX, Viktoriya SEMESHENKO, Mirta B. GORDON 23

24 Seminar # 3. Distribution of thresholds (IWP) stable unstable stable 24

25 Seminar 3 25

26 Seminar # 4 stable stable fragile 26

27 Seminar 4 27

28 Modeling Attraction dynamics ~ reinforcement learning ~ experience Weighted Attraction (EWA) learning scheme (Camerer, 2003) Numerical simulations 28

29 Treatment A Simulations Experiments Treatment A On line : For each individual, information on his threshold and payoffs Each individual is given the additional following information: - All individuals thresholds - All On line individual decisions -Ex post: number of participants for each period of each seminar 29

30 Treatment B Experiments Simulations Treatment B: (ex post) # participants NP For each individual, information on his threshold and payoffs Each individual is given the additional following information - All individuals thresholds -Ex post: number of participants for each period of each seminar 30

31 Treatment C Simulations Experiments Treatment C: (ex post) threshold reached H For each individual, information on his threshold and payoffs Each individual is given the additional following information -Ex post: Individual threshold reached or not 31

32 Treatment D Experiments Simulations Treatment D: Payoff P For each individual, information on his threshold and payoffs No additional information (therefore, a subject who did not participate a seminar does not know whether his threshold would have been reached) 32

33 Perspectives (Much) more participants in large experiments: are the results homothetic? What is a large group in coordination and critical mass problems? Models of (thinking and) learning 33

34 (not too rational) expectations «beauty contest» (Keynes, 1936) vote for the most beautiful winner: the closest to the average choice best strategy: vote for what you expect to be the choice of the majority «p-beauty contest» N players. Every player must choose a number between 0 and 100. Winner: the player with the number closest to p=2/3 of the mean 34

35 p-beauty contest 1 -step thinker: assumes others make a random choice between 0 and 100 hence expected average = 50 their choice = 2/3 x 50 ~ 33 2-step thinker assumes others are 1-step thinker hence expected average = 33 their choice = 2/3 x 33 ~ 22 or rather: assumes a mixture of zero-step and one step thinkers their choice ~ 27 Rational expectations: consistency of expectations every player choose the same number = 2/3 of the average average = 2/3 x average average = 0 35

36 p-beauty contest Camerer 36

37 (not too rational) expectations Cognitive Hierarchy Camerer, Ho & Chong 2002 thinking steps zero step thinkers k-step thinker myopic behavior anticipates k steps of reasoning assumes other players anticipate at most k-1 steps f(k) distribution of k-step thinkers in the population τ = mean number of steps of thinking empirical data: τ ~ 1 2 = <k> = Σ k f(k) k simplest hypothesis: Poisson distribution f(k) = ( τ k / k! ) e -τ 37

38 38

39 cognitive hierarchy exple: 2 players game strategy of a k-step thinker: expected payoff from choosing strategy ω: E k [π(ω)] = Σ ω π(ω, ω ) Σ k <k g k (k ) P k (ω ) probablity that the other player choose strategy ω g k (k ) = fraction of k -step thinkers among those with k < k = f(k ) / Σ k < k f(k ) case of best respons: P k (ω) = 1 if E k [π(ω)] = max ω E k [π(ω )] = 0 otherwise 39

40 40

41 41

42 42

Choice under social influence : effects of learning behaviors on the collective dynamics

Choice under social influence : effects of learning behaviors on the collective dynamics Choice under social influence : effects of learning behaviors on the collective dynamics Viktoriya Semeshenko(1) Jean-Pierre Nadal(2,3) Mirta B. Gordon(1) Denis Phan(4,1) (1) Laboratoire Leibniz-IMAG,

More information

arxiv: v1 [physics.soc-ph] 18 Apr 2007

arxiv: v1 [physics.soc-ph] 18 Apr 2007 arxiv:0704.2324v1 [physics.soc-ph] 18 Apr 2007 Collective states in social systems with interacting learning agents Viktoriya Semeshenko (1) Mirta B. Gordon (1) Jean-Pierre Nadal (2,3) (1) Laboratoire

More information

Cogmaster - CS1

Cogmaster - CS1 Cogmaster - CS1 http://www.lps.ens.fr/~risc/cs/ http://www.lps.ens.fr/~nadal/cours/cs/cs_jpn.html Dynamique des interactions sociales : du choix individuel au comportement collectif Dynamics of social

More information

A game of matching pennies

A game of matching pennies A game of matching pennies column L R row T 2,, B,, People last names A-M play ROW (choose T, B) People last names N-Z play COLUMN (choose L, R) A game of matching pennies : Mixed-strategy equilibrium

More information

1. Introduction. 2. A Simple Model

1. Introduction. 2. A Simple Model . Introduction In the last years, evolutionary-game theory has provided robust solutions to the problem of selection among multiple Nash equilibria in symmetric coordination games (Samuelson, 997). More

More information

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment

Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.

More information

A Dynamic Level-k Model in Games

A Dynamic Level-k Model in Games Dynamic Level-k Model in Games Teck Ho and Xuanming Su UC erkeley March, 2010 Teck Hua Ho 1 4-stage Centipede Game 4 2 16 8 1 8 4 32 1 2 3 4 64 16 5 Outcome Round 1 2 3 4 5 1 5 6.2% 30.3% 35.9% 20.0% 7.6%

More information

Belief-based Learning

Belief-based Learning Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium

More information

Guessing with negative feedback : an experiment. Angela Sutan, Marc Willinger ESC & LAMETA Dijon - Montpellier

Guessing with negative feedback : an experiment. Angela Sutan, Marc Willinger ESC & LAMETA Dijon - Montpellier Guessing with negative feedback : an experiment Angela Sutan, Marc Willinger ESC & LAMETA Dijon - Montpellier Lorentz workshop, October 2007 Depth of reasoning in a strategic environment Cognitive process

More information

Evolution & Learning in Games

Evolution & Learning in Games 1 / 28 Evolution & Learning in Games Econ 243B Jean-Paul Carvalho Lecture 5. Revision Protocols and Evolutionary Dynamics 2 / 28 Population Games played by Boundedly Rational Agents We have introduced

More information

Computational Neuroscience Introduction Day

Computational Neuroscience Introduction Day Computational Neuroscience Introduction Day 9.30am Introduction (C.Machens) 10am M1 (C.Machens) 10.15am M2 (V. Hakim) 10.40 break 11.00 Matching Law (S. Deneve) 14.00 Student presentations 11.20 Rescorla-Wagner

More information

Supervised Learning Part I

Supervised Learning Part I Supervised Learning Part I http://www.lps.ens.fr/~nadal/cours/mva Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure

More information

EC3224 Autumn Lecture #03 Applications of Nash Equilibrium

EC3224 Autumn Lecture #03 Applications of Nash Equilibrium Reading EC3224 Autumn Lecture #03 Applications of Nash Equilibrium Osborne Chapter 3 By the end of this week you should be able to: apply Nash equilibrium to oligopoly games, voting games and other examples.

More information

Mixed Strategies. Krzysztof R. Apt. CWI, Amsterdam, the Netherlands, University of Amsterdam. (so not Krzystof and definitely not Krystof)

Mixed Strategies. Krzysztof R. Apt. CWI, Amsterdam, the Netherlands, University of Amsterdam. (so not Krzystof and definitely not Krystof) Mixed Strategies Krzysztof R. Apt (so not Krzystof and definitely not Krystof) CWI, Amsterdam, the Netherlands, University of Amsterdam Mixed Strategies p. 1/1 Mixed Extension of a Finite Game Probability

More information

On p-beauty Contest Integer Games

On p-beauty Contest Integer Games On p-beauty Contest Integer Games Rafael López December 001 Abstract In this paper we provide a full characterization of the pure-strategy ash Equilibria for the p-beauty Contest Game when we restrict

More information

arxiv: v2 [cs.gt] 13 Dec 2014

arxiv: v2 [cs.gt] 13 Dec 2014 A POPULATION-CENTRIC APPROACH TO THE BEAUTY CONTEST GAME MARC HARPER arxiv:1005.1311v [cs.gt] 13 Dec 014 Abstract. An population-centric analysis for a version of the p-beauty contest game is given for

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

How to choose under social influence?

How to choose under social influence? How to choose under socal nfluence? Mrta B. Gordon () Jean-Perre Nadal (2) Dens Phan (,3) - Vktorya Semeshenko () () Laboratore Lebnz - IMAG - Grenoble (2) Laboratore de Physque Statstque ENS - Pars (3)

More information

Game Theory and its Applications to Networks - Part I: Strict Competition

Game Theory and its Applications to Networks - Part I: Strict Competition Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis

More information

Belief Learning in an Unstable Infinite Game. Paul J. Healy

Belief Learning in an Unstable Infinite Game. Paul J. Healy Belief Learning in an Unstable Infinite Game Paul J. Healy CMU Issue #3 Issue #2 Belief Learning in an Unstable Infinite Game Issue #1 Issue #1: Infinite Games Typical Learning Model: Finite set of strategies

More information

Appendix 3: Cognitive Hierarchy

Appendix 3: Cognitive Hierarchy 1 Appendix 3: Cognitive Hierarchy As a robustness check, we conduct our analysis with the cognitive hierarchy model of Camerer, Ho and Chong (2004). There, the distribution of types is Poisson distributed,

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Robust Predictions in Games with Incomplete Information

Robust Predictions in Games with Incomplete Information Robust Predictions in Games with Incomplete Information joint with Stephen Morris (Princeton University) November 2010 Payoff Environment in games with incomplete information, the agents are uncertain

More information

Lecture 6. Xavier Gabaix. March 11, 2004

Lecture 6. Xavier Gabaix. March 11, 2004 14.127 Lecture 6 Xavier Gabaix March 11, 2004 0.0.1 Shrouded attributes. A continuation Rational guys U i = q p + max (V p, V e) + σε i = q p + V min (p, e) + σε i = U i + σε i Rational demand for good

More information

Level K Thinking. Mark Dean. Columbia University - Spring 2017

Level K Thinking. Mark Dean. Columbia University - Spring 2017 Level K Thinking Mark Dean Columbia University - Spring 2017 Introduction Game theory: The study of strategic decision making Your outcome depends on your own actions and the actions of others Standard

More information

Optimal Convergence in Multi-Agent MDPs

Optimal Convergence in Multi-Agent MDPs Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,

More information

Mechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham

Mechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham Game Theory Course: Jackson, Leyton-Brown & Shoham Bayesian Game Setting Extend the social choice setting to a new setting where agents can t be relied upon to disclose their preferences honestly Start

More information

Preliminary Results on Social Learning with Partial Observations

Preliminary Results on Social Learning with Partial Observations Preliminary Results on Social Learning with Partial Observations Ilan Lobel, Daron Acemoglu, Munther Dahleh and Asuman Ozdaglar ABSTRACT We study a model of social learning with partial observations from

More information

Higher Order Beliefs in Dynamic Environments

Higher Order Beliefs in Dynamic Environments University of Pennsylvania Department of Economics June 22, 2008 Introduction: Higher Order Beliefs Global Games (Carlsson and Van Damme, 1993): A B A 0, 0 0, θ 2 B θ 2, 0 θ, θ Dominance Regions: A if

More information

Optimism in the Face of Uncertainty Should be Refutable

Optimism in the Face of Uncertainty Should be Refutable Optimism in the Face of Uncertainty Should be Refutable Ronald ORTNER Montanuniversität Leoben Department Mathematik und Informationstechnolgie Franz-Josef-Strasse 18, 8700 Leoben, Austria, Phone number:

More information

Walras-Bowley Lecture 2003

Walras-Bowley Lecture 2003 Walras-Bowley Lecture 2003 Sergiu Hart This version: September 2004 SERGIU HART c 2004 p. 1 ADAPTIVE HEURISTICS A Little Rationality Goes a Long Way Sergiu Hart Center for Rationality, Dept. of Economics,

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Boundary and interior equilibria: what drives convergence in a beauty contest?

Boundary and interior equilibria: what drives convergence in a beauty contest? Boundary and interior equilibria: what drives convergence in a beauty contest? Andrea Morone Università degli studi di Bari Dipartimento di Scienze Economiche e Metodi Matematici a.morone@gmail.com Piergiuseppe

More information

4. Opponent Forecasting in Repeated Games

4. Opponent Forecasting in Repeated Games 4. Opponent Forecasting in Repeated Games Julian and Mohamed / 2 Learning in Games Literature examines limiting behavior of interacting players. One approach is to have players compute forecasts for opponents

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Fictitious Self-Play in Extensive-Form Games

Fictitious Self-Play in Extensive-Form Games Johannes Heinrich, Marc Lanctot, David Silver University College London, Google DeepMind July 9, 05 Problem Learn from self-play in games with imperfect information. Games: Multi-agent decision making

More information

Finite Mixture Analysis of Beauty-Contest Data from Multiple Samples

Finite Mixture Analysis of Beauty-Contest Data from Multiple Samples Finite Mixture Analysis of Beauty-Contest Data from Multiple Samples Antoni Bosch-Domènech José G. Montalvo Rosemarie Nagel and Albert Satorra Universitat Pompeu Fabra, Barcelona July 8, 2004 Research

More information

Level-n Bounded Rationality and Dominated Strategies in Normal-Form Games

Level-n Bounded Rationality and Dominated Strategies in Normal-Form Games Level-n Bounded Rationality and Dominated Strategies in Normal-Form Games by Dale O. Stahl Department of Economics University of Texas at Austin and Ernan Haruvy Department of Management University of

More information

Learning Equilibrium as a Generalization of Learning to Optimize

Learning Equilibrium as a Generalization of Learning to Optimize Learning Equilibrium as a Generalization of Learning to Optimize Dov Monderer and Moshe Tennenholtz Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa 32000,

More information

July Framing Game Theory. Hitoshi Matsushima. University of Tokyo. February 2018

July Framing Game Theory. Hitoshi Matsushima. University of Tokyo. February 2018 1 Framing Game Theory July 2018 Hitoshi Matsushima University of Tokyo February 2018 2 1. Motivation and Examples Real players fail Hypothetical Thinking Shafir and Tversky (92), Evans (07), Charness and

More information

Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games

Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games Jacob W. Crandall Michael A. Goodrich Computer Science Department, Brigham Young University, Provo, UT 84602 USA crandall@cs.byu.edu

More information

A Dynamic Level-k Model in Games

A Dynamic Level-k Model in Games A Dynamic Level-k Model in Games Teck-Hua Ho and Xuanming Su September 21, 2010 Backward induction is the most widely accepted principle for predicting behavior in dynamic games. In experiments, however,

More information

The Reinforcement Learning Problem

The Reinforcement Learning Problem The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Quantum Probability in Cognition. Ryan Weiss 11/28/2018

Quantum Probability in Cognition. Ryan Weiss 11/28/2018 Quantum Probability in Cognition Ryan Weiss 11/28/2018 Overview Introduction Classical vs Quantum Probability Brain Information Processing Decision Making Conclusion Introduction Quantum probability in

More information

THE first formalization of the multi-armed bandit problem

THE first formalization of the multi-armed bandit problem EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can

More information

TI-games I: An Exploration of Type Indeterminacy in Strategic Decision-making

TI-games I: An Exploration of Type Indeterminacy in Strategic Decision-making TI-games I: An Exploration of Type Indeterminacy in Strategic Decision-making J. Busemeyer (Indiana University) A. Lambert-Mogiliansky (PSE) CNAM February 9 2009 1 Introduction This paper belongs to a

More information

Boundary and interior equilibria: what drives convergence in a beauty contest?

Boundary and interior equilibria: what drives convergence in a beauty contest? Boundary and interior equilibria: what drives convergence in a beauty contest? Andrea Morone Università degli studi di Bari Dipartimento di Scienze Economiche e Metodi Matematici a.morone@gmail.com Piergiuseppe

More information

Conjectural Variations in Aggregative Games: An Evolutionary Perspective

Conjectural Variations in Aggregative Games: An Evolutionary Perspective Conjectural Variations in Aggregative Games: An Evolutionary Perspective Alex Possajennikov University of Nottingham January 2012 Abstract Suppose that in aggregative games, in which a player s payoff

More information

NOTE. A 2 2 Game without the Fictitious Play Property

NOTE. A 2 2 Game without the Fictitious Play Property GAMES AND ECONOMIC BEHAVIOR 14, 144 148 1996 ARTICLE NO. 0045 NOTE A Game without the Fictitious Play Property Dov Monderer and Aner Sela Faculty of Industrial Engineering and Management, The Technion,

More information

Experimental Evidence on the Use of Information in Beauty-Contest Game

Experimental Evidence on the Use of Information in Beauty-Contest Game Experimental Evidence on the Use of Information in Beauty-Contest Game Zahra Gambarova University of Leicester March 13, 2018 Abstract. This paper tests the predictions of the Keynesian beauty contest

More information

Intrinsic and Extrinsic Motivation

Intrinsic and Extrinsic Motivation Intrinsic and Extrinsic Motivation Roland Bénabou Jean Tirole. Review of Economic Studies 2003 Bénabou and Tirole Intrinsic and Extrinsic Motivation 1 / 30 Motivation Should a child be rewarded for passing

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

Heterogeneous Agents with Local Social Influence Networks: Path Dependence and Plurality of Equilibria in the ACE Noiseless case

Heterogeneous Agents with Local Social Influence Networks: Path Dependence and Plurality of Equilibria in the ACE Noiseless case Heterogeneous Agents with Local Social Influence Networks: Path Dependence and Plurality of Equilibria in the ACE Noiseless case Denis Phan GEMAS UMR,8598 CNRS& University Paris IV Sorbonne CREM UMR 6211,

More information

(with thanks to Miguel Costa-Gomes, Nagore Iriberri, Colin Camerer, Teck-Hua Ho, and Juin-Kuan Chong)

(with thanks to Miguel Costa-Gomes, Nagore Iriberri, Colin Camerer, Teck-Hua Ho, and Juin-Kuan Chong) RES Easter School: Behavioural Economics Brasenose College Oxford, 22-25 March 2015 Strategic Thinking I: Theory and Evidence Vincent P. Crawford University of Oxford, All Souls College, and University

More information

Lecture 4. 1 Examples of Mechanism Design Problems

Lecture 4. 1 Examples of Mechanism Design Problems CSCI699: Topics in Learning and Game Theory Lecture 4 Lecturer: Shaddin Dughmi Scribes: Haifeng Xu,Reem Alfayez 1 Examples of Mechanism Design Problems Example 1: Single Item Auctions. There is a single

More information

APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL

APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL Adrien Blanchet IAST/TSE/Université Toulouse 1 Capitole Journées SMAI-MODE 216 En collaboration avec G. Carlier & P. Mossay & F. Santambrogio

More information

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo Marc Toussaint University of

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

Guessing Games. Anthony Mendes and Kent E. Morrison

Guessing Games. Anthony Mendes and Kent E. Morrison Guessing Games Anthony Mendes and Kent E. Morrison Abstract. In a guessing game, players guess the value of a random real number selected using some probability density function. The winner may be determined

More information

Information Choice in Macroeconomics and Finance.

Information Choice in Macroeconomics and Finance. Information Choice in Macroeconomics and Finance. Laura Veldkamp New York University, Stern School of Business, CEPR and NBER Spring 2009 1 Veldkamp What information consumes is rather obvious: It consumes

More information

Ergodicity and Non-Ergodicity in Economics

Ergodicity and Non-Ergodicity in Economics Abstract An stochastic system is called ergodic if it tends in probability to a limiting form that is independent of the initial conditions. Breakdown of ergodicity gives rise to path dependence. We illustrate

More information

Temporal Difference Learning & Policy Iteration

Temporal Difference Learning & Policy Iteration Temporal Difference Learning & Policy Iteration Advanced Topics in Reinforcement Learning Seminar WS 15/16 ±0 ±0 +1 by Tobias Joppen 03.11.2015 Fachbereich Informatik Knowledge Engineering Group Prof.

More information

Human-level control through deep reinforcement. Liia Butler

Human-level control through deep reinforcement. Liia Butler Humanlevel control through deep reinforcement Liia Butler But first... A quote "The question of whether machines can think... is about as relevant as the question of whether submarines can swim" Edsger

More information

Distributional stability and equilibrium selection Evolutionary dynamics under payoff heterogeneity (II) Dai Zusai

Distributional stability and equilibrium selection Evolutionary dynamics under payoff heterogeneity (II) Dai Zusai Distributional stability Dai Zusai 1 Distributional stability and equilibrium selection Evolutionary dynamics under payoff heterogeneity (II) Dai Zusai Tohoku University (Economics) August 2017 Outline

More information

Some Thought-ettes on Artificial Agent Modeling and Its Uses

Some Thought-ettes on Artificial Agent Modeling and Its Uses Some Thought-ettes on Artificial Agent Modeling and Its Uses Steven O. Kimbrough University of Pennsylvania http://opim-sun.wharton.upenn.edu/ sok/ kimbrough@wharton.upenn.edu ECEG, Washington, D.C., 1/26/04

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Statistiques en grande dimension

Statistiques en grande dimension Statistiques en grande dimension Christophe Giraud 1,2 et Tristan Mary-Huart 3,4 (1) Université Paris-Sud (2) Ecole Polytechnique (3) AgroParistech (4) INRA - Le Moulon M2 MathSV & Maths Aléa C. Giraud

More information

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo Game Theory Giorgio Fagiolo giorgio.fagiolo@univr.it https://mail.sssup.it/ fagiolo/welcome.html Academic Year 2005-2006 University of Verona Summary 1. Why Game Theory? 2. Cooperative vs. Noncooperative

More information

Animal learning theory

Animal learning theory Animal learning theory Based on [Sutton and Barto, 1990, Dayan and Abbott, 2001] Bert Kappen [Sutton and Barto, 1990] Classical conditioning: - A conditioned stimulus (CS) and unconditioned stimulus (US)

More information

Partially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague

Partially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague Partially observable Markov decision processes Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:

More information

Theory Field Examination Game Theory (209A) Jan Question 1 (duopoly games with imperfect information)

Theory Field Examination Game Theory (209A) Jan Question 1 (duopoly games with imperfect information) Theory Field Examination Game Theory (209A) Jan 200 Good luck!!! Question (duopoly games with imperfect information) Consider a duopoly game in which the inverse demand function is linear where it is positive

More information

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games

A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games International Journal of Fuzzy Systems manuscript (will be inserted by the editor) A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games Mostafa D Awheda Howard M Schwartz Received:

More information

Modeling Bounded Rationality of Agents During Interactions

Modeling Bounded Rationality of Agents During Interactions Interactive Decision Theory and Game Theory: Papers from the 2 AAAI Workshop (WS--3) Modeling Bounded Rationality of Agents During Interactions Qing Guo and Piotr Gmytrasiewicz Department of Computer Science

More information

On Equilibria of Distributed Message-Passing Games

On Equilibria of Distributed Message-Passing Games On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu

More information

Economics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection

Economics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection Economics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection Theory cannot provide clear guesses about with equilibrium will occur in games with multiple

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.

Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions. COMP 34120 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE AI and Games Date: Thursday 17th May 2018 Time: 09:45-11:45 Please answer all Questions. Use a SEPARATE answerbook for each SECTION

More information

Learning by Similarity-weighted Imitation in Winner-takes-all Games

Learning by Similarity-weighted Imitation in Winner-takes-all Games Learning by Similarity-weighted Imitation in Winner-takes-all Games Erik Mohlin Robert Östling Joseph Tao-yi Wang April 6, 2018 Abstract We study how a large population of players in the field learn to

More information

Spatial Economics and Potential Games

Spatial Economics and Potential Games Outline Spatial Economics and Potential Games Daisuke Oyama Graduate School of Economics, Hitotsubashi University Hitotsubashi Game Theory Workshop 2007 Session Potential Games March 4, 2007 Potential

More information

Economics 3012 Strategic Behavior Andy McLennan October 20, 2006

Economics 3012 Strategic Behavior Andy McLennan October 20, 2006 Economics 301 Strategic Behavior Andy McLennan October 0, 006 Lecture 11 Topics Problem Set 9 Extensive Games of Imperfect Information An Example General Description Strategies and Nash Equilibrium Beliefs

More information

A reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation

A reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation A reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation Hajime Fujita and Shin Ishii, Nara Institute of Science and Technology 8916 5 Takayama, Ikoma, 630 0192 JAPAN

More information

Temporal Difference. Learning KENNETH TRAN. Principal Research Engineer, MSR AI

Temporal Difference. Learning KENNETH TRAN. Principal Research Engineer, MSR AI Temporal Difference Learning KENNETH TRAN Principal Research Engineer, MSR AI Temporal Difference Learning Policy Evaluation Intro to model-free learning Monte Carlo Learning Temporal Difference Learning

More information

Emotions and the Design of Institutions

Emotions and the Design of Institutions Emotions and the Design of Institutions Burkhard C. Schipper Incomplete and preliminary: January 15, 2017 Abstract Darwin (1872) already observed that emotions may facilitate communication. Moreover, since

More information

Knowing What Others Know: Coordination Motives in Information Acquisition

Knowing What Others Know: Coordination Motives in Information Acquisition Knowing What Others Know: Coordination Motives in Information Acquisition Christian Hellwig and Laura Veldkamp UCLA and NYU Stern May 2006 1 Hellwig and Veldkamp Two types of information acquisition Passive

More information

Payments System Design Using Reinforcement Learning: A Progress Report

Payments System Design Using Reinforcement Learning: A Progress Report Payments System Design Using Reinforcement Learning: A Progress Report A. Desai 1 H. Du 1 R. Garratt 2 F. Rivadeneyra 1 1 Bank of Canada 2 University of California Santa Barbara 16th Payment and Settlement

More information

Contents Quantum-like Paradigm Classical (Kolmogorovian) and Quantum (Born) Probability

Contents Quantum-like Paradigm Classical (Kolmogorovian) and Quantum (Born) Probability 1 Quantum-like Paradigm... 1 1.1 Applications of Mathematical Apparatus of QM Outside ofphysics... 1 1.2 Irreducible Quantum Randomness, Copenhagen Interpretation... 2 1.3 Quantum Reductionism in Biology

More information

Paul Bourgine and Jean-Pierre Nadal. Cognitive Economics An Interdisciplinary Approach

Paul Bourgine and Jean-Pierre Nadal. Cognitive Economics An Interdisciplinary Approach Paul Bourgine and Jean-Pierre Nadal Cognitive Economics An Interdisciplinary Approach Springer Berlin Heidelberg New York Barcelona Budapest Hong Kong London Milan Paris Santa Clara Singapore Tokyo Preface

More information

Introduction to Game Theory

Introduction to Game Theory Introduction to Game Theory Part 2. Dynamic games of complete information Chapter 2. Two-stage games of complete but imperfect information Ciclo Profissional 2 o Semestre / 2011 Graduação em Ciências Econômicas

More information

Online Appendices for Large Matching Markets: Risk, Unraveling, and Conflation

Online Appendices for Large Matching Markets: Risk, Unraveling, and Conflation Online Appendices for Large Matching Markets: Risk, Unraveling, and Conflation Aaron L. Bodoh-Creed - Cornell University A Online Appendix: Strategic Convergence In section 4 we described the matching

More information

The Time Consistency Problem - Theory and Applications

The Time Consistency Problem - Theory and Applications The Time Consistency Problem - Theory and Applications Nils Adler and Jan Störger Seminar on Dynamic Fiscal Policy Dr. Alexander Ludwig November 30, 2006 Universität Mannheim Outline 1. Introduction 1.1

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning Introduction to Reinforcement Learning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe Machine Learning Summer School, September 2011,

More information

Models of Strategic Reasoning Lecture 2

Models of Strategic Reasoning Lecture 2 Models of Strategic Reasoning Lecture 2 Eric Pacuit University of Maryland, College Park ai.stanford.edu/~epacuit August 7, 2012 Eric Pacuit: Models of Strategic Reasoning 1/30 Lecture 1: Introduction,

More information

Costly Social Learning and Rational Inattention

Costly Social Learning and Rational Inattention Costly Social Learning and Rational Inattention Srijita Ghosh Dept. of Economics, NYU September 19, 2016 Abstract We consider a rationally inattentive agent with Shannon s relative entropy cost function.

More information

Augmented Rescorla-Wagner and Maximum Likelihood Estimation.

Augmented Rescorla-Wagner and Maximum Likelihood Estimation. Augmented Rescorla-Wagner and Maximum Likelihood Estimation. Alan Yuille Department of Statistics University of California at Los Angeles Los Angeles, CA 90095 yuille@stat.ucla.edu In Advances in Neural

More information

David Silver, Google DeepMind

David Silver, Google DeepMind Tutorial: Deep Reinforcement Learning David Silver, Google DeepMind Outline Introduction to Deep Learning Introduction to Reinforcement Learning Value-Based Deep RL Policy-Based Deep RL Model-Based Deep

More information