Behavorial learning in a population of interacting agents Jean-Pierre Nadal
|
|
- Ethan Watts
- 5 years ago
- Views:
Transcription
1 Behavorial learning in a population of interacting agents Jean-Pierre Nadal nadal@lps.ens.fr Laboratoire de Physique Statistique, ENS and Centre d Analyse et Mathématique Sociales, EHESS 1
2 Continental divide Van Huyck, Battalio & Cook, 1997 Nash 2
3 Continental divide 3
4 continental divide Camerer 4
5 . modeling human and animal behaviour (experimental psychology) computational neuroscience (at the neuronal level): decision making based on expected reward/punishment, motor control, economics/game theory: behavorial game theory Some ref.: Behavioral learning BushR. & Mosteller, F., Psychological Rev Rescorla,R.A.&Wagner,A.R.(1972) A theory of Pavlovian conditioning Sutton and Barto, Reinforcement learning, 1984, 1988, 1990 Book: The MIT Press, 1998 free online Behavioral game theory: Cross 1973 ; Arthur 1991 ; McAllister 1991; Walliser 1997 ; Camerer 1998 Dayan P & Daw ND (2008), Decision theory, reinforcement learning, and the brain. Cognitive, Affective & Behavioral Neuroscience
6 Behavorial learning For a set of possible actions, utility / pay-off/ profit not known in advance: Exploration making choices whose possible outcomes are not (well) known Learning: reinforcement of the actions which appear to be the most efficient: higher probability to choose such actions in the future Exploitation of acquired knowledge: past experience (possibly of others) allows to make expectations on outcomes of some actions/choices/strategies. Efficient learning; compromise exploration/exploitation. Collective scale: learning in a population of interacting agents 6
7 Attractions dynamics At each time step every agent i makes a choice (among a set of possible actions/choices/strategies ω = 1,, Ω) Iterated game: at each time step t, agent i associates to each possible action ω a weight (an attraction ) A i (ω, t ) ( estimate of < u i (ω ) > ) Choice of ω i (t): p i (ω i (t) = ω) = f ( A i (ω, t ) ) / Σ { ω} f (A i (ω, t )) With, e.g.: f(x) = exp ( β x ) («logit») 7
8 Reinforcement learning: basic idea A i (ω i, t ) ω = 3 choosen at t payoff u i (3, ω i (t) ) ω i = 1 ω i = 2 ω i = 3 ω i = 4 strategies (actions) A i (ω, t ) = agent i s attraction for action ω: the larger A i (ω, t ), the larger the probability for agent i to choose ω i =ω 8
9 Basic reinforcement learning A i (ω i, t ) If payoffs u i (ω, ω i (t) ) are known for ω=1,2,3,4 «fictitious play» ω i = 1 ω i = 2 ω i = 3 ω i = 4 strategies (actions) The larger A i (ω, t ), the larger the probability for agent i to choose ω i =ω (Cournot 1838 ; Brown 1951 ; Robinson 1951) 9
10 Basic reinforcement learning Ai (ω i, t+1) renormalisation : uniform weakening of the attractions A i ω i = 1 ω i = 2 ω i = 3 ω i = 4 strategies (actions) 10
11 Attractions dynamics At each time step every agent i makes a choice (among a set of possible actions/choices/strategies ω = 1,, Ω) Choice rule: depends on «attractions» (weights) {A i (ω, t), ω=1,,ω} the greater the attraction A i (ω, t) for ω, the greater the probability p i (ω, t) that i choose ω at time t. A i (ω, t) ~ expectation/estimate of the payoff if ω i =ω ~ opinion on the usefulness of ω deterministic choice: ω i (t) = argmax A i (ω, t) ω i 0 (t) ω probabilistic choice ω i (t) = { ω i 0 (t) with proba. 1 ε «trembling hand» any other ω with proba. ε / (Ω 1) p i (ω,t) = f ( A i (ω,t) ) / Z i (t) Z i (t) = Σ ω f( A i (ω,t) ) example: f (A) = exp ( β A ) logit choice function 11
12 Adaptation of attractions updating of attractions: family of learning rules A i (ω, t+1) ) = (1 - μ) A i (ω,t) + μ Φ[π i (ω,t), ω i (t)] π i (ω, t) payoff (which would have been) received at t if ω i (t) = ω [ payoff at t depends on the actions/choices of the other agents at time t: π i (ω, t) = π i (ω i = ω, {ω j (t), j =1,,N; j i} ) ] «fictitious play»: A i (ω, t+1) ) = (1 - μ) A i (ω,t) + μ π i (ω,t), ω = 1,, Ω «weighted belief learning» A i (ω, t+1) ) = (1 - μ) A i (ω,t) + μ π i (ω,t) if ω i (t) = ω A i (ω, t+1) ) = (1 - μ) A i (ω,t) + μ δ π i (ω,t) otherwise. 0 < δ < 1 fictitious play: δ = 1 myopic best response: μ = 1, δ = 1, ε = 0 12
13 Marché aux poissons de Marseille G. Weisbuch, A. Kirman, D. Herreiner (2000) (Données : A. Vignes) -pas de prix affiché ( pas de jeu fictif ) - observation : mélange de clients fidèles et de clients infidèles On se place du point de vue d un acheteur i, face à K vendeurs Stratégies S i de i = { k = aller chez le vendeur numéro k (k = 1,, K) } Renforcement A ik (t) = poids attribué par i à la stratégie «visiter le vendeur k» pour k = S i (t) : A ik ( t + 1 ) = (1 μ) A ik ( t ) + μ u ik ( t ) pour k S i (t) : A ik ( t + 1 ) = (1 μ) A ik ( t ) p i (S i (t) = k, t) = f (A ik ( t ) ) / Σ {k =1,,K} f (A ik ( t ) ) 13
14 Marché aux poissons de Marseille (suite) A ik ( t + 1 ) - A ik ( t ) = - μ A ik ( t ) + μ u ik ( t ) Hypothèse : convergence (régime stationnaire) à chaque fois que i visite k : u ik ( t ) = u ik Approximation de type «champ moyen» : u ik ( t ) < u ik > < u ik > = p i (S i = k) u ik point fixe : A ik = < u ik >, soit : A ik = u ik exp ( β A ik ) / Σ {k } exp (β A ik ) Cas le plus simple : K = 2 vendeurs, et u i1 = u i2 = u Δ i = A i1 - A i2 Points fixes : Δ = u tanh [ β Δ /2 ] β < β c = 2/u Δ = 0 stable : p i (S i = k) = 1/2 β > β c Δ = 0 instable, Δ +, Δ stables (0 < Δ + = - Δ ). pour chaque agent i, la dynamique conduit à Δ i = Δ + ou à Δ i = Δ Si de plus on a hétérogénéité des β : β i < β c «infidélité» β i > β c «fidélité» 14
15 Laboratory experiments 15
16 Experiment: laboratory version of the Dying Seminar Alexis GARAPIN, Bernard RUFFIEUX, Viktoriya SEMESHENKO, Mirta B. GORDON 16
17 4 different information treatments All treatments A B C & D for each individual, information on the individual threshold and payoffs Treatment A: On line OL Each individual is given the additional following information: All individuals thresholds All On line individual decisions Ex post: number of participants for each period of each seminar Treatment B: (ex post) # participants NP Each individual is given the additional following information All individuals thresholds Ex post: number of participants for each period of each seminar Treatment C: (ex post) threshold reached H Each individual is given the additional following information Ex post: Individual threshold reached or not Treatment D: Payoff P No additional information (therefore, a subject who did not participate a seminar does not know whether his threshold would have been reached). 17
18 Putting the Dying Seminar into the lab. Seminars with N = 16 potential participants. In every treatments, the payoffs are the same for an individual seminar (one period): Individual endowment: 50 (with 100 = 1,45 ) Non participant: 50 Participant with threshold reached: 200 Participant with threshold not reached: 0
19 Seminar 1 fragile stable stable
20 Seminar 1
21 Seminar 1
22 Seminar # Distribution of thresholds (IWP) f1bis(hi) F1bis(Hi) stable F1(Hi) stable 2 fragile Hi 22
23 Experiment: laboratory version of the Dying Seminar Alexis GARAPIN, Bernard RUFFIEUX, Viktoriya SEMESHENKO, Mirta B. GORDON 23
24 Seminar # 3. Distribution of thresholds (IWP) stable unstable stable 24
25 Seminar 3 25
26 Seminar # 4 stable stable fragile 26
27 Seminar 4 27
28 Modeling Attraction dynamics ~ reinforcement learning ~ experience Weighted Attraction (EWA) learning scheme (Camerer, 2003) Numerical simulations 28
29 Treatment A Simulations Experiments Treatment A On line : For each individual, information on his threshold and payoffs Each individual is given the additional following information: - All individuals thresholds - All On line individual decisions -Ex post: number of participants for each period of each seminar 29
30 Treatment B Experiments Simulations Treatment B: (ex post) # participants NP For each individual, information on his threshold and payoffs Each individual is given the additional following information - All individuals thresholds -Ex post: number of participants for each period of each seminar 30
31 Treatment C Simulations Experiments Treatment C: (ex post) threshold reached H For each individual, information on his threshold and payoffs Each individual is given the additional following information -Ex post: Individual threshold reached or not 31
32 Treatment D Experiments Simulations Treatment D: Payoff P For each individual, information on his threshold and payoffs No additional information (therefore, a subject who did not participate a seminar does not know whether his threshold would have been reached) 32
33 Perspectives (Much) more participants in large experiments: are the results homothetic? What is a large group in coordination and critical mass problems? Models of (thinking and) learning 33
34 (not too rational) expectations «beauty contest» (Keynes, 1936) vote for the most beautiful winner: the closest to the average choice best strategy: vote for what you expect to be the choice of the majority «p-beauty contest» N players. Every player must choose a number between 0 and 100. Winner: the player with the number closest to p=2/3 of the mean 34
35 p-beauty contest 1 -step thinker: assumes others make a random choice between 0 and 100 hence expected average = 50 their choice = 2/3 x 50 ~ 33 2-step thinker assumes others are 1-step thinker hence expected average = 33 their choice = 2/3 x 33 ~ 22 or rather: assumes a mixture of zero-step and one step thinkers their choice ~ 27 Rational expectations: consistency of expectations every player choose the same number = 2/3 of the average average = 2/3 x average average = 0 35
36 p-beauty contest Camerer 36
37 (not too rational) expectations Cognitive Hierarchy Camerer, Ho & Chong 2002 thinking steps zero step thinkers k-step thinker myopic behavior anticipates k steps of reasoning assumes other players anticipate at most k-1 steps f(k) distribution of k-step thinkers in the population τ = mean number of steps of thinking empirical data: τ ~ 1 2 = <k> = Σ k f(k) k simplest hypothesis: Poisson distribution f(k) = ( τ k / k! ) e -τ 37
38 38
39 cognitive hierarchy exple: 2 players game strategy of a k-step thinker: expected payoff from choosing strategy ω: E k [π(ω)] = Σ ω π(ω, ω ) Σ k <k g k (k ) P k (ω ) probablity that the other player choose strategy ω g k (k ) = fraction of k -step thinkers among those with k < k = f(k ) / Σ k < k f(k ) case of best respons: P k (ω) = 1 if E k [π(ω)] = max ω E k [π(ω )] = 0 otherwise 39
40 40
41 41
42 42
Choice under social influence : effects of learning behaviors on the collective dynamics
Choice under social influence : effects of learning behaviors on the collective dynamics Viktoriya Semeshenko(1) Jean-Pierre Nadal(2,3) Mirta B. Gordon(1) Denis Phan(4,1) (1) Laboratoire Leibniz-IMAG,
More informationarxiv: v1 [physics.soc-ph] 18 Apr 2007
arxiv:0704.2324v1 [physics.soc-ph] 18 Apr 2007 Collective states in social systems with interacting learning agents Viktoriya Semeshenko (1) Mirta B. Gordon (1) Jean-Pierre Nadal (2,3) (1) Laboratoire
More informationCogmaster - CS1
Cogmaster - CS1 http://www.lps.ens.fr/~risc/cs/ http://www.lps.ens.fr/~nadal/cours/cs/cs_jpn.html Dynamique des interactions sociales : du choix individuel au comportement collectif Dynamics of social
More informationA game of matching pennies
A game of matching pennies column L R row T 2,, B,, People last names A-M play ROW (choose T, B) People last names N-Z play COLUMN (choose L, R) A game of matching pennies : Mixed-strategy equilibrium
More information1. Introduction. 2. A Simple Model
. Introduction In the last years, evolutionary-game theory has provided robust solutions to the problem of selection among multiple Nash equilibria in symmetric coordination games (Samuelson, 997). More
More informationNeural coding Ecological approach to sensory coding: efficient adaptation to the natural environment
Neural coding Ecological approach to sensory coding: efficient adaptation to the natural environment Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ.
More informationA Dynamic Level-k Model in Games
Dynamic Level-k Model in Games Teck Ho and Xuanming Su UC erkeley March, 2010 Teck Hua Ho 1 4-stage Centipede Game 4 2 16 8 1 8 4 32 1 2 3 4 64 16 5 Outcome Round 1 2 3 4 5 1 5 6.2% 30.3% 35.9% 20.0% 7.6%
More informationBelief-based Learning
Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium
More informationGuessing with negative feedback : an experiment. Angela Sutan, Marc Willinger ESC & LAMETA Dijon - Montpellier
Guessing with negative feedback : an experiment Angela Sutan, Marc Willinger ESC & LAMETA Dijon - Montpellier Lorentz workshop, October 2007 Depth of reasoning in a strategic environment Cognitive process
More informationEvolution & Learning in Games
1 / 28 Evolution & Learning in Games Econ 243B Jean-Paul Carvalho Lecture 5. Revision Protocols and Evolutionary Dynamics 2 / 28 Population Games played by Boundedly Rational Agents We have introduced
More informationComputational Neuroscience Introduction Day
Computational Neuroscience Introduction Day 9.30am Introduction (C.Machens) 10am M1 (C.Machens) 10.15am M2 (V. Hakim) 10.40 break 11.00 Matching Law (S. Deneve) 14.00 Student presentations 11.20 Rescorla-Wagner
More informationSupervised Learning Part I
Supervised Learning Part I http://www.lps.ens.fr/~nadal/cours/mva Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure
More informationEC3224 Autumn Lecture #03 Applications of Nash Equilibrium
Reading EC3224 Autumn Lecture #03 Applications of Nash Equilibrium Osborne Chapter 3 By the end of this week you should be able to: apply Nash equilibrium to oligopoly games, voting games and other examples.
More informationMixed Strategies. Krzysztof R. Apt. CWI, Amsterdam, the Netherlands, University of Amsterdam. (so not Krzystof and definitely not Krystof)
Mixed Strategies Krzysztof R. Apt (so not Krzystof and definitely not Krystof) CWI, Amsterdam, the Netherlands, University of Amsterdam Mixed Strategies p. 1/1 Mixed Extension of a Finite Game Probability
More informationOn p-beauty Contest Integer Games
On p-beauty Contest Integer Games Rafael López December 001 Abstract In this paper we provide a full characterization of the pure-strategy ash Equilibria for the p-beauty Contest Game when we restrict
More informationarxiv: v2 [cs.gt] 13 Dec 2014
A POPULATION-CENTRIC APPROACH TO THE BEAUTY CONTEST GAME MARC HARPER arxiv:1005.1311v [cs.gt] 13 Dec 014 Abstract. An population-centric analysis for a version of the p-beauty contest game is given for
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationHow to choose under social influence?
How to choose under socal nfluence? Mrta B. Gordon () Jean-Perre Nadal (2) Dens Phan (,3) - Vktorya Semeshenko () () Laboratore Lebnz - IMAG - Grenoble (2) Laboratore de Physque Statstque ENS - Pars (3)
More informationGame Theory and its Applications to Networks - Part I: Strict Competition
Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis
More informationBelief Learning in an Unstable Infinite Game. Paul J. Healy
Belief Learning in an Unstable Infinite Game Paul J. Healy CMU Issue #3 Issue #2 Belief Learning in an Unstable Infinite Game Issue #1 Issue #1: Infinite Games Typical Learning Model: Finite set of strategies
More informationAppendix 3: Cognitive Hierarchy
1 Appendix 3: Cognitive Hierarchy As a robustness check, we conduct our analysis with the cognitive hierarchy model of Camerer, Ho and Chong (2004). There, the distribution of types is Poisson distributed,
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationRobust Predictions in Games with Incomplete Information
Robust Predictions in Games with Incomplete Information joint with Stephen Morris (Princeton University) November 2010 Payoff Environment in games with incomplete information, the agents are uncertain
More informationLecture 6. Xavier Gabaix. March 11, 2004
14.127 Lecture 6 Xavier Gabaix March 11, 2004 0.0.1 Shrouded attributes. A continuation Rational guys U i = q p + max (V p, V e) + σε i = q p + V min (p, e) + σε i = U i + σε i Rational demand for good
More informationLevel K Thinking. Mark Dean. Columbia University - Spring 2017
Level K Thinking Mark Dean Columbia University - Spring 2017 Introduction Game theory: The study of strategic decision making Your outcome depends on your own actions and the actions of others Standard
More informationOptimal Convergence in Multi-Agent MDPs
Optimal Convergence in Multi-Agent MDPs Peter Vrancx 1, Katja Verbeeck 2, and Ann Nowé 1 1 {pvrancx, ann.nowe}@vub.ac.be, Computational Modeling Lab, Vrije Universiteit Brussel 2 k.verbeeck@micc.unimaas.nl,
More informationMechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham
Game Theory Course: Jackson, Leyton-Brown & Shoham Bayesian Game Setting Extend the social choice setting to a new setting where agents can t be relied upon to disclose their preferences honestly Start
More informationPreliminary Results on Social Learning with Partial Observations
Preliminary Results on Social Learning with Partial Observations Ilan Lobel, Daron Acemoglu, Munther Dahleh and Asuman Ozdaglar ABSTRACT We study a model of social learning with partial observations from
More informationHigher Order Beliefs in Dynamic Environments
University of Pennsylvania Department of Economics June 22, 2008 Introduction: Higher Order Beliefs Global Games (Carlsson and Van Damme, 1993): A B A 0, 0 0, θ 2 B θ 2, 0 θ, θ Dominance Regions: A if
More informationOptimism in the Face of Uncertainty Should be Refutable
Optimism in the Face of Uncertainty Should be Refutable Ronald ORTNER Montanuniversität Leoben Department Mathematik und Informationstechnolgie Franz-Josef-Strasse 18, 8700 Leoben, Austria, Phone number:
More informationWalras-Bowley Lecture 2003
Walras-Bowley Lecture 2003 Sergiu Hart This version: September 2004 SERGIU HART c 2004 p. 1 ADAPTIVE HEURISTICS A Little Rationality Goes a Long Way Sergiu Hart Center for Rationality, Dept. of Economics,
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationBoundary and interior equilibria: what drives convergence in a beauty contest?
Boundary and interior equilibria: what drives convergence in a beauty contest? Andrea Morone Università degli studi di Bari Dipartimento di Scienze Economiche e Metodi Matematici a.morone@gmail.com Piergiuseppe
More information4. Opponent Forecasting in Repeated Games
4. Opponent Forecasting in Repeated Games Julian and Mohamed / 2 Learning in Games Literature examines limiting behavior of interacting players. One approach is to have players compute forecasts for opponents
More informationReinforcement Learning
1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision
More informationFictitious Self-Play in Extensive-Form Games
Johannes Heinrich, Marc Lanctot, David Silver University College London, Google DeepMind July 9, 05 Problem Learn from self-play in games with imperfect information. Games: Multi-agent decision making
More informationFinite Mixture Analysis of Beauty-Contest Data from Multiple Samples
Finite Mixture Analysis of Beauty-Contest Data from Multiple Samples Antoni Bosch-Domènech José G. Montalvo Rosemarie Nagel and Albert Satorra Universitat Pompeu Fabra, Barcelona July 8, 2004 Research
More informationLevel-n Bounded Rationality and Dominated Strategies in Normal-Form Games
Level-n Bounded Rationality and Dominated Strategies in Normal-Form Games by Dale O. Stahl Department of Economics University of Texas at Austin and Ernan Haruvy Department of Management University of
More informationLearning Equilibrium as a Generalization of Learning to Optimize
Learning Equilibrium as a Generalization of Learning to Optimize Dov Monderer and Moshe Tennenholtz Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa 32000,
More informationJuly Framing Game Theory. Hitoshi Matsushima. University of Tokyo. February 2018
1 Framing Game Theory July 2018 Hitoshi Matsushima University of Tokyo February 2018 2 1. Motivation and Examples Real players fail Hypothetical Thinking Shafir and Tversky (92), Evans (07), Charness and
More informationLearning to Compete, Compromise, and Cooperate in Repeated General-Sum Games
Learning to Compete, Compromise, and Cooperate in Repeated General-Sum Games Jacob W. Crandall Michael A. Goodrich Computer Science Department, Brigham Young University, Provo, UT 84602 USA crandall@cs.byu.edu
More informationA Dynamic Level-k Model in Games
A Dynamic Level-k Model in Games Teck-Hua Ho and Xuanming Su September 21, 2010 Backward induction is the most widely accepted principle for predicting behavior in dynamic games. In experiments, however,
More informationThe Reinforcement Learning Problem
The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence
More informationIntro to Probability. Andrei Barbu
Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems
More informationQuantum Probability in Cognition. Ryan Weiss 11/28/2018
Quantum Probability in Cognition Ryan Weiss 11/28/2018 Overview Introduction Classical vs Quantum Probability Brain Information Processing Decision Making Conclusion Introduction Quantum probability in
More informationTHE first formalization of the multi-armed bandit problem
EDIC RESEARCH PROPOSAL 1 Multi-armed Bandits in a Network Farnood Salehi I&C, EPFL Abstract The multi-armed bandit problem is a sequential decision problem in which we have several options (arms). We can
More informationTI-games I: An Exploration of Type Indeterminacy in Strategic Decision-making
TI-games I: An Exploration of Type Indeterminacy in Strategic Decision-making J. Busemeyer (Indiana University) A. Lambert-Mogiliansky (PSE) CNAM February 9 2009 1 Introduction This paper belongs to a
More informationBoundary and interior equilibria: what drives convergence in a beauty contest?
Boundary and interior equilibria: what drives convergence in a beauty contest? Andrea Morone Università degli studi di Bari Dipartimento di Scienze Economiche e Metodi Matematici a.morone@gmail.com Piergiuseppe
More informationConjectural Variations in Aggregative Games: An Evolutionary Perspective
Conjectural Variations in Aggregative Games: An Evolutionary Perspective Alex Possajennikov University of Nottingham January 2012 Abstract Suppose that in aggregative games, in which a player s payoff
More informationNOTE. A 2 2 Game without the Fictitious Play Property
GAMES AND ECONOMIC BEHAVIOR 14, 144 148 1996 ARTICLE NO. 0045 NOTE A Game without the Fictitious Play Property Dov Monderer and Aner Sela Faculty of Industrial Engineering and Management, The Technion,
More informationExperimental Evidence on the Use of Information in Beauty-Contest Game
Experimental Evidence on the Use of Information in Beauty-Contest Game Zahra Gambarova University of Leicester March 13, 2018 Abstract. This paper tests the predictions of the Keynesian beauty contest
More informationIntrinsic and Extrinsic Motivation
Intrinsic and Extrinsic Motivation Roland Bénabou Jean Tirole. Review of Economic Studies 2003 Bénabou and Tirole Intrinsic and Extrinsic Motivation 1 / 30 Motivation Should a child be rewarded for passing
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationLecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation
Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free
More informationHeterogeneous Agents with Local Social Influence Networks: Path Dependence and Plurality of Equilibria in the ACE Noiseless case
Heterogeneous Agents with Local Social Influence Networks: Path Dependence and Plurality of Equilibria in the ACE Noiseless case Denis Phan GEMAS UMR,8598 CNRS& University Paris IV Sorbonne CREM UMR 6211,
More information(with thanks to Miguel Costa-Gomes, Nagore Iriberri, Colin Camerer, Teck-Hua Ho, and Juin-Kuan Chong)
RES Easter School: Behavioural Economics Brasenose College Oxford, 22-25 March 2015 Strategic Thinking I: Theory and Evidence Vincent P. Crawford University of Oxford, All Souls College, and University
More informationLecture 4. 1 Examples of Mechanism Design Problems
CSCI699: Topics in Learning and Game Theory Lecture 4 Lecturer: Shaddin Dughmi Scribes: Haifeng Xu,Reem Alfayez 1 Examples of Mechanism Design Problems Example 1: Single Item Auctions. There is a single
More informationAPPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL
APPLICATIONS DU TRANSPORT OPTIMAL DANS LES JEUX À POTENTIEL Adrien Blanchet IAST/TSE/Université Toulouse 1 Capitole Journées SMAI-MODE 216 En collaboration avec G. Carlier & P. Mossay & F. Santambrogio
More informationSatisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games
Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),
More informationReinforcement Learning
Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo Marc Toussaint University of
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationGuessing Games. Anthony Mendes and Kent E. Morrison
Guessing Games Anthony Mendes and Kent E. Morrison Abstract. In a guessing game, players guess the value of a random real number selected using some probability density function. The winner may be determined
More informationInformation Choice in Macroeconomics and Finance.
Information Choice in Macroeconomics and Finance. Laura Veldkamp New York University, Stern School of Business, CEPR and NBER Spring 2009 1 Veldkamp What information consumes is rather obvious: It consumes
More informationErgodicity and Non-Ergodicity in Economics
Abstract An stochastic system is called ergodic if it tends in probability to a limiting form that is independent of the initial conditions. Breakdown of ergodicity gives rise to path dependence. We illustrate
More informationTemporal Difference Learning & Policy Iteration
Temporal Difference Learning & Policy Iteration Advanced Topics in Reinforcement Learning Seminar WS 15/16 ±0 ±0 +1 by Tobias Joppen 03.11.2015 Fachbereich Informatik Knowledge Engineering Group Prof.
More informationHuman-level control through deep reinforcement. Liia Butler
Humanlevel control through deep reinforcement Liia Butler But first... A quote "The question of whether machines can think... is about as relevant as the question of whether submarines can swim" Edsger
More informationDistributional stability and equilibrium selection Evolutionary dynamics under payoff heterogeneity (II) Dai Zusai
Distributional stability Dai Zusai 1 Distributional stability and equilibrium selection Evolutionary dynamics under payoff heterogeneity (II) Dai Zusai Tohoku University (Economics) August 2017 Outline
More informationSome Thought-ettes on Artificial Agent Modeling and Its Uses
Some Thought-ettes on Artificial Agent Modeling and Its Uses Steven O. Kimbrough University of Pennsylvania http://opim-sun.wharton.upenn.edu/ sok/ kimbrough@wharton.upenn.edu ECEG, Washington, D.C., 1/26/04
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationStatistiques en grande dimension
Statistiques en grande dimension Christophe Giraud 1,2 et Tristan Mary-Huart 3,4 (1) Université Paris-Sud (2) Ecole Polytechnique (3) AgroParistech (4) INRA - Le Moulon M2 MathSV & Maths Aléa C. Giraud
More informationFirst Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo
Game Theory Giorgio Fagiolo giorgio.fagiolo@univr.it https://mail.sssup.it/ fagiolo/welcome.html Academic Year 2005-2006 University of Verona Summary 1. Why Game Theory? 2. Cooperative vs. Noncooperative
More informationAnimal learning theory
Animal learning theory Based on [Sutton and Barto, 1990, Dayan and Abbott, 2001] Bert Kappen [Sutton and Barto, 1990] Classical conditioning: - A conditioned stimulus (CS) and unconditioned stimulus (US)
More informationPartially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague
Partially observable Markov decision processes Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationTheory Field Examination Game Theory (209A) Jan Question 1 (duopoly games with imperfect information)
Theory Field Examination Game Theory (209A) Jan 200 Good luck!!! Question (duopoly games with imperfect information) Consider a duopoly game in which the inverse demand function is linear where it is positive
More informationA Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games
International Journal of Fuzzy Systems manuscript (will be inserted by the editor) A Residual Gradient Fuzzy Reinforcement Learning Algorithm for Differential Games Mostafa D Awheda Howard M Schwartz Received:
More informationModeling Bounded Rationality of Agents During Interactions
Interactive Decision Theory and Game Theory: Papers from the 2 AAAI Workshop (WS--3) Modeling Bounded Rationality of Agents During Interactions Qing Guo and Piotr Gmytrasiewicz Department of Computer Science
More informationOn Equilibria of Distributed Message-Passing Games
On Equilibria of Distributed Message-Passing Games Concetta Pilotto and K. Mani Chandy California Institute of Technology, Computer Science Department 1200 E. California Blvd. MC 256-80 Pasadena, US {pilotto,mani}@cs.caltech.edu
More informationEconomics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection
Economics 209B Behavioral / Experimental Game Theory (Spring 2008) Lecture 3: Equilibrium refinements and selection Theory cannot provide clear guesses about with equilibrium will occur in games with multiple
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationTwo hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE. Date: Thursday 17th May 2018 Time: 09:45-11:45. Please answer all Questions.
COMP 34120 Two hours UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE AI and Games Date: Thursday 17th May 2018 Time: 09:45-11:45 Please answer all Questions. Use a SEPARATE answerbook for each SECTION
More informationLearning by Similarity-weighted Imitation in Winner-takes-all Games
Learning by Similarity-weighted Imitation in Winner-takes-all Games Erik Mohlin Robert Östling Joseph Tao-yi Wang April 6, 2018 Abstract We study how a large population of players in the field learn to
More informationSpatial Economics and Potential Games
Outline Spatial Economics and Potential Games Daisuke Oyama Graduate School of Economics, Hitotsubashi University Hitotsubashi Game Theory Workshop 2007 Session Potential Games March 4, 2007 Potential
More informationEconomics 3012 Strategic Behavior Andy McLennan October 20, 2006
Economics 301 Strategic Behavior Andy McLennan October 0, 006 Lecture 11 Topics Problem Set 9 Extensive Games of Imperfect Information An Example General Description Strategies and Nash Equilibrium Beliefs
More informationA reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation
A reinforcement learning scheme for a multi-agent card game with Monte Carlo state estimation Hajime Fujita and Shin Ishii, Nara Institute of Science and Technology 8916 5 Takayama, Ikoma, 630 0192 JAPAN
More informationTemporal Difference. Learning KENNETH TRAN. Principal Research Engineer, MSR AI
Temporal Difference Learning KENNETH TRAN Principal Research Engineer, MSR AI Temporal Difference Learning Policy Evaluation Intro to model-free learning Monte Carlo Learning Temporal Difference Learning
More informationEmotions and the Design of Institutions
Emotions and the Design of Institutions Burkhard C. Schipper Incomplete and preliminary: January 15, 2017 Abstract Darwin (1872) already observed that emotions may facilitate communication. Moreover, since
More informationKnowing What Others Know: Coordination Motives in Information Acquisition
Knowing What Others Know: Coordination Motives in Information Acquisition Christian Hellwig and Laura Veldkamp UCLA and NYU Stern May 2006 1 Hellwig and Veldkamp Two types of information acquisition Passive
More informationPayments System Design Using Reinforcement Learning: A Progress Report
Payments System Design Using Reinforcement Learning: A Progress Report A. Desai 1 H. Du 1 R. Garratt 2 F. Rivadeneyra 1 1 Bank of Canada 2 University of California Santa Barbara 16th Payment and Settlement
More informationContents Quantum-like Paradigm Classical (Kolmogorovian) and Quantum (Born) Probability
1 Quantum-like Paradigm... 1 1.1 Applications of Mathematical Apparatus of QM Outside ofphysics... 1 1.2 Irreducible Quantum Randomness, Copenhagen Interpretation... 2 1.3 Quantum Reductionism in Biology
More informationPaul Bourgine and Jean-Pierre Nadal. Cognitive Economics An Interdisciplinary Approach
Paul Bourgine and Jean-Pierre Nadal Cognitive Economics An Interdisciplinary Approach Springer Berlin Heidelberg New York Barcelona Budapest Hong Kong London Milan Paris Santa Clara Singapore Tokyo Preface
More informationIntroduction to Game Theory
Introduction to Game Theory Part 2. Dynamic games of complete information Chapter 2. Two-stage games of complete but imperfect information Ciclo Profissional 2 o Semestre / 2011 Graduação em Ciências Econômicas
More informationOnline Appendices for Large Matching Markets: Risk, Unraveling, and Conflation
Online Appendices for Large Matching Markets: Risk, Unraveling, and Conflation Aaron L. Bodoh-Creed - Cornell University A Online Appendix: Strategic Convergence In section 4 we described the matching
More informationThe Time Consistency Problem - Theory and Applications
The Time Consistency Problem - Theory and Applications Nils Adler and Jan Störger Seminar on Dynamic Fiscal Policy Dr. Alexander Ludwig November 30, 2006 Universität Mannheim Outline 1. Introduction 1.1
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning Rémi Munos SequeL project: Sequential Learning http://researchers.lille.inria.fr/ munos/ INRIA Lille - Nord Europe Machine Learning Summer School, September 2011,
More informationModels of Strategic Reasoning Lecture 2
Models of Strategic Reasoning Lecture 2 Eric Pacuit University of Maryland, College Park ai.stanford.edu/~epacuit August 7, 2012 Eric Pacuit: Models of Strategic Reasoning 1/30 Lecture 1: Introduction,
More informationCostly Social Learning and Rational Inattention
Costly Social Learning and Rational Inattention Srijita Ghosh Dept. of Economics, NYU September 19, 2016 Abstract We consider a rationally inattentive agent with Shannon s relative entropy cost function.
More informationAugmented Rescorla-Wagner and Maximum Likelihood Estimation.
Augmented Rescorla-Wagner and Maximum Likelihood Estimation. Alan Yuille Department of Statistics University of California at Los Angeles Los Angeles, CA 90095 yuille@stat.ucla.edu In Advances in Neural
More informationDavid Silver, Google DeepMind
Tutorial: Deep Reinforcement Learning David Silver, Google DeepMind Outline Introduction to Deep Learning Introduction to Reinforcement Learning Value-Based Deep RL Policy-Based Deep RL Model-Based Deep
More information