A Behaviouristic Model of Signalling and Moral Sentiments

Similar documents
Lecture 3: Markov Decision Processes

Reinforcement Learning. Introduction

Chapter 6: Temporal Difference Learning

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Reinforcement Learning and Deep Reinforcement Learning

Time varying networks and the weakness of strong ties

STOCHASTIC PROCESSES Basic notions

Evolutionary Game Theory

Multiagent Value Iteration in Markov Games

ST 740: Markov Chain Monte Carlo

arxiv: v1 [physics.soc-ph] 3 Nov 2008

Introduction to game theory LECTURE 1

Evolutionary Bargaining Strategies

Machine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel

Networks and sciences: The story of the small-world

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Lecture XI. Approximating the Invariant Distribution

Evolutionary Games and Computer Simulations

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Ambiguous Business Cycles: Online Appendix

Imitation Processes with Small Mutations

arxiv: v1 [cs.gt] 17 Aug 2016

1 AUTOCRATIC STRATEGIES

Costly Signals and Cooperation

Belief-based Learning

arxiv: v1 [q-bio.pe] 22 Sep 2016

CS599 Lecture 1 Introduction To RL

arxiv: v2 [physics.soc-ph] 11 Feb 2009

Graph topology and the evolution of cooperation

The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma

Computational Genomics and Molecular Biology, Fall

Game Theory, Population Dynamics, Social Aggregation. Daniele Vilone (CSDC - Firenze) Namur

Q-Learning for Markov Decision Processes*

Reinforcement Learning as Variational Inference: Two Recent Approaches

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007

Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing

Algorithmic Strategy Complexity

Stochastic models in product form: the (E)RCAT methodology

Altruism, reputation, and collective collapse of cooperation in a simple model

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

CS 7180: Behavioral Modeling and Decisionmaking

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

Partners in power: Job mobility and dynamic deal-making

Reinforcement Learning

Lecture 7: Value Function Approximation

Reinforcement Learning

Motivation for introducing probabilities

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017

Revealing inductive biases through iterated learning

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang

Inference in Bayesian Networks

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Zero-Sum Stochastic Games An algorithmic review

Decision Theory: Markov Decision Processes

Markov Chains and MCMC

Learning to Coordinate Efficiently: A Model-based Approach

Distributed learning in potential games over large-scale networks

Evolutionary Computation. DEIS-Cesena Alma Mater Studiorum Università di Bologna Cesena (Italia)

Lecture 1: March 7, 2018

Monte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning.

Computation of Efficient Nash Equilibria for experimental economic games

Ergodicity and Non-Ergodicity in Economics

Cover Page. The handle holds various files of this Leiden University dissertation

Lecture 8: Policy Gradient

Markov Model. Model representing the different resident states of a system, and the transitions between the different states

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Reinforcement Learning and NLP

Markov Decision Processes

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Online Appendices for Large Matching Markets: Risk, Unraveling, and Conflation

Variance Reduction for Policy Gradient Methods. March 13, 2017

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks

Reinforcement Learning: the basics

Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods

ISyE 6761 (Fall 2016) Stochastic Processes I

The Reinforcement Learning Problem

CS 7180: Behavioral Modeling and Decision- making in AI

Markov chains. Randomness and Computation. Markov chains. Markov processes

Bayesian Networks Inference with Probabilistic Graphical Models

Evolution of cooperation in finite populations. Sabin Lessard Université de Montréal

Exercises, II part Exercises, II part

Machine Learning I Continuous Reinforcement Learning

Diffusion of Innovation

Population Games and Evolutionary Dynamics

Evolution & Learning in Games

Machine Learning for Data Science (CS4786) Lecture 24

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Models of Molecular Evolution

Optimism in the Face of Uncertainty Should be Refutable

Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Reinforcement learning

Lecture 7 and 8: Markov Chain Monte Carlo

Introduction to Machine Learning CMU-10701

Transcription:

A Behaviouristic Model of Signalling and Moral Sentiments Johannes Zschache, University of Leipzig Monte Verità, October 18, 2012

Introduction Model Parameter analysis Conclusion

Introduction the evolution of cooperative behaviour in one-time PD-interaction C D C 3 0 D 4 1 Robert Frank (1987): Homo Economicus might prefer a utility function with a conscience moral sentiments have evolved to counteract the temptation to cheat in one-time interaction in combination with signals that are contingent upon the sentiments, compliant behaviour is stable

Introduction Pr(H S j ) = h f H (S j ) h f H (S j ) + (1 h) f D (S j ) E(π H S j ) = π(c, C) Pr(H S j ) + π(c, D) (1 Pr(H S j )) interaction with j if E(π H S j ) > π(e) = π(d, D) stringent assumptions knowledge of the population structure correct interpretation of signals

Introduction alternative idea in Frank (1988) 1. moral sentiments develop in stable relationships 2. the matching law (Herrnstein, 1997) is behavioural model 3. impulsiveness: immediate reward often exceeds long-term benefits (chocolate cake during a diet, smoking,.. ) 4. moral sentiments make the actor prudent when: a) choosing an action in the iterated PD b) choosing to interact with a partner 5. moral sentiment develops by evolutionary process 6. same moral sentiments affect interactions with strangers

Model 1. moral sentiments develop in stable relationships 0.05 0.77 0.43 0.4 0.84 0.31 0.23 0.86 0.72 0.84 0.86 0.75 0.78 0.76 0.02 0.95 0.26 0.39 0.64 0.72 0.51 0.5 0.92 0.69 0.94 0.91 0.43 0.65 0.83 0.14 0.1 0.56 0.53 0.88 0.51 0.42 0.63 0.53 0.96 1 0.48 0.34 0.53 0.55 0.91 0.91 0.02 0.3 0.86 0.29 0.54 0.05 0.13 0.22 0.11 0.53 0.32 0.68 0.99 0.83 0.17 0.05 0.88 0.28 0.64 0.99 0.25 0.08 0.49 0.41 0.22 0.4 0.25 0.53 0.64 0.08 0.88 0.12 0.15 0.6 0.21 0.68 0.59 0.93 0.2 0.68 0.21 0.87 0.99 0.02 0.7 0.85 0.11 0.77 0.07 0.75 0.69 0.96 0.21 0.16 100 agents on a two-dimensional grid Moore neighbourhoods N recurrent interactions with neighbours

Model 2. the matching law (Herrnstein, 1997) is behavioural model Definition Let A = {a 1,..., a m } be the set of all possible actions, and let T (a i ) denote the number of times when action a i was chosen during a specified time period. Furthermore, let U(a i ) = t u t(a i ) be the sum of all reinforcements that were received after emitting action a i during this period. The matching law holds if and only if T (a i ) T (a 1 ) + T (a 2 ) + + T (a m ) = U(a i ) U(a 1 ) + U(a 2 ) + + U(a m ), for all i {1,..., m}.

Model 3. impulsiveness: if the reinforcement is delayed by d t (a i ): V (a i ) = u t (a i ) 1 + I d t t (a i ) exponential discounting: δ d(a i ) U(a i ), δ [0, 1], x R: δ d(a) U(a) > δ d(b) U(b) δ d(a)+x U(a) > δ d(b)+x U(b) hyperbolic discounting: I = 1.0 and x = 100: 10 1 + I 100 < 20 1 + I 102, but 10 1 + I 0 > 20 1 + I 2 implications for PD: the immediate value of the temptation pay-off overwhelms the player of an iterated prisoner s dilemma

Model 4. moral sentiments help to overcome impulsiveness U(a i ) V (a i ) = 1 + I d(a i ) Guilt is just such a feeling. [..] If it is felt strongly enough, it can negate the spurious attraction of the imminent material reward (Frank, 1988, p.82).

Model a) choosing an action in the iterated PD one memory entry of length λ for each neighbour n N: (σ(n), π(n)): σ(n) {C, D, E} λ, π(n) R λ bookkeeping β: V (n, a) = algorithm 1: j:σ(n) j =a min(j+β,λ) i=j 1: for all a {C, D} do n N 2: calculate v(a) = V (n,a) T (a) 3: end for 4: â select action with highest v(a) 5: return â π(n) i 1 + I (i j)

Model b) choosing to interact with a partner average value of a partner v(n): v(n) = algorithm 2: 1: for all n N do 2: calculate v(n) 3: end for 4: ˆn select neighbour with highest v(n) 5: return ˆn λ i=1 π(n) i T (n) algorithm 1 and 2 are called melioration learning (Herrnstein, 1997) melioration learning is a process that leads to the matching law

Model 5. moral sentiment develops in an evolutionary process the impulsiveness I resembles the impact of an actor s moral sentiments evolution of moral sentiment evolution of I an agent s fitness is the average of the reinforcements during one generation (1000 interactions) after one generation, new agents are bred a parent is chosen with a probability directly proportional to the parent s fitness a parent passes on his impulsiveness value to a new agent random noise: p mut = 0.1 probability of experimenting ɛ : choose random neighbour as partner and random action

Paramter analysis experimenting & memory length one memory entry of length λ for each neighbour n N: (σ, π), σ {C, D, E} λ, π R λ average impulsiveness 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 experimenting ε, memory length λ (bookkeeping β = 10) 0.05, 25 0.05, 50 0.05, 100 0.10, 25 0.10, 50 0.10, 100 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 generation cooperation interaction

Paramter analysis bookkeeping: accounting for the future V (n, a) = min(j+β,λ) j:σ(n) j =a i=j π(n) i 1 + I (i j) average impulsiveness 1.0 0.8 0.6 0.4 0.2 0.0 bookkeeping β 1 5 10 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 generation cooperation interaction (experimenting ε = 0.1, memory length λ = 100)

Model: Interactions with Strangers 6. same moral sentiments affect interactions with strangers a certain percentage, φ, of interactions take place with strangers (= actors who are met only once) there is no memory of past interactions with a stranger but there might be a signal that is contingent on the existence of moral sentiments a signal s is a number between 0 and 9 indicating the actor s impulsiveness one memory entry for each signal strength λ i=1 average value of a signal v(s): v(s) = π(s) i T (s) actors can choose not to interact with a stranger

Paramter analysis Interactions with Strangers average impulsiveness 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 no signals / signals, strangers φ no signals, 0.2 no signals, 0.4 no signals, 0.6 no signals, 0.8 signals, 0.2 signals, 0.4 signals, 0.6 signals, 0.8 200 400 600 800 1000 200 400 600 800 1000 200 400 600 800 1000 200 400 600 800 1000 generation (experimenting ε = 0.1, memory length λ = 100, bookkeeping β = 10)

Paramter analysis Interactions with Strangers 1.0 impulsiveness 1.0 cooperation with partners 1.0 cooperation with strangers 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1 strangers σ 0.2 0.4 0.6 0.8 1.0 strangers σ 0.2 0.4 0.6 0.8 1.0 strangers σ no signal signal no signal signal no signal signal (generation > 900, experimenting ε = 0.1, memory length λ = 100, bookkeeping β = 10)

Conclusion Frank (1987): formal model of signalling and moral sentiments that lead to the evolution of cooperation in the one-time PD; heavy assumptions Frank (1988): informal ideas about the development of moral sentiments behaviouristic view on human behaviour (matching law) sentiments emerge because of their effects to repress impulsiveness leading to cooperation among friends in case of one-time interactions, signals are needed to support the development of moral sentiments and cooperation among strangers

References Frank, R. H. (1987). If Homo Economicus could choose his own utility function, would he want one with a conscience? The American Economic Review 77(4), 593 604. Frank, R. H. (1988). Passions within Reason. The Strategic Role of the Emotions. W. W. Norton & Company. Herrnstein, R. J. (1997). The Matching Law. Papers in Psychology and Economics. Harvard University Press. source code: http://code.google.com/p/signalling-emotions

Supplementary each of the simulations was performed for a fixed set of parameter values several repetitions that differ in their random seeds since the simulation can be represented as stochastic time-homogeneous Markov chains, they tend toward a unique and stationary state distribution we use the average level of I and the rate of interaction and cooperation as summary statistics to describe the unique distribution statistical tests to check whether the summary statistic describes the stationary state additionally: check for the most promising conditions that lead to the evolution of moral sentiments

Supplementary: Convergence Statistics Gelman, A. and D. B. Rubin (1992). Inference from Iterative Simulation Using Multiple Sequences. Statistical Science 7 (4), 457 472. Brooks, S. P. and A. Gelman (1998). General Methods for Monitoring Convergence of Iterative Simulations. Journal of Compuational and Graphical Statistics 7 (4), 434 455. we generate m 1 chains of a simulation with n time steps : (x 11, x 12,..., x 1n ), (x 21, x 22,..., x 2n ),..., (x m1, x m2,..., x mn ). ˆR I = s ˆR s = length of pooled-chains interval mean length of the within-chain 1 m n mn 1 j=1 i=1 x ji x s n i=1 x ji x j s 1 m(n 1) m j=1 iterated graphical approach: sub-chains (x j1,..., x j(2kb) ), with b being a batch length and k = 1,..., n/b

Supplementary: Example average impulsiveness 1.0 0.8 0.6 0.4 0.2 0.0 ticks per generation 100 250 500 750 1000 200 400 600 800 1000 200 400 600 800 1000 200 400 600 800 1000 200 400 600 800 1000 200 400 600 800 1000 generation ticks per generation 100 250 500 750 1000 6 R^ c = 1.91 R^ c = 1.18 R^ c = 1.73 R^ c = 1.34 R^ c = 1.09 5 R^ l = 2.12 R^ l = 1.35 R^ l = 1.73 R^ l = 1.36 R^ l = 1.06 statistic statistic 4 3 3 R^ 3 = 1.59 3 R^ 3 = 1.14 3 R^ 3 = 1.42 3 R^ 3 = 1.21 3 R^ 3 = 1.04 R^ c R^ l 3 2 R^ 3 1 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 iteration no

Supplementary: The Wald-Wolfowitz Test Wald, A. and J. Wolfowitz (1940). On a test whether two samples are from the same population. Annals of Mathematical Statistics 11 (2), 147 162. Grazzini, J. (2012). Analysis of the emergent properties: Stationarity and ergodicity. Journal of Artificial Societies and Social Simulation 15 (2), 7. check whether two samples X and Y respectively with n and m observations are from the same population i.e. whether the distributions of the two samples are identical two samples are pooled and arranged in ascending order as Z = (z 1, z 2,..., z n+m ), where z 1 < z 2 < < z n+m sequence V of zeros and ones: replacing each element of Z by 0 if z i is element of X and by 1 if z i is element of Y statistic U(X, Y ) of two samples X and Y is the number of runs in the corresponding V sequence given X = {5, 2.2, 4.5, 1} and Y = {2, 4.3, 2.5, 1.4, 3}, V = (0, 1, 1, 0, 1, 1, 1, 0, 0) and U(X, Y ) = 5.

Supplementary: Example stationarity: check whether sections of the mean distribution belong to the same distribution ergodicity: check whether m chains of a simulation belong to the same distribution b = 10 b = 20 tpg u s p s u e p e u s p s u e p e 100 27 0.000000 33 0.004850 40 0.000000 57 0.000000 250 31 0.000640 42 0.700180 57 0.000000 74 0.052510 500 23 0.000000 37 0.101700 46 0.000000 65 0.000110 750 34 0.011770 42 0.700180 66 0.000270 76 0.126710 1000 39 0.281090 40 0.412330 67 0.000600 78 0.256780