A Behaviouristic Model of Signalling and Moral Sentiments

Size: px
Start display at page:

Download "A Behaviouristic Model of Signalling and Moral Sentiments"

Transcription

1 A Behaviouristic Model of Signalling and Moral Sentiments Johannes Zschache, University of Leipzig Monte Verità, October 18, 2012

2 Introduction Model Parameter analysis Conclusion

3 Introduction the evolution of cooperative behaviour in one-time PD-interaction C D C 3 0 D 4 1 Robert Frank (1987): Homo Economicus might prefer a utility function with a conscience moral sentiments have evolved to counteract the temptation to cheat in one-time interaction in combination with signals that are contingent upon the sentiments, compliant behaviour is stable

4 Introduction Pr(H S j ) = h f H (S j ) h f H (S j ) + (1 h) f D (S j ) E(π H S j ) = π(c, C) Pr(H S j ) + π(c, D) (1 Pr(H S j )) interaction with j if E(π H S j ) > π(e) = π(d, D) stringent assumptions knowledge of the population structure correct interpretation of signals

5 Introduction alternative idea in Frank (1988) 1. moral sentiments develop in stable relationships 2. the matching law (Herrnstein, 1997) is behavioural model 3. impulsiveness: immediate reward often exceeds long-term benefits (chocolate cake during a diet, smoking,.. ) 4. moral sentiments make the actor prudent when: a) choosing an action in the iterated PD b) choosing to interact with a partner 5. moral sentiment develops by evolutionary process 6. same moral sentiments affect interactions with strangers

6 Model 1. moral sentiments develop in stable relationships agents on a two-dimensional grid Moore neighbourhoods N recurrent interactions with neighbours

7 Model 2. the matching law (Herrnstein, 1997) is behavioural model Definition Let A = {a 1,..., a m } be the set of all possible actions, and let T (a i ) denote the number of times when action a i was chosen during a specified time period. Furthermore, let U(a i ) = t u t(a i ) be the sum of all reinforcements that were received after emitting action a i during this period. The matching law holds if and only if T (a i ) T (a 1 ) + T (a 2 ) + + T (a m ) = U(a i ) U(a 1 ) + U(a 2 ) + + U(a m ), for all i {1,..., m}.

8 Model 3. impulsiveness: if the reinforcement is delayed by d t (a i ): V (a i ) = u t (a i ) 1 + I d t t (a i ) exponential discounting: δ d(a i ) U(a i ), δ [0, 1], x R: δ d(a) U(a) > δ d(b) U(b) δ d(a)+x U(a) > δ d(b)+x U(b) hyperbolic discounting: I = 1.0 and x = 100: I 100 < I 102, but I 0 > I 2 implications for PD: the immediate value of the temptation pay-off overwhelms the player of an iterated prisoner s dilemma

9 Model 4. moral sentiments help to overcome impulsiveness U(a i ) V (a i ) = 1 + I d(a i ) Guilt is just such a feeling. [..] If it is felt strongly enough, it can negate the spurious attraction of the imminent material reward (Frank, 1988, p.82).

10 Model a) choosing an action in the iterated PD one memory entry of length λ for each neighbour n N: (σ(n), π(n)): σ(n) {C, D, E} λ, π(n) R λ bookkeeping β: V (n, a) = algorithm 1: j:σ(n) j =a min(j+β,λ) i=j 1: for all a {C, D} do n N 2: calculate v(a) = V (n,a) T (a) 3: end for 4: â select action with highest v(a) 5: return â π(n) i 1 + I (i j)

11 Model b) choosing to interact with a partner average value of a partner v(n): v(n) = algorithm 2: 1: for all n N do 2: calculate v(n) 3: end for 4: ˆn select neighbour with highest v(n) 5: return ˆn λ i=1 π(n) i T (n) algorithm 1 and 2 are called melioration learning (Herrnstein, 1997) melioration learning is a process that leads to the matching law

12 Model 5. moral sentiment develops in an evolutionary process the impulsiveness I resembles the impact of an actor s moral sentiments evolution of moral sentiment evolution of I an agent s fitness is the average of the reinforcements during one generation (1000 interactions) after one generation, new agents are bred a parent is chosen with a probability directly proportional to the parent s fitness a parent passes on his impulsiveness value to a new agent random noise: p mut = 0.1 probability of experimenting ɛ : choose random neighbour as partner and random action

13 Paramter analysis experimenting & memory length one memory entry of length λ for each neighbour n N: (σ, π), σ {C, D, E} λ, π R λ average impulsiveness experimenting ε, memory length λ (bookkeeping β = 10) 0.05, , , , , , generation cooperation interaction

14 Paramter analysis bookkeeping: accounting for the future V (n, a) = min(j+β,λ) j:σ(n) j =a i=j π(n) i 1 + I (i j) average impulsiveness bookkeeping β generation cooperation interaction (experimenting ε = 0.1, memory length λ = 100)

15 Model: Interactions with Strangers 6. same moral sentiments affect interactions with strangers a certain percentage, φ, of interactions take place with strangers (= actors who are met only once) there is no memory of past interactions with a stranger but there might be a signal that is contingent on the existence of moral sentiments a signal s is a number between 0 and 9 indicating the actor s impulsiveness one memory entry for each signal strength λ i=1 average value of a signal v(s): v(s) = π(s) i T (s) actors can choose not to interact with a stranger

16 Paramter analysis Interactions with Strangers average impulsiveness no signals / signals, strangers φ no signals, 0.2 no signals, 0.4 no signals, 0.6 no signals, 0.8 signals, 0.2 signals, 0.4 signals, 0.6 signals, generation (experimenting ε = 0.1, memory length λ = 100, bookkeeping β = 10)

17 Paramter analysis Interactions with Strangers 1.0 impulsiveness 1.0 cooperation with partners 1.0 cooperation with strangers strangers σ strangers σ strangers σ no signal signal no signal signal no signal signal (generation > 900, experimenting ε = 0.1, memory length λ = 100, bookkeeping β = 10)

18 Conclusion Frank (1987): formal model of signalling and moral sentiments that lead to the evolution of cooperation in the one-time PD; heavy assumptions Frank (1988): informal ideas about the development of moral sentiments behaviouristic view on human behaviour (matching law) sentiments emerge because of their effects to repress impulsiveness leading to cooperation among friends in case of one-time interactions, signals are needed to support the development of moral sentiments and cooperation among strangers

19 References Frank, R. H. (1987). If Homo Economicus could choose his own utility function, would he want one with a conscience? The American Economic Review 77(4), Frank, R. H. (1988). Passions within Reason. The Strategic Role of the Emotions. W. W. Norton & Company. Herrnstein, R. J. (1997). The Matching Law. Papers in Psychology and Economics. Harvard University Press. source code:

20 Supplementary each of the simulations was performed for a fixed set of parameter values several repetitions that differ in their random seeds since the simulation can be represented as stochastic time-homogeneous Markov chains, they tend toward a unique and stationary state distribution we use the average level of I and the rate of interaction and cooperation as summary statistics to describe the unique distribution statistical tests to check whether the summary statistic describes the stationary state additionally: check for the most promising conditions that lead to the evolution of moral sentiments

21 Supplementary: Convergence Statistics Gelman, A. and D. B. Rubin (1992). Inference from Iterative Simulation Using Multiple Sequences. Statistical Science 7 (4), Brooks, S. P. and A. Gelman (1998). General Methods for Monitoring Convergence of Iterative Simulations. Journal of Compuational and Graphical Statistics 7 (4), we generate m 1 chains of a simulation with n time steps : (x 11, x 12,..., x 1n ), (x 21, x 22,..., x 2n ),..., (x m1, x m2,..., x mn ). ˆR I = s ˆR s = length of pooled-chains interval mean length of the within-chain 1 m n mn 1 j=1 i=1 x ji x s n i=1 x ji x j s 1 m(n 1) m j=1 iterated graphical approach: sub-chains (x j1,..., x j(2kb) ), with b being a batch length and k = 1,..., n/b

22 Supplementary: Example average impulsiveness ticks per generation generation ticks per generation R^ c = 1.91 R^ c = 1.18 R^ c = 1.73 R^ c = 1.34 R^ c = R^ l = 2.12 R^ l = 1.35 R^ l = 1.73 R^ l = 1.36 R^ l = 1.06 statistic statistic R^ 3 = R^ 3 = R^ 3 = R^ 3 = R^ 3 = 1.04 R^ c R^ l 3 2 R^ iteration no

23 Supplementary: The Wald-Wolfowitz Test Wald, A. and J. Wolfowitz (1940). On a test whether two samples are from the same population. Annals of Mathematical Statistics 11 (2), Grazzini, J. (2012). Analysis of the emergent properties: Stationarity and ergodicity. Journal of Artificial Societies and Social Simulation 15 (2), 7. check whether two samples X and Y respectively with n and m observations are from the same population i.e. whether the distributions of the two samples are identical two samples are pooled and arranged in ascending order as Z = (z 1, z 2,..., z n+m ), where z 1 < z 2 < < z n+m sequence V of zeros and ones: replacing each element of Z by 0 if z i is element of X and by 1 if z i is element of Y statistic U(X, Y ) of two samples X and Y is the number of runs in the corresponding V sequence given X = {5, 2.2, 4.5, 1} and Y = {2, 4.3, 2.5, 1.4, 3}, V = (0, 1, 1, 0, 1, 1, 1, 0, 0) and U(X, Y ) = 5.

24 Supplementary: Example stationarity: check whether sections of the mean distribution belong to the same distribution ergodicity: check whether m chains of a simulation belong to the same distribution b = 10 b = 20 tpg u s p s u e p e u s p s u e p e

Lecture 3: Markov Decision Processes

Lecture 3: Markov Decision Processes Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Chapter 6: Temporal Difference Learning

Chapter 6: Temporal Difference Learning Chapter 6: emporal Difference Learning Objectives of this chapter: Introduce emporal Difference (D) learning Focus first on policy evaluation, or prediction, methods hen extend to control methods R. S.

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Reinforcement Learning and Deep Reinforcement Learning

Reinforcement Learning and Deep Reinforcement Learning Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q

More information

Time varying networks and the weakness of strong ties

Time varying networks and the weakness of strong ties Supplementary Materials Time varying networks and the weakness of strong ties M. Karsai, N. Perra and A. Vespignani 1 Measures of egocentric network evolutions by directed communications In the main text

More information

STOCHASTIC PROCESSES Basic notions

STOCHASTIC PROCESSES Basic notions J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving

More information

Evolutionary Game Theory

Evolutionary Game Theory Evolutionary Game Theory ISI 330 Lecture 18 1 ISI 330 Lecture 18 Outline A bit about historical origins of Evolutionary Game Theory Main (competing) theories about how cooperation evolves P and other social

More information

Multiagent Value Iteration in Markov Games

Multiagent Value Iteration in Markov Games Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

arxiv: v1 [physics.soc-ph] 3 Nov 2008

arxiv: v1 [physics.soc-ph] 3 Nov 2008 arxiv:0811.0253v1 [physics.soc-ph] 3 Nov 2008 TI-games : An Exploration of Type Indeterminacy in Strategic Decision-making Jerry Busemeyer, Ariane Lambert-Mogiliansky. February 19, 2009 Abstract In this

More information

Introduction to game theory LECTURE 1

Introduction to game theory LECTURE 1 Introduction to game theory LECTURE 1 Jörgen Weibull January 27, 2010 1 What is game theory? A mathematically formalized theory of strategic interaction between countries at war and peace, in federations

More information

Evolutionary Bargaining Strategies

Evolutionary Bargaining Strategies Evolutionary Bargaining Strategies Nanlin Jin http://cswww.essex.ac.uk/csp/bargain Evolutionary Bargaining Two players alternative offering game x A =?? Player A Rubinstein 1982, 1985: Subgame perfect

More information

Machine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel

Machine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING Slides adapted from Tom Mitchell and Peter Abeel Machine Learning: Jordan Boyd-Graber UMD Machine Learning

More information

Networks and sciences: The story of the small-world

Networks and sciences: The story of the small-world Networks and sciences: The story of the small-world Hugues Bersini IRIDIA ULB 2013 Networks and sciences 1 The story begins with Stanley Milgram (1933-1984) In 1960, the famous experience of the submission

More information

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Hidden Markov Models (HMM) and Support Vector Machine (SVM) Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)

More information

Lecture XI. Approximating the Invariant Distribution

Lecture XI. Approximating the Invariant Distribution Lecture XI Approximating the Invariant Distribution Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Invariant Distribution p. 1 /24 SS Equilibrium in the Aiyagari model G.

More information

Evolutionary Games and Computer Simulations

Evolutionary Games and Computer Simulations Evolutionary Games and Computer Simulations Bernardo A. Huberman and Natalie S. Glance Dynamics of Computation Group Xerox Palo Alto Research Center Palo Alto, CA 94304 Abstract The prisoner s dilemma

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Ambiguous Business Cycles: Online Appendix

Ambiguous Business Cycles: Online Appendix Ambiguous Business Cycles: Online Appendix By Cosmin Ilut and Martin Schneider This paper studies a New Keynesian business cycle model with agents who are averse to ambiguity (Knightian uncertainty). Shocks

More information

Imitation Processes with Small Mutations

Imitation Processes with Small Mutations Imitation Processes with Small Mutations The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Fudenberg, Drew, and Lorens

More information

arxiv: v1 [cs.gt] 17 Aug 2016

arxiv: v1 [cs.gt] 17 Aug 2016 Simulation of an Optional Strategy in the Prisoner s Dilemma in Spatial and Non-spatial Environments Marcos Cardinot, Maud Gibbons, Colm O Riordan, and Josephine Griffith arxiv:1608.05044v1 [cs.gt] 17

More information

1 AUTOCRATIC STRATEGIES

1 AUTOCRATIC STRATEGIES AUTOCRATIC STRATEGIES. ORIGINAL DISCOVERY Recall that the transition matrix M for two interacting players X and Y with memory-one strategies p and q, respectively, is given by p R q R p R ( q R ) ( p R

More information

Costly Signals and Cooperation

Costly Signals and Cooperation Costly Signals and Cooperation Károly Takács and András Németh MTA TK Lendület Research Center for Educational and Network Studies (RECENS) and Corvinus University of Budapest New Developments in Signaling

More information

Belief-based Learning

Belief-based Learning Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium

More information

arxiv: v1 [q-bio.pe] 22 Sep 2016

arxiv: v1 [q-bio.pe] 22 Sep 2016 Cooperation in the two-population snowdrift game with punishment enforced through different mechanisms André Barreira da Silva Rocha a, a Department of Industrial Engineering, Pontifical Catholic University

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

arxiv: v2 [physics.soc-ph] 11 Feb 2009

arxiv: v2 [physics.soc-ph] 11 Feb 2009 arxiv:0811.0253v2 [physics.soc-ph] 11 Feb 2009 TI-games I: An Exploration of Type Indeterminacy in Strategic Decision-making Jerry Busemeyer, Ariane Lambert-Mogiliansky. February 11, 2009 Abstract The

More information

Graph topology and the evolution of cooperation

Graph topology and the evolution of cooperation Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Graph topology and the evolution of cooperation Author(s) Li, Menglin

More information

The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma

The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma Tzai-Der Wang Artificial Intelligence Economic Research Centre, National Chengchi University, Taipei, Taiwan. email: dougwang@nccu.edu.tw

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems

More information

Game Theory, Population Dynamics, Social Aggregation. Daniele Vilone (CSDC - Firenze) Namur

Game Theory, Population Dynamics, Social Aggregation. Daniele Vilone (CSDC - Firenze) Namur Game Theory, Population Dynamics, Social Aggregation Daniele Vilone (CSDC - Firenze) Namur - 18.12.2008 Summary Introduction ( GT ) General concepts of Game Theory Game Theory and Social Dynamics Application:

More information

Q-Learning for Markov Decision Processes*

Q-Learning for Markov Decision Processes* McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of

More information

Reinforcement Learning as Variational Inference: Two Recent Approaches

Reinforcement Learning as Variational Inference: Two Recent Approaches Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing

More information

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti Game Theory, Evolutionary Dynamics, and Multi-Agent Learning Prof. Nicola Gatti (nicola.gatti@polimi.it) Game theory Game theory: basics Normal form Players Actions Outcomes Utilities Strategies Solutions

More information

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007 MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing

More information

Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing

Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Jacob W. Crandall and Michael A. Goodrich Computer Science Department Brigham Young University Provo, UT 84602

More information

Algorithmic Strategy Complexity

Algorithmic Strategy Complexity Algorithmic Strategy Complexity Abraham Neyman aneyman@math.huji.ac.il Hebrew University of Jerusalem Jerusalem Israel Algorithmic Strategy Complexity, Northwesten 2003 p.1/52 General Introduction The

More information

Stochastic models in product form: the (E)RCAT methodology

Stochastic models in product form: the (E)RCAT methodology Stochastic models in product form: the (E)RCAT methodology 1 Maria Grazia Vigliotti 2 1 Dipartimento di Informatica Università Ca Foscari di Venezia 2 Department of Computing Imperial College London Second

More information

Altruism, reputation, and collective collapse of cooperation in a simple model

Altruism, reputation, and collective collapse of cooperation in a simple model Altruism, reputation, and collective collapse of cooperation in a simple model ETH, Zuerich, 20.06.2011 Andrzej Jarynowski Smoluchowski Institute of Physics, Jagiellonian University in Cracow P. Gawroński,

More information

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5 Table of contents 1 Introduction 2 2 Markov Decision Processes 2 3 Future Cumulative Reward 3 4 Q-Learning 4 4.1 The Q-value.............................................. 4 4.2 The Temporal Difference.......................................

More information

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result

More information

Partners in power: Job mobility and dynamic deal-making

Partners in power: Job mobility and dynamic deal-making Partners in power: Job mobility and dynamic deal-making Matthew Checkley Warwick Business School Christian Steglich ICS / University of Groningen Presentation at the Fifth Workshop on Networks in Economics

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Lecture 7: Value Function Approximation

Lecture 7: Value Function Approximation Lecture 7: Value Function Approximation Joseph Modayil Outline 1 Introduction 2 3 Batch Methods Introduction Large-Scale Reinforcement Learning Reinforcement learning can be used to solve large problems,

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Motivation for introducing probabilities

Motivation for introducing probabilities for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.

More information

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)

More information

Revealing inductive biases through iterated learning

Revealing inductive biases through iterated learning Revealing inductive biases through iterated learning Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Brian Christian, Simon Kirby, Mike Dowman, Stephan Lewandowsky

More information

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones Jefferson Huang School of Operations Research and Information Engineering Cornell University November 16, 2016

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo

First Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo Game Theory Giorgio Fagiolo giorgio.fagiolo@univr.it https://mail.sssup.it/ fagiolo/welcome.html Academic Year 2005-2006 University of Verona Summary 1. Why Game Theory? 2. Cooperative vs. Noncooperative

More information

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games

Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),

More information

Zero-Sum Stochastic Games An algorithmic review

Zero-Sum Stochastic Games An algorithmic review Zero-Sum Stochastic Games An algorithmic review Emmanuel Hyon LIP6/Paris Nanterre with N Yemele and L Perrotin Rosario November 2017 Final Meeting Dygame Dygame Project Amstic Outline 1 Introduction Static

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

Learning to Coordinate Efficiently: A Model-based Approach

Learning to Coordinate Efficiently: A Model-based Approach Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion

More information

Distributed learning in potential games over large-scale networks

Distributed learning in potential games over large-scale networks Distributed learning in potential games over large-scale networks Fabio Fagnani, DISMA, Politecnico di Torino joint work with Giacomo Como, Lund University Sandro Zampieri, DEI, University of Padova Networking

More information

Evolutionary Computation. DEIS-Cesena Alma Mater Studiorum Università di Bologna Cesena (Italia)

Evolutionary Computation. DEIS-Cesena Alma Mater Studiorum Università di Bologna Cesena (Italia) Evolutionary Computation DEIS-Cesena Alma Mater Studiorum Università di Bologna Cesena (Italia) andrea.roli@unibo.it Evolutionary Computation Inspiring principle: theory of natural selection Species face

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Monte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning.

Monte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning. Monte Carlo is important in practice CSE 190: Reinforcement Learning: An Introduction Chapter 6: emporal Difference Learning When there are just a few possibilitieo value, out of a large state space, Monte

More information

Computation of Efficient Nash Equilibria for experimental economic games

Computation of Efficient Nash Equilibria for experimental economic games International Journal of Mathematics and Soft Computing Vol.5, No.2 (2015), 197-212. ISSN Print : 2249-3328 ISSN Online: 2319-5215 Computation of Efficient Nash Equilibria for experimental economic games

More information

Ergodicity and Non-Ergodicity in Economics

Ergodicity and Non-Ergodicity in Economics Abstract An stochastic system is called ergodic if it tends in probability to a limiting form that is independent of the initial conditions. Breakdown of ergodicity gives rise to path dependence. We illustrate

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle  holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive

More information

Lecture 8: Policy Gradient

Lecture 8: Policy Gradient Lecture 8: Policy Gradient Hado van Hasselt Outline 1 Introduction 2 Finite Difference Policy Gradient 3 Monte-Carlo Policy Gradient 4 Actor-Critic Policy Gradient Introduction Vapnik s rule Never solve

More information

Markov Model. Model representing the different resident states of a system, and the transitions between the different states

Markov Model. Model representing the different resident states of a system, and the transitions between the different states Markov Model Model representing the different resident states of a system, and the transitions between the different states (applicable to repairable, as well as non-repairable systems) System behavior

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Reinforcement Learning and NLP

Reinforcement Learning and NLP 1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at MSc MT15. Further Statistical Methods: MCMC Lecture 5-6: Markov chains; Metropolis Hastings MCMC Notes and Practicals available at www.stats.ox.ac.uk\ nicholls\mscmcmc15 Markov chain Monte Carlo Methods

More information

Online Appendices for Large Matching Markets: Risk, Unraveling, and Conflation

Online Appendices for Large Matching Markets: Risk, Unraveling, and Conflation Online Appendices for Large Matching Markets: Risk, Unraveling, and Conflation Aaron L. Bodoh-Creed - Cornell University A Online Appendix: Strategic Convergence In section 4 we described the matching

More information

Variance Reduction for Policy Gradient Methods. March 13, 2017

Variance Reduction for Policy Gradient Methods. March 13, 2017 Variance Reduction for Policy Gradient Methods March 13, 2017 Reward Shaping Reward Shaping Reward Shaping Reward shaping: r(s, a, s ) = r(s, a, s ) + γφ(s ) Φ(s) for arbitrary potential Φ Theorem: r admits

More information

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks 6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks Daron Acemoglu and Asu Ozdaglar MIT November 4, 2009 1 Introduction Outline The role of networks in cooperation A model of social norms

More information

Reinforcement Learning: the basics

Reinforcement Learning: the basics Reinforcement Learning: the basics Olivier Sigaud Université Pierre et Marie Curie, PARIS 6 http://people.isir.upmc.fr/sigaud August 6, 2012 1 / 46 Introduction Action selection/planning Learning by trial-and-error

More information

Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods

Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods M. González, R. Martínez, I. del Puerto, A. Ramos Department of Mathematics. University of Extremadura Spanish

More information

ISyE 6761 (Fall 2016) Stochastic Processes I

ISyE 6761 (Fall 2016) Stochastic Processes I Fall 216 TABLE OF CONTENTS ISyE 6761 (Fall 216) Stochastic Processes I Prof. H. Ayhan Georgia Institute of Technology L A TEXer: W. KONG http://wwong.github.io Last Revision: May 25, 217 Table of Contents

More information

The Reinforcement Learning Problem

The Reinforcement Learning Problem The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence

More information

CS 7180: Behavioral Modeling and Decision- making in AI

CS 7180: Behavioral Modeling and Decision- making in AI CS 7180: Behavioral Modeling and Decision- making in AI Learning Probabilistic Graphical Models Prof. Amy Sliva October 31, 2012 Hidden Markov model Stochastic system represented by three matrices N =

More information

Markov chains. Randomness and Computation. Markov chains. Markov processes

Markov chains. Randomness and Computation. Markov chains. Markov processes Markov chains Randomness and Computation or, Randomized Algorithms Mary Cryan School of Informatics University of Edinburgh Definition (Definition 7) A discrete-time stochastic process on the state space

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Evolution of cooperation in finite populations. Sabin Lessard Université de Montréal

Evolution of cooperation in finite populations. Sabin Lessard Université de Montréal Evolution of cooperation in finite populations Sabin Lessard Université de Montréal Title, contents and acknowledgements 2011 AMS Short Course on Evolutionary Game Dynamics Contents 1. Examples of cooperation

More information

Exercises, II part Exercises, II part

Exercises, II part Exercises, II part Inference: 12 Jul 2012 Consider the following Joint Probability Table for the three binary random variables A, B, C. Compute the following queries: 1 P(C A=T,B=T) 2 P(C A=T) P(A, B, C) A B C 0.108 T T

More information

Machine Learning I Continuous Reinforcement Learning

Machine Learning I Continuous Reinforcement Learning Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t

More information

Diffusion of Innovation

Diffusion of Innovation Diffusion of Innovation Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Social Network Analysis

More information

Population Games and Evolutionary Dynamics

Population Games and Evolutionary Dynamics Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population

More information

Evolution & Learning in Games

Evolution & Learning in Games 1 / 27 Evolution & Learning in Games Econ 243B Jean-Paul Carvalho Lecture 2. Foundations of Evolution & Learning in Games II 2 / 27 Outline In this lecture, we shall: Take a first look at local stability.

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

Models of Molecular Evolution

Models of Molecular Evolution Models of Molecular Evolution Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison September 15, 2007 Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 1 /

More information

Optimism in the Face of Uncertainty Should be Refutable

Optimism in the Face of Uncertainty Should be Refutable Optimism in the Face of Uncertainty Should be Refutable Ronald ORTNER Montanuniversität Leoben Department Mathematik und Informationstechnolgie Franz-Josef-Strasse 18, 8700 Leoben, Austria, Phone number:

More information

Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games

Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games Ronen I. Brafman Department of Computer Science Stanford University Stanford, CA 94305 brafman@cs.stanford.edu Moshe Tennenholtz

More information

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information