A Behaviouristic Model of Signalling and Moral Sentiments
|
|
- Nelson Sullivan
- 5 years ago
- Views:
Transcription
1 A Behaviouristic Model of Signalling and Moral Sentiments Johannes Zschache, University of Leipzig Monte Verità, October 18, 2012
2 Introduction Model Parameter analysis Conclusion
3 Introduction the evolution of cooperative behaviour in one-time PD-interaction C D C 3 0 D 4 1 Robert Frank (1987): Homo Economicus might prefer a utility function with a conscience moral sentiments have evolved to counteract the temptation to cheat in one-time interaction in combination with signals that are contingent upon the sentiments, compliant behaviour is stable
4 Introduction Pr(H S j ) = h f H (S j ) h f H (S j ) + (1 h) f D (S j ) E(π H S j ) = π(c, C) Pr(H S j ) + π(c, D) (1 Pr(H S j )) interaction with j if E(π H S j ) > π(e) = π(d, D) stringent assumptions knowledge of the population structure correct interpretation of signals
5 Introduction alternative idea in Frank (1988) 1. moral sentiments develop in stable relationships 2. the matching law (Herrnstein, 1997) is behavioural model 3. impulsiveness: immediate reward often exceeds long-term benefits (chocolate cake during a diet, smoking,.. ) 4. moral sentiments make the actor prudent when: a) choosing an action in the iterated PD b) choosing to interact with a partner 5. moral sentiment develops by evolutionary process 6. same moral sentiments affect interactions with strangers
6 Model 1. moral sentiments develop in stable relationships agents on a two-dimensional grid Moore neighbourhoods N recurrent interactions with neighbours
7 Model 2. the matching law (Herrnstein, 1997) is behavioural model Definition Let A = {a 1,..., a m } be the set of all possible actions, and let T (a i ) denote the number of times when action a i was chosen during a specified time period. Furthermore, let U(a i ) = t u t(a i ) be the sum of all reinforcements that were received after emitting action a i during this period. The matching law holds if and only if T (a i ) T (a 1 ) + T (a 2 ) + + T (a m ) = U(a i ) U(a 1 ) + U(a 2 ) + + U(a m ), for all i {1,..., m}.
8 Model 3. impulsiveness: if the reinforcement is delayed by d t (a i ): V (a i ) = u t (a i ) 1 + I d t t (a i ) exponential discounting: δ d(a i ) U(a i ), δ [0, 1], x R: δ d(a) U(a) > δ d(b) U(b) δ d(a)+x U(a) > δ d(b)+x U(b) hyperbolic discounting: I = 1.0 and x = 100: I 100 < I 102, but I 0 > I 2 implications for PD: the immediate value of the temptation pay-off overwhelms the player of an iterated prisoner s dilemma
9 Model 4. moral sentiments help to overcome impulsiveness U(a i ) V (a i ) = 1 + I d(a i ) Guilt is just such a feeling. [..] If it is felt strongly enough, it can negate the spurious attraction of the imminent material reward (Frank, 1988, p.82).
10 Model a) choosing an action in the iterated PD one memory entry of length λ for each neighbour n N: (σ(n), π(n)): σ(n) {C, D, E} λ, π(n) R λ bookkeeping β: V (n, a) = algorithm 1: j:σ(n) j =a min(j+β,λ) i=j 1: for all a {C, D} do n N 2: calculate v(a) = V (n,a) T (a) 3: end for 4: â select action with highest v(a) 5: return â π(n) i 1 + I (i j)
11 Model b) choosing to interact with a partner average value of a partner v(n): v(n) = algorithm 2: 1: for all n N do 2: calculate v(n) 3: end for 4: ˆn select neighbour with highest v(n) 5: return ˆn λ i=1 π(n) i T (n) algorithm 1 and 2 are called melioration learning (Herrnstein, 1997) melioration learning is a process that leads to the matching law
12 Model 5. moral sentiment develops in an evolutionary process the impulsiveness I resembles the impact of an actor s moral sentiments evolution of moral sentiment evolution of I an agent s fitness is the average of the reinforcements during one generation (1000 interactions) after one generation, new agents are bred a parent is chosen with a probability directly proportional to the parent s fitness a parent passes on his impulsiveness value to a new agent random noise: p mut = 0.1 probability of experimenting ɛ : choose random neighbour as partner and random action
13 Paramter analysis experimenting & memory length one memory entry of length λ for each neighbour n N: (σ, π), σ {C, D, E} λ, π R λ average impulsiveness experimenting ε, memory length λ (bookkeeping β = 10) 0.05, , , , , , generation cooperation interaction
14 Paramter analysis bookkeeping: accounting for the future V (n, a) = min(j+β,λ) j:σ(n) j =a i=j π(n) i 1 + I (i j) average impulsiveness bookkeeping β generation cooperation interaction (experimenting ε = 0.1, memory length λ = 100)
15 Model: Interactions with Strangers 6. same moral sentiments affect interactions with strangers a certain percentage, φ, of interactions take place with strangers (= actors who are met only once) there is no memory of past interactions with a stranger but there might be a signal that is contingent on the existence of moral sentiments a signal s is a number between 0 and 9 indicating the actor s impulsiveness one memory entry for each signal strength λ i=1 average value of a signal v(s): v(s) = π(s) i T (s) actors can choose not to interact with a stranger
16 Paramter analysis Interactions with Strangers average impulsiveness no signals / signals, strangers φ no signals, 0.2 no signals, 0.4 no signals, 0.6 no signals, 0.8 signals, 0.2 signals, 0.4 signals, 0.6 signals, generation (experimenting ε = 0.1, memory length λ = 100, bookkeeping β = 10)
17 Paramter analysis Interactions with Strangers 1.0 impulsiveness 1.0 cooperation with partners 1.0 cooperation with strangers strangers σ strangers σ strangers σ no signal signal no signal signal no signal signal (generation > 900, experimenting ε = 0.1, memory length λ = 100, bookkeeping β = 10)
18 Conclusion Frank (1987): formal model of signalling and moral sentiments that lead to the evolution of cooperation in the one-time PD; heavy assumptions Frank (1988): informal ideas about the development of moral sentiments behaviouristic view on human behaviour (matching law) sentiments emerge because of their effects to repress impulsiveness leading to cooperation among friends in case of one-time interactions, signals are needed to support the development of moral sentiments and cooperation among strangers
19 References Frank, R. H. (1987). If Homo Economicus could choose his own utility function, would he want one with a conscience? The American Economic Review 77(4), Frank, R. H. (1988). Passions within Reason. The Strategic Role of the Emotions. W. W. Norton & Company. Herrnstein, R. J. (1997). The Matching Law. Papers in Psychology and Economics. Harvard University Press. source code:
20 Supplementary each of the simulations was performed for a fixed set of parameter values several repetitions that differ in their random seeds since the simulation can be represented as stochastic time-homogeneous Markov chains, they tend toward a unique and stationary state distribution we use the average level of I and the rate of interaction and cooperation as summary statistics to describe the unique distribution statistical tests to check whether the summary statistic describes the stationary state additionally: check for the most promising conditions that lead to the evolution of moral sentiments
21 Supplementary: Convergence Statistics Gelman, A. and D. B. Rubin (1992). Inference from Iterative Simulation Using Multiple Sequences. Statistical Science 7 (4), Brooks, S. P. and A. Gelman (1998). General Methods for Monitoring Convergence of Iterative Simulations. Journal of Compuational and Graphical Statistics 7 (4), we generate m 1 chains of a simulation with n time steps : (x 11, x 12,..., x 1n ), (x 21, x 22,..., x 2n ),..., (x m1, x m2,..., x mn ). ˆR I = s ˆR s = length of pooled-chains interval mean length of the within-chain 1 m n mn 1 j=1 i=1 x ji x s n i=1 x ji x j s 1 m(n 1) m j=1 iterated graphical approach: sub-chains (x j1,..., x j(2kb) ), with b being a batch length and k = 1,..., n/b
22 Supplementary: Example average impulsiveness ticks per generation generation ticks per generation R^ c = 1.91 R^ c = 1.18 R^ c = 1.73 R^ c = 1.34 R^ c = R^ l = 2.12 R^ l = 1.35 R^ l = 1.73 R^ l = 1.36 R^ l = 1.06 statistic statistic R^ 3 = R^ 3 = R^ 3 = R^ 3 = R^ 3 = 1.04 R^ c R^ l 3 2 R^ iteration no
23 Supplementary: The Wald-Wolfowitz Test Wald, A. and J. Wolfowitz (1940). On a test whether two samples are from the same population. Annals of Mathematical Statistics 11 (2), Grazzini, J. (2012). Analysis of the emergent properties: Stationarity and ergodicity. Journal of Artificial Societies and Social Simulation 15 (2), 7. check whether two samples X and Y respectively with n and m observations are from the same population i.e. whether the distributions of the two samples are identical two samples are pooled and arranged in ascending order as Z = (z 1, z 2,..., z n+m ), where z 1 < z 2 < < z n+m sequence V of zeros and ones: replacing each element of Z by 0 if z i is element of X and by 1 if z i is element of Y statistic U(X, Y ) of two samples X and Y is the number of runs in the corresponding V sequence given X = {5, 2.2, 4.5, 1} and Y = {2, 4.3, 2.5, 1.4, 3}, V = (0, 1, 1, 0, 1, 1, 1, 0, 0) and U(X, Y ) = 5.
24 Supplementary: Example stationarity: check whether sections of the mean distribution belong to the same distribution ergodicity: check whether m chains of a simulation belong to the same distribution b = 10 b = 20 tpg u s p s u e p e u s p s u e p e
Lecture 3: Markov Decision Processes
Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationChapter 6: Temporal Difference Learning
Chapter 6: emporal Difference Learning Objectives of this chapter: Introduce emporal Difference (D) learning Focus first on policy evaluation, or prediction, methods hen extend to control methods R. S.
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationReinforcement Learning and Deep Reinforcement Learning
Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q
More informationTime varying networks and the weakness of strong ties
Supplementary Materials Time varying networks and the weakness of strong ties M. Karsai, N. Perra and A. Vespignani 1 Measures of egocentric network evolutions by directed communications In the main text
More informationSTOCHASTIC PROCESSES Basic notions
J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving
More informationEvolutionary Game Theory
Evolutionary Game Theory ISI 330 Lecture 18 1 ISI 330 Lecture 18 Outline A bit about historical origins of Evolutionary Game Theory Main (competing) theories about how cooperation evolves P and other social
More informationMultiagent Value Iteration in Markov Games
Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges
More informationST 740: Markov Chain Monte Carlo
ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:
More informationarxiv: v1 [physics.soc-ph] 3 Nov 2008
arxiv:0811.0253v1 [physics.soc-ph] 3 Nov 2008 TI-games : An Exploration of Type Indeterminacy in Strategic Decision-making Jerry Busemeyer, Ariane Lambert-Mogiliansky. February 19, 2009 Abstract In this
More informationIntroduction to game theory LECTURE 1
Introduction to game theory LECTURE 1 Jörgen Weibull January 27, 2010 1 What is game theory? A mathematically formalized theory of strategic interaction between countries at war and peace, in federations
More informationEvolutionary Bargaining Strategies
Evolutionary Bargaining Strategies Nanlin Jin http://cswww.essex.ac.uk/csp/bargain Evolutionary Bargaining Two players alternative offering game x A =?? Player A Rubinstein 1982, 1985: Subgame perfect
More informationMachine Learning. Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING. Slides adapted from Tom Mitchell and Peter Abeel
Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland REINFORCEMENT LEARNING Slides adapted from Tom Mitchell and Peter Abeel Machine Learning: Jordan Boyd-Graber UMD Machine Learning
More informationNetworks and sciences: The story of the small-world
Networks and sciences: The story of the small-world Hugues Bersini IRIDIA ULB 2013 Networks and sciences 1 The story begins with Stanley Milgram (1933-1984) In 1960, the famous experience of the submission
More informationHidden Markov Models (HMM) and Support Vector Machine (SVM)
Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)
More informationLecture XI. Approximating the Invariant Distribution
Lecture XI Approximating the Invariant Distribution Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Invariant Distribution p. 1 /24 SS Equilibrium in the Aiyagari model G.
More informationEvolutionary Games and Computer Simulations
Evolutionary Games and Computer Simulations Bernardo A. Huberman and Natalie S. Glance Dynamics of Computation Group Xerox Palo Alto Research Center Palo Alto, CA 94304 Abstract The prisoner s dilemma
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationAmbiguous Business Cycles: Online Appendix
Ambiguous Business Cycles: Online Appendix By Cosmin Ilut and Martin Schneider This paper studies a New Keynesian business cycle model with agents who are averse to ambiguity (Knightian uncertainty). Shocks
More informationImitation Processes with Small Mutations
Imitation Processes with Small Mutations The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Fudenberg, Drew, and Lorens
More informationarxiv: v1 [cs.gt] 17 Aug 2016
Simulation of an Optional Strategy in the Prisoner s Dilemma in Spatial and Non-spatial Environments Marcos Cardinot, Maud Gibbons, Colm O Riordan, and Josephine Griffith arxiv:1608.05044v1 [cs.gt] 17
More information1 AUTOCRATIC STRATEGIES
AUTOCRATIC STRATEGIES. ORIGINAL DISCOVERY Recall that the transition matrix M for two interacting players X and Y with memory-one strategies p and q, respectively, is given by p R q R p R ( q R ) ( p R
More informationCostly Signals and Cooperation
Costly Signals and Cooperation Károly Takács and András Németh MTA TK Lendület Research Center for Educational and Network Studies (RECENS) and Corvinus University of Budapest New Developments in Signaling
More informationBelief-based Learning
Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium
More informationarxiv: v1 [q-bio.pe] 22 Sep 2016
Cooperation in the two-population snowdrift game with punishment enforced through different mechanisms André Barreira da Silva Rocha a, a Department of Industrial Engineering, Pontifical Catholic University
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationarxiv: v2 [physics.soc-ph] 11 Feb 2009
arxiv:0811.0253v2 [physics.soc-ph] 11 Feb 2009 TI-games I: An Exploration of Type Indeterminacy in Strategic Decision-making Jerry Busemeyer, Ariane Lambert-Mogiliansky. February 11, 2009 Abstract The
More informationGraph topology and the evolution of cooperation
Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Graph topology and the evolution of cooperation Author(s) Li, Menglin
More informationThe Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma
The Cross Entropy Method for the N-Persons Iterated Prisoner s Dilemma Tzai-Der Wang Artificial Intelligence Economic Research Centre, National Chengchi University, Taipei, Taiwan. email: dougwang@nccu.edu.tw
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems
More informationGame Theory, Population Dynamics, Social Aggregation. Daniele Vilone (CSDC - Firenze) Namur
Game Theory, Population Dynamics, Social Aggregation Daniele Vilone (CSDC - Firenze) Namur - 18.12.2008 Summary Introduction ( GT ) General concepts of Game Theory Game Theory and Social Dynamics Application:
More informationQ-Learning for Markov Decision Processes*
McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of
More informationReinforcement Learning as Variational Inference: Two Recent Approaches
Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing
More informationGame Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti
Game Theory, Evolutionary Dynamics, and Multi-Agent Learning Prof. Nicola Gatti (nicola.gatti@polimi.it) Game theory Game theory: basics Normal form Players Actions Outcomes Utilities Strategies Solutions
More informationLecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007
MS&E 336 Lecture 4: Approachability and regret minimization Ramesh Johari May 23, 2007 In this lecture we use Blackwell s approachability theorem to formulate both external and internal regret minimizing
More informationLearning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing
Learning ε-pareto Efficient Solutions With Minimal Knowledge Requirements Using Satisficing Jacob W. Crandall and Michael A. Goodrich Computer Science Department Brigham Young University Provo, UT 84602
More informationAlgorithmic Strategy Complexity
Algorithmic Strategy Complexity Abraham Neyman aneyman@math.huji.ac.il Hebrew University of Jerusalem Jerusalem Israel Algorithmic Strategy Complexity, Northwesten 2003 p.1/52 General Introduction The
More informationStochastic models in product form: the (E)RCAT methodology
Stochastic models in product form: the (E)RCAT methodology 1 Maria Grazia Vigliotti 2 1 Dipartimento di Informatica Università Ca Foscari di Venezia 2 Department of Computing Imperial College London Second
More informationAltruism, reputation, and collective collapse of cooperation in a simple model
Altruism, reputation, and collective collapse of cooperation in a simple model ETH, Zuerich, 20.06.2011 Andrzej Jarynowski Smoluchowski Institute of Physics, Jagiellonian University in Cracow P. Gawroński,
More informationStochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS
Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More information1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5
Table of contents 1 Introduction 2 2 Markov Decision Processes 2 3 Future Cumulative Reward 3 4 Q-Learning 4 4.1 The Q-value.............................................. 4 4.2 The Temporal Difference.......................................
More informationA Polynomial-time Nash Equilibrium Algorithm for Repeated Games
A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result
More informationPartners in power: Job mobility and dynamic deal-making
Partners in power: Job mobility and dynamic deal-making Matthew Checkley Warwick Business School Christian Steglich ICS / University of Groningen Presentation at the Fifth Workshop on Networks in Economics
More informationReinforcement Learning
1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision
More informationLecture 7: Value Function Approximation
Lecture 7: Value Function Approximation Joseph Modayil Outline 1 Introduction 2 3 Batch Methods Introduction Large-Scale Reinforcement Learning Reinforcement learning can be used to solve large problems,
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationMotivation for introducing probabilities
for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.
More informationDeep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017
Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)
More informationRevealing inductive biases through iterated learning
Revealing inductive biases through iterated learning Tom Griffiths Department of Psychology Cognitive Science Program UC Berkeley with Mike Kalish, Brian Christian, Simon Kirby, Mike Dowman, Stephan Lewandowsky
More informationReductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang
Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones Jefferson Huang School of Operations Research and Information Engineering Cornell University November 16, 2016
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationFirst Prev Next Last Go Back Full Screen Close Quit. Game Theory. Giorgio Fagiolo
Game Theory Giorgio Fagiolo giorgio.fagiolo@univr.it https://mail.sssup.it/ fagiolo/welcome.html Academic Year 2005-2006 University of Verona Summary 1. Why Game Theory? 2. Cooperative vs. Noncooperative
More informationSatisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games
Satisfaction Equilibrium: Achieving Cooperation in Incomplete Information Games Stéphane Ross and Brahim Chaib-draa Department of Computer Science and Software Engineering Laval University, Québec (Qc),
More informationZero-Sum Stochastic Games An algorithmic review
Zero-Sum Stochastic Games An algorithmic review Emmanuel Hyon LIP6/Paris Nanterre with N Yemele and L Perrotin Rosario November 2017 Final Meeting Dygame Dygame Project Amstic Outline 1 Introduction Static
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationMarkov Chains and MCMC
Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time
More informationLearning to Coordinate Efficiently: A Model-based Approach
Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion
More informationDistributed learning in potential games over large-scale networks
Distributed learning in potential games over large-scale networks Fabio Fagnani, DISMA, Politecnico di Torino joint work with Giacomo Como, Lund University Sandro Zampieri, DEI, University of Padova Networking
More informationEvolutionary Computation. DEIS-Cesena Alma Mater Studiorum Università di Bologna Cesena (Italia)
Evolutionary Computation DEIS-Cesena Alma Mater Studiorum Università di Bologna Cesena (Italia) andrea.roli@unibo.it Evolutionary Computation Inspiring principle: theory of natural selection Species face
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationMonte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning.
Monte Carlo is important in practice CSE 190: Reinforcement Learning: An Introduction Chapter 6: emporal Difference Learning When there are just a few possibilitieo value, out of a large state space, Monte
More informationComputation of Efficient Nash Equilibria for experimental economic games
International Journal of Mathematics and Soft Computing Vol.5, No.2 (2015), 197-212. ISSN Print : 2249-3328 ISSN Online: 2319-5215 Computation of Efficient Nash Equilibria for experimental economic games
More informationErgodicity and Non-Ergodicity in Economics
Abstract An stochastic system is called ergodic if it tends in probability to a limiting form that is independent of the initial conditions. Breakdown of ergodicity gives rise to path dependence. We illustrate
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive
More informationLecture 8: Policy Gradient
Lecture 8: Policy Gradient Hado van Hasselt Outline 1 Introduction 2 Finite Difference Policy Gradient 3 Monte-Carlo Policy Gradient 4 Actor-Critic Policy Gradient Introduction Vapnik s rule Never solve
More informationMarkov Model. Model representing the different resident states of a system, and the transitions between the different states
Markov Model Model representing the different resident states of a system, and the transitions between the different states (applicable to repairable, as well as non-repairable systems) System behavior
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationReinforcement Learning and NLP
1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationMSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at
MSc MT15. Further Statistical Methods: MCMC Lecture 5-6: Markov chains; Metropolis Hastings MCMC Notes and Practicals available at www.stats.ox.ac.uk\ nicholls\mscmcmc15 Markov chain Monte Carlo Methods
More informationOnline Appendices for Large Matching Markets: Risk, Unraveling, and Conflation
Online Appendices for Large Matching Markets: Risk, Unraveling, and Conflation Aaron L. Bodoh-Creed - Cornell University A Online Appendix: Strategic Convergence In section 4 we described the matching
More informationVariance Reduction for Policy Gradient Methods. March 13, 2017
Variance Reduction for Policy Gradient Methods March 13, 2017 Reward Shaping Reward Shaping Reward Shaping Reward shaping: r(s, a, s ) = r(s, a, s ) + γφ(s ) Φ(s) for arbitrary potential Φ Theorem: r admits
More information6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks
6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks Daron Acemoglu and Asu Ozdaglar MIT November 4, 2009 1 Introduction Outline The role of networks in cooperation A model of social norms
More informationReinforcement Learning: the basics
Reinforcement Learning: the basics Olivier Sigaud Université Pierre et Marie Curie, PARIS 6 http://people.isir.upmc.fr/sigaud August 6, 2012 1 / 46 Introduction Action selection/planning Learning by trial-and-error
More informationNon-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods
Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods M. González, R. Martínez, I. del Puerto, A. Ramos Department of Mathematics. University of Extremadura Spanish
More informationISyE 6761 (Fall 2016) Stochastic Processes I
Fall 216 TABLE OF CONTENTS ISyE 6761 (Fall 216) Stochastic Processes I Prof. H. Ayhan Georgia Institute of Technology L A TEXer: W. KONG http://wwong.github.io Last Revision: May 25, 217 Table of Contents
More informationThe Reinforcement Learning Problem
The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence
More informationCS 7180: Behavioral Modeling and Decision- making in AI
CS 7180: Behavioral Modeling and Decision- making in AI Learning Probabilistic Graphical Models Prof. Amy Sliva October 31, 2012 Hidden Markov model Stochastic system represented by three matrices N =
More informationMarkov chains. Randomness and Computation. Markov chains. Markov processes
Markov chains Randomness and Computation or, Randomized Algorithms Mary Cryan School of Informatics University of Edinburgh Definition (Definition 7) A discrete-time stochastic process on the state space
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationEvolution of cooperation in finite populations. Sabin Lessard Université de Montréal
Evolution of cooperation in finite populations Sabin Lessard Université de Montréal Title, contents and acknowledgements 2011 AMS Short Course on Evolutionary Game Dynamics Contents 1. Examples of cooperation
More informationExercises, II part Exercises, II part
Inference: 12 Jul 2012 Consider the following Joint Probability Table for the three binary random variables A, B, C. Compute the following queries: 1 P(C A=T,B=T) 2 P(C A=T) P(A, B, C) A B C 0.108 T T
More informationMachine Learning I Continuous Reinforcement Learning
Machine Learning I Continuous Reinforcement Learning Thomas Rückstieß Technische Universität München January 7/8, 2010 RL Problem Statement (reminder) state s t+1 ENVIRONMENT reward r t+1 new step r t
More informationDiffusion of Innovation
Diffusion of Innovation Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Social Network Analysis
More informationPopulation Games and Evolutionary Dynamics
Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population
More informationEvolution & Learning in Games
1 / 27 Evolution & Learning in Games Econ 243B Jean-Paul Carvalho Lecture 2. Foundations of Evolution & Learning in Games II 2 / 27 Outline In this lecture, we shall: Take a first look at local stability.
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationModels of Molecular Evolution
Models of Molecular Evolution Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison September 15, 2007 Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 1 /
More informationOptimism in the Face of Uncertainty Should be Refutable
Optimism in the Face of Uncertainty Should be Refutable Ronald ORTNER Montanuniversität Leoben Department Mathematik und Informationstechnolgie Franz-Josef-Strasse 18, 8700 Leoben, Austria, Phone number:
More informationOptimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games
Optimal Efficient Learning Equilibrium: Imperfect Monitoring in Symmetric Games Ronen I. Brafman Department of Computer Science Stanford University Stanford, CA 94305 brafman@cs.stanford.edu Moshe Tennenholtz
More informationOnline Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das
Online Learning 9.520 Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das About this class Goal To introduce the general setting of online learning. To describe an online version of the RLS algorithm
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationReinforcement learning
Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More information