Fictitious Self-Play in Extensive-Form Games
|
|
- Roy McDaniel
- 6 years ago
- Views:
Transcription
1 Johannes Heinrich, Marc Lanctot, David Silver University College London, Google DeepMind July 9, 05
2 Problem Learn from self-play in games with imperfect information. Games: Multi-agent decision making domains, e.g. poker, politics, security Self-play: Agents learn from interaction with each other, without prior knowledge
3 Extensive-Form Game Game-theoretic model of sequential interaction of multiple players Based on a game tree Can represent imperfect (asymmetric, private) information
4 Fictitious Play Game-theoretic model of learning in games Players repeatedly play a game At each iteration each player chooses a best response to their opponents average behaviour Players average strategies converge to Nash equilibrium in some classes of games, e.g. potential games and two-player zero-sum games (Brown, 95)
5 Fictitious Play Game-theoretic model of learning in games Players repeatedly play a game At each iteration each player chooses a best response to their opponents average behaviour Players average strategies converge to Nash equilibrium in some classes of games, e.g. potential games and two-player zero-sum games Problem: Almost exclusively studied in normal-form games (tabular, no explicit sequential structure) (Brown, 95)
6 wo algorithms Fictitious play in extensive-form games Computation linear in time and space rather than exponential Preserve convergence guarantees Fictitious Self-Play Experiential and sample-based approximation of fictitious play Leverages machine learning
7 wo steps of fictitious play At each iteration Compute a best response to opponents average strategies Update own average strategy with computed best response Π k+ Π k + k + (BR [Π k] Π k )
8 Information state tree C rest of tree...
9 Computing a best response E rest of tree... E E
10 Strategies Behavioural Pure p p 0 q q 0 Mixed pq p( q) p 0
11 Aggregating strategies Π B ( α) α Π B
12 Aggregating strategies ( α)π + αb α 0 0.5( α) 0.5( α) α α 0
13 Realization-equivalence p ( p) pq p( q) wo strategies are realization-equivalent iff for any opponent strategies they define the same probability distribution over the states of the game. Realization plan: x π (σ u ) := π(u, a) u U (u,a) σ u (Koller et al., 994; Von Stengel, 996)
14 Aggregating behavioural strategies Lemma Given π and β two behavioural strategies Π and B two mixed strategies 3 Π and B are realization equivalent to π and β hen [ µ(u) = π(u)+α x β (σ u ) ( α)x π (σ u ) + αx β (σ u ) ] (β(u) π(u)) u U is realization-equivalent to M = Π + α (B Π)
15 Fictitious play in extensive-form games Extensive-Form Fictitious Play (Hendon et al., 996) Extensive-Form Fictitious Play (this paper) Exploitability Iterations Figure: Learning curves in Leduc Hold em.
16 Motivation Full-width fictitious play performs computation at all states of the game, no matter whether they are likely to occur Sampling can focus on relevant states In big games event a single iteration of full-width fictitious play might be too costly Function approximator could generalise between states he state space is larger than information state space Learning agents only operate on their information states
17 Generalised weakened fictitious play At each iteration Compute an approximate best response to opponents average strategies Update own average strategy with computed best response, allowing for some kinds of perturbations Π k+ Π k + α k+ ( BRɛk+ [Π k ] Π k + M k+ ) (Benaïm et al., 005; Leslie & Collins, 006)
18 Learning a best response E rest of tree... E E
19 Knowledge transfer Opponent s update (in a two-player game): Π i k = ( α k )Π i k + α kb i k hus the MDP defined by Π i k ( MDP has the following structure: Π i k ) C α k α k (α k 0, k ) ( ) MDP Π i k ( MDP B i k )
20 Learning a strategy update Learn Π i k = ( α k)π i k + αbi k, by sampling data (state-action pairs) from { π i k with prob. α k β i k with prob. α k against some fixed, fully mixed opponent sampling policy, e.g. π i k.
21 Fictitious Self-Play Generate an approximate best response by reinforcement learning from experience Learn a model of own average behaviour by supervised learning from experience 3 Generate experience from self-play, using combinations of (π i, π i ), (β i, π i ), (π i, β i )
22 Experiments Fitted Q-Iteration Counting model: a A(u t ) : N(u t, a) N(u t, a) + {at=a} a A(u t ) : π(u t, a) N(u t, a) N(u t )
23 Fictitious Self-Play in Leduc Hold em.5 XFP, 6-card Leduc FSP:FQI, 6-card Leduc Exploitability ime in s Figure: Learning curves in Leduc Hold em.
24 Fictitious Self-Play in Leduc Hold em Exploitability XFP, 6-card Leduc XFP, 60-card Leduc FSP:FQI, 6-card Leduc FSP:FQI, 60-card Leduc ime in s Figure: Learning curves in Leduc Hold em.
25 Fictitious Self-Play in River Poker.5 XFP, River Poker (uniform beliefs) FSP:FQI, River Poker (uniform beliefs) Exploitability ime in s Figure: Learning curves in River Poker.
26 Fictitious Self-Play in River Poker Exploitability 0 0. XFP, River Poker (uniform beliefs) XFP, River Poker (defined beliefs) FSP:FQI, River Poker (uniform beliefs) FSP:FQI, River Poker (defined beliefs) ime in s Figure: Learning curves in River Poker.
27 Summary Convergent, full-width fictitious play in extensive-form games Fictitious Self-Play is an experiential, sample- and learning-based approach to fictitious play
Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6
Solving Zero-Sum Extensive-Form Games ranislav ošanský E4M36MS, Fall 2013, Lecture 6 Imperfect Information EFGs States Players 1 2 Information Set ctions Utility Solving II Zero-Sum EFG with perfect recall
More information(Deep) Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 17 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationDeep Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 50 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationMS&E 246: Lecture 12 Static games of incomplete information. Ramesh Johari
MS&E 246: Lecture 12 Static games of incomplete information Ramesh Johari Incomplete information Complete information means the entire structure of the game is common knowledge Incomplete information means
More informationA Polynomial-time Nash Equilibrium Algorithm for Repeated Games
A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result
More informationDistributed Learning based on Entropy-Driven Game Dynamics
Distributed Learning based on Entropy-Driven Game Dynamics Bruno Gaujal joint work with Pierre Coucheney and Panayotis Mertikopoulos Inria Aug., 2014 Model Shared resource systems (network, processors)
More informationDavid Silver, Google DeepMind
Tutorial: Deep Reinforcement Learning David Silver, Google DeepMind Outline Introduction to Deep Learning Introduction to Reinforcement Learning Value-Based Deep RL Policy-Based Deep RL Model-Based Deep
More informationReinforcement Learning
Reinforcement Learning Cyber Rodent Project Some slides from: David Silver, Radford Neal CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reinforcement Learning Supervised learning:
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationTime Indexed Hierarchical Relative Entropy Policy Search
Time Indexed Hierarchical Relative Entropy Policy Search Florentin Mehlbeer June 19, 2013 1 / 15 Structure Introduction Reinforcement Learning Relative Entropy Policy Search Hierarchical Relative Entropy
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More informationBelief-based Learning
Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium
More informationCyber Security Games with Asymmetric Information
Cyber Security Games with Asymmetric Information Jeff S. Shamma Georgia Institute of Technology Joint work with Georgios Kotsalis & Malachi Jones ARO MURI Annual Review November 15, 2012 Research Thrust:
More informationBayesian Opponent Exploitation in Imperfect-Information Games
Bayesian Opponent Exploitation in Imperfect-Information Games SAM GANFRIED, Florida International University QINGYUN SUN, Stanford University arxiv:1603.03491v4 cs.gt] 13 Feb 2017 Two fundamental problems
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationNotes on Coursera s Game Theory
Notes on Coursera s Game Theory Manoel Horta Ribeiro Week 01: Introduction and Overview Game theory is about self interested agents interacting within a specific set of rules. Self-Interested Agents have
More informationAmbiguity and the Centipede Game
Ambiguity and the Centipede Game Jürgen Eichberger, Simon Grant and David Kelsey Heidelberg University, Australian National University, University of Exeter. University of Exeter. June 2018 David Kelsey
More informationReinforcement Learning Part 2
Reinforcement Learning Part 2 Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ From previous tutorial Reinforcement Learning Exploration No supervision Agent-Reward-Environment
More informationCyber-Awareness and Games of Incomplete Information
Cyber-Awareness and Games of Incomplete Information Jeff S Shamma Georgia Institute of Technology ARO/MURI Annual Review August 23 24, 2010 Preview Game theoretic modeling formalisms Main issue: Information
More informationGame Theory. Wolfgang Frimmel. Perfect Bayesian Equilibrium
Game Theory Wolfgang Frimmel Perfect Bayesian Equilibrium / 22 Bayesian Nash equilibrium and dynamic games L M R 3 2 L R L R 2 2 L R L 2,, M,2, R,3,3 2 NE and 2 SPNE (only subgame!) 2 / 22 Non-credible
More informationGeneralized Sampling and Variance in Counterfactual Regret Minimization
Generalized Sampling and Variance in Counterfactual Regret Minimization Richard Gison and Marc Lanctot and Neil Burch and Duane Szafron and Michael Bowling Department of Computing Science, University of
More informationA Modified Q-Learning Algorithm for Potential Games
Preprints of the 19th World Congress The International Federation of Automatic Control A Modified Q-Learning Algorithm for Potential Games Yatao Wang Lacra Pavel Edward S. Rogers Department of Electrical
More informationLINEAR PROGRAMMING III
LINEAR PROGRAMMING III ellipsoid algorithm combinatorial optimization matrix games open problems Lecture slides by Kevin Wayne Last updated on 7/25/17 11:09 AM LINEAR PROGRAMMING III ellipsoid algorithm
More informationBayesian Opponent Exploitation in Imperfect-Information Games
Bayesian Opponent Exploitation in Imperfect-Information Games Sam Ganzfried Ganzfried Research Miami Beach, Florida, 33139 sam@ganzfriedresearch.com School of Computing and Information Sciences Florida
More informationReinforcement Learning
1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision
More informationRealization Plans for Extensive Form Games without Perfect Recall
Realization Plans for Extensive Form Games without Perfect Recall Richard E. Stearns Department of Computer Science University at Albany - SUNY Albany, NY 12222 April 13, 2015 Abstract Given a game in
More information15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted
15-889e Policy Search: Gradient Methods Emma Brunskill All slides from David Silver (with EB adding minor modificafons), unless otherwise noted Outline 1 Introduction 2 Finite Difference Policy Gradient
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationLearning to Coordinate Efficiently: A Model-based Approach
Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationConvergence and No-Regret in Multiagent Learning
Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent
More information4. Opponent Forecasting in Repeated Games
4. Opponent Forecasting in Repeated Games Julian and Mohamed / 2 Learning in Games Literature examines limiting behavior of interacting players. One approach is to have players compute forecasts for opponents
More informationMachine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396
Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationarxiv: v1 [cs.gt] 1 Sep 2015
HC selection for MCTS in Simultaneous Move Games Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games arxiv:1509.00149v1 [cs.gt] 1 Sep 2015 Vojtěch Kovařík vojta.kovarik@gmail.com
More informationReinforcement Learning
Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo Marc Toussaint University of
More informationLecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation
Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free
More informationMulti-Agent Learning with Policy Prediction
Multi-Agent Learning with Policy Prediction Chongjie Zhang Computer Science Department University of Massachusetts Amherst, MA 3 USA chongjie@cs.umass.edu Victor Lesser Computer Science Department University
More informationMultiagent (Deep) Reinforcement Learning
Multiagent (Deep) Reinforcement Learning MARTIN PILÁT (MARTIN.PILAT@MFF.CUNI.CZ) Reinforcement learning The agent needs to learn to perform tasks in environment No prior knowledge about the effects of
More informationA Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker
A Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker Duane Szafron University of Alberta Edmonton, Alberta dszafron@ualberta.ca Richard Gibson University of Alberta Edmonton, Alberta
More informationCS230: Lecture 9 Deep Reinforcement Learning
CS230: Lecture 9 Deep Reinforcement Learning Kian Katanforoosh Menti code: 21 90 15 Today s outline I. Motivation II. Recycling is good: an introduction to RL III. Deep Q-Learning IV. Application of Deep
More informationPartially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague
Partially observable Markov decision processes Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationMultiagent Learning Using a Variable Learning Rate
Multiagent Learning Using a Variable Learning Rate Michael Bowling, Manuela Veloso Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213-3890 Abstract Learning to act in a multiagent
More informationReduced Space and Faster Convergence in Imperfect-Information Games via Pruning
Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning Noam Brown 1 uomas Sandholm 1 Abstract Iterative algorithms such as Counterfactual Regret Minimization CFR) are the most
More informationReinforcement Learning
CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act
More informationBayesian Games and Mechanism Design Definition of Bayes Equilibrium
Bayesian Games and Mechanism Design Definition of Bayes Equilibrium Harsanyi [1967] What happens when players do not know one another s payoffs? Games of incomplete information versus games of imperfect
More informationSequential and reinforcement learning: Stochastic Optimization I
1 Sequential and reinforcement learning: Stochastic Optimization I Sequential and reinforcement learning: Stochastic Optimization I Summary This session describes the important and nowadays framework of
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More informationQ-learning. Tambet Matiisen
Q-learning Tambet Matiisen (based on chapter 11.3 of online book Artificial Intelligence, foundations of computational agents by David Poole and Alan Mackworth) Stochastic gradient descent Experience
More informationThe Value of Information in Zero-Sum Games
The Value of Information in Zero-Sum Games Olivier Gossner THEMA, Université Paris Nanterre and CORE, Université Catholique de Louvain joint work with Jean-François Mertens July 2001 Abstract: It is not
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and
More information15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)
15-780: Graduate Artificial Intelligence Reinforcement learning (RL) From MDPs to RL We still use the same Markov model with rewards and actions But there are a few differences: 1. We do not assume we
More informationIncremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests
Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Nancy Fulda and Dan Ventura Department of Computer Science Brigham Young University
More informationIntroduction to Game Theory
Introduction to Game Theory Part 1. Static games of complete information Chapter 3. Mixed strategies and existence of equilibrium Ciclo Profissional 2 o Semestre / 2011 Graduação em Ciências Econômicas
More informationMechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham
Game Theory Course: Jackson, Leyton-Brown & Shoham Bayesian Game Setting Extend the social choice setting to a new setting where agents can t be relied upon to disclose their preferences honestly Start
More informationReinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies
Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University
More informationAdaptive State Feedback Nash Strategies for Linear Quadratic Discrete-Time Games
Adaptive State Feedbac Nash Strategies for Linear Quadratic Discrete-Time Games Dan Shen and Jose B. Cruz, Jr. Intelligent Automation Inc., Rocville, MD 2858 USA (email: dshen@i-a-i.com). The Ohio State
More informationUniversity of Alberta. Richard Gibson. Doctor of Philosophy. Department of Computing Science
University of Alberta REGRET MINIMIZATION IN GAMES AND THE DEVELOPMENT OF CHAMPION MULTIPLAYER COMPUTER POKER-PLAYING AGENTS by Richard Gibson A thesis submitted to the Faculty of Graduate Studies and
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationTowards a General Theory of Non-Cooperative Computation
Towards a General Theory of Non-Cooperative Computation (Extended Abstract) Robert McGrew, Ryan Porter, and Yoav Shoham Stanford University {bmcgrew,rwporter,shoham}@cs.stanford.edu Abstract We generalize
More informationCOMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati
COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning
More information16.410/413 Principles of Autonomy and Decision Making
16.410/413 Principles of Autonomy and Decision Making Lecture 23: Markov Decision Processes Policy Iteration Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology December
More informationLearning in Zero-Sum Team Markov Games using Factored Value Functions
Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer
More informationGames with Perfect Information
Games with Perfect Information Yiling Chen September 7, 2011 Non-Cooperative Game Theory What is it? Mathematical study of interactions between rational and self-interested agents. Non-Cooperative Focus
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationCorrelated Equilibria: Rationality and Dynamics
Correlated Equilibria: Rationality and Dynamics Sergiu Hart June 2010 AUMANN 80 SERGIU HART c 2010 p. 1 CORRELATED EQUILIBRIA: RATIONALITY AND DYNAMICS Sergiu Hart Center for the Study of Rationality Dept
More informationDeep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017
Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)
More informationLEARNING IN CONCAVE GAMES
LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015 Motivation and Preliminaries
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationInstitute of Electrical and Electronics Engineers (IEEE) 53rd IEEE Conference on Decision and Control
KAUST Repository LP formulation of asymmetric zero-sum stochastic games Item type Conference Paper Authors Li, Lichun; Shamma, Jeff S. Eprint version DOI Publisher Journal Rights Post-print 0.09/CDC.204.7039680
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationLecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010
Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationComputing Proper Equilibria of Zero-Sum Games
Computing Proper Equilibria of Zero-Sum Games Peter Bro Miltersen and Troels Bjerre Sørensen University of Aarhus, Denmark. Abstract. We show that a proper equilibrium of a matrix game can be found in
More informationepub WU Institutional Repository
epub WU Institutional Repository Ulrich Berger Learning in games with strategic complementarities revisited Article (Submitted) Original Citation: Berger, Ulrich (2008) Learning in games with strategic
More informationLearning Equilibrium as a Generalization of Learning to Optimize
Learning Equilibrium as a Generalization of Learning to Optimize Dov Monderer and Moshe Tennenholtz Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa 32000,
More informationarxiv: v2 [cs.gt] 23 Jan 2019
Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games Vojtěch Kovařík, Viliam Lisý vojta.kovarik@gmail.com, viliam.lisy@agents.fel.cvut.cz arxiv:1804.09045v2 [cs.gt]
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:
More informationPracticable Robust Markov Decision Processes
Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)
More informationWhen to Ask for an Update: Timing in Strategic Communication. National University of Singapore June 5, 2018
When to Ask for an Update: Timing in Strategic Communication Ying Chen Johns Hopkins University Atara Oliver Rice University National University of Singapore June 5, 2018 Main idea In many communication
More informationDECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R I LAGGING ANCHOR ALGORITHM. Xiaosong Lu and Howard M. Schwartz
International Journal of Innovative Computing, Information and Control ICIC International c 20 ISSN 349-498 Volume 7, Number, January 20 pp. 0 DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationOptimal Machine Strategies to Commit to in Two-Person Repeated Games
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Optimal Machine Strategies to Commit to in Two-Person Repeated Games Song Zuo and Pingzhong Tang Institute for Interdisciplinary
More informationGame Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti
Game Theory, Evolutionary Dynamics, and Multi-Agent Learning Prof. Nicola Gatti (nicola.gatti@polimi.it) Game theory Game theory: basics Normal form Players Actions Outcomes Utilities Strategies Solutions
More informationRemote Estimation Games over Shared Networks
October st, 04 Remote Estimation Games over Shared Networks Marcos Vasconcelos & Nuno Martins marcos@umd.edu Dept. of Electrical and Computer Engineering Institute of Systems Research University of Maryland,
More informationRetrospective Spectrum Access Protocol: A Completely Uncoupled Learning Algorithm for Cognitive Networks
Retrospective Spectrum Access Protocol: A Completely Uncoupled Learning Algorithm for Cognitive Networks Marceau Coupechoux, Stefano Iellamo, Lin Chen + TELECOM ParisTech (INFRES/RMS) and CNRS LTCI + University
More informationHigher Order Beliefs in Dynamic Environments
University of Pennsylvania Department of Economics June 22, 2008 Introduction: Higher Order Beliefs Global Games (Carlsson and Van Damme, 1993): A B A 0, 0 0, θ 2 B θ 2, 0 θ, θ Dominance Regions: A if
More informationAsymmetric Information Security Games 1/43
Asymmetric Information Security Games Jeff S. Shamma with Lichun Li & Malachi Jones & IPAM Graduate Summer School Games and Contracts for Cyber-Physical Security 7 23 July 2015 Jeff S. Shamma Asymmetric
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement
More informationPractical exact algorithm for trembling-hand equilibrium refinements in games
Practical exact algorithm for trembling-hand equilibrium refinements in games Gabriele Farina Computer Science Department Carnegie Mellon University gfarina@cs.cmu.edu Nicola Gatti DEIB Politecnico di
More informationFactored State Spaces 3/2/178
Factored State Spaces 3/2/178 Converting POMDPs to MDPs In a POMDP: Action + observation updates beliefs Value is a function of beliefs. Instead we can view this as an MDP where: There is a state for every
More informationMarks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:
Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,
More informationMachine Learning I Reinforcement Learning
Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:
More informationIntrinsic and Extrinsic Motivation
Intrinsic and Extrinsic Motivation Roland Bénabou Jean Tirole. Review of Economic Studies 2003 Bénabou and Tirole Intrinsic and Extrinsic Motivation 1 / 30 Motivation Should a child be rewarded for passing
More informationCSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes
CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes Name: Roll Number: Please read the following instructions carefully Ø Calculators are allowed. However, laptops or mobile phones are not
More information