Fictitious Self-Play in Extensive-Form Games

Size: px
Start display at page:

Download "Fictitious Self-Play in Extensive-Form Games"

Transcription

1 Johannes Heinrich, Marc Lanctot, David Silver University College London, Google DeepMind July 9, 05

2 Problem Learn from self-play in games with imperfect information. Games: Multi-agent decision making domains, e.g. poker, politics, security Self-play: Agents learn from interaction with each other, without prior knowledge

3 Extensive-Form Game Game-theoretic model of sequential interaction of multiple players Based on a game tree Can represent imperfect (asymmetric, private) information

4 Fictitious Play Game-theoretic model of learning in games Players repeatedly play a game At each iteration each player chooses a best response to their opponents average behaviour Players average strategies converge to Nash equilibrium in some classes of games, e.g. potential games and two-player zero-sum games (Brown, 95)

5 Fictitious Play Game-theoretic model of learning in games Players repeatedly play a game At each iteration each player chooses a best response to their opponents average behaviour Players average strategies converge to Nash equilibrium in some classes of games, e.g. potential games and two-player zero-sum games Problem: Almost exclusively studied in normal-form games (tabular, no explicit sequential structure) (Brown, 95)

6 wo algorithms Fictitious play in extensive-form games Computation linear in time and space rather than exponential Preserve convergence guarantees Fictitious Self-Play Experiential and sample-based approximation of fictitious play Leverages machine learning

7 wo steps of fictitious play At each iteration Compute a best response to opponents average strategies Update own average strategy with computed best response Π k+ Π k + k + (BR [Π k] Π k )

8 Information state tree C rest of tree...

9 Computing a best response E rest of tree... E E

10 Strategies Behavioural Pure p p 0 q q 0 Mixed pq p( q) p 0

11 Aggregating strategies Π B ( α) α Π B

12 Aggregating strategies ( α)π + αb α 0 0.5( α) 0.5( α) α α 0

13 Realization-equivalence p ( p) pq p( q) wo strategies are realization-equivalent iff for any opponent strategies they define the same probability distribution over the states of the game. Realization plan: x π (σ u ) := π(u, a) u U (u,a) σ u (Koller et al., 994; Von Stengel, 996)

14 Aggregating behavioural strategies Lemma Given π and β two behavioural strategies Π and B two mixed strategies 3 Π and B are realization equivalent to π and β hen [ µ(u) = π(u)+α x β (σ u ) ( α)x π (σ u ) + αx β (σ u ) ] (β(u) π(u)) u U is realization-equivalent to M = Π + α (B Π)

15 Fictitious play in extensive-form games Extensive-Form Fictitious Play (Hendon et al., 996) Extensive-Form Fictitious Play (this paper) Exploitability Iterations Figure: Learning curves in Leduc Hold em.

16 Motivation Full-width fictitious play performs computation at all states of the game, no matter whether they are likely to occur Sampling can focus on relevant states In big games event a single iteration of full-width fictitious play might be too costly Function approximator could generalise between states he state space is larger than information state space Learning agents only operate on their information states

17 Generalised weakened fictitious play At each iteration Compute an approximate best response to opponents average strategies Update own average strategy with computed best response, allowing for some kinds of perturbations Π k+ Π k + α k+ ( BRɛk+ [Π k ] Π k + M k+ ) (Benaïm et al., 005; Leslie & Collins, 006)

18 Learning a best response E rest of tree... E E

19 Knowledge transfer Opponent s update (in a two-player game): Π i k = ( α k )Π i k + α kb i k hus the MDP defined by Π i k ( MDP has the following structure: Π i k ) C α k α k (α k 0, k ) ( ) MDP Π i k ( MDP B i k )

20 Learning a strategy update Learn Π i k = ( α k)π i k + αbi k, by sampling data (state-action pairs) from { π i k with prob. α k β i k with prob. α k against some fixed, fully mixed opponent sampling policy, e.g. π i k.

21 Fictitious Self-Play Generate an approximate best response by reinforcement learning from experience Learn a model of own average behaviour by supervised learning from experience 3 Generate experience from self-play, using combinations of (π i, π i ), (β i, π i ), (π i, β i )

22 Experiments Fitted Q-Iteration Counting model: a A(u t ) : N(u t, a) N(u t, a) + {at=a} a A(u t ) : π(u t, a) N(u t, a) N(u t )

23 Fictitious Self-Play in Leduc Hold em.5 XFP, 6-card Leduc FSP:FQI, 6-card Leduc Exploitability ime in s Figure: Learning curves in Leduc Hold em.

24 Fictitious Self-Play in Leduc Hold em Exploitability XFP, 6-card Leduc XFP, 60-card Leduc FSP:FQI, 6-card Leduc FSP:FQI, 60-card Leduc ime in s Figure: Learning curves in Leduc Hold em.

25 Fictitious Self-Play in River Poker.5 XFP, River Poker (uniform beliefs) FSP:FQI, River Poker (uniform beliefs) Exploitability ime in s Figure: Learning curves in River Poker.

26 Fictitious Self-Play in River Poker Exploitability 0 0. XFP, River Poker (uniform beliefs) XFP, River Poker (defined beliefs) FSP:FQI, River Poker (uniform beliefs) FSP:FQI, River Poker (defined beliefs) ime in s Figure: Learning curves in River Poker.

27 Summary Convergent, full-width fictitious play in extensive-form games Fictitious Self-Play is an experiential, sample- and learning-based approach to fictitious play

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6

Solving Zero-Sum Extensive-Form Games. Branislav Bošanský AE4M36MAS, Fall 2013, Lecture 6 Solving Zero-Sum Extensive-Form Games ranislav ošanský E4M36MS, Fall 2013, Lecture 6 Imperfect Information EFGs States Players 1 2 Information Set ctions Utility Solving II Zero-Sum EFG with perfect recall

More information

(Deep) Reinforcement Learning

(Deep) Reinforcement Learning Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 17 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 50 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015

More information

MS&E 246: Lecture 12 Static games of incomplete information. Ramesh Johari

MS&E 246: Lecture 12 Static games of incomplete information. Ramesh Johari MS&E 246: Lecture 12 Static games of incomplete information Ramesh Johari Incomplete information Complete information means the entire structure of the game is common knowledge Incomplete information means

More information

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games

A Polynomial-time Nash Equilibrium Algorithm for Repeated Games A Polynomial-time Nash Equilibrium Algorithm for Repeated Games Michael L. Littman mlittman@cs.rutgers.edu Rutgers University Peter Stone pstone@cs.utexas.edu The University of Texas at Austin Main Result

More information

Distributed Learning based on Entropy-Driven Game Dynamics

Distributed Learning based on Entropy-Driven Game Dynamics Distributed Learning based on Entropy-Driven Game Dynamics Bruno Gaujal joint work with Pierre Coucheney and Panayotis Mertikopoulos Inria Aug., 2014 Model Shared resource systems (network, processors)

More information

David Silver, Google DeepMind

David Silver, Google DeepMind Tutorial: Deep Reinforcement Learning David Silver, Google DeepMind Outline Introduction to Deep Learning Introduction to Reinforcement Learning Value-Based Deep RL Policy-Based Deep RL Model-Based Deep

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Cyber Rodent Project Some slides from: David Silver, Radford Neal CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reinforcement Learning Supervised learning:

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

Time Indexed Hierarchical Relative Entropy Policy Search

Time Indexed Hierarchical Relative Entropy Policy Search Time Indexed Hierarchical Relative Entropy Policy Search Florentin Mehlbeer June 19, 2013 1 / 15 Structure Introduction Reinforcement Learning Relative Entropy Policy Search Hierarchical Relative Entropy

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques

More information

Belief-based Learning

Belief-based Learning Belief-based Learning Algorithmic Game Theory Marcello Restelli Lecture Outline Introdutcion to multi-agent learning Belief-based learning Cournot adjustment Fictitious play Bayesian learning Equilibrium

More information

Cyber Security Games with Asymmetric Information

Cyber Security Games with Asymmetric Information Cyber Security Games with Asymmetric Information Jeff S. Shamma Georgia Institute of Technology Joint work with Georgios Kotsalis & Malachi Jones ARO MURI Annual Review November 15, 2012 Research Thrust:

More information

Bayesian Opponent Exploitation in Imperfect-Information Games

Bayesian Opponent Exploitation in Imperfect-Information Games Bayesian Opponent Exploitation in Imperfect-Information Games SAM GANFRIED, Florida International University QINGYUN SUN, Stanford University arxiv:1603.03491v4 cs.gt] 13 Feb 2017 Two fundamental problems

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Notes on Coursera s Game Theory

Notes on Coursera s Game Theory Notes on Coursera s Game Theory Manoel Horta Ribeiro Week 01: Introduction and Overview Game theory is about self interested agents interacting within a specific set of rules. Self-Interested Agents have

More information

Ambiguity and the Centipede Game

Ambiguity and the Centipede Game Ambiguity and the Centipede Game Jürgen Eichberger, Simon Grant and David Kelsey Heidelberg University, Australian National University, University of Exeter. University of Exeter. June 2018 David Kelsey

More information

Reinforcement Learning Part 2

Reinforcement Learning Part 2 Reinforcement Learning Part 2 Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ From previous tutorial Reinforcement Learning Exploration No supervision Agent-Reward-Environment

More information

Cyber-Awareness and Games of Incomplete Information

Cyber-Awareness and Games of Incomplete Information Cyber-Awareness and Games of Incomplete Information Jeff S Shamma Georgia Institute of Technology ARO/MURI Annual Review August 23 24, 2010 Preview Game theoretic modeling formalisms Main issue: Information

More information

Game Theory. Wolfgang Frimmel. Perfect Bayesian Equilibrium

Game Theory. Wolfgang Frimmel. Perfect Bayesian Equilibrium Game Theory Wolfgang Frimmel Perfect Bayesian Equilibrium / 22 Bayesian Nash equilibrium and dynamic games L M R 3 2 L R L R 2 2 L R L 2,, M,2, R,3,3 2 NE and 2 SPNE (only subgame!) 2 / 22 Non-credible

More information

Generalized Sampling and Variance in Counterfactual Regret Minimization

Generalized Sampling and Variance in Counterfactual Regret Minimization Generalized Sampling and Variance in Counterfactual Regret Minimization Richard Gison and Marc Lanctot and Neil Burch and Duane Szafron and Michael Bowling Department of Computing Science, University of

More information

A Modified Q-Learning Algorithm for Potential Games

A Modified Q-Learning Algorithm for Potential Games Preprints of the 19th World Congress The International Federation of Automatic Control A Modified Q-Learning Algorithm for Potential Games Yatao Wang Lacra Pavel Edward S. Rogers Department of Electrical

More information

LINEAR PROGRAMMING III

LINEAR PROGRAMMING III LINEAR PROGRAMMING III ellipsoid algorithm combinatorial optimization matrix games open problems Lecture slides by Kevin Wayne Last updated on 7/25/17 11:09 AM LINEAR PROGRAMMING III ellipsoid algorithm

More information

Bayesian Opponent Exploitation in Imperfect-Information Games

Bayesian Opponent Exploitation in Imperfect-Information Games Bayesian Opponent Exploitation in Imperfect-Information Games Sam Ganzfried Ganzfried Research Miami Beach, Florida, 33139 sam@ganzfriedresearch.com School of Computing and Information Sciences Florida

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Realization Plans for Extensive Form Games without Perfect Recall

Realization Plans for Extensive Form Games without Perfect Recall Realization Plans for Extensive Form Games without Perfect Recall Richard E. Stearns Department of Computer Science University at Albany - SUNY Albany, NY 12222 April 13, 2015 Abstract Given a game in

More information

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted

15-889e Policy Search: Gradient Methods Emma Brunskill. All slides from David Silver (with EB adding minor modificafons), unless otherwise noted 15-889e Policy Search: Gradient Methods Emma Brunskill All slides from David Silver (with EB adding minor modificafons), unless otherwise noted Outline 1 Introduction 2 Finite Difference Policy Gradient

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

Learning to Coordinate Efficiently: A Model-based Approach

Learning to Coordinate Efficiently: A Model-based Approach Journal of Artificial Intelligence Research 19 (2003) 11-23 Submitted 10/02; published 7/03 Learning to Coordinate Efficiently: A Model-based Approach Ronen I. Brafman Computer Science Department Ben-Gurion

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Convergence and No-Regret in Multiagent Learning

Convergence and No-Regret in Multiagent Learning Convergence and No-Regret in Multiagent Learning Michael Bowling Department of Computing Science University of Alberta Edmonton, Alberta Canada T6G 2E8 bowling@cs.ualberta.ca Abstract Learning in a multiagent

More information

4. Opponent Forecasting in Repeated Games

4. Opponent Forecasting in Repeated Games 4. Opponent Forecasting in Repeated Games Julian and Mohamed / 2 Learning in Games Literature examines limiting behavior of interacting players. One approach is to have players compute forecasts for opponents

More information

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396 Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

arxiv: v1 [cs.gt] 1 Sep 2015

arxiv: v1 [cs.gt] 1 Sep 2015 HC selection for MCTS in Simultaneous Move Games Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games arxiv:1509.00149v1 [cs.gt] 1 Sep 2015 Vojtěch Kovařík vojta.kovarik@gmail.com

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo Marc Toussaint University of

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

Multi-Agent Learning with Policy Prediction

Multi-Agent Learning with Policy Prediction Multi-Agent Learning with Policy Prediction Chongjie Zhang Computer Science Department University of Massachusetts Amherst, MA 3 USA chongjie@cs.umass.edu Victor Lesser Computer Science Department University

More information

Multiagent (Deep) Reinforcement Learning

Multiagent (Deep) Reinforcement Learning Multiagent (Deep) Reinforcement Learning MARTIN PILÁT (MARTIN.PILAT@MFF.CUNI.CZ) Reinforcement learning The agent needs to learn to perform tasks in environment No prior knowledge about the effects of

More information

A Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker

A Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker A Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker Duane Szafron University of Alberta Edmonton, Alberta dszafron@ualberta.ca Richard Gibson University of Alberta Edmonton, Alberta

More information

CS230: Lecture 9 Deep Reinforcement Learning

CS230: Lecture 9 Deep Reinforcement Learning CS230: Lecture 9 Deep Reinforcement Learning Kian Katanforoosh Menti code: 21 90 15 Today s outline I. Motivation II. Recycling is good: an introduction to RL III. Deep Q-Learning IV. Application of Deep

More information

Partially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague

Partially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague Partially observable Markov decision processes Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:

More information

Multiagent Learning Using a Variable Learning Rate

Multiagent Learning Using a Variable Learning Rate Multiagent Learning Using a Variable Learning Rate Michael Bowling, Manuela Veloso Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213-3890 Abstract Learning to act in a multiagent

More information

Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning

Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning Noam Brown 1 uomas Sandholm 1 Abstract Iterative algorithms such as Counterfactual Regret Minimization CFR) are the most

More information

Reinforcement Learning

Reinforcement Learning CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act

More information

Bayesian Games and Mechanism Design Definition of Bayes Equilibrium

Bayesian Games and Mechanism Design Definition of Bayes Equilibrium Bayesian Games and Mechanism Design Definition of Bayes Equilibrium Harsanyi [1967] What happens when players do not know one another s payoffs? Games of incomplete information versus games of imperfect

More information

Sequential and reinforcement learning: Stochastic Optimization I

Sequential and reinforcement learning: Stochastic Optimization I 1 Sequential and reinforcement learning: Stochastic Optimization I Sequential and reinforcement learning: Stochastic Optimization I Summary This session describes the important and nowadays framework of

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call

More information

Q-Learning in Continuous State Action Spaces

Q-Learning in Continuous State Action Spaces Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental

More information

Q-learning. Tambet Matiisen

Q-learning. Tambet Matiisen Q-learning Tambet Matiisen (based on chapter 11.3 of online book Artificial Intelligence, foundations of computational agents by David Poole and Alan Mackworth) Stochastic gradient descent Experience

More information

The Value of Information in Zero-Sum Games

The Value of Information in Zero-Sum Games The Value of Information in Zero-Sum Games Olivier Gossner THEMA, Université Paris Nanterre and CORE, Université Catholique de Louvain joint work with Jean-François Mertens July 2001 Abstract: It is not

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and

More information

15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)

15-780: Graduate Artificial Intelligence. Reinforcement learning (RL) 15-780: Graduate Artificial Intelligence Reinforcement learning (RL) From MDPs to RL We still use the same Markov model with rewards and actions But there are a few differences: 1. We do not assume we

More information

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests

Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Incremental Policy Learning: An Equilibrium Selection Algorithm for Reinforcement Learning Agents with Common Interests Nancy Fulda and Dan Ventura Department of Computer Science Brigham Young University

More information

Introduction to Game Theory

Introduction to Game Theory Introduction to Game Theory Part 1. Static games of complete information Chapter 3. Mixed strategies and existence of equilibrium Ciclo Profissional 2 o Semestre / 2011 Graduação em Ciências Econômicas

More information

Mechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham

Mechanism Design: Implementation. Game Theory Course: Jackson, Leyton-Brown & Shoham Game Theory Course: Jackson, Leyton-Brown & Shoham Bayesian Game Setting Extend the social choice setting to a new setting where agents can t be relied upon to disclose their preferences honestly Start

More information

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University

More information

Adaptive State Feedback Nash Strategies for Linear Quadratic Discrete-Time Games

Adaptive State Feedback Nash Strategies for Linear Quadratic Discrete-Time Games Adaptive State Feedbac Nash Strategies for Linear Quadratic Discrete-Time Games Dan Shen and Jose B. Cruz, Jr. Intelligent Automation Inc., Rocville, MD 2858 USA (email: dshen@i-a-i.com). The Ohio State

More information

University of Alberta. Richard Gibson. Doctor of Philosophy. Department of Computing Science

University of Alberta. Richard Gibson. Doctor of Philosophy. Department of Computing Science University of Alberta REGRET MINIMIZATION IN GAMES AND THE DEVELOPMENT OF CHAMPION MULTIPLAYER COMPUTER POKER-PLAYING AGENTS by Richard Gibson A thesis submitted to the Faculty of Graduate Studies and

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

Towards a General Theory of Non-Cooperative Computation

Towards a General Theory of Non-Cooperative Computation Towards a General Theory of Non-Cooperative Computation (Extended Abstract) Robert McGrew, Ryan Porter, and Yoav Shoham Stanford University {bmcgrew,rwporter,shoham}@cs.stanford.edu Abstract We generalize

More information

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning

More information

16.410/413 Principles of Autonomy and Decision Making

16.410/413 Principles of Autonomy and Decision Making 16.410/413 Principles of Autonomy and Decision Making Lecture 23: Markov Decision Processes Policy Iteration Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology December

More information

Learning in Zero-Sum Team Markov Games using Factored Value Functions

Learning in Zero-Sum Team Markov Games using Factored Value Functions Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer

More information

Games with Perfect Information

Games with Perfect Information Games with Perfect Information Yiling Chen September 7, 2011 Non-Cooperative Game Theory What is it? Mathematical study of interactions between rational and self-interested agents. Non-Cooperative Focus

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Correlated Equilibria: Rationality and Dynamics

Correlated Equilibria: Rationality and Dynamics Correlated Equilibria: Rationality and Dynamics Sergiu Hart June 2010 AUMANN 80 SERGIU HART c 2010 p. 1 CORRELATED EQUILIBRIA: RATIONALITY AND DYNAMICS Sergiu Hart Center for the Study of Rationality Dept

More information

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017

Deep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)

More information

LEARNING IN CONCAVE GAMES

LEARNING IN CONCAVE GAMES LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015 Motivation and Preliminaries

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Institute of Electrical and Electronics Engineers (IEEE) 53rd IEEE Conference on Decision and Control

Institute of Electrical and Electronics Engineers (IEEE) 53rd IEEE Conference on Decision and Control KAUST Repository LP formulation of asymmetric zero-sum stochastic games Item type Conference Paper Authors Li, Lichun; Shamma, Jeff S. Eprint version DOI Publisher Journal Rights Post-print 0.09/CDC.204.7039680

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018 COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010

Lecture 25: Learning 4. Victor R. Lesser. CMPSCI 683 Fall 2010 Lecture 25: Learning 4 Victor R. Lesser CMPSCI 683 Fall 2010 Final Exam Information Final EXAM on Th 12/16 at 4:00pm in Lederle Grad Res Ctr Rm A301 2 Hours but obviously you can leave early! Open Book

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

Computing Proper Equilibria of Zero-Sum Games

Computing Proper Equilibria of Zero-Sum Games Computing Proper Equilibria of Zero-Sum Games Peter Bro Miltersen and Troels Bjerre Sørensen University of Aarhus, Denmark. Abstract. We show that a proper equilibrium of a matrix game can be found in

More information

epub WU Institutional Repository

epub WU Institutional Repository epub WU Institutional Repository Ulrich Berger Learning in games with strategic complementarities revisited Article (Submitted) Original Citation: Berger, Ulrich (2008) Learning in games with strategic

More information

Learning Equilibrium as a Generalization of Learning to Optimize

Learning Equilibrium as a Generalization of Learning to Optimize Learning Equilibrium as a Generalization of Learning to Optimize Dov Monderer and Moshe Tennenholtz Faculty of Industrial Engineering and Management Technion Israel Institute of Technology Haifa 32000,

More information

arxiv: v2 [cs.gt] 23 Jan 2019

arxiv: v2 [cs.gt] 23 Jan 2019 Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games Vojtěch Kovařík, Viliam Lisý vojta.kovarik@gmail.com, viliam.lisy@agents.fel.cvut.cz arxiv:1804.09045v2 [cs.gt]

More information

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms

Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Artificial Intelligence Review manuscript No. (will be inserted by the editor) Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Howard M. Schwartz Received:

More information

Practicable Robust Markov Decision Processes

Practicable Robust Markov Decision Processes Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)

More information

When to Ask for an Update: Timing in Strategic Communication. National University of Singapore June 5, 2018

When to Ask for an Update: Timing in Strategic Communication. National University of Singapore June 5, 2018 When to Ask for an Update: Timing in Strategic Communication Ying Chen Johns Hopkins University Atara Oliver Rice University National University of Singapore June 5, 2018 Main idea In many communication

More information

DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R I LAGGING ANCHOR ALGORITHM. Xiaosong Lu and Howard M. Schwartz

DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R I LAGGING ANCHOR ALGORITHM. Xiaosong Lu and Howard M. Schwartz International Journal of Innovative Computing, Information and Control ICIC International c 20 ISSN 349-498 Volume 7, Number, January 20 pp. 0 DECENTRALIZED LEARNING IN GENERAL-SUM MATRIX GAMES: AN L R

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Optimal Machine Strategies to Commit to in Two-Person Repeated Games

Optimal Machine Strategies to Commit to in Two-Person Repeated Games Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Optimal Machine Strategies to Commit to in Two-Person Repeated Games Song Zuo and Pingzhong Tang Institute for Interdisciplinary

More information

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti Game Theory, Evolutionary Dynamics, and Multi-Agent Learning Prof. Nicola Gatti (nicola.gatti@polimi.it) Game theory Game theory: basics Normal form Players Actions Outcomes Utilities Strategies Solutions

More information

Remote Estimation Games over Shared Networks

Remote Estimation Games over Shared Networks October st, 04 Remote Estimation Games over Shared Networks Marcos Vasconcelos & Nuno Martins marcos@umd.edu Dept. of Electrical and Computer Engineering Institute of Systems Research University of Maryland,

More information

Retrospective Spectrum Access Protocol: A Completely Uncoupled Learning Algorithm for Cognitive Networks

Retrospective Spectrum Access Protocol: A Completely Uncoupled Learning Algorithm for Cognitive Networks Retrospective Spectrum Access Protocol: A Completely Uncoupled Learning Algorithm for Cognitive Networks Marceau Coupechoux, Stefano Iellamo, Lin Chen + TELECOM ParisTech (INFRES/RMS) and CNRS LTCI + University

More information

Higher Order Beliefs in Dynamic Environments

Higher Order Beliefs in Dynamic Environments University of Pennsylvania Department of Economics June 22, 2008 Introduction: Higher Order Beliefs Global Games (Carlsson and Van Damme, 1993): A B A 0, 0 0, θ 2 B θ 2, 0 θ, θ Dominance Regions: A if

More information

Asymmetric Information Security Games 1/43

Asymmetric Information Security Games 1/43 Asymmetric Information Security Games Jeff S. Shamma with Lichun Li & Malachi Jones & IPAM Graduate Summer School Games and Contracts for Cyber-Physical Security 7 23 July 2015 Jeff S. Shamma Asymmetric

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement

More information

Practical exact algorithm for trembling-hand equilibrium refinements in games

Practical exact algorithm for trembling-hand equilibrium refinements in games Practical exact algorithm for trembling-hand equilibrium refinements in games Gabriele Farina Computer Science Department Carnegie Mellon University gfarina@cs.cmu.edu Nicola Gatti DEIB Politecnico di

More information

Factored State Spaces 3/2/178

Factored State Spaces 3/2/178 Factored State Spaces 3/2/178 Converting POMDPs to MDPs In a POMDP: Action + observation updates beliefs Value is a function of beliefs. Instead we can view this as an MDP where: There is a state for every

More information

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam: Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Intrinsic and Extrinsic Motivation

Intrinsic and Extrinsic Motivation Intrinsic and Extrinsic Motivation Roland Bénabou Jean Tirole. Review of Economic Studies 2003 Bénabou and Tirole Intrinsic and Extrinsic Motivation 1 / 30 Motivation Should a child be rewarded for passing

More information

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes

CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes CSL302/612 Artificial Intelligence End-Semester Exam 120 Minutes Name: Roll Number: Please read the following instructions carefully Ø Calculators are allowed. However, laptops or mobile phones are not

More information