Optimally Solving Dec-POMDPs as Continuous-State MDPs

Size: px
Start display at page:

Download "Optimally Solving Dec-POMDPs as Continuous-State MDPs"

Transcription

1 Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI August 8, 2013 Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

2 Outline Outline 1 Background Overview Decentralized POMDPs Existing methods 2 Dec-POMDPs as continuous-state MDPs Overview Solving the occupancy MDP Exploiting multiagent structure 3 Experiments 4 Conclusion Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

3 Background Overview General overview Agents situated in a world, receiving information and choosing actions Uncertainty about outcomes and sensors Sequential domains Cooperative multi-agent Decision-theoretic approach Developing approaches that scale to real-world domains Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

4 Background Overview Cooperative multiagent problems Each agent s choice affects all others, but must be made using only local information Communication may be costly, slow or noisy Domains of interest robotics, disaster response, networks,... Surveillance Area Communication Area Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, 2013 Base Area 4 / 20

5 Background Decentralized POMDPs Multi-Agent Decision Making Under Uncertainty Decentralized partially observable Markov decision process (Dec-POMDP) Sequential decision-making At each stage, each agent takes an action and receives: A local observation A joint immediate reward a 1 r o 1 a n Environment o n Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

6 Background Decentralized POMDPs Multi-Agent Decision Making Under Uncertainty Dec-POMDP definition Dec-POMDP I, S, {A i }, {Z i }, p, r, o, b 0, T I, a finite set of agents S, a finite set of states A i, each agent s finite set of actions Z i, each agent s finite set of observations p, the state transition model: Pr(s s, a) o, the observation model: Pr( o s, a) r, the reward model: R(s, a) b 0, initial state distribution T, planning horizon Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

7 Dec-POMDP solutions Background Decentralized POMDPs History θi t = ai 0, oi 1,..., a t 1 i, oi t Local policy: each agent maps histories to actions, π i : Θ i A i State is unknown, so beneficial to remember history π i, a sequence of decision rules π i = πi 0,..., π T 1 i mapping histories to actions, πi t (θi t ) = a i Joint policy π = π 1,..., π n with individual (local) agent policies π i Goal is to maximize expected cumulative reward over a finite horizon a 1 o 1 o 2 a 1 o 1 o 2 a 2 a 3 a 2 a 3 o 1 o 2 o 1 o 2 o 1 o 2 o 1 o 2 a 3 a 2 a 1 a 1 a 3 a 2 a 1 a 1 Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

8 POMDPs Background a o,r Decentralized POMDPs Environment Subclass of Dec-POMDPs with only one agent Agent maintain s belief state (distributions over states) Policy = mapping from histories or belief states π : B A Can solve a POMDP as a continuous-state belief MDP V π (b) = R(b, a) + o Pr(b b, a, o) Pr(o b, a)v π (b ) Structure: piecewise linear convex (PWLC) value function s 1 s 2 S Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

9 Background Decentralized POMDPs Example: 2-Agent Navigation Meeting in a grid States: grid cell pairs Actions: move,,,, stay Transitions: noisy Observations: red lines Rewards: negative unless sharing the same square Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

10 Background Decentralized POMDPs Challenges in solving Dec-POMDPs Partial observability makes the problem difficult to solve No common state estimate (centralized belief state) or concise sufficient statistic Each agent depends on the others Can t directly transform Dec-POMDPs into a continuous-state MDP from a single agent s perspective Therefore, Dec-POMDPs are fundamentally different and more complex (NEXP instead of PSPACE) Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

11 Current methods Background Existing methods Assume an offline planning phase that is centralized Generate explicit policy representations (trees) for each agent Search bottom up (DP) or top down (heuristic search) Often use game-theoretic ideas from the perspective of a single agent Search in the space of policies for the optimal set Top down Bottom up t=0 a1 a1 o1 o2 o1 o2 t=1 a1 a2 a1 a2 o1 o2 o1 o2 o1 o2 o1 o2 t=2 a1 a2 a2 a2 a1 a2 a2 a2 old new Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

12 Dec-POMDPs as continuous-state MDPs Overview Overview of our approach Current methods don t take full advantage of centralized planning phase Overview Push common information into an occupancy state Move local information into action selection as decision rules Formalize Dec-POMDPs as continuous-state MDPs with a PWLC value function Exploit multiagent structure in representation, making it scalable This doesn t use explicit policy representations or construct policies from a single agent s perspective Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

13 Dec-POMDPs as continuous-state MDPs Overview Centralized Sufficient Statistic Policy π, sequence of decentralized decision rules, π = π 0,..., π T 1 Joint history θ t = θ t 1,..., θ t n, with π t (θ t ) = a 1,..., a n a3 a2 o1 a1 o2 a3 o1 o2 o1 o2 a2 a1 a1 a3 a2 o1 a1 o2 a3 o1 o2 o1 o2 a2 a1 a1 An occupancy state is a distribution η(s, θ t ) = Pr(s, θ t π 0:t 1, b 0 ) The occupancy state is a sufficient statistic: Can optimize future policy π t:t over η rather than initial belief and past joint policies Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

14 Dec-POMDPs as continuous-state MDPs Overview Dec-POMDPs as continuous-state MDPs Occupancy state η t (s, θ t ) = Pr(s, θ t π 0:t 1, η 0 ) with η 0 = b 0 Transform Dec-POMDP into a continuous-state MDP s MDP : η a MDP : π t (decentralized decision rules) T MDP : Pr(η t π t 1, η t 1 ) Deterministic with P(η t, π t ) = η t+1 R MDP : s,θ t η t (s, θ t )R(s, π t (θ t )) Centralized sufficient statistic (the occupancy state) Decision rules ensure decentralization Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

15 Dec-POMDPs as continuous-state MDPs Overview Piecewise linear convexity Bellman optimality operator: V t (η t ) = max π t D R MDP(η t, π t ) + V t+1(p(η t, π t )) 1- Operator preserves PWLC property (piecewise linearity and convexity) 2- R MDP (η t, π t ) is linear PWLC value function SxΘ t POMDP algorithms can be used! Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

16 Dec-POMDPs as continuous-state MDPs Solving the occupancy MDP Solving the occupancy MDP Feature-based heuristic search value iteration (FB-HSVI) Based on heuristic search value iteration (Smith and Simmons, UAI 04) Sample occupancy distributions starting from the initial occupancy Update upper bounds based on decision rules (on the way down) Update lower bounds (on the way back up) Stop when bounds converge for initial occupancy υ(η) η, β lower bound Λ = {β k } k β 0 β1 β 2 η ῡ(η) 0.2 v v2 upper bound Γ = {(η k, v k )} k (η 1, v 1 ) (η 2, v 2 ) η = 0.2 η η 2 Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

17 Scaling up Dec-POMDPs as continuous-state MDPs Exploiting multiagent structure The occupancy MDP has very large action and state spaces Two key ideas to deal with these combinatorial explosions: 1 State reduction through history compression Compress histories of the same length (Oliehoek et al., JAIR 13) Reduce history length without loss 2 More efficient action selection Generating a greedy decision rule for an occupancy state as a weighted constraint satisfaction problem Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

18 Experiments Experiments Tested 3 versions of our algorithm Algorithm 0: HSVI with occupancy MDP Algorithm 1: HSVI with efficient action selection Algorithm 2: HSVI with efficient action selection + feature-based state space Comparison algorithms Forward search: GMAA*-ICE (Spaan et al., IJCAI 2011) Dynamic programming: IPG (Amato et al., ICAPS 2009), LPC (Boularias and Chaib-draa, ICAPS 2008) Optimization: MILP (Aras and Dutech, JAIR 2010) Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

19 Experiments Experiments Optimal υ within ε = 0.01 The multi-agent tiger problem ( S =2, Z =4, A =9, K =3) T MILP LPC IPG ICE FB-HSVI( ) ( 0 ) The recycling-robot problem ( S =4, Z =4, A =9, K =1) The mars-rovers problem ( S =256, Z =81, A =36, K =3) Table 2: Experiments comparing the computation times (in Time and value on benchmarks Blank space = algorithm over time (200s) Red for fastest and previously unsolvable horizons K is the largest history window Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20 used

20 Conclusion Conclusion Summary Dec-POMDPs are powerful multiagent models Formulated Dec-POMDPs as continuous-state MDPs with PWLC value function POMDP (and continuous MDP) methods can now be applied Can also take advantage of multiagent structure in the problem Our approach shows significantly improved scalability Future work Approximate solutions (bounds on the solution quality) More concise statistics Subclasses like TI Dec-MDPs in our AAMAS-13 paper Just observation histories as in Oliehoek, IJCAI 13 Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Optimally Solving Dec-POMDPs as Continuous-State MDPs Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Steeve Dibangoye Inria / Université de Lorraine Nancy, France jilles.dibangoye@inria.fr Christopher Amato CSAIL / MIT Cambridge, MA, USA camato@csail.mit.edu

More information

Producing Efficient Error-bounded Solutions for Transition Independent Decentralized MDPs

Producing Efficient Error-bounded Solutions for Transition Independent Decentralized MDPs Producing Efficient Error-bounded Solutions for Transition Independent Decentralized MDPs Jilles S. Dibangoye INRIA Loria, Campus Scientifique Vandœuvre-lès-Nancy, France jilles.dibangoye@inria.fr Christopher

More information

Decentralized Decision Making!

Decentralized Decision Making! Decentralized Decision Making! in Partially Observable, Uncertain Worlds Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst Joint work with Martin Allen, Christopher

More information

Optimizing Memory-Bounded Controllers for Decentralized POMDPs

Optimizing Memory-Bounded Controllers for Decentralized POMDPs Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003

More information

Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms

Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms Jilles Steeve Dibangoye, Christopher Amato, Olivier Buffet, François Charpillet To cite this version: Jilles Steeve Dibangoye,

More information

Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs Liam MacDermed College of Computing Georgia Institute of Technology Atlanta, GA 30332 liam@cc.gatech.edu Charles L. Isbell College

More information

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu

More information

Decision-theoretic approaches to planning, coordination and communication in multiagent systems

Decision-theoretic approaches to planning, coordination and communication in multiagent systems Decision-theoretic approaches to planning, coordination and communication in multiagent systems Matthijs Spaan Frans Oliehoek 2 Stefan Witwicki 3 Delft University of Technology 2 U. of Liverpool & U. of

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

Optimal and Approximate Q-value Functions for Decentralized POMDPs

Optimal and Approximate Q-value Functions for Decentralized POMDPs Journal of Artificial Intelligence Research 32 (2008) 289 353 Submitted 09/07; published 05/08 Optimal and Approximate Q-value Functions for Decentralized POMDPs Frans A. Oliehoek Intelligent Systems Lab

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

RL 14: POMDPs continued

RL 14: POMDPs continued RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Information Gathering and Reward Exploitation of Subgoals for P

Information Gathering and Reward Exploitation of Subgoals for P Information Gathering and Reward Exploitation of Subgoals for POMDPs Hang Ma and Joelle Pineau McGill University AAAI January 27, 2015 http://www.cs.washington.edu/ai/mobile_robotics/mcl/animations/global-floor.gif

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

IAS. Dec-POMDPs as Non-Observable MDPs. Frans A. Oliehoek 1 and Christopher Amato 2

IAS. Dec-POMDPs as Non-Observable MDPs. Frans A. Oliehoek 1 and Christopher Amato 2 Universiteit van Amsterdam IAS technical report IAS-UVA-14-01 Dec-POMDPs as Non-Observable MDPs Frans A. Oliehoek 1 and Christopher Amato 2 1 Intelligent Systems Laboratory Amsterdam, University of Amsterdam.

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty

Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Stéphane Ross School of Computer Science McGill University, Montreal (Qc), Canada, H3A 2A7 stephane.ross@mail.mcgill.ca

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

European Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University

European Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University European Workshop on Reinforcement Learning 2013 A POMDP Tutorial Joelle Pineau McGill University (With many slides & pictures from Mauricio Araya-Lopez and others.) August 2013 Sequential decision-making

More information

arxiv: v2 [cs.ai] 20 Dec 2014

arxiv: v2 [cs.ai] 20 Dec 2014 Scalable Planning and Learning for Multiagent POMDPs: Extended Version arxiv:1404.1140v2 [cs.ai] 20 Dec 2014 Christopher Amato CSAIL, MIT Cambridge, MA 02139 camato@csail.mit.edu Frans A. Oliehoek Informatics

More information

Multi-Agent Online Planning with Communication

Multi-Agent Online Planning with Communication Multi-Agent Online Planning with Communication Feng Wu Department of Computer Science University of Sci. & Tech. of China Hefei, Anhui 230027 China wufeng@mail.ustc.edu.cn Shlomo Zilberstein Department

More information

Tree-based Solution Methods for Multiagent POMDPs with Delayed Communication

Tree-based Solution Methods for Multiagent POMDPs with Delayed Communication Tree-based Solution Methods for Multiagent POMDPs with Delayed Communication Frans A. Oliehoek MIT CSAIL / Maastricht University Maastricht, The Netherlands frans.oliehoek@maastrichtuniversity.nl Matthijs

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs

Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs Bikramjit Banerjee 1, Jeremy Lyle 2, Landon Kraemer

More information

Heuristic Search Value Iteration for POMDPs

Heuristic Search Value Iteration for POMDPs 520 SMITH & SIMMONS UAI 2004 Heuristic Search Value Iteration for POMDPs Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract We present a novel POMDP

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Markov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.

Markov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy. Page 1 Markov decision processes (MDP) CS 416 Artificial Intelligence Lecture 21 Making Complex Decisions Chapter 17 Initial State S 0 Transition Model T (s, a, s ) How does Markov apply here? Uncertainty

More information

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Accelerated Vector Pruning for Optimal POMDP Solvers

Accelerated Vector Pruning for Optimal POMDP Solvers Accelerated Vector Pruning for Optimal POMDP Solvers Erwin Walraven and Matthijs T. J. Spaan Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands Abstract Partially Observable Markov

More information

Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP

Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP Jiaying Shen Department of Computer Science University of Massachusetts Amherst, MA 0003-460, USA jyshen@cs.umass.edu

More information

Probabilistic Planning. George Konidaris

Probabilistic Planning. George Konidaris Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t

More information

Bayes-Adaptive POMDPs 1

Bayes-Adaptive POMDPs 1 Bayes-Adaptive POMDPs 1 Stéphane Ross, Brahim Chaib-draa and Joelle Pineau SOCS-TR-007.6 School of Computer Science McGill University Montreal, Qc, Canada Department of Computer Science and Software Engineering

More information

Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs

Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs Bikramjit Banerjee School of Computing The University of Southern Mississippi Hattiesburg, MS 39402 Bikramjit.Banerjee@usm.edu

More information

Factored State Spaces 3/2/178

Factored State Spaces 3/2/178 Factored State Spaces 3/2/178 Converting POMDPs to MDPs In a POMDP: Action + observation updates beliefs Value is a function of beliefs. Instead we can view this as an MDP where: There is a state for every

More information

Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping

Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping Pradeep Varakantham, Jun-young Kwak, Matthew Taylor, Janusz Marecki, Paul Scerri, Milind Tambe University of Southern California,

More information

Discrete planning (an introduction)

Discrete planning (an introduction) Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments

Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments Miao Liu Laboratory for Information and Decision Systems Cambridge, MA 02139 miaoliu@mit.edu Christopher

More information

A Concise Introduction to Decentralized POMDPs

A Concise Introduction to Decentralized POMDPs Frans A. Oliehoek & Christopher Amato A Concise Introduction to Decentralized POMDPs Author version July 6, 2015 Springer This is the authors pre-print version that does not include corrections that were

More information

Decision Making As An Optimization Problem

Decision Making As An Optimization Problem Decision Making As An Optimization Problem Hala Mostafa 683 Lecture 14 Wed Oct 27 th 2010 DEC-MDP Formulation as a math al program Often requires good insight into the problem to get a compact well-behaved

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains

Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains Byung Kon Kang, Kee-Eung Kim KAIST, 373-1 Guseong-dong Yuseong-gu Daejeon 305-701, Korea Abstract While Partially

More information

Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents

Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping

Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping In The AAMAS workshop on Multi-agent Sequential Decision-Making in Uncertain Domains (MSDM-09), Budapest, Hungary, May 2009. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Planning by Probabilistic Inference

Planning by Probabilistic Inference Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs

COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs Madison Clark-Turner Department of Computer Science University of New Hampshire Durham, NH 03824 mbc2004@cs.unh.edu Christopher Amato

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Efficient Maximization in Solving POMDPs

Efficient Maximization in Solving POMDPs Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University

More information

An Analytic Solution to Discrete Bayesian Reinforcement Learning

An Analytic Solution to Discrete Bayesian Reinforcement Learning An Analytic Solution to Discrete Bayesian Reinforcement Learning Pascal Poupart (U of Waterloo) Nikos Vlassis (U of Amsterdam) Jesse Hoey (U of Toronto) Kevin Regan (U of Waterloo) 1 Motivation Automated

More information

Partially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague

Partially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague Partially observable Markov decision processes Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

Reinforcement Learning. Machine Learning, Fall 2010

Reinforcement Learning. Machine Learning, Fall 2010 Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis

Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis Journal of Artificial Intelligence Research 22 (2004) 143-174 Submitted 01/04; published 11/04 Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis Claudia V. Goldman Shlomo

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Planning Under Uncertainty II

Planning Under Uncertainty II Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov

More information

Learning to Act in Decentralized Partially Observable MDPs

Learning to Act in Decentralized Partially Observable MDPs Learning to Act in Decentralized Partially Observable MDPs Jilles Steeve Dibangoye, Olivier Buffet To cite this version: Jilles Steeve Dibangoye, Olivier Buffet. Learning to Act in Decentralized Partially

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Reinforcement Learning and NLP

Reinforcement Learning and NLP 1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value

More information

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Kamyar Azizzadenesheli U.C. Irvine Joint work with Prof. Anima Anandkumar and Dr. Alessandro Lazaric. Motivation +1 Agent-Environment

More information

Chapter 16 Planning Based on Markov Decision Processes

Chapter 16 Planning Based on Markov Decision Processes Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

SpringerBriefs in Intelligent Systems

SpringerBriefs in Intelligent Systems SpringerBriefs in Intelligent Systems Artificial Intelligence, Multiagent Systems, and Cognitive Robotics Series editors Gerhard Weiss, Maastricht University, Maastricht, The Netherlands Karl Tuyls, University

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic

Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract Real-time dynamic

More information

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague /doku.php/courses/a4b33zui/start pagenda Previous lecture: individual rational

More information

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus

More information

Planning and Acting in Partially Observable Stochastic Domains

Planning and Acting in Partially Observable Stochastic Domains Planning and Acting in Partially Observable Stochastic Domains Leslie Pack Kaelbling*, Michael L. Littman**, Anthony R. Cassandra*** *Computer Science Department, Brown University, Providence, RI, USA

More information

CS 4649/7649 Robot Intelligence: Planning

CS 4649/7649 Robot Intelligence: Planning CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides

More information

Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss

Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs Lorenzo Nardi and Cyrill Stachniss Navigation under uncertainty C B C B A A 2 `B` is the most likely position C B C B A A 3

More information

Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN)

Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Alp Sardag and H.Levent Akin Bogazici University Department of Computer Engineering 34342 Bebek, Istanbul,

More information

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm

CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.

More information

THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES

THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES MATHEMATICS OF OPERATIONS RESEARCH Vol. 27, No. 4, November 2002, pp. 819 840 Printed in U.S.A. THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES DANIEL S. BERNSTEIN, ROBERT GIVAN, NEIL

More information

Control Theory : Course Summary

Control Theory : Course Summary Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields

More information

RL 14: Simplifications of POMDPs

RL 14: Simplifications of POMDPs RL 14: Simplifications of POMDPs Michael Herrmann University of Edinburgh, School of Informatics 04/03/2016 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally

More information

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.

Final. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes. CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers

More information

Towards Faster Planning with Continuous Resources in Stochastic Domains

Towards Faster Planning with Continuous Resources in Stochastic Domains Towards Faster Planning with Continuous Resources in Stochastic Domains Janusz Marecki and Milind Tambe Computer Science Department University of Southern California 941 W 37th Place, Los Angeles, CA 989

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Fifth Workshop on Multi-agent Sequential Decision Making in Uncertain Domains (MSDM) Toronto, Canada

Fifth Workshop on Multi-agent Sequential Decision Making in Uncertain Domains (MSDM) Toronto, Canada Fifth Workshop on Multi-agent Sequential Decision Making in Uncertain Domains (MSDM) Toronto, Canada May 11, 2010 Organizing Committee: Matthijs Spaan Institute for Systems and Robotics, Instituto Superior

More information

Partially-Synchronized DEC-MDPs in Dynamic Mechanism Design

Partially-Synchronized DEC-MDPs in Dynamic Mechanism Design Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Partially-Synchronized DEC-MDPs in Dynamic Mechanism Design Sven Seuken, Ruggiero Cavallo, and David C. Parkes School of

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence 175 (2011 1757 1789 Contents lists available at ScienceDirect Artificial Intelligence www.elsevier.com/locate/artint Decentralized MDPs with sparse interactions Francisco S. Melo

More information

Active Learning of MDP models

Active Learning of MDP models Active Learning of MDP models Mauricio Araya-López, Olivier Buffet, Vincent Thomas, and François Charpillet Nancy Université / INRIA LORIA Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex France

More information

Learning in Zero-Sum Team Markov Games using Factored Value Functions

Learning in Zero-Sum Team Markov Games using Factored Value Functions Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer

More information

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning

More information

Point-Based Value Iteration for Constrained POMDPs

Point-Based Value Iteration for Constrained POMDPs Point-Based Value Iteration for Constrained POMDPs Dongho Kim Jaesong Lee Kee-Eung Kim Department of Computer Science Pascal Poupart School of Computer Science IJCAI-2011 2011. 7. 22. Motivation goals

More information

Introduction to Sequential Teams

Introduction to Sequential Teams Introduction to Sequential Teams Aditya Mahajan McGill University Joint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan MITACS Workshop on Fusion and Inference in Networks, 2011 Decentralized

More information