Optimally Solving Dec-POMDPs as Continuous-State MDPs
|
|
- Cornelia Hodges
- 6 years ago
- Views:
Transcription
1 Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI August 8, 2013 Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
2 Outline Outline 1 Background Overview Decentralized POMDPs Existing methods 2 Dec-POMDPs as continuous-state MDPs Overview Solving the occupancy MDP Exploiting multiagent structure 3 Experiments 4 Conclusion Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
3 Background Overview General overview Agents situated in a world, receiving information and choosing actions Uncertainty about outcomes and sensors Sequential domains Cooperative multi-agent Decision-theoretic approach Developing approaches that scale to real-world domains Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
4 Background Overview Cooperative multiagent problems Each agent s choice affects all others, but must be made using only local information Communication may be costly, slow or noisy Domains of interest robotics, disaster response, networks,... Surveillance Area Communication Area Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, 2013 Base Area 4 / 20
5 Background Decentralized POMDPs Multi-Agent Decision Making Under Uncertainty Decentralized partially observable Markov decision process (Dec-POMDP) Sequential decision-making At each stage, each agent takes an action and receives: A local observation A joint immediate reward a 1 r o 1 a n Environment o n Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
6 Background Decentralized POMDPs Multi-Agent Decision Making Under Uncertainty Dec-POMDP definition Dec-POMDP I, S, {A i }, {Z i }, p, r, o, b 0, T I, a finite set of agents S, a finite set of states A i, each agent s finite set of actions Z i, each agent s finite set of observations p, the state transition model: Pr(s s, a) o, the observation model: Pr( o s, a) r, the reward model: R(s, a) b 0, initial state distribution T, planning horizon Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
7 Dec-POMDP solutions Background Decentralized POMDPs History θi t = ai 0, oi 1,..., a t 1 i, oi t Local policy: each agent maps histories to actions, π i : Θ i A i State is unknown, so beneficial to remember history π i, a sequence of decision rules π i = πi 0,..., π T 1 i mapping histories to actions, πi t (θi t ) = a i Joint policy π = π 1,..., π n with individual (local) agent policies π i Goal is to maximize expected cumulative reward over a finite horizon a 1 o 1 o 2 a 1 o 1 o 2 a 2 a 3 a 2 a 3 o 1 o 2 o 1 o 2 o 1 o 2 o 1 o 2 a 3 a 2 a 1 a 1 a 3 a 2 a 1 a 1 Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
8 POMDPs Background a o,r Decentralized POMDPs Environment Subclass of Dec-POMDPs with only one agent Agent maintain s belief state (distributions over states) Policy = mapping from histories or belief states π : B A Can solve a POMDP as a continuous-state belief MDP V π (b) = R(b, a) + o Pr(b b, a, o) Pr(o b, a)v π (b ) Structure: piecewise linear convex (PWLC) value function s 1 s 2 S Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
9 Background Decentralized POMDPs Example: 2-Agent Navigation Meeting in a grid States: grid cell pairs Actions: move,,,, stay Transitions: noisy Observations: red lines Rewards: negative unless sharing the same square Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
10 Background Decentralized POMDPs Challenges in solving Dec-POMDPs Partial observability makes the problem difficult to solve No common state estimate (centralized belief state) or concise sufficient statistic Each agent depends on the others Can t directly transform Dec-POMDPs into a continuous-state MDP from a single agent s perspective Therefore, Dec-POMDPs are fundamentally different and more complex (NEXP instead of PSPACE) Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
11 Current methods Background Existing methods Assume an offline planning phase that is centralized Generate explicit policy representations (trees) for each agent Search bottom up (DP) or top down (heuristic search) Often use game-theoretic ideas from the perspective of a single agent Search in the space of policies for the optimal set Top down Bottom up t=0 a1 a1 o1 o2 o1 o2 t=1 a1 a2 a1 a2 o1 o2 o1 o2 o1 o2 o1 o2 t=2 a1 a2 a2 a2 a1 a2 a2 a2 old new Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
12 Dec-POMDPs as continuous-state MDPs Overview Overview of our approach Current methods don t take full advantage of centralized planning phase Overview Push common information into an occupancy state Move local information into action selection as decision rules Formalize Dec-POMDPs as continuous-state MDPs with a PWLC value function Exploit multiagent structure in representation, making it scalable This doesn t use explicit policy representations or construct policies from a single agent s perspective Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
13 Dec-POMDPs as continuous-state MDPs Overview Centralized Sufficient Statistic Policy π, sequence of decentralized decision rules, π = π 0,..., π T 1 Joint history θ t = θ t 1,..., θ t n, with π t (θ t ) = a 1,..., a n a3 a2 o1 a1 o2 a3 o1 o2 o1 o2 a2 a1 a1 a3 a2 o1 a1 o2 a3 o1 o2 o1 o2 a2 a1 a1 An occupancy state is a distribution η(s, θ t ) = Pr(s, θ t π 0:t 1, b 0 ) The occupancy state is a sufficient statistic: Can optimize future policy π t:t over η rather than initial belief and past joint policies Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
14 Dec-POMDPs as continuous-state MDPs Overview Dec-POMDPs as continuous-state MDPs Occupancy state η t (s, θ t ) = Pr(s, θ t π 0:t 1, η 0 ) with η 0 = b 0 Transform Dec-POMDP into a continuous-state MDP s MDP : η a MDP : π t (decentralized decision rules) T MDP : Pr(η t π t 1, η t 1 ) Deterministic with P(η t, π t ) = η t+1 R MDP : s,θ t η t (s, θ t )R(s, π t (θ t )) Centralized sufficient statistic (the occupancy state) Decision rules ensure decentralization Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
15 Dec-POMDPs as continuous-state MDPs Overview Piecewise linear convexity Bellman optimality operator: V t (η t ) = max π t D R MDP(η t, π t ) + V t+1(p(η t, π t )) 1- Operator preserves PWLC property (piecewise linearity and convexity) 2- R MDP (η t, π t ) is linear PWLC value function SxΘ t POMDP algorithms can be used! Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
16 Dec-POMDPs as continuous-state MDPs Solving the occupancy MDP Solving the occupancy MDP Feature-based heuristic search value iteration (FB-HSVI) Based on heuristic search value iteration (Smith and Simmons, UAI 04) Sample occupancy distributions starting from the initial occupancy Update upper bounds based on decision rules (on the way down) Update lower bounds (on the way back up) Stop when bounds converge for initial occupancy υ(η) η, β lower bound Λ = {β k } k β 0 β1 β 2 η ῡ(η) 0.2 v v2 upper bound Γ = {(η k, v k )} k (η 1, v 1 ) (η 2, v 2 ) η = 0.2 η η 2 Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
17 Scaling up Dec-POMDPs as continuous-state MDPs Exploiting multiagent structure The occupancy MDP has very large action and state spaces Two key ideas to deal with these combinatorial explosions: 1 State reduction through history compression Compress histories of the same length (Oliehoek et al., JAIR 13) Reduce history length without loss 2 More efficient action selection Generating a greedy decision rule for an occupancy state as a weighted constraint satisfaction problem Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
18 Experiments Experiments Tested 3 versions of our algorithm Algorithm 0: HSVI with occupancy MDP Algorithm 1: HSVI with efficient action selection Algorithm 2: HSVI with efficient action selection + feature-based state space Comparison algorithms Forward search: GMAA*-ICE (Spaan et al., IJCAI 2011) Dynamic programming: IPG (Amato et al., ICAPS 2009), LPC (Boularias and Chaib-draa, ICAPS 2008) Optimization: MILP (Aras and Dutech, JAIR 2010) Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
19 Experiments Experiments Optimal υ within ε = 0.01 The multi-agent tiger problem ( S =2, Z =4, A =9, K =3) T MILP LPC IPG ICE FB-HSVI( ) ( 0 ) The recycling-robot problem ( S =4, Z =4, A =9, K =1) The mars-rovers problem ( S =256, Z =81, A =36, K =3) Table 2: Experiments comparing the computation times (in Time and value on benchmarks Blank space = algorithm over time (200s) Red for fastest and previously unsolvable horizons K is the largest history window Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20 used
20 Conclusion Conclusion Summary Dec-POMDPs are powerful multiagent models Formulated Dec-POMDPs as continuous-state MDPs with PWLC value function POMDP (and continuous MDP) methods can now be applied Can also take advantage of multiagent structure in the problem Our approach shows significantly improved scalability Future work Approximate solutions (bounds on the solution quality) More concise statistics Subclasses like TI Dec-MDPs in our AAMAS-13 paper Just observation histories as in Oliehoek, IJCAI 13 Dibangoye, Amato, Buffet and Charpillet Optimally Solving Dec-POMDPs as MDPs August 8, / 20
Optimally Solving Dec-POMDPs as Continuous-State MDPs
Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Steeve Dibangoye Inria / Université de Lorraine Nancy, France jilles.dibangoye@inria.fr Christopher Amato CSAIL / MIT Cambridge, MA, USA camato@csail.mit.edu
More informationProducing Efficient Error-bounded Solutions for Transition Independent Decentralized MDPs
Producing Efficient Error-bounded Solutions for Transition Independent Decentralized MDPs Jilles S. Dibangoye INRIA Loria, Campus Scientifique Vandœuvre-lès-Nancy, France jilles.dibangoye@inria.fr Christopher
More informationDecentralized Decision Making!
Decentralized Decision Making! in Partially Observable, Uncertain Worlds Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst Joint work with Martin Allen, Christopher
More informationOptimizing Memory-Bounded Controllers for Decentralized POMDPs
Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003
More informationOptimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms
Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms Jilles Steeve Dibangoye, Christopher Amato, Olivier Buffet, François Charpillet To cite this version: Jilles Steeve Dibangoye,
More informationPoint Based Value Iteration with Optimal Belief Compression for Dec-POMDPs
Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs Liam MacDermed College of Computing Georgia Institute of Technology Atlanta, GA 30332 liam@cc.gatech.edu Charles L. Isbell College
More informationFinite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs
Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu
More informationDecision-theoretic approaches to planning, coordination and communication in multiagent systems
Decision-theoretic approaches to planning, coordination and communication in multiagent systems Matthijs Spaan Frans Oliehoek 2 Stefan Witwicki 3 Delft University of Technology 2 U. of Liverpool & U. of
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationOptimal and Approximate Q-value Functions for Decentralized POMDPs
Journal of Artificial Intelligence Research 32 (2008) 289 353 Submitted 09/07; published 05/08 Optimal and Approximate Q-value Functions for Decentralized POMDPs Frans A. Oliehoek Intelligent Systems Lab
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationRL 14: POMDPs continued
RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationInformation Gathering and Reward Exploitation of Subgoals for P
Information Gathering and Reward Exploitation of Subgoals for POMDPs Hang Ma and Joelle Pineau McGill University AAAI January 27, 2015 http://www.cs.washington.edu/ai/mobile_robotics/mcl/animations/global-floor.gif
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationIAS. Dec-POMDPs as Non-Observable MDPs. Frans A. Oliehoek 1 and Christopher Amato 2
Universiteit van Amsterdam IAS technical report IAS-UVA-14-01 Dec-POMDPs as Non-Observable MDPs Frans A. Oliehoek 1 and Christopher Amato 2 1 Intelligent Systems Laboratory Amsterdam, University of Amsterdam.
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationBayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty
Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Stéphane Ross School of Computer Science McGill University, Montreal (Qc), Canada, H3A 2A7 stephane.ross@mail.mcgill.ca
More informationAn Introduction to Markov Decision Processes. MDP Tutorial - 1
An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal
More informationEuropean Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University
European Workshop on Reinforcement Learning 2013 A POMDP Tutorial Joelle Pineau McGill University (With many slides & pictures from Mauricio Araya-Lopez and others.) August 2013 Sequential decision-making
More informationarxiv: v2 [cs.ai] 20 Dec 2014
Scalable Planning and Learning for Multiagent POMDPs: Extended Version arxiv:1404.1140v2 [cs.ai] 20 Dec 2014 Christopher Amato CSAIL, MIT Cambridge, MA 02139 camato@csail.mit.edu Frans A. Oliehoek Informatics
More informationMulti-Agent Online Planning with Communication
Multi-Agent Online Planning with Communication Feng Wu Department of Computer Science University of Sci. & Tech. of China Hefei, Anhui 230027 China wufeng@mail.ustc.edu.cn Shlomo Zilberstein Department
More informationTree-based Solution Methods for Multiagent POMDPs with Delayed Communication
Tree-based Solution Methods for Multiagent POMDPs with Delayed Communication Frans A. Oliehoek MIT CSAIL / Maastricht University Maastricht, The Netherlands frans.oliehoek@maastrichtuniversity.nl Matthijs
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationReinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies
Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More informationSample Bounded Distributed Reinforcement Learning for Decentralized POMDPs
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs Bikramjit Banerjee 1, Jeremy Lyle 2, Landon Kraemer
More informationHeuristic Search Value Iteration for POMDPs
520 SMITH & SIMMONS UAI 2004 Heuristic Search Value Iteration for POMDPs Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract We present a novel POMDP
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationMarkov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.
Page 1 Markov decision processes (MDP) CS 416 Artificial Intelligence Lecture 21 Making Complex Decisions Chapter 17 Initial State S 0 Transition Model T (s, a, s ) How does Markov apply here? Uncertainty
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationAccelerated Vector Pruning for Optimal POMDP Solvers
Accelerated Vector Pruning for Optimal POMDP Solvers Erwin Walraven and Matthijs T. J. Spaan Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands Abstract Partially Observable Markov
More informationMinimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP
Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP Jiaying Shen Department of Computer Science University of Massachusetts Amherst, MA 0003-460, USA jyshen@cs.umass.edu
More informationProbabilistic Planning. George Konidaris
Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t
More informationBayes-Adaptive POMDPs 1
Bayes-Adaptive POMDPs 1 Stéphane Ross, Brahim Chaib-draa and Joelle Pineau SOCS-TR-007.6 School of Computer Science McGill University Montreal, Qc, Canada Department of Computer Science and Software Engineering
More informationPruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs
Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs Bikramjit Banerjee School of Computing The University of Southern Mississippi Hattiesburg, MS 39402 Bikramjit.Banerjee@usm.edu
More informationFactored State Spaces 3/2/178
Factored State Spaces 3/2/178 Converting POMDPs to MDPs In a POMDP: Action + observation updates beliefs Value is a function of beliefs. Instead we can view this as an MDP where: There is a state for every
More informationExploiting Coordination Locales in Distributed POMDPs via Social Model Shaping
Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping Pradeep Varakantham, Jun-young Kwak, Matthew Taylor, Janusz Marecki, Paul Scerri, Milind Tambe University of Southern California,
More informationDiscrete planning (an introduction)
Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationLearning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments
Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments Miao Liu Laboratory for Information and Decision Systems Cambridge, MA 02139 miaoliu@mit.edu Christopher
More informationA Concise Introduction to Decentralized POMDPs
Frans A. Oliehoek & Christopher Amato A Concise Introduction to Decentralized POMDPs Author version July 6, 2015 Springer This is the authors pre-print version that does not include corrections that were
More informationDecision Making As An Optimization Problem
Decision Making As An Optimization Problem Hala Mostafa 683 Lecture 14 Wed Oct 27 th 2010 DEC-MDP Formulation as a math al program Often requires good insight into the problem to get a compact well-behaved
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationExploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains
Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains Byung Kon Kang, Kee-Eung Kim KAIST, 373-1 Guseong-dong Yuseong-gu Daejeon 305-701, Korea Abstract While Partially
More informationInteractive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationExploiting Coordination Locales in Distributed POMDPs via Social Model Shaping
In The AAMAS workshop on Multi-agent Sequential Decision-Making in Uncertain Domains (MSDM-09), Budapest, Hungary, May 2009. Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationPlanning by Probabilistic Inference
Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationCOG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs
COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs Madison Clark-Turner Department of Computer Science University of New Hampshire Durham, NH 03824 mbc2004@cs.unh.edu Christopher Amato
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationEfficient Maximization in Solving POMDPs
Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University
More informationAn Analytic Solution to Discrete Bayesian Reinforcement Learning
An Analytic Solution to Discrete Bayesian Reinforcement Learning Pascal Poupart (U of Waterloo) Nikos Vlassis (U of Amsterdam) Jesse Hoey (U of Toronto) Kevin Regan (U of Waterloo) 1 Motivation Automated
More informationPartially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague
Partially observable Markov decision processes Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationReinforcement Learning. Machine Learning, Fall 2010
Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationDecentralized Control of Cooperative Systems: Categorization and Complexity Analysis
Journal of Artificial Intelligence Research 22 (2004) 143-174 Submitted 01/04; published 11/04 Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis Claudia V. Goldman Shlomo
More informationReinforcement Learning
1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision
More informationPlanning Under Uncertainty II
Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov
More informationLearning to Act in Decentralized Partially Observable MDPs
Learning to Act in Decentralized Partially Observable MDPs Jilles Steeve Dibangoye, Olivier Buffet To cite this version: Jilles Steeve Dibangoye, Olivier Buffet. Learning to Act in Decentralized Partially
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions
More informationReinforcement Learning and NLP
1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value
More informationOpen Problem: Approximate Planning of POMDPs in the class of Memoryless Policies
Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Kamyar Azizzadenesheli U.C. Irvine Joint work with Prof. Anima Anandkumar and Dr. Alessandro Lazaric. Motivation +1 Agent-Environment
More informationChapter 16 Planning Based on Markov Decision Processes
Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationSpringerBriefs in Intelligent Systems
SpringerBriefs in Intelligent Systems Artificial Intelligence, Multiagent Systems, and Cognitive Robotics Series editors Gerhard Weiss, Maastricht University, Maastricht, The Netherlands Karl Tuyls, University
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationFocused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic
Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract Real-time dynamic
More informationSequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague
Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague /doku.php/courses/a4b33zui/start pagenda Previous lecture: individual rational
More informationSymbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning
Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus
More informationPlanning and Acting in Partially Observable Stochastic Domains
Planning and Acting in Partially Observable Stochastic Domains Leslie Pack Kaelbling*, Michael L. Littman**, Anthony R. Cassandra*** *Computer Science Department, Brown University, Providence, RI, USA
More informationCS 4649/7649 Robot Intelligence: Planning
CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides
More informationTowards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss
Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs Lorenzo Nardi and Cyrill Stachniss Navigation under uncertainty C B C B A A 2 `B` is the most likely position C B C B A A 3
More informationKalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN)
Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Alp Sardag and H.Levent Akin Bogazici University Department of Computer Engineering 34342 Bebek, Istanbul,
More informationCS 188 Introduction to Fall 2007 Artificial Intelligence Midterm
NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.
More informationTHE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES
MATHEMATICS OF OPERATIONS RESEARCH Vol. 27, No. 4, November 2002, pp. 819 840 Printed in U.S.A. THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES DANIEL S. BERNSTEIN, ROBERT GIVAN, NEIL
More informationControl Theory : Course Summary
Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields
More informationRL 14: Simplifications of POMDPs
RL 14: Simplifications of POMDPs Michael Herrmann University of Edinburgh, School of Informatics 04/03/2016 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationFinal. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.
CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers
More informationTowards Faster Planning with Continuous Resources in Stochastic Domains
Towards Faster Planning with Continuous Resources in Stochastic Domains Janusz Marecki and Milind Tambe Computer Science Department University of Southern California 941 W 37th Place, Los Angeles, CA 989
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationFifth Workshop on Multi-agent Sequential Decision Making in Uncertain Domains (MSDM) Toronto, Canada
Fifth Workshop on Multi-agent Sequential Decision Making in Uncertain Domains (MSDM) Toronto, Canada May 11, 2010 Organizing Committee: Matthijs Spaan Institute for Systems and Robotics, Instituto Superior
More informationPartially-Synchronized DEC-MDPs in Dynamic Mechanism Design
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Partially-Synchronized DEC-MDPs in Dynamic Mechanism Design Sven Seuken, Ruggiero Cavallo, and David C. Parkes School of
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationArtificial Intelligence
Artificial Intelligence 175 (2011 1757 1789 Contents lists available at ScienceDirect Artificial Intelligence www.elsevier.com/locate/artint Decentralized MDPs with sparse interactions Francisco S. Melo
More informationActive Learning of MDP models
Active Learning of MDP models Mauricio Araya-López, Olivier Buffet, Vincent Thomas, and François Charpillet Nancy Université / INRIA LORIA Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex France
More informationLearning in Zero-Sum Team Markov Games using Factored Value Functions
Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer
More informationCOMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati
COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning
More informationPoint-Based Value Iteration for Constrained POMDPs
Point-Based Value Iteration for Constrained POMDPs Dongho Kim Jaesong Lee Kee-Eung Kim Department of Computer Science Pascal Poupart School of Computer Science IJCAI-2011 2011. 7. 22. Motivation goals
More informationIntroduction to Sequential Teams
Introduction to Sequential Teams Aditya Mahajan McGill University Joint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan MITACS Workshop on Fusion and Inference in Networks, 2011 Decentralized
More information