Information Gathering and Reward Exploitation of Subgoals for P

Size: px
Start display at page:

Download "Information Gathering and Reward Exploitation of Subgoals for P"

Transcription

1 Information Gathering and Reward Exploitation of Subgoals for POMDPs Hang Ma and Joelle Pineau McGill University AAAI January 27, 2015

2

3 POMDPs Background A partially observable Markov decision process (POMDP) is a tuple S, A, Ω, T, O, R, b 0, γ S is a finite set of states, A is a finite set of actions, Ω is a finite set of observations; T (s, a, s ) = p(s s, a) is the transition function that maps each state and action to a probability distribution over states; O(s, a, o) = p(o s, a) is the observation function that maps a state and an action to a probability distribution over possible observations; R(s, a) is the reward function, b 0 (s) is the initial belief state, and γ is the discount factor.

4 Beliefs Background A belief state b B is a sufficient statistic for the history, and is updated after taking action a and receiving observation o as follows: b a,o (s ) = O(s, a, o) s S T (s, a, s )b(s), (1) p(o a, b) where p(o a, b) = s S b(s) s S T (s, a, s )O(s, a, o) is a normalizing factor.

5 Policies Background A policy is a mapping from the current belief state to an action. A value function V π (b) specifies the expected reward gained starting from b followed by policy π: V π (b) = b(s)r(s, π(b)) + γ p(o b, π(b))v π (b π(b),o ). s S o Ω

6 POMDP Planning Background Find an optimal policy that maximizes its value function.

7 Challenges Background Computational complexity: We must reason in a (n 1)-dimensional continuous belief space (the curse of dimensionality). Complexity also grows fast with the length of planning horizon (the curse of history).

8 Challenges Background Computational complexity: We must reason in a (n 1)-dimensional continuous belief space (the curse of dimensionality). Complexity also grows fast with the length of planning horizon (the curse of history). Practical challenges: How to carry out intelligent information gathering in a large high dimensional belief space. How to scale up planning with long sequences of actions and delayed rewards.

9 b0 a1 a2 o1 o2 Figure: A belief tree rooted at b 0.

10 Existing Solvers Background PBVI (Pineau et al., 2003) HSVI/HSVI2 (Smith and Simmons, 2004, 2005) FSVI (Shani et al., 2007) SARSOP (Kurniawati et al., 2008) RTDP-Bel (Bonet and Geffner, 2009) MiGS (Kurniawati et al., 2011) Some online solvers, e.g. PUMA (He et al., 2010), POMCP (Silver and Veness, 2010)

11 Our Approach: Information Gathering and Reward Exploitation of Subgoals (IGRES) Main ideas:

12 Our Approach: Information Gathering and Reward Exploitation of Subgoals (IGRES) Main ideas: 1 Sample potentially important states as subgoals.

13 Our Approach: Information Gathering and Reward Exploitation of Subgoals (IGRES) Main ideas: 1 Sample potentially important states as subgoals. 2 Generate macro-actions (sequences of actions) for transitions to subgoals.

14 Our Approach: Information Gathering and Reward Exploitation of Subgoals (IGRES) Main ideas: 1 Sample potentially important states as subgoals. 2 Generate macro-actions (sequences of actions) for transitions to subgoals. 3 Generate macro-actions for information gathering and reward exploitation in the neighborhood of subgoals.

15 Capturing Important States 1 Identify importance heuristic functions with respect to: Immediate reward; Information gain. 2 Then a state is sampled as subgoal with probability of its importance.

16 Leverage the Structure 1 Specify distance between states with respect to the approximate similarity of their value. 2 Group all the states by the distance to the subgoals.

17 Sampling Belief States with Macro-actions 1 Associate each belief with an estimated current state. 2 Generate a macro-action towards the corresponding subgoal. 3 Gather information and exploit rewards around the subgoal. b 0 a 1 o 1 b 1 b l s 0 a 1 s 1 s l

18 Overview of IGRES

19 Conclusion Background Reduce planning complexity by using macro-actions. Sample potentially most useful beliefs (based on subgoal states). Capability of planning in large state space for a long horizon.

20 s 0 = tiger-left Pr(o = obs-left s = s 0, a = listen) = 0.85 Pr(o = obs-right s = s 0, a = listen) = 0.15 s 1 = tiger-left Pr(o = obs-left s = s 1, a = listen) = 0.15 Pr(o = obs-right s = s 1, a = listen) = 0.85 Reward Function: Opening the wrong door: -100 Opening the correct door: 10 Listen action: -1 Figure: Tiger Domain

21 Figure: RockSample Domain

22 Figure: Underwater Domain (a) Underwa instance of c shown on a a the possible i the robot. Th likely to star positions. D tinations. R O marks pl can fully loca

23 Figure: Homecare Domain

24 (a) Figure 3: Sampled milestones. (a) The initial roadmap. (b) The milestone. Figure: 3D-Navigation Domain Figure 4: 3-D navigation. See the caption of Figure 2 for the meanings of labels. 3-D Navigat (UAV) navigat ure 4), where environment is ger zones. T (x, y, z, θ p, θ y ), tion, θ p is the p The 3-D enviro horizo The pitch ang the yaw angle, entrance to th its initial state either rotate in adjacent cells motion is quite moves forward

25 Results of the benchmark domains. Return Time(s) Tiger S = 2, A = 3, Ω = 2 RTDP-Bel ± HSVI ± 0.09 <1 FSVI* N/A SARSOP ± MiGS ± IGRES (# subgoals: 1) ± Noisy-tiger S = 2, A = 3, Ω = 2 RTDP-Bel ± HSVI ± 0.04 <1 FSVI* N/A SARSOP ± MiGS ± IGRES (# subgoals: 1) ± RockSample(4,4) S = 257, A = 9, Ω = 2 RTDP-Bel ± HSVI ± 0.01 <1 FSVI ± SARSOP ± MiGS 8.57 ± IGRES (# subgoals: 4) ± RockSample(7,8) S = 12545, A = 13, Ω = 2 RTDP-Bel ± HSVI ± FSVI ± SARSOP ± MiGS 7.35 ± IGRES (# subgoals: 8) ± Hallway2 S = 92, A = 5, Ω = 17 RTDP-Bel ± HSVI ± FSVI ± SARSOP ± MiGs ± IGRES (# subgoals: 20) ± Tag S = 870, A = 5, Ω = 30 RTDP-Bel 6.32 ± HSVI ± FSVI 6.11 ± SARSOP 6.08 ± MiGS 6.00 ± IGRES (# subgoals: 20) 6.12 ± Underwater Navigation S = 2653, A = 6, Ω = 103 RTDP-Bel ± HSVI ± FSVI ± SARSOP ± MiGS ± IGRES (# subgoals: 20) ± Homecare S = 5408, A = 9, Ω = 928 RTDP-Bel** N/A HSVI ± FSVI*** N/A SARSOP ± MiGS ± IGRES (# subgoals: 30) ± D-Navigation S = 16969, A = 5, Ω = 14 RTDP-Bel ± HSVI ± FSVI** N/A SARSOP ± MiGS (2.977 ± 0.512) IGRES (# subgoals: 163) (3.272 ± 0.193) * ArrayIndexOutOfBoundsException is thrown. ** Solver is not able to compute a solution given large amount of computation time. *** OutOfMemoryError is thrown.

26 Results of the adaptive management of migratory birds problem. Return Time(s) Lesser sand plover S = 108, A = 3, Ω = 36 symbolic Perseus* IGRES (# subgoals: 18) ± Bar-tailed godwit b. S = 972, A = 5, Ω = 324 symbolic Perseus* IGRES (# subgoals: 36) ± Terek sandpiper S = 2916, A = 6, Ω = 972 symbolic Perseus* IGRES (# subgoals: 72) ± Bar-tailed godwit m. S = 2916, A = 6, Ω = 972 symbolic Perseus* IGRES (# subgoals: 72) ± Grey-tailed tattler S = 2916, A = 6, Ω = 972 symbolic Perseus* IGRES (# subgoals: 72) ± IGRES (# subgoals: 72) ± * Results from Nicol et al. (2013).

27 Figure: Performances on Grey-tailed tattler Domain

28 Thank You!

29 Reference I Background B. Bonet and H. Geffner. Solving POMDPs: RTDP-bel vs. point-based algorithms. In International Jont Conference on Artifical Intelligence, R. He, E. Brunskill, and N. Roy. Puma: Planning under uncertainty with macro-actions. In National Conference on Artificial Intelligence, H. Kurniawati, D. Hsu, and W. S. Lee. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and Systems, H. Kurniawati, Y. Du, D. Hsu, and W. S. Lee. Motion planning under uncertainty for robotic tasks with long time horizons. International Journal of Robotics Research, 30(3): , 2011.

30 Reference II Background S. Nicol, O. Buffet, T. Iwamura, and I. Chadès. Adaptive management of migratory birds under sea level rise. In International Joint Conference on Artificial Intelligence, J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence, G. Shani, R. I. Brafman, and S. E. Shimony. Forward search value iteration for POMDPs. In International Joint Conference on Artifical Intelligence, D. Silver and J. Veness. Monte-carlo planning in large POMDPs. In Advances in Neural Information Processing Systems, T. Smith and R. G. Simmons. Heuristic search value iteration for POMDPs. In Uncertainty in Artificial Intelligence, 2004.

31 Reference III Background T. Smith and R. G. Simmons. Point-based POMDP algorithms: Improved analysis and implementation. In Uncertainty in Artificial Intelligence, 2005.

Heuristic Search Value Iteration for POMDPs

Heuristic Search Value Iteration for POMDPs 520 SMITH & SIMMONS UAI 2004 Heuristic Search Value Iteration for POMDPs Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract We present a novel POMDP

More information

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs

Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu

More information

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Optimally Solving Dec-POMDPs as Continuous-State MDPs Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI

More information

RL 14: Simplifications of POMDPs

RL 14: Simplifications of POMDPs RL 14: Simplifications of POMDPs Michael Herrmann University of Edinburgh, School of Informatics 04/03/2016 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Aircraft Collision Avoidance Using Monte Carlo Real-Time Belief Space Search. Travis Benjamin Wolf

Aircraft Collision Avoidance Using Monte Carlo Real-Time Belief Space Search. Travis Benjamin Wolf Aircraft Collision Avoidance Using Monte Carlo Real-Time Belief Space Search by Travis Benjamin Wolf B.S., Aerospace Engineering (2007) United States Naval Academy Submitted to the Department of Aeronautics

More information

RL 14: POMDPs continued

RL 14: POMDPs continued RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally

More information

Bayes-Adaptive POMDPs 1

Bayes-Adaptive POMDPs 1 Bayes-Adaptive POMDPs 1 Stéphane Ross, Brahim Chaib-draa and Joelle Pineau SOCS-TR-007.6 School of Computer Science McGill University Montreal, Qc, Canada Department of Computer Science and Software Engineering

More information

Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN)

Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Alp Sardag and H.Levent Akin Bogazici University Department of Computer Engineering 34342 Bebek, Istanbul,

More information

European Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University

European Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University European Workshop on Reinforcement Learning 2013 A POMDP Tutorial Joelle Pineau McGill University (With many slides & pictures from Mauricio Araya-Lopez and others.) August 2013 Sequential decision-making

More information

Approximating Reachable Belief Points in POMDPs

Approximating Reachable Belief Points in POMDPs Approximating Reachable Belief Points in POMDPs Kyle Hollins Wray and Shlomo Zilberstein Abstract We propose an algorithm called σ-approximation that compresses the non-zero values of beliefs for partially

More information

Probabilistic robot planning under model uncertainty: an active learning approach

Probabilistic robot planning under model uncertainty: an active learning approach Probabilistic robot planning under model uncertainty: an active learning approach Robin JAULMES, Joelle PINEAU and Doina PRECUP School of Computer Science McGill University Montreal, QC CANADA H3A 2A7

More information

Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty

Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Stéphane Ross School of Computer Science McGill University, Montreal (Qc), Canada, H3A 2A7 stephane.ross@mail.mcgill.ca

More information

Machine Learning for Sustainable Development and Biological Conservation

Machine Learning for Sustainable Development and Biological Conservation Machine Learning for Sustainable Development and Biological Conservation Tom Dietterich Distinguished Professor, Oregon State University President, Association for the Advancement of Artificial Intelligence

More information

Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs Liam MacDermed College of Computing Georgia Institute of Technology Atlanta, GA 30332 liam@cc.gatech.edu Charles L. Isbell College

More information

Importance Sampling for Online Planning under Uncertainty

Importance Sampling for Online Planning under Uncertainty APPEARED IN Proc. Int. Workshop on the Algorithmic Foundations of Robotics (WAFR), 016 Importance Sampling for Online Planning under Uncertainty Yuanfu Luo, Haoyu Bai, David Hsu, and Wee Sun Lee National

More information

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus

More information

Point-Based Value Iteration for Constrained POMDPs

Point-Based Value Iteration for Constrained POMDPs Point-Based Value Iteration for Constrained POMDPs Dongho Kim Jaesong Lee Kee-Eung Kim Department of Computer Science Pascal Poupart School of Computer Science IJCAI-2011 2011. 7. 22. Motivation goals

More information

Accelerated Vector Pruning for Optimal POMDP Solvers

Accelerated Vector Pruning for Optimal POMDP Solvers Accelerated Vector Pruning for Optimal POMDP Solvers Erwin Walraven and Matthijs T. J. Spaan Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands Abstract Partially Observable Markov

More information

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous Helicopter Flight via Reinforcement Learning Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy

More information

Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains

Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains Byung Kon Kang, Kee-Eung Kim KAIST, 373-1 Guseong-dong Yuseong-gu Daejeon 305-701, Korea Abstract While Partially

More information

Solving Risk-Sensitive POMDPs with and without Cost Observations

Solving Risk-Sensitive POMDPs with and without Cost Observations Solving Risk-Sensitive POMDPs with and without Cost Observations Ping Hou Department of Computer Science New Mexico State University Las Cruces, NM 88003, USA phou@cs.nmsu.edu William Yeoh Department of

More information

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University

More information

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty

A Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of

More information

Decision-theoretic approaches to planning, coordination and communication in multiagent systems

Decision-theoretic approaches to planning, coordination and communication in multiagent systems Decision-theoretic approaches to planning, coordination and communication in multiagent systems Matthijs Spaan Frans Oliehoek 2 Stefan Witwicki 3 Delft University of Technology 2 U. of Liverpool & U. of

More information

Prediction-Constrained POMDPs

Prediction-Constrained POMDPs Prediction-Constrained POMDPs Joseph Futoma Harvard SEAS Michael C. Hughes Dept of. Computer Science, Tufts University Abstract Finale Doshi-Velez Harvard SEAS We propose prediction-constrained (PC) training

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Dialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department

Dialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department Dialogue management: Parametric approaches to policy optimisation Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department 1 / 30 Dialogue optimisation as a reinforcement learning

More information

Markov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.

Markov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy. Page 1 Markov decision processes (MDP) CS 416 Artificial Intelligence Lecture 21 Making Complex Decisions Chapter 17 Initial State S 0 Transition Model T (s, a, s ) How does Markov apply here? Uncertainty

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic

Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract Real-time dynamic

More information

Planning in Partially Observable Domains with Fuzzy Epistemic States and Probabilistic Dynamics

Planning in Partially Observable Domains with Fuzzy Epistemic States and Probabilistic Dynamics Planning in Partially Observable Domains with Fuzzy Epistemic States and Probabilistic Dynamics Nicolas Drougard 1, Didier Dubois 2, Jean-Loup Farges 1, and Florent Teichteil-Königsbuch 1 1 Onera The French

More information

Optimizing Memory-Bounded Controllers for Decentralized POMDPs

Optimizing Memory-Bounded Controllers for Decentralized POMDPs Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003

More information

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets Pablo Samuel Castro pcastr@cs.mcgill.ca McGill University Joint work with: Doina Precup and Prakash

More information

Online Partial Conditional Plan Synthesis for POMDPs with Safe-Reachability Objectives

Online Partial Conditional Plan Synthesis for POMDPs with Safe-Reachability Objectives To appear in the proceedings of the 13th International Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018 Online Partial Conditional Plan Synthesis for POMDPs with Safe-Reachability Objectives

More information

Region-Based Dynamic Programming for Partially Observable Markov Decision Processes

Region-Based Dynamic Programming for Partially Observable Markov Decision Processes Region-Based Dynamic Programming for Partially Observable Markov Decision Processes Zhengzhu Feng Department of Computer Science University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Abstract

More information

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning

More information

Learning in non-stationary Partially Observable Markov Decision Processes

Learning in non-stationary Partially Observable Markov Decision Processes Learning in non-stationary Partially Observable Markov Decision Processes Robin JAULMES, Joelle PINEAU, Doina PRECUP McGill University, School of Computer Science, 3480 University St., Montreal, QC, Canada,

More information

Partially Observable Online Contingent Planning Using Landmark Heuristics

Partially Observable Online Contingent Planning Using Landmark Heuristics Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling Partially Observable Online Contingent Planning Using Landmark Heuristics Shlomi Maliah Information Systems

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:

Marks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam: Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,

More information

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Optimally Solving Dec-POMDPs as Continuous-State MDPs Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Steeve Dibangoye Inria / Université de Lorraine Nancy, France jilles.dibangoye@inria.fr Christopher Amato CSAIL / MIT Cambridge, MA, USA camato@csail.mit.edu

More information

RAO : an Algorithm for Chance-Constrained POMDP s

RAO : an Algorithm for Chance-Constrained POMDP s RAO : an Algorithm for Chance-Constrained POMDP s Pedro Santana and Sylvie Thiébaux + and Brian Williams Massachusetts Institute of Technology, CSAIL + The Australian National University & NICTA 32 Vassar

More information

Width and Complexity of Belief Tracking in Non-Deterministic Conformant and Contingent Planning

Width and Complexity of Belief Tracking in Non-Deterministic Conformant and Contingent Planning Width and Complexity of Belief Tracking in Non-Deterministic Conformant and Contingent Planning Blai Bonet Universidad Simón Bolívar Caracas, Venezuela bonet@ldc.usb.ve Hector Geffner ICREA & Universitat

More information

Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss

Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs Lorenzo Nardi and Cyrill Stachniss Navigation under uncertainty C B C B A A 2 `B` is the most likely position C B C B A A 3

More information

Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Symbolic Dynamic Programming for Continuous State and Observation POMDPs Symbolic Dynamic Programming for Continuous State and Observation POMDPs Zahra Zamani ANU & NICTA Canberra, Australia zahra.zamani@anu.edu.au Pascal Poupart U. of Waterloo Waterloo, Canada ppoupart@uwaterloo.ca

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Bayesian Mixture Modeling and Inference based Thompson Sampling in Monte-Carlo Tree Search

Bayesian Mixture Modeling and Inference based Thompson Sampling in Monte-Carlo Tree Search Bayesian Mixture Modeling and Inference based Thompson Sampling in Monte-Carlo Tree Search Aijun Bai Univ. of Sci. & Tech. of China baj@mail.ustc.edu.cn Feng Wu University of Southampton fw6e11@ecs.soton.ac.uk

More information

Planning Under Uncertainty II

Planning Under Uncertainty II Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov

More information

Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Symbolic Dynamic Programming for Continuous State and Observation POMDPs Symbolic Dynamic Programming for Continuous State and Observation POMDPs Zahra Zamani ANU & NICTA Canberra, Australia zahra.zamani@anu.edu.au Pascal Poupart U. of Waterloo Waterloo, Canada ppoupart@uwaterloo.ca

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Decayed Markov Chain Monte Carlo for Interactive POMDPs

Decayed Markov Chain Monte Carlo for Interactive POMDPs Decayed Markov Chain Monte Carlo for Interactive POMDPs Yanlin Han Piotr Gmytrasiewicz Department of Computer Science University of Illinois at Chicago Chicago, IL 60607 {yhan37,piotr}@uic.edu Abstract

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

Relational Partially Observable MDPs

Relational Partially Observable MDPs Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) elational Partially Observable MDPs Chenggang Wang and oni Khardon Department of Computer Science Tufts University

More information

Solving POMDPs with Continuous or Large Discrete Observation Spaces

Solving POMDPs with Continuous or Large Discrete Observation Spaces Solving POMDPs with Continuous or Large Discrete Observation Spaces Jesse Hoey Department of Computer Science University of Toronto Toronto, ON, M5S 3H5 jhoey@cs.toronto.edu Pascal Poupart School of Computer

More information

GMDPtoolbox: a Matlab library for solving Graph-based Markov Decision Processes

GMDPtoolbox: a Matlab library for solving Graph-based Markov Decision Processes GMDPtoolbox: a Matlab library for solving Graph-based Markov Decision Processes Marie-Josée Cros, Nathalie Peyrard, Régis Sabbadin UR MIAT, INRA Toulouse, France JFRB, Clermont Ferrand, 27-28 juin 2016

More information

Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs

Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs Bikramjit Banerjee 1, Jeremy Lyle 2, Landon Kraemer

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

Tutorial on Optimization for PartiallyObserved Markov Decision. Processes

Tutorial on Optimization for PartiallyObserved Markov Decision. Processes Tutorial on Optimization for PartiallyObserved Markov Decision Processes Salvatore Candido Dept. of Electrical and Computer Engineering University of Illinois Probability Theory Prerequisites A random

More information

POMDPs and Policy Gradients

POMDPs and Policy Gradients POMDPs and Policy Gradients MLSS 2006, Canberra Douglas Aberdeen Canberra Node, RSISE Building Australian National University 15th February 2006 Outline 1 Introduction What is Reinforcement Learning? Types

More information

Efficient Maximization in Solving POMDPs

Efficient Maximization in Solving POMDPs Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Chapter 16 Planning Based on Markov Decision Processes

Chapter 16 Planning Based on Markov Decision Processes Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until

More information

Learning in POMDPs with Monte Carlo Tree Search

Learning in POMDPs with Monte Carlo Tree Search Sammie Katt 1 Frans A. Oliehoek 2 Christopher Amato 1 Abstract The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult.

More information

arxiv: v3 [cs.ai] 26 Jun 2018

arxiv: v3 [cs.ai] 26 Jun 2018 Planning to Give Information in Partially Observed Domains with a Learned Weighted Entropy Model Rohan Chitnis Leslie Pack Kaelbling Tomás Lozano-Pérez MIT Computer Science and Artificial Intelligence

More information

Goal Recognition over POMDPs: Inferring the Intention of a POMDP Agent

Goal Recognition over POMDPs: Inferring the Intention of a POMDP Agent Goal Recognition over POMDPs: Inferring the Intention of a POMDP Agent Miquel Ramírez Universitat Pompeu Fabra 08018 Barcelona, SPAIN miquel.ramirez@upf.edu Hector Geffner ICREA & Universitat Pompeu Fabra

More information

Course on Automated Planning: Transformations

Course on Automated Planning: Transformations Course on Automated Planning: Transformations Hector Geffner ICREA & Universitat Pompeu Fabra Barcelona, Spain H. Geffner, Course on Automated Planning, Rome, 7/2010 1 AI Planning: Status The good news:

More information

Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation

Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation Blai Bonet 1 and Hector Geffner 2 1 Universidad Simón Boĺıvar 2 ICREA & Universitat Pompeu Fabra

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Learning What Information to Give in Partially Observed Domains

Learning What Information to Give in Partially Observed Domains Learning What Information to Give in Partially Observed Domains Rohan Chitnis Leslie Pack Kaelbling Tomás Lozano-Pérez MIT Computer Science and Artificial Intelligence Laboratory {ronuchit, lpk, tlp}@mit.edu

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION PING HOU. A dissertation submitted to the Graduate School

PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION PING HOU. A dissertation submitted to the Graduate School PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION BY PING HOU A dissertation submitted to the Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy Major Subject:

More information

Active Learning of MDP models

Active Learning of MDP models Active Learning of MDP models Mauricio Araya-López, Olivier Buffet, Vincent Thomas, and François Charpillet Nancy Université / INRIA LORIA Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex France

More information

Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments

Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments Miao Liu Laboratory for Information and Decision Systems Cambridge, MA 02139 miaoliu@mit.edu Christopher

More information

Optimal and Approximate Q-value Functions for Decentralized POMDPs

Optimal and Approximate Q-value Functions for Decentralized POMDPs Journal of Artificial Intelligence Research 32 (2008) 289 353 Submitted 09/07; published 05/08 Optimal and Approximate Q-value Functions for Decentralized POMDPs Frans A. Oliehoek Intelligent Systems Lab

More information

COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs

COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs Madison Clark-Turner Department of Computer Science University of New Hampshire Durham, NH 03824 mbc2004@cs.unh.edu Christopher Amato

More information

Bayes-Adaptive Simulation-based Search with Value Function Approximation

Bayes-Adaptive Simulation-based Search with Value Function Approximation Bayes-Adaptive Simulation-based Search with Value Function Approximation Arthur Guez,,2 Nicolas Heess 2 David Silver 2 eter Dayan aguez@google.com Gatsby Unit, UCL 2 Google DeepMind Abstract Bayes-adaptive

More information

Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents

Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested

More information

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *

Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms * Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms

More information

Bayesian Reinforcement Learning in Continuous POMDPs with Application to Robot Navigation

Bayesian Reinforcement Learning in Continuous POMDPs with Application to Robot Navigation 2008 IEEE International Conference on Robotics and Automation Pasadena, CA, USA, May 19-23, 2008 Bayesian Reinforcement Learning in Continuous POMDPs with Application to Robot Navigation Stéphane Ross

More information

A Decision-Theoretic Formalism for Belief-Optimal Reasoning

A Decision-Theoretic Formalism for Belief-Optimal Reasoning A Decision-Theoretic Formalism for Belief-Optimal Reasoning Kris Hauser School of Informatics and Computing, Indiana University Lindley Hall, Room 215 150 S. Woodlawn Ave. Bloomington, Indiana hauserk@cs.indiana.edu

More information

Efficient Learning in Linearly Solvable MDP Models

Efficient Learning in Linearly Solvable MDP Models Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Efficient Learning in Linearly Solvable MDP Models Ang Li Department of Computer Science, University of Minnesota

More information

Multi-Agent Online Planning with Communication

Multi-Agent Online Planning with Communication Multi-Agent Online Planning with Communication Feng Wu Department of Computer Science University of Sci. & Tech. of China Hefei, Anhui 230027 China wufeng@mail.ustc.edu.cn Shlomo Zilberstein Department

More information

Approximate Universal Artificial Intelligence

Approximate Universal Artificial Intelligence Approximate Universal Artificial Intelligence A Monte-Carlo AIXI Approximation Joel Veness Kee Siong Ng Marcus Hutter Dave Silver UNSW / NICTA / ANU / UoA September 8, 2010 General Reinforcement Learning

More information

Bayesian reinforcement learning and partially observable Markov decision processes November 6, / 24

Bayesian reinforcement learning and partially observable Markov decision processes November 6, / 24 and partially observable Markov decision processes Christos Dimitrakakis EPFL November 6, 2013 Bayesian reinforcement learning and partially observable Markov decision processes November 6, 2013 1 / 24

More information

arxiv: v2 [cs.ai] 9 Dec 2015

arxiv: v2 [cs.ai] 9 Dec 2015 Scaling POMPs For Selecting Sellers in E-markets Extended Version Athirai A. Irissappane Nanyang Technological University Singapore athirai1@e.ntu.edu.sg Frans A. Oliehoek University of Amsterdam University

More information

10 Robotic Exploration and Information Gathering

10 Robotic Exploration and Information Gathering NAVARCH/EECS 568, ROB 530 - Winter 2018 10 Robotic Exploration and Information Gathering Maani Ghaffari April 2, 2018 Robotic Information Gathering: Exploration and Monitoring In information gathering

More information

Lecture 3: Markov Decision Processes

Lecture 3: Markov Decision Processes Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian

More information

Towards Faster Planning with Continuous Resources in Stochastic Domains

Towards Faster Planning with Continuous Resources in Stochastic Domains Towards Faster Planning with Continuous Resources in Stochastic Domains Janusz Marecki and Milind Tambe Computer Science Department University of Southern California 941 W 37th Place, Los Angeles, CA 989

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

Decision Making As An Optimization Problem

Decision Making As An Optimization Problem Decision Making As An Optimization Problem Hala Mostafa 683 Lecture 14 Wed Oct 27 th 2010 DEC-MDP Formulation as a math al program Often requires good insight into the problem to get a compact well-behaved

More information

Dialogue as a Decision Making Process

Dialogue as a Decision Making Process Dialogue as a Decision Making Process Nicholas Roy Challenges of Autonomy in the Real World Wide range of sensors Noisy sensors World dynamics Adaptability Incomplete information Robustness under uncertainty

More information

arxiv: v1 [cs.ai] 14 Nov 2018

arxiv: v1 [cs.ai] 14 Nov 2018 Bayesian Reinforcement Learning in Factored POMDPs arxiv:1811.05612v1 [cs.ai] 14 Nov 2018 ABSTRACT Sammie Katt Northeastern University Bayesian approaches provide a principled solution to the explorationexploitation

More information