Information Gathering and Reward Exploitation of Subgoals for P
|
|
- Osborne James
- 6 years ago
- Views:
Transcription
1 Information Gathering and Reward Exploitation of Subgoals for POMDPs Hang Ma and Joelle Pineau McGill University AAAI January 27, 2015
2
3 POMDPs Background A partially observable Markov decision process (POMDP) is a tuple S, A, Ω, T, O, R, b 0, γ S is a finite set of states, A is a finite set of actions, Ω is a finite set of observations; T (s, a, s ) = p(s s, a) is the transition function that maps each state and action to a probability distribution over states; O(s, a, o) = p(o s, a) is the observation function that maps a state and an action to a probability distribution over possible observations; R(s, a) is the reward function, b 0 (s) is the initial belief state, and γ is the discount factor.
4 Beliefs Background A belief state b B is a sufficient statistic for the history, and is updated after taking action a and receiving observation o as follows: b a,o (s ) = O(s, a, o) s S T (s, a, s )b(s), (1) p(o a, b) where p(o a, b) = s S b(s) s S T (s, a, s )O(s, a, o) is a normalizing factor.
5 Policies Background A policy is a mapping from the current belief state to an action. A value function V π (b) specifies the expected reward gained starting from b followed by policy π: V π (b) = b(s)r(s, π(b)) + γ p(o b, π(b))v π (b π(b),o ). s S o Ω
6 POMDP Planning Background Find an optimal policy that maximizes its value function.
7 Challenges Background Computational complexity: We must reason in a (n 1)-dimensional continuous belief space (the curse of dimensionality). Complexity also grows fast with the length of planning horizon (the curse of history).
8 Challenges Background Computational complexity: We must reason in a (n 1)-dimensional continuous belief space (the curse of dimensionality). Complexity also grows fast with the length of planning horizon (the curse of history). Practical challenges: How to carry out intelligent information gathering in a large high dimensional belief space. How to scale up planning with long sequences of actions and delayed rewards.
9 b0 a1 a2 o1 o2 Figure: A belief tree rooted at b 0.
10 Existing Solvers Background PBVI (Pineau et al., 2003) HSVI/HSVI2 (Smith and Simmons, 2004, 2005) FSVI (Shani et al., 2007) SARSOP (Kurniawati et al., 2008) RTDP-Bel (Bonet and Geffner, 2009) MiGS (Kurniawati et al., 2011) Some online solvers, e.g. PUMA (He et al., 2010), POMCP (Silver and Veness, 2010)
11 Our Approach: Information Gathering and Reward Exploitation of Subgoals (IGRES) Main ideas:
12 Our Approach: Information Gathering and Reward Exploitation of Subgoals (IGRES) Main ideas: 1 Sample potentially important states as subgoals.
13 Our Approach: Information Gathering and Reward Exploitation of Subgoals (IGRES) Main ideas: 1 Sample potentially important states as subgoals. 2 Generate macro-actions (sequences of actions) for transitions to subgoals.
14 Our Approach: Information Gathering and Reward Exploitation of Subgoals (IGRES) Main ideas: 1 Sample potentially important states as subgoals. 2 Generate macro-actions (sequences of actions) for transitions to subgoals. 3 Generate macro-actions for information gathering and reward exploitation in the neighborhood of subgoals.
15 Capturing Important States 1 Identify importance heuristic functions with respect to: Immediate reward; Information gain. 2 Then a state is sampled as subgoal with probability of its importance.
16 Leverage the Structure 1 Specify distance between states with respect to the approximate similarity of their value. 2 Group all the states by the distance to the subgoals.
17 Sampling Belief States with Macro-actions 1 Associate each belief with an estimated current state. 2 Generate a macro-action towards the corresponding subgoal. 3 Gather information and exploit rewards around the subgoal. b 0 a 1 o 1 b 1 b l s 0 a 1 s 1 s l
18 Overview of IGRES
19 Conclusion Background Reduce planning complexity by using macro-actions. Sample potentially most useful beliefs (based on subgoal states). Capability of planning in large state space for a long horizon.
20 s 0 = tiger-left Pr(o = obs-left s = s 0, a = listen) = 0.85 Pr(o = obs-right s = s 0, a = listen) = 0.15 s 1 = tiger-left Pr(o = obs-left s = s 1, a = listen) = 0.15 Pr(o = obs-right s = s 1, a = listen) = 0.85 Reward Function: Opening the wrong door: -100 Opening the correct door: 10 Listen action: -1 Figure: Tiger Domain
21 Figure: RockSample Domain
22 Figure: Underwater Domain (a) Underwa instance of c shown on a a the possible i the robot. Th likely to star positions. D tinations. R O marks pl can fully loca
23 Figure: Homecare Domain
24 (a) Figure 3: Sampled milestones. (a) The initial roadmap. (b) The milestone. Figure: 3D-Navigation Domain Figure 4: 3-D navigation. See the caption of Figure 2 for the meanings of labels. 3-D Navigat (UAV) navigat ure 4), where environment is ger zones. T (x, y, z, θ p, θ y ), tion, θ p is the p The 3-D enviro horizo The pitch ang the yaw angle, entrance to th its initial state either rotate in adjacent cells motion is quite moves forward
25 Results of the benchmark domains. Return Time(s) Tiger S = 2, A = 3, Ω = 2 RTDP-Bel ± HSVI ± 0.09 <1 FSVI* N/A SARSOP ± MiGS ± IGRES (# subgoals: 1) ± Noisy-tiger S = 2, A = 3, Ω = 2 RTDP-Bel ± HSVI ± 0.04 <1 FSVI* N/A SARSOP ± MiGS ± IGRES (# subgoals: 1) ± RockSample(4,4) S = 257, A = 9, Ω = 2 RTDP-Bel ± HSVI ± 0.01 <1 FSVI ± SARSOP ± MiGS 8.57 ± IGRES (# subgoals: 4) ± RockSample(7,8) S = 12545, A = 13, Ω = 2 RTDP-Bel ± HSVI ± FSVI ± SARSOP ± MiGS 7.35 ± IGRES (# subgoals: 8) ± Hallway2 S = 92, A = 5, Ω = 17 RTDP-Bel ± HSVI ± FSVI ± SARSOP ± MiGs ± IGRES (# subgoals: 20) ± Tag S = 870, A = 5, Ω = 30 RTDP-Bel 6.32 ± HSVI ± FSVI 6.11 ± SARSOP 6.08 ± MiGS 6.00 ± IGRES (# subgoals: 20) 6.12 ± Underwater Navigation S = 2653, A = 6, Ω = 103 RTDP-Bel ± HSVI ± FSVI ± SARSOP ± MiGS ± IGRES (# subgoals: 20) ± Homecare S = 5408, A = 9, Ω = 928 RTDP-Bel** N/A HSVI ± FSVI*** N/A SARSOP ± MiGS ± IGRES (# subgoals: 30) ± D-Navigation S = 16969, A = 5, Ω = 14 RTDP-Bel ± HSVI ± FSVI** N/A SARSOP ± MiGS (2.977 ± 0.512) IGRES (# subgoals: 163) (3.272 ± 0.193) * ArrayIndexOutOfBoundsException is thrown. ** Solver is not able to compute a solution given large amount of computation time. *** OutOfMemoryError is thrown.
26 Results of the adaptive management of migratory birds problem. Return Time(s) Lesser sand plover S = 108, A = 3, Ω = 36 symbolic Perseus* IGRES (# subgoals: 18) ± Bar-tailed godwit b. S = 972, A = 5, Ω = 324 symbolic Perseus* IGRES (# subgoals: 36) ± Terek sandpiper S = 2916, A = 6, Ω = 972 symbolic Perseus* IGRES (# subgoals: 72) ± Bar-tailed godwit m. S = 2916, A = 6, Ω = 972 symbolic Perseus* IGRES (# subgoals: 72) ± Grey-tailed tattler S = 2916, A = 6, Ω = 972 symbolic Perseus* IGRES (# subgoals: 72) ± IGRES (# subgoals: 72) ± * Results from Nicol et al. (2013).
27 Figure: Performances on Grey-tailed tattler Domain
28 Thank You!
29 Reference I Background B. Bonet and H. Geffner. Solving POMDPs: RTDP-bel vs. point-based algorithms. In International Jont Conference on Artifical Intelligence, R. He, E. Brunskill, and N. Roy. Puma: Planning under uncertainty with macro-actions. In National Conference on Artificial Intelligence, H. Kurniawati, D. Hsu, and W. S. Lee. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and Systems, H. Kurniawati, Y. Du, D. Hsu, and W. S. Lee. Motion planning under uncertainty for robotic tasks with long time horizons. International Journal of Robotics Research, 30(3): , 2011.
30 Reference II Background S. Nicol, O. Buffet, T. Iwamura, and I. Chadès. Adaptive management of migratory birds under sea level rise. In International Joint Conference on Artificial Intelligence, J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence, G. Shani, R. I. Brafman, and S. E. Shimony. Forward search value iteration for POMDPs. In International Joint Conference on Artifical Intelligence, D. Silver and J. Veness. Monte-carlo planning in large POMDPs. In Advances in Neural Information Processing Systems, T. Smith and R. G. Simmons. Heuristic search value iteration for POMDPs. In Uncertainty in Artificial Intelligence, 2004.
31 Reference III Background T. Smith and R. G. Simmons. Point-based POMDP algorithms: Improved analysis and implementation. In Uncertainty in Artificial Intelligence, 2005.
Heuristic Search Value Iteration for POMDPs
520 SMITH & SIMMONS UAI 2004 Heuristic Search Value Iteration for POMDPs Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract We present a novel POMDP
More informationFinite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs
Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu
More informationOptimally Solving Dec-POMDPs as Continuous-State MDPs
Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI
More informationRL 14: Simplifications of POMDPs
RL 14: Simplifications of POMDPs Michael Herrmann University of Edinburgh, School of Informatics 04/03/2016 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationAircraft Collision Avoidance Using Monte Carlo Real-Time Belief Space Search. Travis Benjamin Wolf
Aircraft Collision Avoidance Using Monte Carlo Real-Time Belief Space Search by Travis Benjamin Wolf B.S., Aerospace Engineering (2007) United States Naval Academy Submitted to the Department of Aeronautics
More informationRL 14: POMDPs continued
RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationBayes-Adaptive POMDPs 1
Bayes-Adaptive POMDPs 1 Stéphane Ross, Brahim Chaib-draa and Joelle Pineau SOCS-TR-007.6 School of Computer Science McGill University Montreal, Qc, Canada Department of Computer Science and Software Engineering
More informationKalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN)
Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Alp Sardag and H.Levent Akin Bogazici University Department of Computer Engineering 34342 Bebek, Istanbul,
More informationEuropean Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University
European Workshop on Reinforcement Learning 2013 A POMDP Tutorial Joelle Pineau McGill University (With many slides & pictures from Mauricio Araya-Lopez and others.) August 2013 Sequential decision-making
More informationApproximating Reachable Belief Points in POMDPs
Approximating Reachable Belief Points in POMDPs Kyle Hollins Wray and Shlomo Zilberstein Abstract We propose an algorithm called σ-approximation that compresses the non-zero values of beliefs for partially
More informationProbabilistic robot planning under model uncertainty: an active learning approach
Probabilistic robot planning under model uncertainty: an active learning approach Robin JAULMES, Joelle PINEAU and Doina PRECUP School of Computer Science McGill University Montreal, QC CANADA H3A 2A7
More informationBayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty
Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Stéphane Ross School of Computer Science McGill University, Montreal (Qc), Canada, H3A 2A7 stephane.ross@mail.mcgill.ca
More informationMachine Learning for Sustainable Development and Biological Conservation
Machine Learning for Sustainable Development and Biological Conservation Tom Dietterich Distinguished Professor, Oregon State University President, Association for the Advancement of Artificial Intelligence
More informationPoint Based Value Iteration with Optimal Belief Compression for Dec-POMDPs
Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs Liam MacDermed College of Computing Georgia Institute of Technology Atlanta, GA 30332 liam@cc.gatech.edu Charles L. Isbell College
More informationImportance Sampling for Online Planning under Uncertainty
APPEARED IN Proc. Int. Workshop on the Algorithmic Foundations of Robotics (WAFR), 016 Importance Sampling for Online Planning under Uncertainty Yuanfu Luo, Haoyu Bai, David Hsu, and Wee Sun Lee National
More informationSymbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning
Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus
More informationPoint-Based Value Iteration for Constrained POMDPs
Point-Based Value Iteration for Constrained POMDPs Dongho Kim Jaesong Lee Kee-Eung Kim Department of Computer Science Pascal Poupart School of Computer Science IJCAI-2011 2011. 7. 22. Motivation goals
More informationAccelerated Vector Pruning for Optimal POMDP Solvers
Accelerated Vector Pruning for Optimal POMDP Solvers Erwin Walraven and Matthijs T. J. Spaan Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands Abstract Partially Observable Markov
More informationAutonomous Helicopter Flight via Reinforcement Learning
Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy
More informationExploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains
Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains Byung Kon Kang, Kee-Eung Kim KAIST, 373-1 Guseong-dong Yuseong-gu Daejeon 305-701, Korea Abstract While Partially
More informationSolving Risk-Sensitive POMDPs with and without Cost Observations
Solving Risk-Sensitive POMDPs with and without Cost Observations Ping Hou Department of Computer Science New Mexico State University Las Cruces, NM 88003, USA phou@cs.nmsu.edu William Yeoh Department of
More informationReinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies
Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationDecision-theoretic approaches to planning, coordination and communication in multiagent systems
Decision-theoretic approaches to planning, coordination and communication in multiagent systems Matthijs Spaan Frans Oliehoek 2 Stefan Witwicki 3 Delft University of Technology 2 U. of Liverpool & U. of
More informationPrediction-Constrained POMDPs
Prediction-Constrained POMDPs Joseph Futoma Harvard SEAS Michael C. Hughes Dept of. Computer Science, Tufts University Abstract Finale Doshi-Velez Harvard SEAS We propose prediction-constrained (PC) training
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions
More informationDialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department
Dialogue management: Parametric approaches to policy optimisation Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department 1 / 30 Dialogue optimisation as a reinforcement learning
More informationMarkov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.
Page 1 Markov decision processes (MDP) CS 416 Artificial Intelligence Lecture 21 Making Complex Decisions Chapter 17 Initial State S 0 Transition Model T (s, a, s ) How does Markov apply here? Uncertainty
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationFocused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic
Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract Real-time dynamic
More informationPlanning in Partially Observable Domains with Fuzzy Epistemic States and Probabilistic Dynamics
Planning in Partially Observable Domains with Fuzzy Epistemic States and Probabilistic Dynamics Nicolas Drougard 1, Didier Dubois 2, Jean-Loup Farges 1, and Florent Teichteil-Königsbuch 1 1 Onera The French
More informationOptimizing Memory-Bounded Controllers for Decentralized POMDPs
Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003
More informationOn Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets
On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets Pablo Samuel Castro pcastr@cs.mcgill.ca McGill University Joint work with: Doina Precup and Prakash
More informationOnline Partial Conditional Plan Synthesis for POMDPs with Safe-Reachability Objectives
To appear in the proceedings of the 13th International Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018 Online Partial Conditional Plan Synthesis for POMDPs with Safe-Reachability Objectives
More informationRegion-Based Dynamic Programming for Partially Observable Markov Decision Processes
Region-Based Dynamic Programming for Partially Observable Markov Decision Processes Zhengzhu Feng Department of Computer Science University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Abstract
More informationCOMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati
COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning Hanna Kurniawati Today } What is machine learning? } Where is it used? } Types of machine learning
More informationLearning in non-stationary Partially Observable Markov Decision Processes
Learning in non-stationary Partially Observable Markov Decision Processes Robin JAULMES, Joelle PINEAU, Doina PRECUP McGill University, School of Computer Science, 3480 University St., Montreal, QC, Canada,
More informationPartially Observable Online Contingent Planning Using Landmark Heuristics
Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling Partially Observable Online Contingent Planning Using Landmark Heuristics Shlomi Maliah Information Systems
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationMarks. bonus points. } Assignment 1: Should be out this weekend. } Mid-term: Before the last lecture. } Mid-term deferred exam:
Marks } Assignment 1: Should be out this weekend } All are marked, I m trying to tally them and perhaps add bonus points } Mid-term: Before the last lecture } Mid-term deferred exam: } This Saturday, 9am-10.30am,
More informationOptimally Solving Dec-POMDPs as Continuous-State MDPs
Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Steeve Dibangoye Inria / Université de Lorraine Nancy, France jilles.dibangoye@inria.fr Christopher Amato CSAIL / MIT Cambridge, MA, USA camato@csail.mit.edu
More informationRAO : an Algorithm for Chance-Constrained POMDP s
RAO : an Algorithm for Chance-Constrained POMDP s Pedro Santana and Sylvie Thiébaux + and Brian Williams Massachusetts Institute of Technology, CSAIL + The Australian National University & NICTA 32 Vassar
More informationWidth and Complexity of Belief Tracking in Non-Deterministic Conformant and Contingent Planning
Width and Complexity of Belief Tracking in Non-Deterministic Conformant and Contingent Planning Blai Bonet Universidad Simón Bolívar Caracas, Venezuela bonet@ldc.usb.ve Hector Geffner ICREA & Universitat
More informationTowards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss
Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs Lorenzo Nardi and Cyrill Stachniss Navigation under uncertainty C B C B A A 2 `B` is the most likely position C B C B A A 3
More informationSymbolic Dynamic Programming for Continuous State and Observation POMDPs
Symbolic Dynamic Programming for Continuous State and Observation POMDPs Zahra Zamani ANU & NICTA Canberra, Australia zahra.zamani@anu.edu.au Pascal Poupart U. of Waterloo Waterloo, Canada ppoupart@uwaterloo.ca
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationBayesian Mixture Modeling and Inference based Thompson Sampling in Monte-Carlo Tree Search
Bayesian Mixture Modeling and Inference based Thompson Sampling in Monte-Carlo Tree Search Aijun Bai Univ. of Sci. & Tech. of China baj@mail.ustc.edu.cn Feng Wu University of Southampton fw6e11@ecs.soton.ac.uk
More informationPlanning Under Uncertainty II
Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov
More informationSymbolic Dynamic Programming for Continuous State and Observation POMDPs
Symbolic Dynamic Programming for Continuous State and Observation POMDPs Zahra Zamani ANU & NICTA Canberra, Australia zahra.zamani@anu.edu.au Pascal Poupart U. of Waterloo Waterloo, Canada ppoupart@uwaterloo.ca
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationDecayed Markov Chain Monte Carlo for Interactive POMDPs
Decayed Markov Chain Monte Carlo for Interactive POMDPs Yanlin Han Piotr Gmytrasiewicz Department of Computer Science University of Illinois at Chicago Chicago, IL 60607 {yhan37,piotr}@uic.edu Abstract
More informationAn Introduction to Markov Decision Processes. MDP Tutorial - 1
An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal
More informationRelational Partially Observable MDPs
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) elational Partially Observable MDPs Chenggang Wang and oni Khardon Department of Computer Science Tufts University
More informationSolving POMDPs with Continuous or Large Discrete Observation Spaces
Solving POMDPs with Continuous or Large Discrete Observation Spaces Jesse Hoey Department of Computer Science University of Toronto Toronto, ON, M5S 3H5 jhoey@cs.toronto.edu Pascal Poupart School of Computer
More informationGMDPtoolbox: a Matlab library for solving Graph-based Markov Decision Processes
GMDPtoolbox: a Matlab library for solving Graph-based Markov Decision Processes Marie-Josée Cros, Nathalie Peyrard, Régis Sabbadin UR MIAT, INRA Toulouse, France JFRB, Clermont Ferrand, 27-28 juin 2016
More informationSample Bounded Distributed Reinforcement Learning for Decentralized POMDPs
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs Bikramjit Banerjee 1, Jeremy Lyle 2, Landon Kraemer
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More informationTutorial on Optimization for PartiallyObserved Markov Decision. Processes
Tutorial on Optimization for PartiallyObserved Markov Decision Processes Salvatore Candido Dept. of Electrical and Computer Engineering University of Illinois Probability Theory Prerequisites A random
More informationPOMDPs and Policy Gradients
POMDPs and Policy Gradients MLSS 2006, Canberra Douglas Aberdeen Canberra Node, RSISE Building Australian National University 15th February 2006 Outline 1 Introduction What is Reinforcement Learning? Types
More informationEfficient Maximization in Solving POMDPs
Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationChapter 16 Planning Based on Markov Decision Processes
Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until
More informationLearning in POMDPs with Monte Carlo Tree Search
Sammie Katt 1 Frans A. Oliehoek 2 Christopher Amato 1 Abstract The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult.
More informationarxiv: v3 [cs.ai] 26 Jun 2018
Planning to Give Information in Partially Observed Domains with a Learned Weighted Entropy Model Rohan Chitnis Leslie Pack Kaelbling Tomás Lozano-Pérez MIT Computer Science and Artificial Intelligence
More informationGoal Recognition over POMDPs: Inferring the Intention of a POMDP Agent
Goal Recognition over POMDPs: Inferring the Intention of a POMDP Agent Miquel Ramírez Universitat Pompeu Fabra 08018 Barcelona, SPAIN miquel.ramirez@upf.edu Hector Geffner ICREA & Universitat Pompeu Fabra
More informationCourse on Automated Planning: Transformations
Course on Automated Planning: Transformations Hector Geffner ICREA & Universitat Pompeu Fabra Barcelona, Spain H. Geffner, Course on Automated Planning, Rome, 7/2010 1 AI Planning: Status The good news:
More informationCausal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation
Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation Blai Bonet 1 and Hector Geffner 2 1 Universidad Simón Boĺıvar 2 ICREA & Universitat Pompeu Fabra
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationLearning What Information to Give in Partially Observed Domains
Learning What Information to Give in Partially Observed Domains Rohan Chitnis Leslie Pack Kaelbling Tomás Lozano-Pérez MIT Computer Science and Artificial Intelligence Laboratory {ronuchit, lpk, tlp}@mit.edu
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationPROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION PING HOU. A dissertation submitted to the Graduate School
PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION BY PING HOU A dissertation submitted to the Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy Major Subject:
More informationActive Learning of MDP models
Active Learning of MDP models Mauricio Araya-López, Olivier Buffet, Vincent Thomas, and François Charpillet Nancy Université / INRIA LORIA Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex France
More informationLearning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments
Learning for Multiagent Decentralized Control in Large Partially Observable Stochastic Environments Miao Liu Laboratory for Information and Decision Systems Cambridge, MA 02139 miaoliu@mit.edu Christopher
More informationOptimal and Approximate Q-value Functions for Decentralized POMDPs
Journal of Artificial Intelligence Research 32 (2008) 289 353 Submitted 09/07; published 05/08 Optimal and Approximate Q-value Functions for Decentralized POMDPs Frans A. Oliehoek Intelligent Systems Lab
More informationCOG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs
COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs Madison Clark-Turner Department of Computer Science University of New Hampshire Durham, NH 03824 mbc2004@cs.unh.edu Christopher Amato
More informationBayes-Adaptive Simulation-based Search with Value Function Approximation
Bayes-Adaptive Simulation-based Search with Value Function Approximation Arthur Guez,,2 Nicolas Heess 2 David Silver 2 eter Dayan aguez@google.com Gatsby Unit, UCL 2 Google DeepMind Abstract Bayes-adaptive
More informationInteractive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested Agents
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Interactive POMDP Lite: Towards Practical Planning to Predict and Exploit Intentions for Interacting with Self-Interested
More informationUsing Gaussian Processes for Variance Reduction in Policy Gradient Algorithms *
Proceedings of the 8 th International Conference on Applied Informatics Eger, Hungary, January 27 30, 2010. Vol. 1. pp. 87 94. Using Gaussian Processes for Variance Reduction in Policy Gradient Algorithms
More informationBayesian Reinforcement Learning in Continuous POMDPs with Application to Robot Navigation
2008 IEEE International Conference on Robotics and Automation Pasadena, CA, USA, May 19-23, 2008 Bayesian Reinforcement Learning in Continuous POMDPs with Application to Robot Navigation Stéphane Ross
More informationA Decision-Theoretic Formalism for Belief-Optimal Reasoning
A Decision-Theoretic Formalism for Belief-Optimal Reasoning Kris Hauser School of Informatics and Computing, Indiana University Lindley Hall, Room 215 150 S. Woodlawn Ave. Bloomington, Indiana hauserk@cs.indiana.edu
More informationEfficient Learning in Linearly Solvable MDP Models
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Efficient Learning in Linearly Solvable MDP Models Ang Li Department of Computer Science, University of Minnesota
More informationMulti-Agent Online Planning with Communication
Multi-Agent Online Planning with Communication Feng Wu Department of Computer Science University of Sci. & Tech. of China Hefei, Anhui 230027 China wufeng@mail.ustc.edu.cn Shlomo Zilberstein Department
More informationApproximate Universal Artificial Intelligence
Approximate Universal Artificial Intelligence A Monte-Carlo AIXI Approximation Joel Veness Kee Siong Ng Marcus Hutter Dave Silver UNSW / NICTA / ANU / UoA September 8, 2010 General Reinforcement Learning
More informationBayesian reinforcement learning and partially observable Markov decision processes November 6, / 24
and partially observable Markov decision processes Christos Dimitrakakis EPFL November 6, 2013 Bayesian reinforcement learning and partially observable Markov decision processes November 6, 2013 1 / 24
More informationarxiv: v2 [cs.ai] 9 Dec 2015
Scaling POMPs For Selecting Sellers in E-markets Extended Version Athirai A. Irissappane Nanyang Technological University Singapore athirai1@e.ntu.edu.sg Frans A. Oliehoek University of Amsterdam University
More information10 Robotic Exploration and Information Gathering
NAVARCH/EECS 568, ROB 530 - Winter 2018 10 Robotic Exploration and Information Gathering Maani Ghaffari April 2, 2018 Robotic Information Gathering: Exploration and Monitoring In information gathering
More informationLecture 3: Markov Decision Processes
Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian
More informationTowards Faster Planning with Continuous Resources in Stochastic Domains
Towards Faster Planning with Continuous Resources in Stochastic Domains Janusz Marecki and Milind Tambe Computer Science Department University of Southern California 941 W 37th Place, Los Angeles, CA 989
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationDecision Making As An Optimization Problem
Decision Making As An Optimization Problem Hala Mostafa 683 Lecture 14 Wed Oct 27 th 2010 DEC-MDP Formulation as a math al program Often requires good insight into the problem to get a compact well-behaved
More informationDialogue as a Decision Making Process
Dialogue as a Decision Making Process Nicholas Roy Challenges of Autonomy in the Real World Wide range of sensors Noisy sensors World dynamics Adaptability Incomplete information Robustness under uncertainty
More informationarxiv: v1 [cs.ai] 14 Nov 2018
Bayesian Reinforcement Learning in Factored POMDPs arxiv:1811.05612v1 [cs.ai] 14 Nov 2018 ABSTRACT Sammie Katt Northeastern University Bayesian approaches provide a principled solution to the explorationexploitation
More information