POMDP S : Exact and Approximate Solutions
|
|
- Harvey Baker
- 5 years ago
- Views:
Transcription
1 POMDP S : Exact and Approximate Solutions Bharaneedharan R University of Illinois at Chicago Automated optimal decision making. Fall 2002 p.1/21
2 Road-map Definition of POMDP Belief as a sufficient statistic General solutions to a POMDP Monahan s enumeration algorithm Incremental pruning Approximations MDP based approximations Grid based approximation Automated optimal decision making. Fall 2002 p.2/21
3 Definition of POMDP A POMDP is defined by the tuple <S, A, T, θ, O, R > S is a finite set of world states. A is a finite set of actions that can be performed by the agent. T : S A S [0,1], the likelihood of an action changing the system state from one to another. θ is the finite set of observations that the agent can make (sensory inputs available for the agent). O : S θ A [0,1], the likelihood of making a certain observation at a state after performing a particular action. R : S A S R, defines payoff s for the agent in a given system state after performing an action. Automated optimal decision making. Fall 2002 p.3/21
4 Belief s and Information States Let the information state process(isp) be I 0, I 1,... I t I t = [o t, a t 1, I t 1 ] Belief or the probability distribution over states at any time t is given as Be j (t) = P (S(t) = j I t 1 ) I 0 is the information state before agents start performing actions. If observations result only after performing actions I 0 = Be(0), the prior at time t=0 Redefining ISP I 0 = Be(0), I 1 = [o 1, a 0, I 0 ],... Automated optimal decision making. Fall 2002 p.4/21
5 Belief is a sufficient statistic Be j (t) = P (S(t) = j o t, a t 1, I t 1 ) At t=0, Be(0) is just a prior At t=1, Be j (1) = P (S(t) = j o 1, a 0, Be j (0)) We see that at any time t, Be(t) will be dependent on current observation, previous action and previous belief state. Also current observation and state is dependent on only previous action and information state Consequently Be(t) is a compact representation of I t Automated optimal decision making. Fall 2002 p.5/21
6 General solution to a POMDP Let b i (S) for all i V t (b i ) = Max a A R(b i a) + γ o θ P (o b i, a) V t 1 (τ(b i, o, a)) α t (s i, a) = R(s i, a) + γ o θ s j S P (o s j, a) P (s j s i, a) α t 1 α t 1 is an alpha vector from the previous epoch for a given observation o. o1 o1 α1(t) o1 o2 α1(t-1) α2(t-1) o2 o3 α1(t) o1 o2 α2(t-1) α1(t-1) o2 o3 o3 o3 α3(t-1) α2(t-1) Automated optimal decision making. Fall 2002 p.6/21
7 Continued... Each choice of α t 1 for a given o is a choice of a future plan given o. α t is constructed using the best of such choices of plans for all possible observations. α1(t) o1 o2 o3 α1(t-1) α1(t-1) α1(t-1) α1(t-1) α1(t-1) α2(t-1) α1(t-1) α1(t-1) α3(t-1) : : α3(t-1) α1(t-1) α2(t-1) : : α3(t-1) α3(t-1) α3(t-1) Automated optimal decision making. Fall 2002 p.7/21
8 Piecewise linearity and consequences We know that the value function is piecewise linear i.e. V t (b i ) = Max α k t α k t b i Computing the value function involves just projecting the alpha vectors. But alpha vectors are linear over the belief space and hence only the values simplex corners represented in alpha vectors need to be computed. Automated optimal decision making. Fall 2002 p.8/21
9 Enumeration algorithm [Monahan 82] At a given epoch, project all the alpha vectors describing the value function at previous epoch. If we have Γ t is the set of alpha vectors at epoch t, Γ t = A Γ t 1 Θ Find the useful set of alpha vectors that forms the value function Useless alpha vectors Automated optimal decision making. Fall 2002 p.9/21
10 Testing for redundancy Test each alpha vector if it maximizes the value function at least at one point in the belief space Let π i π = Be(s = i) and α k be the vector to be tested α k is an useful vector iff for all vectors j k, i π iα j i π i α k is satisfied at some π i.e. i π i(α j α k ) 0 To find the point where the vector maximizes the most we rewrite the above as an LP maximize : δ i π i(α j α k ) + δ 0 for each alpha vector j k i π i = 1 π i 0 This leads to an LP with ( Γ 1) + 1+ S constraints and S +1 variables Automated optimal decision making. Fall 2002 p.10/21
11 Problems with simple enumeration Brute force generate and test For a problem with 30 states, 4 actions and 8 observations, we would need about 60MB of memory to store the new vectors in epoch 2. With 12 observations thats a whopping 15GB. Useful only for solving very small problems Automated optimal decision making. Fall 2002 p.11/21
12 Incremental pruning [zhang, Liu 96] Extension to the enumeration algorithm, using interleaved generation and redundancy testing. The key is in breaking up the dynamic programming update of the value function. V (b) = max a A V a (b) V a (b) = V a o (b) = o θ V a o (b) s R(a,s)b(s) θ + γp (o b, a)v (b a o) The maximizing set of alpha vectors is determined from bottom to top. Leading to iterative purging of smaller set of alpha vectors Automated optimal decision making. Fall 2002 p.12/21
13 Partial projection and pruning Let W, W a, W a o be the set of vectors describing V, V a, V a o W = purge a A W a W a = purge o θ W a o W a o = purge ({τ(α, a, o) α W }) W = projection(w ) τ(α, a, o)(s) = R(a,s) θ + γ s α(s )P (o s, a)p (s s, a) purge(.) returns the maximizing set of vectors in the belief space ( S dimensional space) Automated optimal decision making. Fall 2002 p.13/21
14 Example ,* ,1 0,1 0,0 0, ,* , ,0 12 1,* 5.5 1, ,1 11 0,* 0,* Automated optimal decision making. Fall 2002 p.14/21
15 Approximations MDP based approximation Grid based approximation Automated optimal decision making. Fall 2002 p.15/21
16 MDP based approximation Is a crude approximation assuming complete observability Solve the underlying MDP and computer value function V MDP Value of a belief state is computed as V (b) = s b(s)v MDP (s) We can also redefine our update rule as V i+1 (b) = s b(s )max a A [R(s, a) + γ s P (s s, a)vi MDP (s)] Note V i will always be described by a single vector The MDP based update above provides an upper bound to the general update using observations. MDP-V(s2) MDP-V(s1) V(b) 0 1 Automated optimal decision making. Fall 2002 p.16/21
17 Upper bound property Let H be our regular POMDP update function involving observations. H MDP is the MDP based update function. We have to show that HV i H MDP V i HV i (b) = max a A s S R(s, a)b(s) + γ o θ s S s S P (s, o s, a)b(s)α MDP i (s ) = max a A s S b(s)[r(s, a) + γ s S P (s s, a)v MDP i (s )] s S b(s)max a A [R(s, a) + γ s S P (s s, a)v MDP i (s )] = H MDP V i (b) Automated optimal decision making. Fall 2002 p.17/21
18 Grid based approximation [Lovejoy 91] General value iteration in all exact algorithms compute the value function over the complete belief space. Compute the value function at finite number of points in the belief space. Compute the value of other belief states with values of our chosen set of belief states using interpolation. Automated optimal decision making. Fall 2002 p.18/21
19 Grid based value function updates Let G be the set of grid points V (b) = E(b, V G ) = G j=1 λ jv G (b j ) 0 λ j 1 G j=1 λ j = 1 V i+1 (b G j ) = max a AR(b G j, a) + γ o θ P (o τ(bg j, a), a)v i(τ(b G j, a, o)) Let us call this update function based on grid points as H G Automated optimal decision making. Fall 2002 p.19/21
20 Lower bounds The grid based update function H G provides a lower bound to the regular update function H. H G V i HV i Proof: Let V i be a set of vectors describing the value function H G will always use only a partial set of vectors from V i for updating the value of the grid points. Since H G only considers maximal vectors at grid points it ignores the maximal vectors at other belief points. In the best case scenario H G V i = HV i Automated optimal decision making. Fall 2002 p.20/21
21 Upper bounds The value function based on the grid points and point interpolation together provides us with an upper bound on the value function. HV (b) = HV G λ i b G i i=1 G i=1 [HV (b G i )] = H G V (b) Automated optimal decision making. Fall 2002 p.21/21
Partially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and
More informationMarkov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.
Page 1 Markov decision processes (MDP) CS 416 Artificial Intelligence Lecture 21 Making Complex Decisions Chapter 17 Initial State S 0 Transition Model T (s, a, s ) How does Markov apply here? Uncertainty
More informationEfficient Maximization in Solving POMDPs
Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University
More informationRegion-Based Dynamic Programming for Partially Observable Markov Decision Processes
Region-Based Dynamic Programming for Partially Observable Markov Decision Processes Zhengzhu Feng Department of Computer Science University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Abstract
More informationRL 14: Simplifications of POMDPs
RL 14: Simplifications of POMDPs Michael Herrmann University of Edinburgh, School of Informatics 04/03/2016 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationPROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION PING HOU. A dissertation submitted to the Graduate School
PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION BY PING HOU A dissertation submitted to the Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy Major Subject:
More informationPartially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague
Partially observable Markov decision processes Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationPlanning with Predictive State Representations
Planning with Predictive State Representations Michael R. James University of Michigan mrjames@umich.edu Satinder Singh University of Michigan baveja@umich.edu Michael L. Littman Rutgers University mlittman@cs.rutgers.edu
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationPlanning and Acting in Partially Observable Stochastic Domains
Planning and Acting in Partially Observable Stochastic Domains Leslie Pack Kaelbling*, Michael L. Littman**, Anthony R. Cassandra*** *Computer Science Department, Brown University, Providence, RI, USA
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationOptimally Solving Dec-POMDPs as Continuous-State MDPs
Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI
More informationTowards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss
Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs Lorenzo Nardi and Cyrill Stachniss Navigation under uncertainty C B C B A A 2 `B` is the most likely position C B C B A A 3
More informationSymbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning
Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationEuropean Workshop on Reinforcement Learning A POMDP Tutorial. Joelle Pineau. McGill University
European Workshop on Reinforcement Learning 2013 A POMDP Tutorial Joelle Pineau McGill University (With many slides & pictures from Mauricio Araya-Lopez and others.) August 2013 Sequential decision-making
More informationDecision Procedures An Algorithmic Point of View
An Algorithmic Point of View ILP References: Integer Programming / Laurence Wolsey Deciding ILPs with Branch & Bound Intro. To mathematical programming / Hillier, Lieberman Daniel Kroening and Ofer Strichman
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationSection Handout 5 Solutions
CS 188 Spring 2019 Section Handout 5 Solutions Approximate Q-Learning With feature vectors, we can treat values of states and q-states as linear value functions: V (s) = w 1 f 1 (s) + w 2 f 2 (s) +...
More informationRL 14: POMDPs continued
RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationModule 8 Linear Programming. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 8 Linear Programming CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Policy Optimization Value and policy iteration Iterative algorithms that implicitly solve
More informationSolving Risk-Sensitive POMDPs with and without Cost Observations
Solving Risk-Sensitive POMDPs with and without Cost Observations Ping Hou Department of Computer Science New Mexico State University Las Cruces, NM 88003, USA phou@cs.nmsu.edu William Yeoh Department of
More informationHeuristic Search Value Iteration for POMDPs
520 SMITH & SIMMONS UAI 2004 Heuristic Search Value Iteration for POMDPs Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract We present a novel POMDP
More informationPOMDP solution methods
POMDP solution methods Darius Braziunas Department of Computer Science University of Toronto 2003 Abstract This is an overview of partially observable Markov decision processes (POMDPs). We describe POMDP
More informationGPU Accelerated Markov Decision Processes in Crowd Simulation
GPU Accelerated Markov Decision Processes in Crowd Simulation Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National
More informationFactored State Spaces 3/2/178
Factored State Spaces 3/2/178 Converting POMDPs to MDPs In a POMDP: Action + observation updates beliefs Value is a function of beliefs. Instead we can view this as an MDP where: There is a state for every
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationExploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains
Exploiting Symmetries for Single and Multi-Agent Partially Observable Stochastic Domains Byung Kon Kang, Kee-Eung Kim KAIST, 373-1 Guseong-dong Yuseong-gu Daejeon 305-701, Korea Abstract While Partially
More informationAccelerated Vector Pruning for Optimal POMDP Solvers
Accelerated Vector Pruning for Optimal POMDP Solvers Erwin Walraven and Matthijs T. J. Spaan Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands Abstract Partially Observable Markov
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationChapter 16 Planning Based on Markov Decision Processes
Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until
More informationPOMDPs and Policy Gradients
POMDPs and Policy Gradients MLSS 2006, Canberra Douglas Aberdeen Canberra Node, RSISE Building Australian National University 15th February 2006 Outline 1 Introduction What is Reinforcement Learning? Types
More informationFinite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs
Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationLinear programming: algebra
: algebra CE 377K March 26, 2015 ANNOUNCEMENTS Groups and project topics due soon Announcements Groups and project topics due soon Did everyone get my test email? Announcements REVIEW geometry Review geometry
More informationInverse Reinforcement Learning in Partially Observable Environments
Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Inverse Reinforcement Learning in Partially Observable Environments Jaedeug Choi and Kee-Eung Kim Department
More informationPoint Based Value Iteration with Optimal Belief Compression for Dec-POMDPs
Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs Liam MacDermed College of Computing Georgia Institute of Technology Atlanta, GA 30332 liam@cc.gatech.edu Charles L. Isbell College
More informationState Space Compression with Predictive Representations
State Space Compression with Predictive Representations Abdeslam Boularias Laval University Quebec GK 7P4, Canada Masoumeh Izadi McGill University Montreal H3A A3, Canada Brahim Chaib-draa Laval University
More informationBayes-Adaptive POMDPs 1
Bayes-Adaptive POMDPs 1 Stéphane Ross, Brahim Chaib-draa and Joelle Pineau SOCS-TR-007.6 School of Computer Science McGill University Montreal, Qc, Canada Department of Computer Science and Software Engineering
More informationExponential Moving Average Based Multiagent Reinforcement Learning Algorithms
Exponential Moving Average Based Multiagent Reinforcement Learning Algorithms Mostafa D. Awheda Department of Systems and Computer Engineering Carleton University Ottawa, Canada KS 5B6 Email: mawheda@sce.carleton.ca
More informationReinforcement Learning as Variational Inference: Two Recent Approaches
Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing
More informationPartially Observable Markov Decision Processes: A Geometric Technique and Analysis
OPERATIONS RESEARCH Vol. 58, No., January February 200, pp. 24 228 issn 0030-364X eissn 526-5463 0 580 024 informs doi 0.287/opre.090.0697 200 INFORMS Partially Observable Markov Decision Processes: A
More informationA POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation
A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation Karim G. Seddik and Amr A. El-Sherif 2 Electronics and Communications Engineering Department, American University in Cairo, New
More informationBootstrapping LPs in Value Iteration for Multi-Objective and Partially Observable MDPs
Bootstrapping LPs in Value Iteration for Multi-Objective and Partially Observable MDPs Diederik M. Roijers Vrije Universiteit Brussel & Vrije Universiteit Amsterdam Erwin Walraven Delft University of Technology
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationComputing Optimal Policies for Partially Observable Decision Processes using Compact Representations
Computing Optimal Policies for Partially Observable Decision Processes using Compact epresentations Craig Boutilier and David Poole Department of Computer Science niversity of British Columbia Vancouver,
More informationA Nonlinear Predictive State Representation
Draft: Please do not distribute A Nonlinear Predictive State Representation Matthew R. Rudary and Satinder Singh Computer Science and Engineering University of Michigan Ann Arbor, MI 48109 {mrudary,baveja}@umich.edu
More informationNotes taken by Graham Taylor. January 22, 2005
CSC4 - Linear Programming and Combinatorial Optimization Lecture : Different forms of LP. The algebraic objects behind LP. Basic Feasible Solutions Notes taken by Graham Taylor January, 5 Summary: We first
More informationCourse on Automated Planning: Transformations
Course on Automated Planning: Transformations Hector Geffner ICREA & Universitat Pompeu Fabra Barcelona, Spain H. Geffner, Course on Automated Planning, Rome, 7/2010 1 AI Planning: Status The good news:
More information15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)
15-780: Graduate Artificial Intelligence Reinforcement learning (RL) From MDPs to RL We still use the same Markov model with rewards and actions But there are a few differences: 1. We do not assume we
More informationMS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction
MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent
More informationStochastic Primal-Dual Methods for Reinforcement Learning
Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,
More informationKalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN)
Kalman Based Temporal Difference Neural Network for Policy Generation under Uncertainty (KBTDNN) Alp Sardag and H.Levent Akin Bogazici University Department of Computer Engineering 34342 Bebek, Istanbul,
More informationMarkov Decision Processes and Solving Finite Problems. February 8, 2017
Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:
More informationDecision-theoretic approaches to planning, coordination and communication in multiagent systems
Decision-theoretic approaches to planning, coordination and communication in multiagent systems Matthijs Spaan Frans Oliehoek 2 Stefan Witwicki 3 Delft University of Technology 2 U. of Liverpool & U. of
More informationMinimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP
Minimizing Communication Cost in a Distributed Bayesian Network Using a Decentralized MDP Jiaying Shen Department of Computer Science University of Massachusetts Amherst, MA 0003-460, USA jyshen@cs.umass.edu
More informationA Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes
696 A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes Nevin L. Zhang, Stephen S. Lee, and Weihong Zhang Department of Computer Science, Hong Kong University of
More informationControl Theory : Course Summary
Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationBayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty
Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Stéphane Ross School of Computer Science McGill University, Montreal (Qc), Canada, H3A 2A7 stephane.ross@mail.mcgill.ca
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More informationFederal Aviation Administration Optimal Aircraft Rerouting During Commercial Space Launches
Federal Aviation Administration Optimal Aircraft Rerouting During Commercial Space Launches Rachael Tompa Mykel Kochenderfer Stanford University Oct 28, 2015 1 Motivation! Problem: Launch vehicle anomaly
More informationReinforcement Learning. Spring 2018 Defining MDPs, Planning
Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state
More informationBayesian reinforcement learning and partially observable Markov decision processes November 6, / 24
and partially observable Markov decision processes Christos Dimitrakakis EPFL November 6, 2013 Bayesian reinforcement learning and partially observable Markov decision processes November 6, 2013 1 / 24
More informationGestion de la production. Book: Linear Programming, Vasek Chvatal, McGill University, W.H. Freeman and Company, New York, USA
Gestion de la production Book: Linear Programming, Vasek Chvatal, McGill University, W.H. Freeman and Company, New York, USA 1 Contents 1 Integer Linear Programming 3 1.1 Definitions and notations......................................
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Hidden Markov Models Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell, Andrew Moore, Ali Farhadi, or Dan Weld 1 Outline Probabilistic
More informationDecomposition Methods for Large Scale LP Decoding
Decomposition Methods for Large Scale LP Decoding Siddharth Barman Joint work with Xishuo Liu, Stark Draper, and Ben Recht Outline Background and Problem Setup LP Decoding Formulation Optimization Framework
More informationINTRODUCTION TO MARKOV DECISION PROCESSES
INTRODUCTION TO MARKOV DECISION PROCESSES Balázs Csanád Csáji Research Fellow, The University of Melbourne Signals & Systems Colloquium, 29 April 2010 Department of Electrical and Electronic Engineering,
More informationA Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs
A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs Roy Fox Computer Science Department, Technion IIT, Israel Moshe Tennenholtz Faculty of Industrial
More informationCSE250A Fall 12: Discussion Week 9
CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.
More informationDecayed Markov Chain Monte Carlo for Interactive POMDPs
Decayed Markov Chain Monte Carlo for Interactive POMDPs Yanlin Han Piotr Gmytrasiewicz Department of Computer Science University of Illinois at Chicago Chicago, IL 60607 {yhan37,piotr}@uic.edu Abstract
More informationDynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA
Dynamic Power Management under Uncertain Information Hwisung Jung and Massoud Pedram University of Southern California Los Angeles CA Agenda Introduction Background Stochastic Decision-Making Framework
More informationReinforcement Learning: An Introduction
Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is
More informationPlanning Under Uncertainty II
Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda Announcement No class next Monday - 17/11/2014 2 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov
More informationOn Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets
On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets Pablo Samuel Castro pcastr@cs.mcgill.ca McGill University Joint work with: Doina Precup and Prakash
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationRelational Partially Observable MDPs
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) elational Partially Observable MDPs Chenggang Wang and oni Khardon Department of Computer Science Tufts University
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationDialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department
Dialogue management: Parametric approaches to policy optimisation Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department 1 / 30 Dialogue optimisation as a reinforcement learning
More informationLocalization of Radioactive Sources Zhifei Zhang
Localization of Radioactive Sources Zhifei Zhang 4/13/2016 1 Outline Background and motivation Our goal and scenario Preliminary knowledge Related work Our approach and results 4/13/2016 2 Background and
More informationThe simplex algorithm
The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case. It does yield insight into linear programs, however,
More informationCS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability
CS188: Artificial Intelligence, Fall 2009 Written 2: MDPs, RL, and Probability Due: Thursday 10/15 in 283 Soda Drop Box by 11:59pm (no slip days) Policy: Can be solved in groups (acknowledge collaborators)
More informationFinal. Introduction to Artificial Intelligence. CS 188 Spring You have approximately 2 hours and 50 minutes.
CS 188 Spring 2014 Introduction to Artificial Intelligence Final You have approximately 2 hours and 50 minutes. The exam is closed book, closed notes except your two-page crib sheet. Mark your answers
More informationProbabilistic Planning. George Konidaris
Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t
More informationComputing Optimal Policies for Partially Observable Decision Processes using Compact Representations
Computing Optimal Policies for Partially Observable Decision Processes using Compact epresentations Craig Boutilier and David Poole Department of Computer Science niversity of British Columbia Vancouver,
More information15-780: ReinforcementLearning
15-780: ReinforcementLearning J. Zico Kolter March 2, 2016 1 Outline Challenge of RL Model-based methods Model-free methods Exploration and exploitation 2 Outline Challenge of RL Model-based methods Model-free
More informationExploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes. Pascal Poupart
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes by Pascal Poupart A thesis submitted in conformity with the requirements for the degree of Doctor of
More informationOptimal Control of Partiality Observable Markov. Processes over a Finite Horizon
Optimal Control of Partiality Observable Markov Processes over a Finite Horizon Report by Jalal Arabneydi 04/11/2012 Taken from Control of Partiality Observable Markov Processes over a finite Horizon by
More informationarxiv: v2 [cs.gt] 4 Aug 2016
Dynamic Programming for One-Sided Partially Observable Pursuit-Evasion Games Karel Horák, Branislav Bošanský {horak,bosansky}@agents.fel.cvut.cz arxiv:1606.06271v2 [cs.gt] 4 Aug 2016 Department of Computer
More informationInformation Gathering and Reward Exploitation of Subgoals for P
Information Gathering and Reward Exploitation of Subgoals for POMDPs Hang Ma and Joelle Pineau McGill University AAAI January 27, 2015 http://www.cs.washington.edu/ai/mobile_robotics/mcl/animations/global-floor.gif
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationLecture 3: Markov Decision Processes
Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov
More information