Fast SSP Solvers Using Short-Sighted Labeling
|
|
- Kristin Bruce
- 5 years ago
- Views:
Transcription
1 Luis Pineda, Kyle H. Wray and Shlomo Zilberstein College of Information and Computer Sciences, University of Massachusetts, Amherst, USA July 9th
2 Introduction Motivation SSPs are a highly-expressive model for sequential decision making They can be used for decision-making in the presence of multiple goals Caveat: solving SSPs optimally is a very computationally intensive task
3 Introduction Motivation SSPs are a highly-expressive model for sequential decision making They can be used for decision-making in the presence of multiple goals Caveat: solving SSPs optimally is a very computationally intensive task
4 Introduction Motivation A range of model reduction and heuristic search techniques for solving SSPs are available But even restricting only to states in optimal policies can be prohibitive
5 Introduction Motivation Recent approaches attempt to reduce the reachable state space even more and use re-planning during execution However, they still have several drawbacks: Restricted to particular problem representations (e.g., RFF and FF-Replan) Involve pre-processing (e.g., M k l reduction) Result in moderate reductions in time (e.g., SSiPP)
6 Introduction In this work... We introduce a new algorithm called FLARES (Fast Labeling from Residuals Using Samples) Modifies LRTDP to find high-performing policies much faster It can be extended to also find optimal policies
7 Background Stochastic Shortest Path Problems An SSP is a tuple xs, A, T, C, s 0, s g y, where: S is a finite set of states A is a finite set of actions T ps 1 s, aq P r0, 1s is a transition function Cps, aq P p0, 8q is a cost function s 0 is an initial state s g is a goal state
8 Background Solutions to SSPs The optimal value function for an SSP can be found using: BUpsq : min!cps, aq ` ÿ ) T ps 1 s, aqv ps 1 q apa πpsq arg min apa s 1 PS!Cps, aq ` ÿ s 1 PS ) T ps 1 s, aqv ps 1 q (1) (2)
9 Motivation for FLARES Problems with optimal heuristic search algorithms S G
10 Motivation for FLARES Depth-limited search S G
11 Motivation for FLARES Depth-limited search S x G
12 Motivation for FLARES Depth-limited search Large horizon needed to propagate information from states close to the goal S x G
13 Motivation for FLARES RTDP with short-sighted SSPs S G
14 Motivation for FLARES RTDP with short-sighted SSPs S G
15 Motivation for FLARES RTDP with short-sighted SSPs This approch doesn't seem to work well in practice, typically resulting in worse anytime performance than plain RTDP S G
16 FLARES Labeling states (LRTDP) S If states reachable by following the greedy policy have low residual, then we can label all of them as solved G
17 FLARES Labeling states S If states reachable by following the greedy policy have low residual, then we can label all of them as solved G
18 FLARES Labeling states S Computation can be saved by using cached values and actions for labeled states during the search G
19 FLARES But......to label s, all states reachable from s must be checked...
20 FLARES Labeling states can also be costly S G
21 FLARES FLARES Proposed approach: Move the short-sightedness to the labeling
22 FLARES FLARES Reachable states S Low residual states up to horizon 2t G Labeled states up to horizon t
23 FLARES FLARES S G Similar to the RTDP approach mentioned before... except that now computation can be reused and... there is a crisp termination condition that exploits short-sightedness
24 FLARES FLARES S G Similar to the RTDP approach mentioned before... except that now computation can be reused and... there is a crisp termination condition that exploits short-sightedness
25 FLARES FLARES S G Similar to the RTDP approach mentioned before... except that now computation can be reused and... there is a crisp termination condition that exploits short-sightedness
26 Theoretical properties of FLARES Correctness of labeling procedure Proposition FLARES labels a state s with s.solv true only if all states s 1 that can be reached from following the greedy policy satisfy Rps 1 q ă ɛ. Proposition As long as a Bellman update of s 1 with Rps 1 q ă ɛ never results in Rps 1 q ě ɛ, then FLARES labels a state s with s.d-solv true only if s is depth-t-solved.
27 Theoretical properties of FLARES Correctness of labeling procedure Proposition FLARES labels a state s with s.solv true only if all states s 1 that can be reached from following the greedy policy satisfy Rps 1 q ă ɛ. Proposition As long as a Bellman update of s 1 with Rps 1 q ă ɛ never results in Rps 1 q ě ɛ, then FLARES labels a state s with s.d-solv true only if s is depth-t-solved.
28 Theoretical properties of FLARES FLARES is guaranteed to terminate Theorem If the heuristic is admissible and monotone, FLARES terminates after at most ɛ 1 ř sps V psq V psq trials.
29 Theoretical properties of FLARES An optimal version of FLARES An optimal version of FLARES can be produced by calling FLARES multiple times with increasing horizon t Theorem If the initial heuristic is admissible and monotone, and ρ t, ρpv, tq ą t, with t 0 ě 0, then OPT-FLARES computes an optimal policy.
30 Theoretical properties of FLARES An optimal version of FLARES An optimal version of FLARES can be produced by calling FLARES multiple times with increasing horizon t Theorem If the initial heuristic is admissible and monotone, and ρ t, ρpv, tq ą t, with t 0 ě 0, then OPT-FLARES computes an optimal policy.
31 Gridworld domain Gridworld domain results Goal 2 25 x 100 cells 25 x 100 cells Start Goal 1 algorithm cost time LRTDP FLARES(0) FLARES(1) HDP(0,0) HDP(4,0) HDP(4,4) SSiPP(16) SSiPP(32) SSiPP(64)
32 Racetrack domain Racetrack domain - Average costs
33 Racetrack domain Racetrack domain - Average planning time (seconds) square-4 square-5 ring-5 ring-6 LRTDP FLARES(0) FLARES(1) HDP(0,0) HDP(0,1) HDP(1,0) HDP(1,1) SSiPP(4) SSiPP(8)
34 Sailing domain Sailing domain - Average costs 250 size=(20,20) goal=(10,10) OPT F(0) F(1) H(0,0) H(0,1) H(1,0) H(1,1) S(4) S(8) 600 size=(40,40) goal=(20,20) OPT F(0) F(1) H(0,0) H(0,1) H(1,0) H(1,1) S(4) S(8) 300 size=(20,20) goal=(19,19) OPT F(0) F(1) H(0,0) H(0,1) H(1,0) H(1,1) S(4) S(8) 600 size=(40,40) goal=(39,39) OPT F(0) F(1) H(0,0) H(0,1) H(1,0) H(1,1) S(4) S(8)
35 Sailing domain Sailing domain - Average planning time (seconds) s=20 g=corner s=40 g=corner s=20 g=middle s=40 g=middle LRTDP FLARES(0) FLARES(1) FLARES(2) HDP(0,0) HDP(0,1) HDP(1,0) HDP(1,1) SSiPP(4) SSiPP(8)
36 Conclusions Conclusions We present an approach for short-sightedness in SSPs that applies it only for labeling Allows larger sections of the state to be explored and accelerate running times Based on this idea, we introduce a novel extension of LRTDP called FLARES Experimental results suggest that FLARES can produce near-optimal policies orders of magnitude faster than other state-of-the-art MDP solvers
37 Conclusions Conclusions We present an approach for short-sightedness in SSPs that applies it only for labeling Allows larger sections of the state to be explored and accelerate running times Based on this idea, we introduce a novel extension of LRTDP called FLARES Experimental results suggest that FLARES can produce near-optimal policies orders of magnitude faster than other state-of-the-art MDP solvers
38 Conclusions Conclusions We present an approach for short-sightedness in SSPs that applies it only for labeling Allows larger sections of the state to be explored and accelerate running times Based on this idea, we introduce a novel extension of LRTDP called FLARES Experimental results suggest that FLARES can produce near-optimal policies orders of magnitude faster than other state-of-the-art MDP solvers
39 Conclusions Conclusions We present an approach for short-sightedness in SSPs that applies it only for labeling Allows larger sections of the state to be explored and accelerate running times Based on this idea, we introduce a novel extension of LRTDP called FLARES Experimental results suggest that FLARES can produce near-optimal policies orders of magnitude faster than other state-of-the-art MDP solvers
40 Conclusions Conclusions We present an approach for short-sightedness in SSPs that applies it only for labeling Allows larger sections of the state to be explored and accelerate running times Based on this idea, we introduce a novel extension of LRTDP called FLARES Experimental results suggest that FLARES can produce near-optimal policies orders of magnitude faster than other state-of-the-art MDP solvers
CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam
CSE 573 Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming Slides adapted from Andrey Kolobov and Mausam 1 Stochastic Shortest-Path MDPs: Motivation Assume the agent pays cost
More informationStochastic Safest and Shortest Path Problems
Stochastic Safest and Shortest Path Problems Florent Teichteil-Königsbuch AAAI-12, Toronto, Canada July 24-26, 2012 Path optimization under probabilistic uncertainties Problems coming to searching for
More informationFocused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic
Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic Trey Smith and Reid Simmons Robotics Institute, Carnegie Mellon University {trey,reids}@ri.cmu.edu Abstract Real-time dynamic
More informationHeuristic Search Algorithms
CHAPTER 4 Heuristic Search Algorithms 59 4.1 HEURISTIC SEARCH AND SSP MDPS The methods we explored in the previous chapter have a serious practical drawback the amount of memory they require is proportional
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationInternet Monetization
Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More informationRevisiting Multi-Objective MDPs with Relaxed Lexicographic Preferences
Revisiting Multi-Objective MDPs with Relaxed Lexicographic Preferences Luis Pineda and Kyle Hollins Wray and Shlomo Zilberstein College of Information and Computer Sciences University of Massachusetts
More informationBounded Real-Time Dynamic Programming: RTDP with monotone upper bounds and performance guarantees
: RTDP with monotone upper bounds and performance guarantees H. Brendan McMahan mcmahan@cs.cmu.edu Maxim Likhachev maxim+@cs.cmu.edu Geoffrey J. Gordon ggordon@cs.cmu.edu Carnegie Mellon University, 5
More informationChapter 16 Planning Based on Markov Decision Processes
Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until
More informationPlanning in Markov Decision Processes
Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationEfficient Maximization in Solving POMDPs
Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University
More informationLearning in Depth-First Search: A Unified Approach to Heuristic Search in Deterministic, Non-Deterministic, Probabilistic, and Game Tree Settings
Learning in Depth-First Search: A Unified Approach to Heuristic Search in Deterministic, Non-Deterministic, Probabilistic, and Game Tree Settings Blai Bonet and Héctor Geffner Abstract Dynamic Programming
More informationAn Algorithm better than AO*?
An Algorithm better than? Blai Bonet Universidad Simón Boĺıvar Caracas, Venezuela Héctor Geffner ICREA and Universitat Pompeu Fabra Barcelona, Spain 7/2005 An Algorithm Better than? B. Bonet and H. Geffner;
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationCS788 Dialogue Management Systems Lecture #2: Markov Decision Processes
CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes Kee-Eung Kim KAIST EECS Department Computer Science Division Markov Decision Processes (MDPs) A popular model for sequential decision
More informationCMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and
More informationTowards Faster Planning with Continuous Resources in Stochastic Domains
Towards Faster Planning with Continuous Resources in Stochastic Domains Janusz Marecki and Milind Tambe Computer Science Department University of Southern California 941 W 37th Place, Los Angeles, CA 989
More informationPath Planning for Minimizing the Expected Cost until Success
Path Planning for Minimizing the Expected Cost until Success Arjun Muralidharan and Yasamin Mostofi arxiv:182.898v2 [cs.ro] Aug 18 Abstract Consider a general path planning problem of a robot on a graph
More informationReinforcement Learning: An Introduction
Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationFinite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs
Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu
More informationRobust Value Function Approximation Using Bilinear Programming
Robust Value Function Approximation Using Bilinear Programming Marek Petrik Department of Computer Science University of Massachusetts Amherst, MA 01003 petrik@cs.umass.edu Shlomo Zilberstein Department
More informationFinding Objects through Stochas4c Shortest Path Problems
/7/6 Finding Objects through Stochas4c Shortest Path Problems Manuela Veloso Thanks to Felipe Trevizan School of Computer Science Carnegie Mellon University PEL, Fall 206 CoBot: Autonomous Service Robot
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationFMCAD 2013 Parameter Synthesis with IC3
FMCAD 2013 Parameter Synthesis with IC3 A. Cimatti, A. Griggio, S. Mover, S. Tonetta FBK, Trento, Italy Motivations and Contributions Parametric descriptions of systems arise in many domains E.g. software,
More informationMarkov Decision Processes Infinite Horizon Problems
Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationPrioritized Sweeping Converges to the Optimal Value Function
Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science
More informationStochastic Shortest Path MDPs with Dead Ends
Stochastic Shortest Path MDPs with Dead Ends Andrey Kolobov Mausam Daniel S. Weld {akolobov, mausam, weld}@cs.washington.edu Dept of Computer Science and Engineering University of Washington Seattle, USA,
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationPlanning with Durative Actions in Stochastic Domains
Journal of Artificial Intelligence Research 31 (28) 33-82 Submitted 2/7; published 1/8 Planning with Durative Actions in Stochastic Domains ausam Daniel S. Weld Dept of Computer Science and Engineering
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationReal Time Value Iteration and the State-Action Value Function
MS&E338 Reinforcement Learning Lecture 3-4/9/18 Real Time Value Iteration and the State-Action Value Function Lecturer: Ben Van Roy Scribe: Apoorva Sharma and Tong Mu 1 Review Last time we left off discussing
More informationAn Adaptive Clustering Method for Model-free Reinforcement Learning
An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at
More informationCMU Lecture 11: Markov Decision Processes II. Teacher: Gianni A. Di Caro
CMU 15-781 Lecture 11: Markov Decision Processes II Teacher: Gianni A. Di Caro RECAP: DEFINING MDPS Markov decision processes: o Set of states S o Start state s 0 o Set of actions A o Transitions P(s s,a)
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationMarkov Decision Processes and Solving Finite Problems. February 8, 2017
Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:
More informationReinforcement Learning
Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation
More informationStochastic Shortest Path Problems
Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationToday s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes
Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks
More informationChapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Chapter 7: Eligibility Traces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Midterm Mean = 77.33 Median = 82 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction
More informationCoarticulation in Markov Decision Processes
Coarticulation in Markov Decision Processes Khashayar Rohanimanesh Department of Computer Science University of Massachusetts Amherst, MA 01003 khash@cs.umass.edu Sridhar Mahadevan Department of Computer
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationReinforcement Learning
1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision
More informationCS 188 Introduction to Fall 2007 Artificial Intelligence Midterm
NAME: SID#: Login: Sec: 1 CS 188 Introduction to Fall 2007 Artificial Intelligence Midterm You have 80 minutes. The exam is closed book, closed notes except a one-page crib sheet, basic calculators only.
More informationReinforcement Learning
CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act
More informationReinforcement learning
Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process
Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationHomework 2: MDPs and Search
Graduate Artificial Intelligence 15-780 Homework 2: MDPs and Search Out on February 15 Due on February 29 Problem 1: MDPs [Felipe, 20pts] Figure 1: MDP for Problem 1. States are represented by circles
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More informationLecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation
Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationAn Introduction to Markov Decision Processes. MDP Tutorial - 1
An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal
More informationPROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION PING HOU. A dissertation submitted to the Graduate School
PROBABILISTIC PLANNING WITH RISK-SENSITIVE CRITERION BY PING HOU A dissertation submitted to the Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy Major Subject:
More informationSolving Multi-Objective MDP with Lexicographic Preference: An application to stochastic planning with multiple quantile objective
arxiv:1705.03597v1 [cs.ai] 10 May 2017 Solving Multi-Objective MDP with Lexicographic Preference: An application to stochastic planning with multiple quantile objective Yan Li 1 and Zhaohan Sun 2 1 School
More informationOptimizing Memory-Bounded Controllers for Decentralized POMDPs
Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationCrowdsourcing & Optimal Budget Allocation in Crowd Labeling
Crowdsourcing & Optimal Budget Allocation in Crowd Labeling Madhav Mohandas, Richard Zhu, Vincent Zhuang May 5, 2016 Table of Contents 1. Intro to Crowdsourcing 2. The Problem 3. Knowledge Gradient Algorithm
More informationPlanning by Probabilistic Inference
Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationReinforcement Learning and NLP
1 Reinforcement Learning and NLP Kapil Thadani kapil@cs.columbia.edu RESEARCH Outline 2 Model-free RL Markov decision processes (MDPs) Derivative-free optimization Policy gradients Variance reduction Value
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationApproximating Reachable Belief Points in POMDPs
Approximating Reachable Belief Points in POMDPs Kyle Hollins Wray and Shlomo Zilberstein Abstract We propose an algorithm called σ-approximation that compresses the non-zero values of beliefs for partially
More informationLecture 10 - Planning under Uncertainty (III)
Lecture 10 - Planning under Uncertainty (III) Jesse Hoey School of Computer Science University of Waterloo March 27, 2018 Readings: Poole & Mackworth (2nd ed.)chapter 12.1,12.3-12.9 1/ 34 Reinforcement
More informationA Fast Analytical Algorithm for MDPs with Continuous State Spaces
A Fast Analytical Algorithm for MDPs with Continuous State Spaces Janusz Marecki, Zvi Topol and Milind Tambe Computer Science Department University of Southern California Los Angeles, CA 989 {marecki,
More informationMotivation for introducing probabilities
for introducing probabilities Reaching the goals is often not sufficient: it is important that the expected costs do not outweigh the benefit of reaching the goals. 1 Objective: maximize benefits - costs.
More informationReinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019
Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 8 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Review of Infinite Horizon Problems
More informationWeighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications
May 2012 Report LIDS - 2884 Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications Dimitri P. Bertsekas Abstract We consider a class of generalized dynamic programming
More informationProcedia Computer Science 00 (2011) 000 6
Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-
More informationCITS4211 Mid-semester test 2011
CITS4211 Mid-semester test 2011 Fifty minutes, answer all four questions, total marks 60 Question 1. (12 marks) Briefly describe the principles, operation, and performance issues of iterative deepening.
More informationAn Algorithm better than AO*?
An Algorithm better than? Blai Bonet Departamento de Computación Universidad Simón Bolívar Caracas, Venezuela bonet@ldc.usb.ve Héctor Geffner ICREA & Universitat Pompeu Fabra Paseo de Circunvalación, 8
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationReinforcement Learning. Summer 2017 Defining MDPs, Planning
Reinforcement Learning Summer 2017 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state
More informationQ-Learning in Continuous State Action Spaces
Q-Learning in Continuous State Action Spaces Alex Irpan alexirpan@berkeley.edu December 5, 2015 Contents 1 Introduction 1 2 Background 1 3 Q-Learning 2 4 Q-Learning In Continuous Spaces 4 5 Experimental
More information2534 Lecture 4: Sequential Decisions and Markov Decision Processes
2534 Lecture 4: Sequential Decisions and Markov Decision Processes Briefly: preference elicitation (last week s readings) Utility Elicitation as a Classification Problem. Chajewska, U., L. Getoor, J. Norman,Y.
More information16.4 Multiattribute Utility Functions
285 Normalized utilities The scale of utilities reaches from the best possible prize u to the worst possible catastrophe u Normalized utilities use a scale with u = 0 and u = 1 Utilities of intermediate
More informationOptimal Control of Markov Decision Processes with Temporal Logic Constraints
Optimal Control of Markov Decision Processes with Temporal Logic Constraints Xuchu (Dennis) Ding Stephen L. Smith Calin Belta Daniela Rus Abstract In this paper, we develop a method to automatically generate
More informationApproximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.
An Upper Bound on the oss from Approximate Optimal-Value Functions Satinder P. Singh Richard C. Yee Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu, yee@cs.umass.edu
More informationReinforcement Learning
Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo MLR, University of Stuttgart
More information15-780: ReinforcementLearning
15-780: ReinforcementLearning J. Zico Kolter March 2, 2016 1 Outline Challenge of RL Model-based methods Model-free methods Exploration and exploitation 2 Outline Challenge of RL Model-based methods Model-free
More information5. Solving the Bellman Equation
5. Solving the Bellman Equation In the next two lectures, we will look at several methods to solve Bellman s Equation (BE) for the stochastic shortest path problem: Value Iteration, Policy Iteration and
More informationA Gentle Introduction to Reinforcement Learning
A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,
More informationAn Analytic Solution to Discrete Bayesian Reinforcement Learning
An Analytic Solution to Discrete Bayesian Reinforcement Learning Pascal Poupart (U of Waterloo) Nikos Vlassis (U of Amsterdam) Jesse Hoey (U of Toronto) Kevin Regan (U of Waterloo) 1 Motivation Automated
More informationApproximate Dynamic Programming
Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.
More informationBayes-Adaptive POMDPs 1
Bayes-Adaptive POMDPs 1 Stéphane Ross, Brahim Chaib-draa and Joelle Pineau SOCS-TR-007.6 School of Computer Science McGill University Montreal, Qc, Canada Department of Computer Science and Software Engineering
More information