Planning Under Uncertainty II
|
|
- Bertha Horton
- 5 years ago
- Views:
Transcription
1 Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda
2 Announcement No class next Monday - 17/11/2014 2
3 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov Decision Processes (MDPs) - Model definition - Planning for MDPs (Value iteration) 3
4 Previous Lecture 4
5 Previous Lecture An MDP is a tuple is a finite set of states is the initial state Markov assumption is a finite set of actions is a probabilistic transition function For all, 5
6 Previous Lecture Plans for MDPs are represented by policies. A memoryless policy is of the form We can find memoryless policies to maximize the discounted cumulative reward for a reward structure 6
7 Previous Lecture 1. For each do: end for 3. repeat until 1. For each do: end for 4. end repeat 5. return 7
8 This Lecture Cost-optimal policy generation with temporal logic goals Partially observable Markov decision processes (POMDPs) 8
9 Cost-optimal Policy Generation with Temporal Logic Goals
10 Motivation Sometimes we want to specify more intricate goals for our robot Patrol the hall area, and if someone asks to go to a room, guide them there Until now, we use rewards to specify our goals. How would you create a reward structure for the task above? - Going from a natural language specification to a reward structure is far from being straightforward Linear temporal logic provides an intuitive way to specify such tasks 10
11 Linear Temporal Logic Extension of propositional logic which allows reasoning about infinite sequences of states. Propositional connectives + new operators to reason about time: - X operator: read next - (X p) means that p must be true in the next state q q p,q q 11
12 Linear Temporal Logic Extension of propositional logic which allows reasoning about infinite sequences of states. Propositional connectives + new operators to reason about time: - G operator: read always - (G p) means that p must be true for all states p p,q p p,q p p p,q q p,q p 12
13 Linear Temporal Logic Extension of propositional logic which allows reasoning about infinite sequences of states. Propositional connectives + new operators to reason about time: - F operator: read eventually - (F p) means that there must exist at least one state where p is true q q q p,q q q q q q q 13
14 Linear Temporal Logic Extension of propositional logic which allows reasoning about infinite sequences of states. Propositional connectives + new operators to reason about time: - U operator: read until - (p U q) means that p must be true in all states until we reach a state where q is true p p,r p q r p p p r q 14
15 Linear Temporal Logic We will restrict ourselves to co-safe LTL - Class of LTL formulas that can be satisfied by finite traces ( p) U q not p until q r r r q p G F p always eventually p p,q q q p,q q Syntactic restriction - Formulas in positive normal form, using only the 'X', 'F' and 'U' operators 15
16 Policy Generation for LTL ii is a finite set of atomic propositions maps each state to the set of atomic propositions that are true in that state - Logical representation of the current state of the system - E.g., - LTL formulas are written over. E.g., don t leave the kitchen without the cup 16
17 Policy Generation for LTL Problem: Given co-safe LTL formula φ, find policy to minimize expected cumulative cost to satisfy φ Given Find Automata-theoretic approach to LTL model checking is used - A co-safe LTL formula φ can be translated into a deterministic finite state automaton Aφ 17
18 Policy Generation for LTL MDP Automaton Product MDP 18
19 Policy Generation for LTL Policy generation for co-safe LTL can be solved by performing value iteration on the product MDP MDP + Rewards (or Costs) Value iteration (Last lecture) MDP + co-safe LTL Value iteration - Automaton component of states can be seen as a memory mechanism? Product MDP + Costs equivalent problems - We are now generating finite memory policies 19
20 Application to Motion Planning MDP model of navigation graph. Probabilities of navigation failures and expected time between nodes are learned by the robot 20
21 Application to Motion Planning MDP model of navigation graph. Probabilities of navigation failures and expected time between nodes are learned by the robot 21
22 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit v4 and v28 None visited 22
23 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit v4 and v28 v4 visited, v28 not visited 23
24 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit v4 and v28 v28 visited, v4 not visited 24
25 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit v28 while avoiding v15, and then visit v15 v28 not visited 25
26 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit v28 while avoiding v15, and then visit v15 v28 visited 26
27 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit either v25 or v28 27
28 Application to Motion Planning Edges between first and second row with cost 10 10% failure probability between v13 and v17 Minimize expected cost to visit either v25 or v28 28
29 Application to Motion Planning Edges between first and second row with cost 10 50% failure probability between v13 and v17 Cost to recover from failure is 55 29
30 Summary LTL allows the specification of intricate tasks in an intuitive manner We reduce the policy generation problem for co-safe LTL to value iteration on a Product MDP We can also send new goals during execution and regenerate optimal policies on-the-fly - not described in the lecture This approach has been implemented for high-level motion planning - Future work: Better MDP model of an office-like environment 30
31 Summary This approach not only provides optimal policies, but also time estimates for their execution on different times of day - Can be integrated with a scheduler that orders execution of tasks throughout the day - Lenka Mudrová s research on
32 Reading [Lacerda, Parker, Hawes] - PlanSIG 13, IROS 14 Principles of Model Checking [Bayer, Katoen] - Chapters 5, 10 Automated Planning (Theory and Practice) [Ghallab, Nau, Traverso] - Chapter 17 32
33 Partially Observable Markov Decision Processes
34 We haven t used this belief and assume perfect state estimation when generating policies/plans Motivation 34
35 We haven t used this belief and assume perfect state estimation when generating policies/plans Motivation Partially Observable Markov Decision Processes 35
36 Partially Observable MDPs ii is a set of observations (what the robot can see with its sensors) is a set of conditional probabilities, a sensor model - l is the probability of observing o given that we reached s after executing a 36
37 Beliefs The current state of execution on a POMDP is now a belief over S - MDP: - POMDP: Example: Assume - MDP: s2 I m at state s2 - POMDP: [0.2, 0.7, 0.1, 0] I m either at s1 with probability 0.2; or s2 with probability 0.7; or s3 with probability
38 Policies Now our policies need to be over the belief space - MDP: - POMDP: Infinite set 38
39 Policies Fortunately, for the maximum discounted cumulative reward problem, there are optimal policies that partition the belief space in a finite number of sets For example, an optimal policy can be of the form Of course in general these policies have a lot more cases in them, but still they can be represented finitely 39
40 Policies There are algorithms to generate these policies. However they are hard: - They are pretty involved and require some effort to understand - In computational complexity terms, generating policies for POMDPs is PSPACE-hard Awful scaling properties We won t get into these solvers in this course 40
41 Summary POMDP models allow coping with uncertain outcome of actions, plus uncertainty on the current state - Very elegant models for describing robot systems However, generating policies for them is possible, but very intractable - Lots of current research focuses on tackling this issue 41
42 Reading Probabilistic Robotics [Thrun, Burgard, Fox] - Chapters 15, 16 POMDP webpage [Cassandra]:
Probabilistic Model Checking and Strategy Synthesis for Robot Navigation
Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Dave Parker University of Birmingham (joint work with Bruno Lacerda, Nick Hawes) AIMS CDT, Oxford, May 2015 Overview Probabilistic
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationProbabilistic Model Checking: Advances and Applications
Probabilistic Model Checking: Advances and Applications Dave Parker University of Birmingham Highlights 18, Berlin, September 2018 Overview Probabilistic model checking & PRISM Markov decision processes
More informationMarkov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525
Markov Models and Reinforcement Learning Stephen G. Ware CSCI 4525 / 5525 Camera Vacuum World (CVW) 2 discrete rooms with cameras that detect dirt. A mobile robot with a vacuum. The goal is to ensure both
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationControl Theory : Course Summary
Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields
More informationMulti-Objective Planning with Multiple High Level Task Specifications
Multi-Objective Planning with Multiple High Level Task Specifications Seyedshams Feyzabadi Stefano Carpin Abstract We present an algorithm to solve a sequential stochastic decision making problem whereby
More informationLecture 3: Markov Decision Processes
Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov
More informationChapter 16 Planning Based on Markov Decision Processes
Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until
More informationOpen Problem: Approximate Planning of POMDPs in the class of Memoryless Policies
Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Kamyar Azizzadenesheli U.C. Irvine Joint work with Prof. Anima Anandkumar and Dr. Alessandro Lazaric. Motivation +1 Agent-Environment
More informationCS 4649/7649 Robot Intelligence: Planning
CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides
More informationDiscrete planning (an introduction)
Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationA Gentle Introduction to Reinforcement Learning
A Gentle Introduction to Reinforcement Learning Alexander Jung 2018 1 Introduction and Motivation Consider the cleaning robot Rumba which has to clean the office room B329. In order to keep things simple,
More informationMarkov decision processes (MDP) CS 416 Artificial Intelligence. Iterative solution of Bellman equations. Building an optimal policy.
Page 1 Markov decision processes (MDP) CS 416 Artificial Intelligence Lecture 21 Making Complex Decisions Chapter 17 Initial State S 0 Transition Model T (s, a, s ) How does Markov apply here? Uncertainty
More informationAn Introduction to Markov Decision Processes. MDP Tutorial - 1
An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationVerification and Control of Partially Observable Probabilistic Systems
Verification and Control of Partially Observable Probabilistic Systems Gethin Norman 1, David Parker 2, and Xueyi Zou 3 1 School of Computing Science, University of Glasgow, UK 2 School of Computer Science,
More informationIntroduction to Artificial Intelligence (AI)
Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture
More informationFinite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs
Finite-State Controllers Based on Mealy Machines for Centralized and Decentralized POMDPs Christopher Amato Department of Computer Science University of Massachusetts Amherst, MA 01003 USA camato@cs.umass.edu
More informationTowards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs. Lorenzo Nardi and Cyrill Stachniss
Towards Uncertainty-Aware Path Planning On Road Networks Using Augmented-MDPs Lorenzo Nardi and Cyrill Stachniss Navigation under uncertainty C B C B A A 2 `B` is the most likely position C B C B A A 3
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationONR MURI AIRFOILS: Animal Inspired Robust Flight with Outer and Inner Loop Strategies. Calin Belta
ONR MURI AIRFOILS: Animal Inspired Robust Flight with Outer and Inner Loop Strategies Provable safety for animal inspired agile flight Calin Belta Hybrid and Networked Systems (HyNeSs) Lab Department of
More informationSequential Decision Problems
Sequential Decision Problems Michael A. Goodrich November 10, 2006 If I make changes to these notes after they are posted and if these changes are important (beyond cosmetic), the changes will highlighted
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationAutonomous Helicopter Flight via Reinforcement Learning
Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationRL 14: POMDPs continued
RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationarxiv: v1 [cs.lo] 6 Mar 2012
Control of Probabilistic Systems under Dynamic, Partially Known Environments with Temporal Logic Specifications Tichakorn Wongpiromsarn and Emilio Frazzoli arxiv:203.77v [cs.lo] 6 Mar 202 Abstract We consider
More information15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)
15-780: Graduate Artificial Intelligence Reinforcement learning (RL) From MDPs to RL We still use the same Markov model with rewards and actions But there are a few differences: 1. We do not assume we
More informationReinforcement Learning and Deep Reinforcement Learning
Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationDialogue as a Decision Making Process
Dialogue as a Decision Making Process Nicholas Roy Challenges of Autonomy in the Real World Wide range of sensors Noisy sensors World dynamics Adaptability Incomplete information Robustness under uncertainty
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationLTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees
LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees Xu Chu (Dennis) Ding Stephen L. Smith Calin Belta Daniela Rus Department of Mechanical Engineering, Boston University, Boston,
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationMS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction
MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationA Decentralized Approach to Multi-agent Planning in the Presence of Constraints and Uncertainty
2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center May 9-13, 2011, Shanghai, China A Decentralized Approach to Multi-agent Planning in the Presence of
More informationReinforcement Learning II
Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini
More informationThe State Explosion Problem
The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis
More informationControlling probabilistic systems under partial observation an automata and verification perspective
Controlling probabilistic systems under partial observation an automata and verification perspective Nathalie Bertrand, Inria Rennes, France Uncertainty in Computation Workshop October 4th 2016, Simons
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationOn Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets
On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets Pablo Samuel Castro pcastr@cs.mcgill.ca McGill University Joint work with: Doina Precup and Prakash
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationADVANCED ROBOTICS. PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes
ADVANCED ROBOTICS PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes Pedro U. Lima Instituto Superior Técnico/Instituto de Sistemas e Robótica September 2009 Reviewed April
More informationSequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague
Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationPlanning and Acting in Partially Observable Stochastic Domains
Planning and Acting in Partially Observable Stochastic Domains Leslie Pack Kaelbling*, Michael L. Littman**, Anthony R. Cassandra*** *Computer Science Department, Brown University, Providence, RI, USA
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationChapter 17 Planning Based on Model Checking
Lecture slides for Automated Planning: Theory and Practice Chapter 17 Planning Based on Model Checking Dana S. Nau CMSC 7, AI Planning University of Maryland, Spring 008 1 Motivation c Actions with multiple
More informationCS 7180: Behavioral Modeling and Decisionmaking
CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and
More informationOptimizing Memory-Bounded Controllers for Decentralized POMDPs
Optimizing Memory-Bounded Controllers for Decentralized POMDPs Christopher Amato, Daniel S. Bernstein and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003
More informationOutline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012
CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationCourse on Automated Planning: Transformations
Course on Automated Planning: Transformations Hector Geffner ICREA & Universitat Pompeu Fabra Barcelona, Spain H. Geffner, Course on Automated Planning, Rome, 7/2010 1 AI Planning: Status The good news:
More informationIntelligent Agents. Formal Characteristics of Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University
Intelligent Agents Formal Characteristics of Planning Ute Schmid Cognitive Systems, Applied Computer Science, Bamberg University Extensions to the slides for chapter 3 of Dana Nau with contributions by
More informationPartially observable Markov decision processes. Department of Computer Science, Czech Technical University in Prague
Partially observable Markov decision processes Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:
More informationRL 14: Simplifications of POMDPs
RL 14: Simplifications of POMDPs Michael Herrmann University of Edinburgh, School of Informatics 04/03/2016 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationProbabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford
Probabilistic Model Checking Michaelmas Term 20 Dr. Dave Parker Department of Computer Science University of Oxford Overview PCTL for MDPs syntax, semantics, examples PCTL model checking next, bounded
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationGPU Accelerated Markov Decision Processes in Crowd Simulation
GPU Accelerated Markov Decision Processes in Crowd Simulation Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National
More informationReinforcement Learning
CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act
More informationCombining Memory and Landmarks with Predictive State Representations
Combining Memory and Landmarks with Predictive State Representations Michael R. James and Britton Wolfe and Satinder Singh Computer Science and Engineering University of Michigan {mrjames, bdwolfe, baveja}@umich.edu
More informationPlanning With Information States: A Survey Term Project for cs397sml Spring 2002
Planning With Information States: A Survey Term Project for cs397sml Spring 2002 Jason O Kane jokane@uiuc.edu April 18, 2003 1 Introduction Classical planning generally depends on the assumption that the
More informationRobust Motion Planning using Markov Decision Processes and Quadtree Decomposition
Robust Motion Planning using Markov Decision Processes and Quadtree Decomposition Julien Burlet, Olivier Aycard 1 and Thierry Fraichard 2 Inria Rhône-Alpes & Gravir Lab., Grenoble (FR) {julien.burlet &
More informationPracticable Robust Markov Decision Processes
Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)
More informationT Reactive Systems: Temporal Logic LTL
Tik-79.186 Reactive Systems 1 T-79.186 Reactive Systems: Temporal Logic LTL Spring 2005, Lecture 4 January 31, 2005 Tik-79.186 Reactive Systems 2 Temporal Logics Temporal logics are currently the most
More informationReinforcement Learning. Machine Learning, Fall 2010
Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30
More informationChapter 17 Planning Based on Model Checking
Lecture slides for Automated Planning: Theory and Practice Chapter 17 Planning Based on Model Checking Dana S. Nau University of Maryland 1:19 PM February 9, 01 1 Motivation c a b Actions with multiple
More informationProbabilistic Planning. George Konidaris
Probabilistic Planning George Konidaris gdk@cs.brown.edu Fall 2017 The Planning Problem Finding a sequence of actions to achieve some goal. Plans It s great when a plan just works but the world doesn t
More informationAutomata-Theoretic LTL Model-Checking
Automata-Theoretic LTL Model-Checking Arie Gurfinkel arie@cmu.edu SEI/CMU Automata-Theoretic LTL Model-Checking p.1 LTL - Linear Time Logic (Pn 77) Determines Patterns on Infinite Traces Atomic Propositions
More informationSFM-11:CONNECT Summer School, Bertinoro, June 2011
SFM-:CONNECT Summer School, Bertinoro, June 20 EU-FP7: CONNECT LSCITS/PSS VERIWARE Part 3 Markov decision processes Overview Lectures and 2: Introduction 2 Discrete-time Markov chains 3 Markov decision
More informationSequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague
Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague /doku.php/courses/a4b33zui/start pagenda Previous lecture: individual rational
More informationUsing first-order logic, formalize the following knowledge:
Probabilistic Artificial Intelligence Final Exam Feb 2, 2016 Time limit: 120 minutes Number of pages: 19 Total points: 100 You can use the back of the pages if you run out of space. Collaboration on the
More information1 [15 points] Search Strategies
Probabilistic Foundations of Artificial Intelligence Final Exam Date: 29 January 2013 Time limit: 120 minutes Number of pages: 12 You can use the back of the pages if you run out of space. strictly forbidden.
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationBayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty
Bayes-Adaptive POMDPs: Toward an Optimal Policy for Learning POMDPs with Parameter Uncertainty Stéphane Ross School of Computer Science McGill University, Montreal (Qc), Canada, H3A 2A7 stephane.ross@mail.mcgill.ca
More informationCS 570: Machine Learning Seminar. Fall 2016
CS 570: Machine Learning Seminar Fall 2016 Class Information Class web page: http://web.cecs.pdx.edu/~mm/mlseminar2016-2017/fall2016/ Class mailing list: cs570@cs.pdx.edu My office hours: T,Th, 2-3pm or
More informationCounterexamples for Robotic Planning Explained in Structured Language
Counterexamples for Robotic Planning Explained in Structured Language Lu Feng 1, Mahsa Ghasemi 2, Kai-Wei Chang 3, and Ufuk Topcu 4 Abstract Automated techniques such as model checking have been used to
More informationPlanning by Probabilistic Inference
Planning by Probabilistic Inference Hagai Attias Microsoft Research 1 Microsoft Way Redmond, WA 98052 Abstract This paper presents and demonstrates a new approach to the problem of planning under uncertainty.
More informationOptimally Solving Dec-POMDPs as Continuous-State MDPs
Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI
More informationBüchi Automata and Linear Temporal Logic
Büchi Automata and Linear Temporal Logic Joshua D. Guttman Worcester Polytechnic Institute 18 February 2010 Guttman ( WPI ) Büchi & LTL 18 Feb 10 1 / 10 Büchi Automata Definition A Büchi automaton is a
More informationProbabilistic robot planning under model uncertainty: an active learning approach
Probabilistic robot planning under model uncertainty: an active learning approach Robin JAULMES, Joelle PINEAU and Doina PRECUP School of Computer Science McGill University Montreal, QC CANADA H3A 2A7
More informationA Framework for Automated Competitive Analysis of On-line Scheduling of Firm-Deadline Tasks
A Framework for Automated Competitive Analysis of On-line Scheduling of Firm-Deadline Tasks Krishnendu Chatterjee 1, Andreas Pavlogiannis 1, Alexander Kößler 2, Ulrich Schmid 2 1 IST Austria, 2 TU Wien
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationDuality in Probabilistic Automata
Duality in Probabilistic Automata Chris Hundt Prakash Panangaden Joelle Pineau Doina Precup Gavin Seal McGill University MFPS May 2006 Genoa p.1/40 Overview We have discovered an - apparently - new kind
More informationRobust Control of Uncertain Markov Decision Processes with Temporal Logic Specifications
Robust Control of Uncertain Markov Decision Processes with Temporal Logic Specifications Eric M. Wolff, Ufuk Topcu, and Richard M. Murray Abstract We present a method for designing a robust control policy
More informationReinforcement Learning
Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationEfficient Maximization in Solving POMDPs
Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University
More information