Planning Under Uncertainty II

Size: px

Start display at page:

Download "Planning Under Uncertainty II"

Bertha Horton
5 years ago
Views:

1 Planning Under Uncertainty II Intelligent Robotics 2014/15 Bruno Lacerda

2 Announcement No class next Monday - 17/11/2014 2

3 Previous Lecture Approach to cope with uncertainty on outcome of actions Markov Decision Processes (MDPs) - Model definition - Planning for MDPs (Value iteration) 3

4 Previous Lecture 4

5 Previous Lecture An MDP is a tuple is a finite set of states is the initial state Markov assumption is a finite set of actions is a probabilistic transition function For all, 5

6 Previous Lecture Plans for MDPs are represented by policies. A memoryless policy is of the form We can find memoryless policies to maximize the discounted cumulative reward for a reward structure 6

7 Previous Lecture 1. For each do: end for 3. repeat until 1. For each do: end for 4. end repeat 5. return 7

8 This Lecture Cost-optimal policy generation with temporal logic goals Partially observable Markov decision processes (POMDPs) 8

9 Cost-optimal Policy Generation with Temporal Logic Goals

10 Motivation Sometimes we want to specify more intricate goals for our robot Patrol the hall area, and if someone asks to go to a room, guide them there Until now, we use rewards to specify our goals. How would you create a reward structure for the task above? - Going from a natural language specification to a reward structure is far from being straightforward Linear temporal logic provides an intuitive way to specify such tasks 10

11 Linear Temporal Logic Extension of propositional logic which allows reasoning about infinite sequences of states. Propositional connectives + new operators to reason about time: - X operator: read next - (X p) means that p must be true in the next state q q p,q q 11

12 Linear Temporal Logic Extension of propositional logic which allows reasoning about infinite sequences of states. Propositional connectives + new operators to reason about time: - G operator: read always - (G p) means that p must be true for all states p p,q p p,q p p p,q q p,q p 12

13 Linear Temporal Logic Extension of propositional logic which allows reasoning about infinite sequences of states. Propositional connectives + new operators to reason about time: - F operator: read eventually - (F p) means that there must exist at least one state where p is true q q q p,q q q q q q q 13

14 Linear Temporal Logic Extension of propositional logic which allows reasoning about infinite sequences of states. Propositional connectives + new operators to reason about time: - U operator: read until - (p U q) means that p must be true in all states until we reach a state where q is true p p,r p q r p p p r q 14

15 Linear Temporal Logic We will restrict ourselves to co-safe LTL - Class of LTL formulas that can be satisfied by finite traces ( p) U q not p until q r r r q p G F p always eventually p p,q q q p,q q Syntactic restriction - Formulas in positive normal form, using only the 'X', 'F' and 'U' operators 15

16 Policy Generation for LTL ii is a finite set of atomic propositions maps each state to the set of atomic propositions that are true in that state - Logical representation of the current state of the system - E.g., - LTL formulas are written over. E.g., don t leave the kitchen without the cup 16

17 Policy Generation for LTL Problem: Given co-safe LTL formula φ, find policy to minimize expected cumulative cost to satisfy φ Given Find Automata-theoretic approach to LTL model checking is used - A co-safe LTL formula φ can be translated into a deterministic finite state automaton Aφ 17

18 Policy Generation for LTL MDP Automaton Product MDP 18

19 Policy Generation for LTL Policy generation for co-safe LTL can be solved by performing value iteration on the product MDP MDP + Rewards (or Costs) Value iteration (Last lecture) MDP + co-safe LTL Value iteration - Automaton component of states can be seen as a memory mechanism? Product MDP + Costs equivalent problems - We are now generating finite memory policies 19

20 Application to Motion Planning MDP model of navigation graph. Probabilities of navigation failures and expected time between nodes are learned by the robot 20

21 Application to Motion Planning MDP model of navigation graph. Probabilities of navigation failures and expected time between nodes are learned by the robot 21

22 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit v4 and v28 None visited 22

23 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit v4 and v28 v4 visited, v28 not visited 23

24 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit v4 and v28 v28 visited, v4 not visited 24

25 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit v28 while avoiding v15, and then visit v15 v28 not visited 25

26 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit v28 while avoiding v15, and then visit v15 v28 visited 26

27 Application to Motion Planning No navigation failures Equal costs for all transitions Minimize expected cost to visit either v25 or v28 27

28 Application to Motion Planning Edges between first and second row with cost 10 10% failure probability between v13 and v17 Minimize expected cost to visit either v25 or v28 28

29 Application to Motion Planning Edges between first and second row with cost 10 50% failure probability between v13 and v17 Cost to recover from failure is 55 29

30 Summary LTL allows the specification of intricate tasks in an intuitive manner We reduce the policy generation problem for co-safe LTL to value iteration on a Product MDP We can also send new goals during execution and regenerate optimal policies on-the-fly - not described in the lecture This approach has been implemented for high-level motion planning - Future work: Better MDP model of an office-like environment 30

31 Summary This approach not only provides optimal policies, but also time estimates for their execution on different times of day - Can be integrated with a scheduler that orders execution of tasks throughout the day - Lenka Mudrová s research on

32 Reading [Lacerda, Parker, Hawes] - PlanSIG 13, IROS 14 Principles of Model Checking [Bayer, Katoen] - Chapters 5, 10 Automated Planning (Theory and Practice) [Ghallab, Nau, Traverso] - Chapter 17 32

33 Partially Observable Markov Decision Processes

34 We haven t used this belief and assume perfect state estimation when generating policies/plans Motivation 34

35 We haven t used this belief and assume perfect state estimation when generating policies/plans Motivation Partially Observable Markov Decision Processes 35

36 Partially Observable MDPs ii is a set of observations (what the robot can see with its sensors) is a set of conditional probabilities, a sensor model - l is the probability of observing o given that we reached s after executing a 36

37 Beliefs The current state of execution on a POMDP is now a belief over S - MDP: - POMDP: Example: Assume - MDP: s2 I m at state s2 - POMDP: [0.2, 0.7, 0.1, 0] I m either at s1 with probability 0.2; or s2 with probability 0.7; or s3 with probability

38 Policies Now our policies need to be over the belief space - MDP: - POMDP: Infinite set 38

39 Policies Fortunately, for the maximum discounted cumulative reward problem, there are optimal policies that partition the belief space in a finite number of sets For example, an optimal policy can be of the form Of course in general these policies have a lot more cases in them, but still they can be represented finitely 39

40 Policies There are algorithms to generate these policies. However they are hard: - They are pretty involved and require some effort to understand - In computational complexity terms, generating policies for POMDPs is PSPACE-hard Awful scaling properties We won t get into these solvers in this course 40

41 Summary POMDP models allow coping with uncertain outcome of actions, plus uncertainty on the current state - Very elegant models for describing robot systems However, generating policies for them is possible, but very intractable - Lots of current research focuses on tackling this issue 41

42 Reading Probabilistic Robotics [Thrun, Burgard, Fox] - Chapters 15, 16 POMDP webpage [Cassandra]:

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation Dave Parker University of Birmingham (joint work with Bruno Lacerda, Nick Hawes) AIMS CDT, Oxford, May 2015 Overview Probabilistic