AUTONOMOUS SYSTEMS. Task Planning. Pedro U. Lima M. Isabel Ribeiro Luis Custódio

Size: px

Start display at page:

Download "AUTONOMOUS SYSTEMS. Task Planning. Pedro U. Lima M. Isabel Ribeiro Luis Custódio"

Martin Smith
6 years ago
Views:

1 AUTONOMOUS SYSTEMS Task Planning Pedro U. Lima M. Isabel Ribeiro Luis Custódio Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal March 2007 Revised by Pedro U. Lima in November 2015

2 Outline 1. Planning Problem 2. Logic 3. Logic-Based Planning: Situation Calculus, STRIPS 4. Plan Representation and Modeling: Petri Net Task Models 5. Plan Analysis 6. Planning Under Uncertainty 7. Markov Decision Processes (MDP) 8. Dynamic Programming Solution of MDPs 9. Reinforcement Learning Solution of MDPs

Planning Planning consists of determining the

appropriate sequence of actions to move a robot

3 Planning Planning consists of determining the sequence of actions that enables reaching the goal(s) of an agent. Robot Task Planning consists of determining the appropriate sequence of actions to move a robot from the current world situation to a world situation that satisfies its preferences.

4 Planning Robot Task Planning [Courtesy of JSK Lab U. Tokyo, Japan]

5 Logic Logic can be seen as a language to represent the knowledge about the world and a particular problem to be solved. Syntactic System Set of accepted symbols Set of rules establishing how symbols can be aggregated so as to build formulas/ sentences Alphabet Formation rules LANGUAGE Set of rules that establish how to derive formulas from other formulas INFERENCE RULES

6 Logic Semantic System Assigns a meaning to the language formulas World (semantics) Facts Language (syntax) Formulas

7 Logic Syntactic System vs Semantic System Language rules g + r + e + e + n green Associates a color to the word green Arithmetic rules x, y expressions representing numbers x > y is a formula over numbers Fact is true when the number represented by x is greater than the number represented by y

8 Logic Typically, one deals only with the world issues relevant for the problem, through a conceptualization of the reality Objects and their relations are defined Functions given a set of objects, a function establishes which object is related to the object(s) in the set and how, e.g., left_room(kitchen) Relations given a set of objects, establishes if that set is related in a certain way e.g., on(laptop, table)

9 Logic The concept of interpretation establishes the link between the language elements and the conceptualization of the reality elements (objects, functions and relations). Given a formula written in the defined language, its interpretation is designated as proposition A proposition is true iff it correctly describes the world, based on the adopted conceptualization of the reality A formula is satisfied iff there is an interpretation that associates it to a true proposition

10 Logic A fact is a true proposition for a given (conceptualized) world state The initial known facts compose the initial knowledge base Inference is the process of obtaining new propositions (conclusions) from the knowledge base To ensure that a given reached conclusion is satisfied by the adopted interpretation, only a conclusion satisfied for all the interpretations that satisfy the starting propositions (premises) is accepted. This way, we guarantee that, should the premises be satisfied, so is the conclusion, irrespectively of the interpretation. Inference Rule Ex. (Modus Ponens): Premises Conclusion IF on(a,b) THEN above(a,b) above(a,b) on(a,b)

11 Logic Propositional Logic Facts objects, functions and relations Predicate Logic variables quantifiers

12 Propositional vs Predicate Logic room S1 P1 door P1 room S2 Example R S2 door P2 P2 S1 robot S3 B box room S3 World Model (KB) Propositional Logic robot_inroom_s1 box_inroom_s3 door_p1_connects_rooms_s1_s2 door_p2_connects_rooms_s2_s3 Predicate Logic inroom(<obj>,<room>) <OBJ> ß robot; <ROOM> ß S1 <OBJ> ß box; <ROOM> ß S3 connects(<door>,<room1>,<room2>) <DOOR> ß P1; <ROOM1> ß S1; <ROOM2> ß S2 <DOOR> ß P2; <ROOM1> ß S2; <ROOM2> ß S3

13 Situation Calculus logic handles propositions truth, not action execution logic can not tell which action should be executed at most it can suggest the possible actions time and changes are not adequately handled by basic logic (propositional, predicate) Idea: the world state is represented by a proposition set the set is changed according to received perceptions and executed actions the world evolution is described by diachronic rules, which express how the world changes representation of change Situation Calculus attempts to solve the problems associated to representation and reasoning under changes. It is based on predicate logic and describes the world as a sequence of situations, each of which represents a world state

14 Situation Calculus one situation is generated from another situation by executing an action an argument is added to each property (represented by a predicate) that may change denoting the situation where the property is satisfied Ex: localization( agent, (1,1), S 0 ) localization( agent, (1,2), S 1 ) to represent passing from one situation to another, the following function is used: Result( action, situation) : Α Σ Σ Ex: Result( go_ahead, S 0 ) = S 1

15 Situation Calculus Effect Axioms pre-conditions predicate (to execute the action) (whose logical value changes after the action is executed) State action effects to describe the change(s) due to the action effect(s) e.g.,: x s Present(x, s) Portable(x) Hold(x, Result(pickup, s)) x s Hold(x, s) Hold(x, Result(release, s))

16 Situation Calculus Frame Axioms predicate conditions predicate (logical value in current situation) (for no change) (in the situation following the action) One needs to explain what does not change due to the action execution e.g.,: a x s Hold(x, s) (a release) Hold(x, Result(a, s)) a x s Hold(x, s) (a pickup (Present(x, s) Portable(x)) Hold(x, Result(a, s))

17 Situation Calculus Successor State Axioms merge effect and frame axioms Predicate true in the next situation [ one action makes it true It was true in the previous situation no action made it false] e.g., a x s Hold(x, Result(a, s)) [ (a = pickup Present(x, s) Portable(x)) (Hold(x, s) a release) ] a x s Hold(x, Result(a, s)) [ (a = release) ( Hold(x, s) (a pickup (Present(x, s) Portable(x)))) ]

18 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Effect Axioms: x y s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, y, result(puton(x,y), s)) x y w s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, w, s) ClearTop(w, result(puton(x,y), s))

19 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Frame Axioms: x y z s On(x, y, s) (a PutOn(x,z)) On(x, y, Result(a, s)) x y s ClearTop(y, s) (a PutOn(x,y)) ClearTop(y, Result(a, s))

20 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Resulting Successor State Axioms: x y z a s On(x, y, result(a,s)) [ ( a=puton(x,y) On(x, z, s) ClearTop(x,s) ClearTop(y,s) Block(x) (Block(y) y=m) ) ( a PutOn(x,z) On(x, y, s) ) ] x y z a s ClearTop(z, result(a,s)) [ ( a=puton(x,y) On(x, z, s) ClearTop(x,s) ClearTop(y,s) Block(x) (Block(y) y=m) ) ( a PutOn(x,z) On(x, y, s) ) ]

21 Situation Calculus Example (Blocks World) a b c Initial Situation Action Sequence? Final Situation c b a Predicates: On(x, y, s) ClearTop(x,s) Block(x) Objects: A B C M (blocks and table) Action: PutOn(x, y) Initial State: Block(A) Block(B) Block(C) On(C, M, s 0 ) On(B, C, s 0 ) On(A, B, s 0 ) ClearTop(A, s 0 ) ClearTop(M, s 0 ) Goal State s: Block(A) Block(B) Block(C) On(A, M, s) On(B, A, s) On(C, B, s) ClearTop(C, s) ClearTop(M, s)

22 a b c Effect Axioms: E1) x y s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, y, result(puton(x,y), s)) E2) x y w s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, w, s) ClearTop(w, result(puton(x,y), s)) Frame Axioms: N1) x y z s On(x, y, s) (a PutOn(x,z)) On(x, y, Result(a, s)) N2) x y s ClearTop(y, s) (a PutOn(x,y)) ClearTop(y, Result(a, s)) Initial Situation(s 0 ): Block(A); Block(B); Block(C); On(C, M, s o ); On(B, C, s o ); On(A, B, s 0 ); ClearTop(A, s 0 ); ClearTop(M, s 0 ) s 0 Situation Calculus Example (Blocks World) In s 0, axiom E1) is applicable with x=a and y=m : block(a) (block(m) M = M) ClearTop(A,s 0 ) ClearTop(M,s 0 ) On(A, M, result(puton(a,m), s 0 )) If s 1 = result(puton(a,m), s 0 ) then On(A, M, s 1 ) In s 0, axiom E2) is applicable with x=a, y=m and w=b then ClearTop(B, s 1 ) In s 0, axiom N1) is applicable with w=a, z=m, x=c and y=m then On(C, M, s 1 ) In s 0, axiom N1) is applicable with w=a, z=m, x=b and y=c then On(B, C, s 1 ) In s 0, axiom N1) is not applicable with w=a, z=m, x=a and y=b (i.e., On(A,B,s 1 ) is false) In s 0, axiom N2) is applicable with x=a, w=m and y=a then ClearTop(A, s 1 ) Situation s 1 : On(A, M, s 1 ); On(C, M, s 1 ); On(B, C, s 1 ), ClearTop(B, s 1 ); ClearTop(A, s 1 ); ClearTop(M,s 1 ) b c a

23 Situation Calculus Example (Blocks World) s 1 b c a Effect Axioms: E1) x y s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, y, result(puton(x,y), s)) E2) x y w s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, w, s) ClearTop(w, result(puton(x,y), s)) Frame Axioms: N1) x y z s On(x, y, s) (a PutOn(x,z)) On(x, y, Result(a, s)) N2) x y s ClearTop(y, s) (a PutOn(x,y)) ClearTop(y, Result(a, s)) Initial Situation(s 0 ): Block(A); Block(B); Block(C); On(C, M, s o ); On(B, C, s o ); On(A, B, s 0 ); ClearTop(A, s 0 ); ClearTop(M, s 0 ) In s 1, axiom E1) is applicable with x=b e y=a : block(b) (block(a) A = M) ClearTop(B,s 1 ) ClearTop(A,s 1 ) On(B, A, result(puton(b,a), s 1 )) If s 2 = result(puton(b,a), s 1 ) then On(B, A, s 2 ) In s 1, axiom E2) is applicable with x=b, y=a and w=c then ClearTop(C, s 2 ) In s 1, axiom N1) is applicable with w=b, z=a, x=c and y=m then On(C, M, s 2 ) In s 1, axiom N1) is applicable with w=b, z=a, x=a and y=m then On(A, M, s 2 ) In s 1, axiom N1) is not applicable with w=b, z=a, x=b and y=c (i.e., On(B,C,s 2 ) is false) In s 1, axiom N2) is applicable with x=b, w=a and y=b then ClearTop(B, s 2 ) In s 1, axiom N2) is not applicable with x=b, w=a and y=a (i.e., ClearTop(A,s 2 ) is false) Situation s 2 : On(A, M, s 2 ); On(C, M, s 2 ); On(B, A, s 2 ), ClearTop(C, s 2 ); ClearTop(B, s 2 ); ClearTop(M,s 2 ) c b a

24 Situation Calculus Example (Blocks World) s 2 c b a Effect Axioms: E1) x y s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, y, result(puton(x,y), s)) E2) x y w s block(x) (block(y) y = M) ClearTop(x,s) ClearTop(y,s) On(x, w, s) ClearTop(w, result(puton(x,y), s)) Frame Axioms: N1) x y z s On(x, y, s) (a PutOn(x,z)) On(x, y, Result(a, s)) N2) x y s ClearTop(y, s) (a PutOn(x,y)) ClearTop(y, Result(a, s)) Initial Situation(s 0 ): Block(A); Block(B); Block(C); On(C, M, s o ); On(B, C, s o ); On(A, B, s 0 ); ClearTop(A, s 0 ); ClearTop(M, s 0 ) In s 2, axiom E1) is applicable for x=c e y=b : block(c) (block(b) B = M) ClearTop(C,s 2 ) ClearTop(B,s 2 ) On(C, B, result(puton(c,b), s 2 )) If s 3 = result(puton(c,b), s 2 ) then On(C, B, s 3 ) In s 2, axiom N1) is applicable for w=c, z=b, x=a e y=b then On(B, A, s 3 ) In s 2, axiom N1) is applicable for w=c, z=b, x=a e y=m then On(A, M, s 3 ) In s 2, axiom N1) is not applicable for w=c, z=b, x=c e y=m (i.e., On(C,M,s 3 ) is false) In s 2, axiom N2) é aplicável para x=c, w=b e y=c then ClearTop(C, s 3 ) In s 2, axiom N2) is not applicable for x=c, w=b e y=b (i.e., ClearTop(B,s 3 ) is false) Situation s 3 : On(C, B, s 3 ); On(B, A, s 3 ); On(A, M, s 3 ); ClearTop(C, s 3 ); ClearTop(M,s 3 ) c b a

25 Situation Calculus Example (Blocks World) Initial Situation a b c Action Sequence (plan) [ PutOn(A, M), PutOn(B, A), PutOn(C, B) ] Final Situation c b a

26 Complexity of Planning Problem The problem is intractable in the general case Simplifying assumptions: agent knows everything that is relevant for the planning problem agent knows how its available actions can change the world state from one state to another the planning agent is in control of the world the only state changes are the result of its deliberate actions the agent s preferred world states are constant during a planning episode Based on these assumptions, a typical approach is: first formulate the plan then execute it

27 Extensions of Planning Problem The real world surrounding the robot does not meet most of the simplifying assumptions, especially in dynamic, uncertain environments EXTENSIONS conditional planning: handles uncertainty by enumerating the possible states that may arise after the execution of an action and provides alternative courses of action for each of them plan monitoring and repair: during plan execution, progress is monitored and, when deviations from the predicted nominal conditions occur, the plan execution halts and a revised plan is created continual planning: in dynamic environments, one may allow context and/or agent s preferences changes and plan revision is an ongoing process rather than one triggered by failures of the nominal plan. Planning is not made in too much detail into the future, and it is interleaved with execution

Basic Planning Problem Formulation A possible formulation of the Planning problem is (Lavalle, 1996): 1. A nonempty state space, X, which is a finite or countably infinite set of states. 2.

28 Basic Planning Problem Formulation A possible formulation of the Planning problem is (Lavalle, 1996): 1. A nonempty state space, X, which is a finite or countably infinite set of states. 2. For each state, x X, a finite action space, U(x). 3. A state transition function, f, which produces a state, f(x; u) X, for every x X and u U(x). The state transition equation is derived from f as x = f(x; u). 4. An initial state, x I X. 5. A goal set, X G X.

29 Basic Planning Problem Formulation represent the planning problem as a directed state transition graph: set of vertices is the state space, X a directed edge from x X to x X exists in the graph if there exists an action u U(x) such that x = f(x; u) the initial state and goal set are designated as special vertices in the graph. X1 obj on the left u1 pick u2 push X2 obj in hand u3 release u1 pick X4 obj on the right X3 obj on ground

30 Basic Planning Problem Formulation Based on this formulation, several problem solving algorithms are available to find a feasible plan (i.e., one that leads from the initial to one of the goal states, not necessarily optimal). Examples: breadth-first depth-first best-first A*... Algorithms to solve Discrete Optimal Planning problems also exist, typically based on Dynamic Programming: find the sequence of actions that lead to the goal set and optimize some criterion, such as distance traversed or energy spent costs are associated to actions.

31 Logic-Based Planning ADVANTAGES build compact representations for discrete planning problems, when their regularity allows such compression convenient for producing output that logically explains the steps involved to arrive at some goal DISADVANTAGES difficult to generalize to enable concepts such as modeling uncertainty, unpredictability, sensing errors, and game theory to be incorporated into planning

32 Logic-Based Planning It is possible to convert the logic-based formulation into the graph-based formulation, e.g., the set of literals may be encoded as a binary string by imposing a linear ordering on the instances and predicates, and using 1s for true and 0 for false. This way, even optimal solutions can be found, if we associate costs to actions obj_on_the_left obj_on_the_right obj_in_hand obj_on_ground x1 x

Logic-Based Planning However, the problem dimension may become

, constant number k of arguments per predicate à k space state

predicates ( P = 4) with 1 argument (k = 1) left(<obj>) right(<obj>)

33 Logic-Based Planning However, the problem dimension may become intractable, even for a small number of predicates and instances e.g., constant number k of arguments per predicate à k space state dimension is 2 P I, where P is the number of predicates I the number of instances per predicate argument. obj_on_the_left obj_on_the_right obj_in_hand obj_on_ground 4 predicates ( P = 4) with 1 argument (k = 1) left(<obj>) right(<obj>) inhand(<obj>) ground(<obj>) Autonomous Systems Pedro Lima, M. Isabel Ribeiro 3 objects ( I = 3) bolt nut bin Task Planning

34 Logic-Based Planning A STRIPS-like Planning formulation is (Lavalle, 1996): 1. A nonempty set, I, of instances. 2. A nonempty set, P, of predicates, which are binary-valued (partial) functions of one of more instances. Each application of a predicate to a specific set of instances is called a positive literal if the predicate is true or a negative literal if it is false. 3. A nonempty set, O, of operators, each of which has: 1) preconditions, which is a set of positive and negative literals that must hold for the operator to apply, and 2) effects, which is a set of positive and negative literals that are the result of applying the operator. 4. An initial set, S, which is expressed as a set of positive literals. All literals not appearing in S are assumed to be negative. 5. A goal set, G, which is expressed as a set of both positive and negative literals.

35 Logic-Based Planning STRIPS (Stanford Research Institute Problem Solver) (Fikes, Nilsson, 1971) Example: mobile robot should move a box from room S3 to S2 room S1 P1 door P1 room S2 World Model (KB) inroom(robot, room s1) inroom(box, room s3) connects(door p1, room s1, room s2) connects(door p2, room s2, room s3) S1 R robot S2 S3 B box door P2 P2 room S3 Goal inroom(box, room s2) Plan (Action Sequence) move(robot, room s1, room s3) search(box) push(box, room s3, room s2, door p2)

Logic-Based Planning STRIPS (Stanford Research Institute Problem Solver) (Fikes, Nilsson, 1971) tasks are specified as well-formed-formulas or wff (predicate

36 Logic-Based Planning STRIPS (Stanford Research Institute Problem Solver) (Fikes, Nilsson, 1971) tasks are specified as well-formed-formulas or wff (predicate calculus) planning system attempts to find an action sequence that modifies the world models so as to make the wff TRUE to generate a plan, the effect of each action is modeled

37 Logic-Based Planning STRIPS (Stanford Research Institute Problem Solver) (Fikes, Nilsson, 1971) operator (actions over world model) world model S i clause set operator pre-conditions add clauses remove clauses world model S i+1 clause set inroom(robot, room s1) inroom(box, room s3) connects(door p1, room s1, room s2) connects(door p2, room s2, room s3) inroom(robot, room s2) inroom(box, room s3) connects(door p1, room s1, room s2) connects(door p2, room s2, room s3) 1. is goal clause in the current world model? YES: success NO: 2. search in the operator list one whose pre-conditions are satisfied and that, when applied to the current one, produces a new world model where the goal is closer to be satisfied 3. GoTo 1

38 Logic-Based Planning STRIPS and Situation Calculus STRIPS Pre-conditions: inroom(robot, room s1) connects(door p1, room s1, room s2) OPERATOR move(robot, room s1, room s2) Effects: Add: inroom(robot, room s2) Delete: inroom(robot, room s1) Situation Calculus a x s room(s2) inroom(robot, s2, Result(a,s)) [room(s1) (a = move(robot, s1, s2) inroom(robot, s1,s) (room(x) inroom(robot, s2,s) a move(robot, s2, x) ) ]

39 Plan Representation and Task Modeling How to represent and determine the right plan? Behavior switching for a soccer robot lost_ball undribbable TakeBall2Goal no_ball success Score saw_ball AND ShouldIGo ClearBall lost_ball obstacle success success NOT ShouldIGo unreachable_ball OR lost_ball GetClose2Ball ShouldIGo saw_ball AND NOT ShouldIGo saw_ball AND ShouldIGo Standby saw_ball AND NOT ShouldIGo lost_ball success OR unreachable_posture OR (saw_ball AND NOT ShouldIGo) saw_ball AND ShouldIGo GoEmptySpot success OR unreachable_posture GoHome? success OR (NOT can_shoot_safely) success OR lost_ball

40 Plan Representation and Task Modeling to design a plan that meets some specifications we need a model of the robot task the plan is supposed to carry out a model enables performance analysis, formal verification (model checking) robot tasks are discrete event systems (DES) event-driven (not time-driven) discrete state space (not continuous)

41 Plan Representation and Task Modeling x time-driven continuous state space t x e 3 e 1 e 3 x 0 x 2 x 1 x 0 e 2 event-driven discrete state space t 1 t 2 t 3 t 4 t

42 Plan Representation and Task Modeling DES: State Machines / Finite State Automata

43 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

44 Petri Nets Def.: A Petri net (PN) graph or structure is a weighted bipartite graph (P,T,A,w), where: P={p 1, p 2,... p n } is the finite set of places T ={t 1, t 2,... t m } is the finite set of transitions A ( P T) ( T P) is the set of arcs from places to transitions (p i,t j ) and transitions to places (t j,p i ) w: A { 1,2,3, } is the weight function on the arcs Set of input places to t j T I( t ) = { p P : ( p, t ) A} Set of output places from j i t j T O( t ) = { p P : ( t, p ) A} j i i j j i

45 Petri Nets Def.: A marked Petri net is a five-tuple (P,T,A,w,x), where (P,T,A,w) is a Petri net graph and x is a marking of the set of n places P; x = [ x( p1), x( p2),, x( p n )] N is the row vector associated with x. p 2 t 2 p 4 p 1 t 1 p 3 t 3

46 Petri Nets Def. (PN dynamics): The state transition function, f : N T of Petri net (P,T,A,w,x), is defined for transition t j T iff x( pi ) w( pi, t j ), pi I( t j ). If f(x,t j ) is defined, the new state is x = f(x,t j ) where x' ( p ) = x( p ) w( p, t ) + w( t, p ), i = 1,, n. i i i j j i n Enabled t j N n p 2 t 2 p 4 p 1 t 1 p 3 t 3

47 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

48 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

49 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

50 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

51 Petri Nets p 2 t 2 p 4 p 1 t 1 p 3 t 3 Discrete Event Dynamic Systems Pedro U. Lima Petri Nets

52 Plan Representation and Modeling Petri Net Models of Robotic Tasks (Lima et al, 1998) (Milutinovic, Lima, 2002)(Costelha, Lima, 2012) Places with tokens represent resources available primitive actions running State is distributed over the places with tokens (PN marking) Events assigned to transitions and represent uncontrolled changes of state (e.g., caused by other agents or simply by the environment dynamics) controlled decisions to start a primitive action Transition fires when it is enabled and the labeling event occurs (note: the labeling event may be replaced by input/output arcs to places representing the reading of sensors for modeling and analysis)

Plan Representation and Modeling PN model of a multi-task single robot detected_ post track_found move2post teardown_pole back2track reached_post pole_down standby track_found following_track no_ no_

53 Plan Representation and Modeling PN model of a multi-task single robot detected_ post track_found move2post teardown_pole back2track reached_post pole_down standby track_found following_track no_ no_ no_ resuming_ resuming_ resuming_ look_ahead point look_left point look_right point found_ interrupt resuming_ point resuming_ point resuming_ point following_resuming_point check_if_track track_not_found track_found

Plan Representation and Modeling (Lima et al, 1998) A Tool for Robotic Task Design and Distributed Execution Further developments in (Milutinovic, Lima, 2002) t

54 Plan Representation and Modeling (Lima et al, 1998) A Tool for Robotic Task Design and Distributed Execution Further developments in (Milutinovic, Lima, 2002) t vision_ready2locate_ball 2 locating_ball standby t 1 p 2 new_frame p 3 p start 1 robot_ready2move moving2ball catching_ball t 3 t 4 t 5 ball_located ready2catch p 4 p 5 p 6 ball_catched

55 Petri Nets Def. (Labeled Petri net): A labeled Petri net N is an eight-tuple N = ( P, T, A, w, E, l, x0, Xm) where ( P, T, A, w) is a PN graph E is the event set for transition labeling l : T E is the transition labeling function x 0 X N m n N is the initial state n is the set of marked states Def. (Languages generated and marked): L( N) : = { l( s) E : s T and f ( x0, s) is L m defined} ( N) : = { l( s) L( N) : s T and f ( x0, s) Xm}

56 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = x 0 = [ ] T marking or state Generated string: ε in L

57 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T marking or state Generated string: s in L

58 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T marking or state Generated string: s nf in L

59 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T marking or state Generated string: s nf bl in L

60 Plan Representation and Modeling Petri Nets (PN) Language Model Petri Net N E = { s, nf, bl, r2c, bc} l( t x 0 1 X m ) = s, l( t = [ ] = { x 0 2 ) = nf, l( t ) = bl,,[ ]} 3 vision_ready2locate_ball t 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched p ball_located p ready2catch 4 5 p 6 x = [ ] T Generated and Marked Languages marking or state L( N) = { ε, s, s Lm ( G) = { ε, s, s nf, s nf bl nf,...} nf bl r2c bc} L( G)

61 Plan Representation and Modeling Monitoring algorithms check the value of predicates over world state variables. Event occurrence means that a logical function of the predicates changed from true to false or vice-versa. Examples of events: found_ball: see(ball)= false true lost_ball: see(ball)= true false see_ball AND closest_player2ball = false true

Plan Representation and Modeling PN markings represent world states A plan to carry out a task is the sequence of primitive actions in a sequence of markings (world states) Plans are conditional, as

62 Plan Representation and Modeling PN markings represent world states A plan to carry out a task is the sequence of primitive actions in a sequence of markings (world states) Plans are conditional, as resource places in markings represent logical pre-conditions for the execution of the next primitive actiion Example: primitive actions set X={GetCloseToBall, TakeBallToGoal, Score } Plan: GetCloseToBall. TakeBallToGoal. Score

63 Plan Representation and Modeling Event sequences (i.e., strings) are an equivalent representation of plans A language is the set of all possible plans for a robot Different language classes are equivalent to machine types used to represent and execute the task (Finite State Automata, PNs,...) Of course, larger classes have an increased modeling power (e.g., PN languages vs regular/finite state machine languages) Do not confuse this with modeling elegance it is more natural to program with a rule-based system rather than with a state machine, but it is not necessarily more powerful (compare with C vs assembly)

64 Plan Representation and Modeling Abstraction Levels in Discrete Event Systems Untimed Timed Stochastic Timed e, e2,..., e 1 k,... time associated to events duration stochastic time associated to events FSA x, x1,..., x 0 k,... Timed FSA x(t) STA x( t) p( x( t)) e, e2,..., e 1 k,... time associated to transitions/ events duration stochastic time associated to transitions/events duration PN x 0,x 1,...,x k,... Timed PN x(t) SPN x( t) p( x( t))

65 Stochastic DES STOCHASTIC TIMED AUTOMATA (STA) inter-event time is stochastically distributed (typical case: exponential pdf) STOCHASTIC PETRI NET (SPN) inter-event time is stochastically distributed (typical case: exponential pdf) stochastic inter-event time assigned to transitions SPN with exponential timed transitions is equivalent to a Markov Chain

66 Controllable vs Uncontrollable Events in PNs (Costelha and Lima, 2012) Conflict between transitions associated enabled by different predicates (whose value is not controlled by the robot) Uncertain action effects Conflict between controllable events (associated to commands to start Dribble2Goal or Kick2Goal) e.g., probability that robot does not see ball happens before getting close to ball λ 2 λ 2 + λ 3 Random switch: probability of choosing Dribble2Goal is p 5 probability of choosing Kick2Goal is p 5 p 7 Probabilistic policy

67 PN Hierarchical Plan Representation (Costelha and Lima, 2012)

68 Generalized Stochastic Petri Net Closed Loop Robot Plan / Environent (Costelha and Lima, 2012) stochastic transitions immediate transitions

69 Plan Qualitative Analysis (Formal Verification) Qualitative view/models enable answering analysis questions such as: will bad behaviors occur? will unsafe states be avoided? will we attempt to use more resources than those available? Qualitative view/models enable designing supervisors for specifications such as: eliminate substrings corresponding to bad behaviors avoid blocking ensure bounded usage of resources

70 Plan Qualitative Analysis (Formal Verification) Safety properties For all executions the system avoids a bad set of events or a set of bad strings is never generated or marked. e.g., robot does not enter a room where holes on the ground exist (any sequence including traversing a door leading to the room must be disabled from happening) Blocking properties deadlocks or livelocks e.g., robot that can only move forward enters a corridor with a dead-end)

71 Plan Qualitative Analysis Def. (Boundedness): Place p i P in PN N with initial state x 0 is said to be k-bounded, or k-safe, if x(p i ) k for all states x R(N), i.e., for all reachable states. e.g., robot can not be called for a transportation task a 2nd time while it is performing the same task (place corresponding to the robot performing the transportation task should be 1-bounded or safe) Def. (Conservation): A PN N with initial state x 0 is said to be conservative with respect to γ = [γ 1, γ 2,..., γ ν ] if n i= 1 γ x( i p i ) = constant for all reachable states. e.g., robot with only one tool can never use 2 tools simultaneously during the performance of the whole task

72 Plan Qualitative Analysis Def. (Liveness): A PN N with initial state x 0 is said to be live if there always exists some sample path such that any transition can eventually fire from any state reached from x 0. Liveness levels - a transition in a PN may be: Dead or L0-live, if the transition can never fire from this state L1-live, if there is some firing sequence from x 0 such that the transition can fire at least once L2-live, if the transition can fire at least k times for some given positive integer k L3-live, if there exists some infinite firing sequence in which the transition appears infinitely often L4-live, if the transition is L1-live for every possible state reached from x 0 L3 L1 L0 L2 This property is related to the reachability of given states, and with the repeatability of system states (e.g., error recovery and returning to the initial state)

73 Plan Qualitative Analysis Def. (Liveness): A PN N with initial state x 0 is said to be live if there always exists some sample path such that any transition can eventually fire from any state reached from x 0. Liveness levels robot task examples: Dead or L0-live: robot in a deadlock situation L1-live: after robot picks an object it will not be able to pick it again later L2-live: robot can only perform an action sequence with a finite number of steps (e.g., release as many objects as those it picked before until its transported bin is empty) L3-live: robot keeps repeating the same action sequence forever L4-live: robot can always return to the same state and repeat the same operation L3 L1 L0 L2

74 Plan Quantitative Analysis Stochastic view/models enable answering analysis questions such as: what is the probability of success of a task plan? given a probability of success for the plan, how many steps (actions) will it take to accomplish the task? Stochastic view/models enable designing controllers for specifications such as: given some allowed number of steps for a plan, determine the plan that maximizes the probability of success given some desired probability of success, determine the plan that minimizes the number of required actions, or the accumulated action cost

75 Markov Decision Process (MDP) A Markov Chain is a stochastic process X(t) with discrete state space which satisfies the Markov property Pr{ x t+1 =x j x t =x i,x t 1 =x k,,x 0 =x i } = Pr { x t+1 =x j x t =x i = p ij p ij := transition probabilities Adding actions to a Markov Chain makes transition probabilities depend on the action taken.

76 Markov Decision Process (MDP) A Markov Chain with transition probabilities dependent on actions (u) rewards r associated to each (state x, action u) pair an associated cost/performance function is known as a Markov Decision Process (MDP) # Pr{ x = x',r = r x,u,r,x,u,,r,x,u $ = Pr x # t +1 t +1 t t t t 1 t % { = x',r = r x,u $ t +1 t +1 t t % object on the table grasp 1.0 object grasped pickup release 0.5 object on the floor no rewards included in the diagram

77 Planning as Solving MDPs Conflict between transitions associated enabled by different predicates (whose value is not controlled by the robot) Uncertain action effects Conflict between controllable events (associated to commands to start Dribble2Goal or Kick2Goal) e.g., probability that robot does not see ball happens before getting close to ball λ 2 λ 2 + λ 3 Random switch: probability of choosing Dribble2Goal is p 5 probability of choosing Kick2Goal is p 5 Probabilistic policy GSPN equivalent to MDP

78 STOCHASTIC PETRI NETS PN Stochastic Timed Model Def.:A Stochastic PN is a 6-tuple (P,T,A,w,x,F) where (P,T,A,w,x) is a marked PN, and F:R[x 0 ] T R is a function that associates to each transition t in each reachable marking x a random variable Def.:A Generalized Stochastic PN is a 7-tuple (P,T=T 0 T D,A,w,x,F,S) where (P,T,A,w,x) is a marked PN, F:R[x 0 ] T D R is a function that associates to each timed transition t T D in each reachable marking x a random variable. Each t T 0 has zero firing time in all reachable x. S is a set (possibly empty) of elements called random switches, which associate probability distributions to subsets of conflicting immediate transitions.

79 EXPONENTIAL TIMED PETRI NETS For Exponential Timed PNs, in the two previous definitions F:R[x 0 ] T R is a function that associates to each transition t j T D in each reachable marking x an exponential random variable with rate λ j (x). The transitions in T D are known as exponential transitions and refer to λ j (x) as the firing rate of t j in x.

80 EXPONENTIAL TIMED PETRI NETS Theorem The marking process of an exponential timed Petri net is a continuous time Markov Chain (CTMC). State space of the equivalent CTMC: reachability set R[x 0 ] of the exponential timed Petri net Computation of the transition rate from state x i to state x j x i is given by q = λ ( x ) ij k tk Tij Where T ij is the subset of T D of enabled transitions in x i such that the firing of any transition in T ij leaves the CTMC in x j. q ii = q ij If x j = x i, j i i

81 GENERALIZED SPN (GSPN) When there is conflict in state x i, if T i is the set of enabled transitions in x i, the probability of firing t j T i is: if T i is composed by exponential transitions only: λ ( x j tk T i k i ) λ ( x if T i includes one single immediate transition, this is the one that will fire if T i includes two or more immediate transition, a probability mass function will be specified over them by an element of S. The subset of immediate transitions plus the switching distribution is called a random switch. i )

82 GSPN AND EQUIVALENT CTMC ( ) To ensure the existence of an unique steady state probability vector ρ 1,...,ρ s for the marking process of the GSPN with s tangible markings, the following simplifying assumptions are made: 1. The GSPN is bounded, i.e., its reachability set is finite 2. Firing rates do not depend on time parameters, ensuring that the equivalent MC is homogeneous 3. The GSPN model is proper and deadlock-free, i.e., the initial marking is reachable with a non-zero probability from any marking in the reachability set and also there is no absorbing marking (can be lifted)

observing_table sel_deposit_obj q 3 λ 5 t 5 q 3 + q 4 =1 q 4 t 4 t 6 a.

83 EXAMPLE: GSPN AND EQUIVALENT CTMC p.grasped(obj) p 1 p 2 p t 1 t 3 2 p.ontable(obj) λ 1 λ 2 a.pickingup_obj t 3 sel_carry_obj p 4 p 5 a.observing_table sel_deposit_obj q 3 λ 5 t 5 q 3 + q 4 =1 q 4 t 4 t 6 a.carrying_obj sel_deposit_obj p 6 a.depositing_obj random switches sel_pickup_obj

EXAMPLE: GSPN AND EQUIVALENT CTMC tangible Marking graph (0 1 1 0 0 0) t 2 t 1 (1

84 EXAMPLE: GSPN AND EQUIVALENT CTMC tangible Marking graph ( ) t 2 t 1 ( ) t 3 vanishing t 4 t 6 ( ) t 5 ( ) tangible vanishing

EXAMPLE: GSPN AND EQUIVALENT CTMC Embedded MC (EMC) (0 1 1 0 0 0) tangible λ 2 λ 1 + λ 2 λ 1

85 EXAMPLE: GSPN AND EQUIVALENT CTMC Embedded MC (EMC) ( ) tangible λ 2 λ 1 + λ 2 λ 1 λ 1 + λ 2 ( ) q 3 vanishing q 4 1 ( ) 1 ( ) tangible vanishing

86 EXAMPLE: GSPN AND EQUIVALENT CTMC tangible Reduced Embedded MC (REMC) ( ) q 3 λ 1 λ 1 + λ 2 1 λ 2 λ 1 + λ 2 + q 4 λ 1 λ 1 + λ 2 ( ) tangible MDP: random switch probabilities can be manipulated to achieve optimal decision

87 GSPN, REMC AND PERFORMANCE MEASURES PNs of robot controller and world model must be connected in closed loop. Closed loop PN can be analyzed w.r.t., e.g., 1 1. Probability that a particular condition C holds Pr(C) = ρ j j { 1,...,s} : C is satisfied in x j, S 1 = j S 1 2. Probability that place p i has exactly k tokens Pr(p i,k) = ρ j, S 2 = j S 2 3. Expected number of tokens in a place p i: ET[p i ] = K k=1 k Pr(p i,k), { } { j { 1,...,s} : x j ( p i ) = k} where K is the max number of tokens p i may contain in any reachable marking 1 ρ i is the probability of marking i

88 GSPN, REMC AND PERFORMANCE MEASURES 4. Throughput rate of an exponential transition t j : TR(t j ) = ρ i λ(x i,t j ) υ ij, S 3 = { i { 1,...,s} : t j enabled in x i } i S 3 where υ ij is the probability that t j fires among all enabled transitions in x i 5. Throughput rate of immediate transitions can be computed from those of the exponential transitions and from the structure of the model 6. Mean waiting time in a place p i: WAIT( p i ) = ET[p i ] t j IN( p i ) TR(t j ) = ET[ p i ] t j OUT ( p i ) TR(t j )

89 Markov Decision Process (MDP) Given: States x Actions u Transition probabilities p(x u,x) Reinforcement r t / expected payoff function r(x,u) Wanted: Policy π(x) that maximizes the future expected (discounted) reward

90 MDP Rewards and Policies Policy (fully observable case) is a map of states onto actions: π : x t u t Expected discounted cumulative reward / payoff: " T % R T = E $ γ τ r t+τ +1 ', # & τ =0 0 < γ 1 T=0: greedy policy T>0: finite horizon case, typically no discount T= : infinite-horizon case, finite reward if discount < 1

91 Markov Decision Process (MDP) Agent action u t reinforcement r t state x t x t+1 stochastic state dynamics x t X u t U(x t ) r t+1 R r t+1 Environment Goal: choose the action sequence that maximizes R T = E T % γ τ r ', 0 γ 1 T may go to infinity, as long as τ=0 t+τ+1 ' γ 1 & " $ $ #

92 MDP Ex.: Recycling robot transition probability _ trash α, R search search_trash expected reward action taken robot has to be rescued because its battery is depleted 1 β, 3 search_trash _trash β, R search search_trash 1, R wait wait Battery High 1,0 recharge_battery Battery Low 1, R wait α _trash, R search wait search_trash _trash 1 α, R search search_trash R search _ trash > R wait > 0 Number of cans collected while performing the corresponding tasks

T (x) = E{ R π T x t = x} = π (x,u) du p(x' u, x) # $ r(x,u)+γv π T 1 (x')% & dx' l Bellman equation for discrete action

93 Policies l Expected cumulative payoff of policy π: " T % R π T (x t ) = E $ γ τ r t+τ +1 x t,u t+τ = π (x t+τ )' # & τ =0 l Bellman equation for continuous action and state spaces Policy (may be deterministic or probabilistic) expected payoff V π T (x) = E{ R π T x t = x} = π (x,u) du p(x' u, x) # $ r(x,u)+γv π T 1 (x')% & dx' l Bellman equation for discrete action and state spaces V π T (x) = E{ R π T x t = x} = π (x,u) p(x' u, x) # $ r(x,u)+γv π T 1 (x')% & u x' V T (x) = maxv π T (x) π

94 l Expected cumulative payoff of policy π: Policies cont d. & R π T ) T (x t ) = E γ τ r t +τ +1 x t,u t +τ = π (x t ) ' ( * + τ =0 l Optimal policy: π = argmax π l 1-step optimal policy: π 0 (x) = argmax u r(x, u) l Value function of 1-step optimal policy: V 0 (x) = max u R T π (x t ) r(x, u)

95 2-step Policies l Optimal policy: π 1 (x) = argmax u l Value function: V 1 (x) = max u # & % r(x,u)+γ V (x')p(x' u, x) 0 ( $ ' x' # & % r(x,u)+γ V (x')p(x' u, x) 0 ( $ ' x'

96 T-step Policies l Optimal policy: π T (x) = argmax u l Value function: $ & r(x,u)+γ % x' V (x')p(x' u, x) T 1 ' ) ( V T (x) = max u $ & r(x,u)+γ % x' V (x')p(x' u, x) T 1 ' ) (

97 Infinite Horizon l Optimal policy, infinite horizon: V (x) = max u l Bellman equation x' l Fixed point is optimal policy $ ' & r(x,u)+γ V (x')p(x' u, x) ) % ( l Necessary and sufficient condition: induced policy is optimal iff value function satisfies the above condition

98 Value Iteration 1. for all x do 2. endfor Vˆ ( x) r min 3. repeat until convergence 1. for all x do 2. endfor 4. endrepeat ˆ V(x) max u π (x) = argmax u # % r(x,u)+γ $ " $ r(x,u)+γ # & V(x')p(x' ˆ u, x) ( ' x' x' % V(x')p(x' ˆ u, x) ' &

99 Value Iteration for Motion Planning

100 Reinforcement Learning Previous (DP) methods to solve MDPs assume full knowledge of p(x u,x) and r(u,x) Dynamic Programming (DP) To determine V for X = N, a system of N non-linear equations must be solved. Well established mathematical method. A complete model of the environment is required (P and R known). Often faces the curse of dimensionality [Bellman, 1957] Alternative approaches, if we do not know p(x u,x) and r(u,x) Monte Carlo Similar to DP, but P and R s unknown. P and R determined from the average of several trial-and-error trials. Unappropriate for a step-by-step incremental approximation of V *. Temporal Differences Knowledge of P e R is not required Step-by-step incremental approximation of V. Mathematical analysis more complex. Q-learning

101 Value Functions state value for policy π: & V π ) (x) = E π ' γ k r t +k +1 x t = x* ( + Expected value of starting in state x and following policy π thereafter. k= 0 (state, action) value for policy π: & Q π (x,u) = E π ' γ k r t +k +1 ( k= 0 ) x t = x,u t = u* + Expected value of starting in state x, carrying out action u, and following policy π thereafter.

102 Value Functions (cont d) relation between state value and Q function for policy π: Q is such that its value is the maximum discounted cumulative reward that can be achieved starting from state x and applying action u as the first action Q(x,u) = E{ r t+1 +γv (x t+1 ) x t = x,u t = u} V (x) = maxq (x,u') u' { } Q (x,u) = E r t+1 +γ maxq (x t+1,u') x t = x,u t = u u'

103 Value Functions (cont d) Bellman equation for V and Q (discrete action and state spaces, deterministic policy) V T π (x) = Q T π (x,u) = x' x' [ ] p(x' u, x) r(x,u) + γv π T 1 (x') V T = max π V T π (x) x p(x' u, x) " # r(x,u)+γ max Q T = maxq π T (x,u) x,u π u' Q π T 1 (x',u') $ % Solutions are unique and equations are also met by the optimal functions

104 Q-Learning - Algorithm Initialize Q(x,u) random or arbitrarily Repeat forever (for each episode or trial): Initialize x Repeat (for each step n of the episode): Choose action u of x Execute action u and observe r and x Update Q for the n xu th visit to (x,u) x x'; until x final. Q (x,u) Q nxu +1 n xu (x,u)+α # nxu $ r(x,u)+γ maxq nxu (x',u') Q nxu (x,u)% & u' α constant allows adaptability to slow environment changes but it does not guarantee convergence only possible with a temporal decay under given circumstances.

105 Q-Learning Algorithm Convergence Should each pair (x,u) be visited an infinite number of times, with 0 α nxu <1 i=1 i=1 α nxu (i) 2 α nxu (i) = < then x, u Pr lim ˆQnxu (x,u) = Q(x,u) & $n % ' =1

106 Action Selection: Exploration vs Exploitation Exploration: less promising actions, which may lead to good results, are tested. Exploitation: takes advantage of tested actions which are more promising, i.e., which have a larger Q(x,u). ε- greedy: at each step n, picks the best action so far with probability 1-ε, for small ε, but can also pick with probability ε, in an uniformly distributed random fashion, one of the other actions. softmax: at each step n, picks the action to be executed according to a Gibbs or Boltzmann distribution: π n (x,u) = eq n (x,u)/τ e Q n (x,u')/τ u'(x)

107 Q-Learning an Example G r(x,u) V * (x) G 100 Q nπ (x,u) α = 1 γ =

108 Q-Learning an Example G r(x,u) V * (x) G Q nπ (x,u)

Task Planning AUTONOMOUS SYSTEMS. Pedro U. Lima M. Isabel Ribeiro. Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal

Task Planning AUTONOMOUS SYSTEMS. Pedro U. Lima M. Isabel Ribeiro. Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal AUTONOMOUS SYSTEMS Task Planning Pedro U. Lima M. Isabel Ribeiro Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal March 2007 Outline 1. Planning Problem 2. Logic 3. Logic-Based