ADVANCED ROBOTICS. PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes

ADVANCED ROBOTICS PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes Pedro U. Lima Instituto Superior Técnico/Instituto de Sistemas e Robótica September 2009 Reviewed April 2016 PDEEC Course Handouts

PETRI NET TASK AND PLAN MODELS Representing robot plans by Petri nets (PN) enables tackling a considerable number of issues: non-deterministic control policies (using pmfs over the possible actions for a given state) plan represented as a Petri net can be executed by following Petri net firing rules, and be event-based, rulebased or a mix of the two (sequential) decision-making algorithms (e.g., Reinforcement Learning) can be used for conflict resolution whenever more than one action is available for a given state 2

PETRI NET TASK AND PLAN MODELS Plan Representation Views qualitative untimed Petri net view: plans can be analyzed regarding their formal properties, e.g., using algorithms that address Petri net analysis problems (such as conservation, blocking, liveness, invariants) quantitative stochastic timed Petri net view: plans can be analyzed regarding their performance under uncertainty, e.g., using closed form algorithms and/or Monte Carlo simulations that address Petri net stochastic performance (such as plan success probability, plan robustness). 3

PN ROBOT TASK MODEL PN Untimed Model A Petri net N is an 6-tuple N = (P,T,A,w,x,x 0 ) where P = { p 1,..., p n } is a set of n places T = { t 1,...,t m } is a set of m transitions A is a set of arcs, connecting places to transitions and transitions to places w : A N + is the set of arc weights (1 in this model) x : P N n is the marking or state of the PN (assigns to each place one or more tokens) x 0 N n is the initial state 4

PN ROBOT TASK MODEL t 1 p 2 p 3 p 1 t 3 t 4 t 5 p 4 p 5 p 6 x = x 0 = [ 1 0 0 0 0 0] T marking or state 5

PN ROBOT TASK MODEL t 2 t 1 p 2 p 3 p 1 t 3 t 4 t 5 p 4 p 5 p 6 x = [ 0 1 0 1 0 0] T marking or state 6

PN ROBOT TASK MODEL t 2 t 1 p 2 p 3 p 1 t 3 t 4 t 5 p 4 p 5 p 6 x = [ 0 1 1 1 0 0] T marking or state 7

PN ROBOT TASK MODEL t 2 t 1 p 2 p 3 p 1 start t 3 t 4 t 5 p 4 p 5 p 6 x = [ 0 1 0 0 1 0] T t 2 marking or state t 1 p 2 p 3 p 1 start t 3 t 4 t 5 p 4 p 5 p 6 x = [0 1 1 0 0 0] T marking or state 8

PN ROBOT TASK MODEL t 2 t 1 p 2 p 3 p 1 start t 3 t 4 t 5 p 4 p 5 p 6 x = [ 0 1 1 0 1 0] T marking or state 9

PN ROBOT TASK MODEL t 2 t 1 p 2 p 3 p 1 start t 3 t 4 t 5 p 4 p 5 p 6 x = [0 1 1 0 0 1] T marking or state 10

PETRI NET TASK AND PLAN MODELS Why Petri nets over, e.g., Finite State Automata? PN languages (languages marked by Petri nets) are a superset of regular languages (languages marked by finite state automata), mainly due to Petri net memory and concurrency distinctive features è richer set of plans PNs enable distributed state modeling, i.e., one can start with simple models (e.g., a primitive action and its preconditions) and build more complex ones (e.g., a behavior PN out of several primitive action PNs); tools for PN formal analysis exist (e.g., PIPE, TimeNET): formal verification, useful for programming stochastic performance evaluation, useful to evaluate plans under uncertainty 11

PN ROBOT TASK MODEL PN Untimed Model Robot Task Model each place in the Petri net is labeled by an associated primitive action or by a predicate, i.e., l p : P Π D, where is the place labeling function each transition in the Petri net is labeled by an event, i.e., l t : T E { ε}, where l t is the (in general non-injective) transition labeling function, and ε is the ever-occurring event. l p vision_ready2locate_ball locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched ball_located ready2catch p 4 p 5 p 6 12

STOCHASTIC PETRI NETS PN Stochastic Timed Model Def.:A Stochastic PN is a 7-tuple (P,T,A,w,x,x 0,F) where (P,T,A,w,x, x 0 ) is a marked PN, and F:R[x 0 ]XTàR is a function that associates to each transition t in each reachable marking x a random variable Def.: A Generalized Stochastic PN is a 8-tuple (P,T=T 0 T D,A,w,x,x 0,F,S) where (P,T,A,w,x,x 0 ) is a marked PN, F:R[x 0 ]xt D àr is a function that associates to each timed transition t T D in each reachable marking x a random variable. Each t T 0 has zero firing time in all reachable x. S is a set (possibly empty) of elements called random switches, which associate probability distributions to subsets of conflicting immediate transitions. 13

EXPONENTIAL TIMED PETRI NETS For Exponential Timed PNs, in the two previous definitions F:R[x 0 ]xtàr is a function that associates to each transition t j T D in each reachable marking x an exponential random variable with rate λ j (x). The transitions in T D are known as exponential transitions and refer to λ j (x) as the firing rate of t j in x. 14

EXPONENTIAL TIMED PETRI NETS Theorem The marking process of an exponential timed Petri net is a continuous time Markov Chain (CTMC). State space of the equivalent CTMC: reachability set R[x 0 ] of the exponential timed Petri net Computation of the transition rate from state x i to state x j x i is given by q = λ ( x ) ij k tk Tij Where T ij is the subset of T D of enabled transitions in x i such that the firing of any transition in T ij leaves the CTMC in x j. If x j = x i, q ii = q ij j i i 15

GENERALIZED STOCHASTIC TIMED PETRI NETS (GSPN) When there is conflict in state x i, if T i is the set of enabled transitions in x i, the probability of firing t j T i is: if T i is composed by exponential transitions only: λ ( x j tk Ti k i λ ( x if T i includes one single immediate transition, this is the one that will fire if T i includes two or more immediate transition, a probability mass function will be specified over them by an element of S. The subset of immediate transitions plus the switching distribution is called a random switch. ) i ) 16

GSPN FOR MOTIVATING EXAMPLE vision_ready2locate_ball t 2 λ 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched λ 4 λ 3 λ 5 ball_located ready2catch p 4 p 5 p 6 stochastic transitions with associated exponential pdfs. λ 2, λ 3, λ 4 and λ 5 are the rates of the corresponding exponential transitions, and represent the estimated rates of sampling frames, locating a ball by the vision system, moving the manipulator towards the estimated exit pointof the ball, and catching the ball by the manipulator, respectively with uncertainty involved. If λ 2 is > λ 3 + λ 4 + λ 5, a problem of resource management will occur, due to the accumulation oftokens in p 3. One might prefer to control event new_frame, adjusting its (deterministic) sampling rate. 17

EXAMPLE: GSPN AND ROBOT SOCCER TASK Conflict between transitions associated enabled by different predicates (whose value is not controlled by the robot) Uncertain action effects Conflict between controllable events (associated to commands to start Dribble2Goal or Kick2Goal) e.g., probability that robot does not see ball happens before getting close to ball λ 2 λ 2 + λ 3 Random switch: probability of choosing Dribble2Goal is p 5 probability of choosing Kick2Goal is p 7 Probabilistic policy (p 5 + p 7 = 1) GSPN equivalent to MDP 18

GSPN AND EQUIVALENT CTMC To ensure the existence of an unique steady state probability vector for the marking process of the GSPN with s tangible markings, the following simplifying assumptions are made: ( ρ 1,...,ρ s ) 1. The GSPN is bounded, i.e., its reachability set is finite 2. Firing rates do not depend on time parameters, ensuring that the equivalent MC is homogeneous 3. The GSPN model is proper and deadlock-free, i.e., the initial marking is reachable with a non-zero probability from any marking in the reachability set and also there is no absorbing marking (can be lifted) 19

EXAMPLE: GSPN AND EQUIVALENT CTMC p.grasped(obj) p 1 p 2 p t 1 t 3 2 p.ontable(obj) λ 1 λ 2 a.pickingup_obj t 3 sel_carry_obj p 4 p 5 a.observing_table sel_deposit_obj q 3 λ 5 t 5 q 3 + q 4 =1 q 4 t 4 t 6 a.carrying_obj sel_deposit_obj p 6 a.depositing_obj random switches deposited + sel_pickup_obj 20

EXAMPLE: GSPN AND EQUIVALENT CTMC tangible Marking graph (0 1 1 0 0 0) t 2 t 1 (1 0 0 1 0 0) t 3 vanishing t 4 t 6 (1 0 0 0 1 0) t 5 (1 0 0 0 0 1) tangible vanishing 21

EXAMPLE: GSPN AND EQUIVALENT CTMC Embedded MC (EMC) (0 1 1 0 0 0) tangible λ 2 λ 1 + λ 2 λ 1 λ 1 + λ 2 (1 0 0 1 0 0) q 3 vanishing q 4 1 (1 0 0 0 1 0) 1 (1 0 0 0 0 1) tangible vanishing 22

EXAMPLE: GSPN AND EQUIVALENT CTMC tangible Reduced Embedded MC (REMC) (0 1 1 0 0 0) q 3 λ 1 λ 1 + λ 2 1 λ 2 λ 1 + λ 2 + q 4 λ 1 λ 1 + λ 2 (1 0 0 0 1 0) tangible MDP: random switch probabilities can be manipulated to achieve optimal decision 23

GSPN, REMC AND PERFORMANCE MEASURES PNs of robot controller and world model must be connected in closed loop. Closed loop PN can be analyzed w.r.t., e.g., 1 1.Probability that a particular condition C holds Pr(C) = ρ j j { 1,...,s} : C is satisfied in x j, S 1 = 2.Probability that place p i has exactly k tokens j S 1 3.Expected number of tokens in a place p i: ET[p i ] = K k=1 Pr(p i,k) = k Pr(p i,k), ρ j, S 2 = j S 2 { } { j { 1,...,s} : x j ( p i ) = k} where K is the max number of tokens p i may contain in any reachable marking 1 ρ i is the probability of marking i 24

GSPN, REMC AND PERFORMANCE MEASURES cont d 4. Throughput rate of an exponential transition t j : TR(t j ) = ρ i λ(x i,t j ) υ ij, S 3 = i 1,...,s i S 3 { { } : t j enabled in x i } where υ ij is the probability that t j fires among all enabled transitions in x i 5. Throughput rate of immediate transitions can be computed from those of the exponential transitions and fromthe structure of the model 6. Mean waiting time in a place p i: WAIT( p i ) = ET[p i ] t j IN( p i ) TR(t j ) = ET[ p i ] t j OUT ( p i ) TR(t j ) 25

CONCLUSIONS AND OPEN ISSUES Petri nets are suitable representations for (multi-)robot plans Formal PN models of a (multi-)robot task enable qualitative and quantitative analysis Quantitative analysis results from GSPN models GSPN with exponential timed transitions are equivalent to Markov Chains If some of the events are controllable and represent actions, GSPN is indeed equivalent to an MDP Decreased complexity of MDP due to the structure embedded in building the GSPN robot + world models How to represent state observation uncertainty? (Probabilistic PNs POMDPs) How to move beyond Propositional Logic PN Supervision to meet plan specifications Reinforcement learning to learn optimal plans 26

References Costelha, Hugo, and Pedro Lima. "Robot task plan representation by Petri nets: modelling, identification, analysis and execution." Autonomous Robots 33.4 (2012): 337-360 Pedro U. Lima, Error Monitoring, Conflict Resolution and Decision-Making, in Perception-reason-action cycle: Models, algorithms and systems, J. G. Taylor, D. Polani, A. Hussain, and N. Tish (Eds.), Springer-Verlag, 2010

Final Illustrative Example: Soccer Goalkeeper

Final Illustrative Example: Soccer Goalkeeper BehaviorGKDefault BehaviorGKRemoveBall BehaviorGKDefendGoal

Final Illustrative Example: Soccer Goalkeeper