ADVANCED ROBOTICS. PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes

Similar documents
Stochastic Petri Net. Ben, Yue (Cindy) 2013/05/08

Stochastic Petri Nets. Jonatan Lindén. Modelling SPN GSPN. Performance measures. Almost none of the theory. December 8, 2010

AUTONOMOUS SYSTEMS. Task Planning. Pedro U. Lima M. Isabel Ribeiro Luis Custódio

Task Planning AUTONOMOUS SYSTEMS. Pedro U. Lima M. Isabel Ribeiro. Institute for Systems and Robotics Instituto Superior Técnico Lisbon, Portugal

Analysis and Optimization of Discrete Event Systems using Petri Nets

Designing parsimonious scheduling policies for complex resource allocation systems through concurrency theory

DES. 4. Petri Nets. Introduction. Different Classes of Petri Net. Petri net properties. Analysis of Petri net models

From Stochastic Processes to Stochastic Petri Nets

Planning Under Uncertainty II

Composition of product-form Generalized Stochastic Petri Nets: a modular approach

Designing parsimonious scheduling policies for complex resource allocation systems through concurrency theory

Stochastic Petri Net

Proxel-Based Simulation of Stochastic Petri Nets Containing Immediate Transitions

7. Queueing Systems. 8. Petri nets vs. State Automata

Applications of Petri Nets

Learning Automata Based Adaptive Petri Net and Its Application to Priority Assignment in Queuing Systems with Unknown Parameters

Industrial Automation (Automação de Processos Industriais)

The State Explosion Problem

Petri Nets (for Planners)

Stochastic Petri Net

Specification models and their analysis Petri Nets

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague

Stéphane Lafortune. August 2006

Grundlagen der Künstlichen Intelligenz

MODELLING DYNAMIC RELIABILITY VIA FLUID PETRI NETS

A comment on Boucherie product-form results

Reinforcement Learning II

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

Time and Timed Petri Nets

Discrete Event Systems Exam

A REACHABLE THROUGHPUT UPPER BOUND FOR LIVE AND SAFE FREE CHOICE NETS VIA T-INVARIANTS

Embedded Systems 6 REVIEW. Place/transition nets. defaults: K = ω W = 1

SPN 2003 Preliminary Version. Translating Hybrid Petri Nets into Hybrid. Automata 1. Dipartimento di Informatica. Universita di Torino

A Stochastic Framework for Quantitative Analysis of Attack-Defense Trees

Time(d) Petri Net. Serge Haddad. Petri Nets 2016, June 20th LSV ENS Cachan, Université Paris-Saclay & CNRS & INRIA

Introduction to Reinforcement Learning

Reinforcement Learning

A Symbolic Approach to the Analysis of Multi-Formalism Markov Reward Models

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

Toward a Definition of Modeling Power for Stochastic Petri Net Models

Reinforcement Learning and Deep Reinforcement Learning

Reinforcement Learning as Classification Leveraging Modern Classifiers

Stochastic Games with Time The value Min strategies Max strategies Determinacy Finite-state games Cont.-time Markov chains

Basics of reinforcement learning

Controlling probabilistic systems under partial observation an automata and verification perspective

1. sort of tokens (e.g. indistinguishable (black), coloured, structured,...),

Probabilistic Model Checking and Strategy Synthesis for Robot Navigation

Probabilistic verification and approximation schemes

CS599 Lecture 1 Introduction To RL

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Flat counter automata almost everywhere!

Petri nets analysis using incidence matrix method inside ATOM 3

PRISM An overview. automatic verification of systems with stochastic behaviour e.g. due to unreliability, uncertainty, randomisation,

Hybrid Control and Switched Systems. Lecture #1 Hybrid systems are everywhere: Examples

As Soon As Probable. O. Maler, J.-F. Kempf, M. Bozga. March 15, VERIMAG Grenoble, France

POLYNOMIAL SPACE QSAT. Games. Polynomial space cont d

Fluid Petri Nets and hybrid model-checking: a comparative case study q

Introduction to Stochastic Petri Nets

Page 0 of 5 Final Examination Name. Closed book. 120 minutes. Cover page plus five pages of exam.

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

A Canonical Contraction for Safe Petri Nets

CS 7180: Behavioral Modeling and Decisionmaking

6 Reinforcement Learning

On Prediction and Planning in Partially Observable Markov Decision Processes with Large Observation Sets

SFM-11:CONNECT Summer School, Bertinoro, June 2011

Introduction to Artificial Intelligence (AI)

Symbolic Semantics and Verification of Stochastic Process Algebras. Symbolische Semantik und Verifikation stochastischer Prozessalgebren

Dependable Computer Systems

Methods for the specification and verification of business processes MPB (6 cfu, 295AA)

Markov Models and Reinforcement Learning. Stephen G. Ware CSCI 4525 / 5525

Georg Frey ANALYSIS OF PETRI NET BASED CONTROL ALGORITHMS

The Markov Decision Process (MDP) model

Designing Petri Net Supervisors from LTL Specifications

Quantitative Safety Analysis of Non-Deterministic System Architectures

Analyzing Concurrent and Fault-Tolerant Software using Stochastic Reward Nets

Analysis of Deterministic and Stochastic Petri Nets

MODEL CHECKING - PART I - OF CONCURRENT SYSTEMS. Petrinetz model. system properties. Problem system. model properties

Simulation of Spiking Neural P Systems using Pnet Lab

Business Processes Modelling MPB (6 cfu, 295AA)

Probabilistic Planning. George Konidaris

CS256/Spring 2008 Lecture #11 Zohar Manna. Beyond Temporal Logics

A Review of Petri Net Modeling of Dynamical Systems

Markov Decision Processes

The efficiency of identifying timed automata and the power of clocks

Towards Co-Engineering Communicating Autonomous Cyber-physical Systems. Bujorianu, M.C. and Bujorianu, M.L. MIMS EPrint:

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

On the Design of Adaptive Supervisors for Discrete Event Systems

Let's contemplate a continuous-time limit of the Bernoulli process:

Safety Verification of Fault Tolerant Goal-based Control Programs with Estimation Uncertainty

Probabilistic Model Checking: Advances and Applications

Some techniques and results in deciding bisimilarity

Modeling and Stability Analysis of a Communication Network System

Intelligent Agents. Formal Characteristics of Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University

Statistical Model Checking Applied on Perception and Decision-making Systems for Autonomous Driving

Probabilistic Model Checking for Biochemical Reaction Systems

Kleene Algebras and Algebraic Path Problems

On the Optimality of Randomized Deadlock Avoidance Policies

NONBLOCKING CONTROL OF PETRI NETS USING UNFOLDING. Alessandro Giua Xiaolan Xie

Reasoning under Uncertainty: Intro to Probability

Methods for the specification and verification of business processes MPB (6 cfu, 295AA)

Transcription:

ADVANCED ROBOTICS PLAN REPRESENTATION Generalized Stochastic Petri nets and Markov Decision Processes Pedro U. Lima Instituto Superior Técnico/Instituto de Sistemas e Robótica September 2009 Reviewed April 2016 PDEEC Course Handouts

PETRI NET TASK AND PLAN MODELS Representing robot plans by Petri nets (PN) enables tackling a considerable number of issues: non-deterministic control policies (using pmfs over the possible actions for a given state) plan represented as a Petri net can be executed by following Petri net firing rules, and be event-based, rulebased or a mix of the two (sequential) decision-making algorithms (e.g., Reinforcement Learning) can be used for conflict resolution whenever more than one action is available for a given state 2

PETRI NET TASK AND PLAN MODELS Plan Representation Views qualitative untimed Petri net view: plans can be analyzed regarding their formal properties, e.g., using algorithms that address Petri net analysis problems (such as conservation, blocking, liveness, invariants) quantitative stochastic timed Petri net view: plans can be analyzed regarding their performance under uncertainty, e.g., using closed form algorithms and/or Monte Carlo simulations that address Petri net stochastic performance (such as plan success probability, plan robustness). 3

PN ROBOT TASK MODEL PN Untimed Model A Petri net N is an 6-tuple N = (P,T,A,w,x,x 0 ) where P = { p 1,..., p n } is a set of n places T = { t 1,...,t m } is a set of m transitions A is a set of arcs, connecting places to transitions and transitions to places w : A N + is the set of arc weights (1 in this model) x : P N n is the marking or state of the PN (assigns to each place one or more tokens) x 0 N n is the initial state 4

PN ROBOT TASK MODEL t 1 p 2 p 3 p 1 t 3 t 4 t 5 p 4 p 5 p 6 x = x 0 = [ 1 0 0 0 0 0] T marking or state 5

PN ROBOT TASK MODEL t 2 t 1 p 2 p 3 p 1 t 3 t 4 t 5 p 4 p 5 p 6 x = [ 0 1 0 1 0 0] T marking or state 6

PN ROBOT TASK MODEL t 2 t 1 p 2 p 3 p 1 t 3 t 4 t 5 p 4 p 5 p 6 x = [ 0 1 1 1 0 0] T marking or state 7

PN ROBOT TASK MODEL t 2 t 1 p 2 p 3 p 1 start t 3 t 4 t 5 p 4 p 5 p 6 x = [ 0 1 0 0 1 0] T t 2 marking or state t 1 p 2 p 3 p 1 start t 3 t 4 t 5 p 4 p 5 p 6 x = [0 1 1 0 0 0] T marking or state 8

PN ROBOT TASK MODEL t 2 t 1 p 2 p 3 p 1 start t 3 t 4 t 5 p 4 p 5 p 6 x = [ 0 1 1 0 1 0] T marking or state 9

PN ROBOT TASK MODEL t 2 t 1 p 2 p 3 p 1 start t 3 t 4 t 5 p 4 p 5 p 6 x = [0 1 1 0 0 1] T marking or state 10

PETRI NET TASK AND PLAN MODELS Why Petri nets over, e.g., Finite State Automata? PN languages (languages marked by Petri nets) are a superset of regular languages (languages marked by finite state automata), mainly due to Petri net memory and concurrency distinctive features è richer set of plans PNs enable distributed state modeling, i.e., one can start with simple models (e.g., a primitive action and its preconditions) and build more complex ones (e.g., a behavior PN out of several primitive action PNs); tools for PN formal analysis exist (e.g., PIPE, TimeNET): formal verification, useful for programming stochastic performance evaluation, useful to evaluate plans under uncertainty 11

PN ROBOT TASK MODEL PN Untimed Model Robot Task Model each place in the Petri net is labeled by an associated primitive action or by a predicate, i.e., l p : P Π D, where is the place labeling function each transition in the Petri net is labeled by an event, i.e., l t : T E { ε}, where l t is the (in general non-injective) transition labeling function, and ε is the ever-occurring event. l p vision_ready2locate_ball locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched ball_located ready2catch p 4 p 5 p 6 12

STOCHASTIC PETRI NETS PN Stochastic Timed Model Def.:A Stochastic PN is a 7-tuple (P,T,A,w,x,x 0,F) where (P,T,A,w,x, x 0 ) is a marked PN, and F:R[x 0 ]XTàR is a function that associates to each transition t in each reachable marking x a random variable Def.: A Generalized Stochastic PN is a 8-tuple (P,T=T 0 T D,A,w,x,x 0,F,S) where (P,T,A,w,x,x 0 ) is a marked PN, F:R[x 0 ]xt D àr is a function that associates to each timed transition t T D in each reachable marking x a random variable. Each t T 0 has zero firing time in all reachable x. S is a set (possibly empty) of elements called random switches, which associate probability distributions to subsets of conflicting immediate transitions. 13

EXPONENTIAL TIMED PETRI NETS For Exponential Timed PNs, in the two previous definitions F:R[x 0 ]xtàr is a function that associates to each transition t j T D in each reachable marking x an exponential random variable with rate λ j (x). The transitions in T D are known as exponential transitions and refer to λ j (x) as the firing rate of t j in x. 14

EXPONENTIAL TIMED PETRI NETS Theorem The marking process of an exponential timed Petri net is a continuous time Markov Chain (CTMC). State space of the equivalent CTMC: reachability set R[x 0 ] of the exponential timed Petri net Computation of the transition rate from state x i to state x j x i is given by q = λ ( x ) ij k tk Tij Where T ij is the subset of T D of enabled transitions in x i such that the firing of any transition in T ij leaves the CTMC in x j. If x j = x i, q ii = q ij j i i 15

GENERALIZED STOCHASTIC TIMED PETRI NETS (GSPN) When there is conflict in state x i, if T i is the set of enabled transitions in x i, the probability of firing t j T i is: if T i is composed by exponential transitions only: λ ( x j tk Ti k i λ ( x if T i includes one single immediate transition, this is the one that will fire if T i includes two or more immediate transition, a probability mass function will be specified over them by an element of S. The subset of immediate transitions plus the switching distribution is called a random switch. ) i ) 16

GSPN FOR MOTIVATING EXAMPLE vision_ready2locate_ball t 2 λ 2 locating_ball standby t 1 p 2 new_frame p 3 robot_ready2move moving2ball catching_ball p 1 start t 3 t 4 t 5 ball_catched λ 4 λ 3 λ 5 ball_located ready2catch p 4 p 5 p 6 stochastic transitions with associated exponential pdfs. λ 2, λ 3, λ 4 and λ 5 are the rates of the corresponding exponential transitions, and represent the estimated rates of sampling frames, locating a ball by the vision system, moving the manipulator towards the estimated exit pointof the ball, and catching the ball by the manipulator, respectively with uncertainty involved. If λ 2 is > λ 3 + λ 4 + λ 5, a problem of resource management will occur, due to the accumulation oftokens in p 3. One might prefer to control event new_frame, adjusting its (deterministic) sampling rate. 17

EXAMPLE: GSPN AND ROBOT SOCCER TASK Conflict between transitions associated enabled by different predicates (whose value is not controlled by the robot) Uncertain action effects Conflict between controllable events (associated to commands to start Dribble2Goal or Kick2Goal) e.g., probability that robot does not see ball happens before getting close to ball λ 2 λ 2 + λ 3 Random switch: probability of choosing Dribble2Goal is p 5 probability of choosing Kick2Goal is p 7 Probabilistic policy (p 5 + p 7 = 1) GSPN equivalent to MDP 18

GSPN AND EQUIVALENT CTMC To ensure the existence of an unique steady state probability vector for the marking process of the GSPN with s tangible markings, the following simplifying assumptions are made: ( ρ 1,...,ρ s ) 1. The GSPN is bounded, i.e., its reachability set is finite 2. Firing rates do not depend on time parameters, ensuring that the equivalent MC is homogeneous 3. The GSPN model is proper and deadlock-free, i.e., the initial marking is reachable with a non-zero probability from any marking in the reachability set and also there is no absorbing marking (can be lifted) 19

EXAMPLE: GSPN AND EQUIVALENT CTMC p.grasped(obj) p 1 p 2 p t 1 t 3 2 p.ontable(obj) λ 1 λ 2 a.pickingup_obj t 3 sel_carry_obj p 4 p 5 a.observing_table sel_deposit_obj q 3 λ 5 t 5 q 3 + q 4 =1 q 4 t 4 t 6 a.carrying_obj sel_deposit_obj p 6 a.depositing_obj random switches deposited + sel_pickup_obj 20

EXAMPLE: GSPN AND EQUIVALENT CTMC tangible Marking graph (0 1 1 0 0 0) t 2 t 1 (1 0 0 1 0 0) t 3 vanishing t 4 t 6 (1 0 0 0 1 0) t 5 (1 0 0 0 0 1) tangible vanishing 21

EXAMPLE: GSPN AND EQUIVALENT CTMC Embedded MC (EMC) (0 1 1 0 0 0) tangible λ 2 λ 1 + λ 2 λ 1 λ 1 + λ 2 (1 0 0 1 0 0) q 3 vanishing q 4 1 (1 0 0 0 1 0) 1 (1 0 0 0 0 1) tangible vanishing 22

EXAMPLE: GSPN AND EQUIVALENT CTMC tangible Reduced Embedded MC (REMC) (0 1 1 0 0 0) q 3 λ 1 λ 1 + λ 2 1 λ 2 λ 1 + λ 2 + q 4 λ 1 λ 1 + λ 2 (1 0 0 0 1 0) tangible MDP: random switch probabilities can be manipulated to achieve optimal decision 23

GSPN, REMC AND PERFORMANCE MEASURES PNs of robot controller and world model must be connected in closed loop. Closed loop PN can be analyzed w.r.t., e.g., 1 1.Probability that a particular condition C holds Pr(C) = ρ j j { 1,...,s} : C is satisfied in x j, S 1 = 2.Probability that place p i has exactly k tokens j S 1 3.Expected number of tokens in a place p i: ET[p i ] = K k=1 Pr(p i,k) = k Pr(p i,k), ρ j, S 2 = j S 2 { } { j { 1,...,s} : x j ( p i ) = k} where K is the max number of tokens p i may contain in any reachable marking 1 ρ i is the probability of marking i 24

GSPN, REMC AND PERFORMANCE MEASURES cont d 4. Throughput rate of an exponential transition t j : TR(t j ) = ρ i λ(x i,t j ) υ ij, S 3 = i 1,...,s i S 3 { { } : t j enabled in x i } where υ ij is the probability that t j fires among all enabled transitions in x i 5. Throughput rate of immediate transitions can be computed from those of the exponential transitions and fromthe structure of the model 6. Mean waiting time in a place p i: WAIT( p i ) = ET[p i ] t j IN( p i ) TR(t j ) = ET[ p i ] t j OUT ( p i ) TR(t j ) 25

CONCLUSIONS AND OPEN ISSUES Petri nets are suitable representations for (multi-)robot plans Formal PN models of a (multi-)robot task enable qualitative and quantitative analysis Quantitative analysis results from GSPN models GSPN with exponential timed transitions are equivalent to Markov Chains If some of the events are controllable and represent actions, GSPN is indeed equivalent to an MDP Decreased complexity of MDP due to the structure embedded in building the GSPN robot + world models How to represent state observation uncertainty? (Probabilistic PNs POMDPs) How to move beyond Propositional Logic PN Supervision to meet plan specifications Reinforcement learning to learn optimal plans 26

References Costelha, Hugo, and Pedro Lima. "Robot task plan representation by Petri nets: modelling, identification, analysis and execution." Autonomous Robots 33.4 (2012): 337-360 Pedro U. Lima, Error Monitoring, Conflict Resolution and Decision-Making, in Perception-reason-action cycle: Models, algorithms and systems, J. G. Taylor, D. Polani, A. Hussain, and N. Tish (Eds.), Springer-Verlag, 2010

Final Illustrative Example: Soccer Goalkeeper

Final Illustrative Example: Soccer Goalkeeper BehaviorGKDefault BehaviorGKRemoveBall BehaviorGKDefendGoal

Final Illustrative Example: Soccer Goalkeeper

Final Illustrative Example: Soccer Goalkeeper