A Region-based Algorithm for Discovering Petri Nets from Event Logs

Similar documents
CS 491G Combinatorial Optimization Lecture Notes

2.4 Theoretical Foundations

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Lecture 6: Coding theory

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Compression of Palindromes and Regularity.

CS 573 Automata Theory and Formal Languages

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

A Disambiguation Algorithm for Finite Automata and Functional Transducers

NON-DETERMINISTIC FSA

Synthesis of Petri Nets Free-choice PN. Transition System. Burst-mode automata Model transformation. State encoded. Synthesis of asynchronous circuits

On a Class of Planar Graphs with Straight-Line Grid Drawings on Linear Area

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Petri Nets. Rebecca Albrecht. Seminar: Automata Theory Chair of Software Engeneering

Alpha Algorithm: Limitations

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

Logic, Set Theory and Computability [M. Coppenbarger]

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

Implication Graphs and Logic Testing

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Alpha Algorithm: A Process Discovery Algorithm

Nondeterministic Automata vs Deterministic Automata

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza

Monochromatic Plane Matchings in Bicolored Point Set

Automata and Regular Languages

arxiv: v2 [math.co] 31 Oct 2016

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Nondeterministic Finite Automata

Convert the NFA into DFA

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

POSITIVE IMPLICATIVE AND ASSOCIATIVE FILTERS OF LATTICE IMPLICATION ALGEBRAS

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Bisimulation, Games & Hennessy Milner logic

I 3 2 = I I 4 = 2A

Lecture 8: Abstract Algebra

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Introduction to Olympiad Inequalities

CSC2542 State-Space Planning

Transition systems (motivation)

INTRODUCTION TO AUTOMATA THEORY

Abstraction of Nondeterministic Automata Rong Su

Subsequence Automata with Default Transitions

Particle Physics. Michaelmas Term 2011 Prof Mark Thomson. Handout 3 : Interaction by Particle Exchange and QED. Recap

Section 2.3. Matrix Inverses

CIT 596 Theory of Computation 1. Graphs and Digraphs

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

Solutions to Problem Set #1

Graph States EPIT Mehdi Mhalla (Calgary, Canada) Simon Perdrix (Grenoble, France)

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework

ANALYSIS AND MODELLING OF RAINFALL EVENTS

Maximum size of a minimum watching system and the graphs achieving the bound

Lecture 11 Binary Decision Diagrams (BDDs)

System Validation (IN4387) November 2, 2012, 14:00-17:00

Minimal DFA. minimal DFA for L starting from any other

If the numbering is a,b,c,d 1,2,3,4, then the matrix representation is as follows:

Lecture 2: Cayley Graphs

Unfoldings of Networks of Timed Automata

arxiv: v1 [cs.dm] 24 Jul 2017

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106

Factorising FACTORISING.

On the Revision of Argumentation Systems: Minimal Change of Arguments Status

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

On the Spectra of Bipartite Directed Subgraphs of K 4

Analysis of Temporal Interactions with Link Streams and Stream Graphs

= state, a = reading and q j

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233,

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

Separable discrete functions: recognition and sufficient conditions

Formal Languages and Automata

Fast Learning of Restricted Regular Expressions and DTDs

Model Reduction of Finite State Machines by Contraction

Solving the Class Diagram Restructuring Transformation Case with FunnyQT

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

Behavior Composition in the Presence of Failure

A Primer on Continuous-time Economic Dynamics

Symbolic Automata for Static Specification Mining

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Aperiodic tilings and substitutions

Finite State Automata and Determinisation

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching

A Process-Algebraic Semantics for Generalised Nonblocking

Computing all-terminal reliability of stochastic networks with Binary Decision Diagrams

Graph Algorithms. Vertex set = { a,b,c,d } Edge set = { {a,c}, {b,c}, {c,d}, {b,d}} Figure 1: An example for a simple graph

Obstructions to chordal circular-arc graphs of small independence number

Computing on rings by oblivious robots: a unified approach for different tasks

Lecture Notes No. 10

arxiv: v1 [cs.cg] 28 Apr 2009

6.5 Improper integrals

Algorithm Design and Analysis

Chapter 4 State-Space Planning

F / x everywhere in some domain containing R. Then, + ). (10.4.1)

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

p-adic Egyptian Fractions

Coalgebra, Lecture 15: Equations for Deterministic Automata

Designing finite automata II

Petri automata for Kleene allegories

Transcription:

A Region-se Algorithm for Disovering Petri Nets from Event Logs J. Crmon 1, J. Cortell 1, n M. Kishinevsky 2 1 Universitt Politèni e Ctluny, Spin 2 Intel Corportion, USA Astrt. The pper presents new metho for the synthesis of Petri nets from event logs in the re of Proess Mining. The metho erives oune Petri net tht over-pproximtes the ehvior of n event log. The most importnt property is tht it proues net with the smllest ehvior tht still ontins the ehvior of the event log. The methos esrie in this pper hve een implemente in tool n teste on set of exmples. 1 Introution The isovery of forml moels from event logs in informtion systems is known s proess mining. Sine the nineties, the re of proess mining hs een fouse in proviing forml support to usiness informtion systems [16]. In the inustril omin, rnging from hospitls n nks to sensor networks or CAD for VLSI, proess mining n e pplie to suintly summrize the ehvior oserve in lrge event logs [14]. Nowys, severl pprohes n e use to mine forml moels, most of them inlue in the ProM frmework [15]. The synthesis prolem [7] is relte to proess mining: it onsists in uiling Petri net tht hs ehvior equivlent to given trnsition system. The prolem ws first resse y Ehrenfeuht n Rozenerg [8] introuing regions to moel the sets of sttes tht hrterize mrke ples. Proess mining iffers from synthesis in the knowlege ssumption: while in synthesis one ssumes omplete esription of the system, only prtil esription of the system is ssume in proess mining. Therefore, isimultion is no longer gol to hieve in proess mining. Inste, otining pproximtions tht suintly represent the log uner onsiertion re more vlule [19]. In the re of synthesis, some pprohes hve een stuie to tke the theory of regions into prtie. In [3] polynomil lgorithms for the synthesis of oune nets were presente. This pproh hs een reently pte for the prolem of proess mining in [4]. In [6], the theory of regions ws pplie for the synthesis of sfe Petri nets with isimilr ehvior. Reently, the theory from [6] hs een extene to oune Petri nets [5]. In this pper we pt the theory from [5] to the prolem of proess mining. The work presente in this pper ims t onstruting (mining) Petri net tht overs the ehvior oserve in the event log, i.e. tres in the event log

r s em p rj rs p s Fig. 1. Petri net mining to voi overfitting. will e fesile in the Petri net. Moreover, the Petri net my ept tres not oserve in the log. Aitionlly, minimlity property is emonstrte on the mine Petri net: no other net exists tht oth overs the log n epts less tres thn the mine Petri net. This pility of miniml over-pproximtion represents the min theoretil ontriution of this pper. The methos presente in the pper n mine prtiulr k-oune Petri net, for given oun k. We hve implemente the theory of this pper in tool, n some preliminry results from logs re reporte. The pproh tken in this pper is forml one n iffers from the more heuristi methos in the literture. Although the methos presente might hve high omplexity for lrge logs, they n e omine with reent itertive pprohes [18] to llevite their omplexity. This pper shres ommon gols with the previously presente pper [4]. In [4], two proess mining strtegies on region of lnguges re presente, hving the sme minimlity gol s the one tht we hve in this pper. However the strtegy is ifferent: integer liner moels re solve in orer to fin set of speil ples lle fesile ples tht gurntee the inlusion of the tres from the event log. The more ples e, the more tres re forien in the resulting net. If the net ontins ll the possile fesile ples, then the minimlity property n e emonstrte. However, the set of fesile ples might e infinite. In our se, given mximl oun k for the mining of k-oune Petri net, miniml regions of the trnsition system re enough to emonstrte the minimlity property on this oun. Exmple. In [14], smll log is presente to motivte the overfitting proue y synthesis tools. The log ontins the following tivities: r=register, s=ship, s=sen ill, p=pyment, =ounting, p=pprove, =lose, em=express mil, rj=rejete, n rs=resolve. Now ssume tht the event log ontins the tres (r, s, s,p,, p, ), (r,s,em, p,, p, ), (r, s, p, em,, rj, rs, ), (r, em, s, p,, p, ), (r, s, s, p,, rj, rs, ), (r, s, p, s,, p, ) n (r, s, p, em,, p, ). From this log, TS n e otine [13] n PN s the one shown in Figure 1 will e synthesize y tool like petrify [6]. If the log is slightly hnge (for instne, tre (r, s, s, p,, rj, rs, ) is reple y (r, s, s, p,, p, ), the synthesis tool will pt the PN to ount for the hnges, eriving ifferent PN. This mens tht synthesis lgorithms re very sensitive to vritions in the logs. However, the tehniques presente in this pper, s it hppens lso with tritionl min-

ing pprohes like the α-lgorithm [16], re less sensitive to vritions in event logs, n will erive the sme PN over the moifie log. The two moels use in this pper re Petri nets n trnsition systems. We will ssume tht trnsition system represents n event log otine from oserving rel system from whih n event-se representtion (e.g. Petri net) pproximting its ehvior must e otine. The erivtion of the trnsition system from n event log is n importnt step, tht my hve ig impt in the finl mine Petri net, s it is emonstrte in [13]. A two-step pproh is presente in [13], emphsizing tht the first step (genertion of the trnsition system) is ruil for the lne etween unerfitting n overfitting. If the esire strtion is ttine in the first step, i.e. the trnsition system represents n strtion of the event log, the seon step is expete to reproue extly this strtion, vi synthesis. The methos presente in this pper exten the possiilities of this two-step pproh, given tht the seon step might lso introue further strtion in ontrolle mnner. The pprohes se on regions of lnguges perform the mining proess in only one step, provie tht logs n e iretly interprete s lnguges [4]. 2 Preliminries: theory of regions 2.1 Finite trnsition systems n Petri nets Definition 1 (Trnsition system). A trnsition system (TS) is tuple (S, E, A, s in ), where S is set of sttes, E is n lphet of tions, suh tht S E =, A S E S is set of (lelle) trnsitions, n s in is the initil stte. Let TS = (S, E, A, s in ) e trnsition system. We onsier onnete TSs tht stisfy the following xioms: S n E re finite sets. Every event hs n ourrene: e E : (s, e, s ) A; Every stte is rehle from the initil stte: s S : s in s. A TS is lle eterministi if for eh stte s n eh lel there n e t most one stte s suh tht s s. The reltion etween TSs will e stuie in this pper. The lnguge of TS, L(TS), is the set of tres fesile from the initil stte. When, L(TS 1 ) L(TS 2 ), we will enote TS 2 s n overpproximtion of TS 1. The notion of simultion etween two TSs is relte to this onept: Definition 2 (Simultion [2]). Let TS 1 = (S 1, E, A 1, s in1 ) n TS 2 = (S 2, E, A 2, s in2 ) e two TSs with the sme set of events. A simultion of TS 1 y TS 2 is reltion π etween S 1 n S 2 suh tht for every s 1 S 1, there exists s 2 S 2 suh tht s 1 πs 2. for every (s 1, e, s 1) A 1 n for every s 2 S 2 suh tht s 1 πs 2, there exists (s 2, e, s 2) A 2 suh tht s 1πs 2.

When TS 1 is simulte y TS 2 with reltions π, n vievers with reltion π 1, TS 1 n TS 2 re isimilr [2]. Definition 3 (Petri net [12]). A Petri net (PN) is tuple (P, T, F, M 0 ) where P n T represent finite sets of ples n trnsitions, respetively, n F (P T ) (T P ) is the flow reltion. The initil mrking M 0 P efines the initil stte of the system 3. The sets of input n output trnsitions of ple p re enote y p n p, respetively. The set of ll mrkings of N rehle from the initil mrking m 0 is lle its Rehility Set. The Rehility Grph of PN (RG(PN)) is trnsition system in whih the set of sttes is the Rehility Set, the events re the trnsitions of the net n trnsition (m 1, t, m 2 ) exists if n only if t m 2. We use L(PN) s shortut for L(RG(PN)). m 1 2.2 Regions We now review the lssil theory of regions for the synthesis of Petri nets [6 8]. Let S e suset of the sttes of TS, S S. If s S n s S, then we sy tht trnsition s s enters S. If s S n s S, then trnsition s s exits S. Otherwise, trnsition s s oes not ross S. Definition 4. Let TS = (S, E, A, s in ) e TS. Let S S e suset of sttes n e E e n event. The following onitions (in the form of preites) re efine for S n e: noross(e, S ) (s 1, e, s 2 ) A : s 1 S s 2 S enter(e, S ) (s 1, e, s 2 ) A : s 1 S s 2 S exit(e, S ) (s 1, e, s 2 ) A : s 1 S s 2 S The notion of region is entrl for the synthesis of PNs. Intuitively, eh region is set of sttes tht orrespons to ple in the synthesize PN, so tht every stte in the region moels the mrking of the ple. Definition 5 (region). A set of sttes r S in TS = (S, E, A, s in ) is lle region if the following two onitions re stisfie for eh event e E: (i) enter(e, r) noross(e, r) exit(e, r) (ii) exit(e, r) noross(e, r) enter(e, r) A region is suset of sttes in whih ll trnsitions lele with the sme event e hve extly the sme entry/exit reltion. This reltion will eome the preeessor/suessor reltion in the Petri net. The event my lwys e either n enter event for the region (se (i) in the previous efinition), or 3 Although this pper els with oune Petri nets, for the ske of lrity we restrit the theory of urrent n next setions to the simpler lss of sfe (1-oune) Petri nets. Setion 4 isusses how to generlize the metho for oune Petri nets.

r 2 r 1 s2 s1 s3 s4 s5 () Miniml Regions r 1= { s1, s2 } r 2= { s1, s3 } r 3= { s2, s4 } r 4= { s3, s4 } r = { s5 } 5 () r 3 r 4 r 5 () Fig. 2. () Trnsition system, () miniml regions, () synthesis pplying Algorithm of Figure 3. lwys e n exit event (se (ii)), or never ross the region s ounries, i.e. eh trnsition lele with e is internl or externl to the region, where the nteeents of neither (i) nor (ii) hol. The trnsition orresponing to the event will e suessor, preeessor or unrelte with the orresponing ple respetively. Exmples of regions re reporte in Figure 2: from the TS of Figure 2(), some regions re enumerte in Figure 2(). For instne, for region r 2, event is n exit event, event is n entry event while the rest of events o not ross the region. Definition 6 (Miniml region). Let r n r e regions of TS. A region r is si to e suregion of r if r r. A region r is miniml region if there is no other region r whih is suregion of r. Going k to the exmple of Figure 2, in Figure 2() we report the set of miniml regions. The union of isjoint regions is region, so for instne the union of the regions r 1 n r 4 is the set {s1, s2, s3, s4} whih is lso (nonminiml) region. Eh TS hs two trivil regions: the set of ll sttes, S, n the empty set. Further on we will lwys onsier only non-trivil regions. The set of non-trivil regions of TS will e enote y R TS. Given set S S n region r, r S represents the projetion of the region r into the set S, i.e. r S = r S. A region r is pre-region of event e if there is trnsition lele with e whih exits r. A region r is post-region of event e if there is trnsition lele with e whih enters r. The sets of ll pre-regions n post-regions of e re enote with e n e, respetively. By efinition it follows tht if r e, then ll trnsitions lele with e exit r. Similrly, if r e, then ll trnsitions lele with e enter r.

Algorithm: PN synthesis For eh event e E generte trnsition lele with e in the PN; For eh miniml region r i R TS generte ple r i; Ple r i ontins token in the initil mrking iff the orresponing region r i ontins the initil stte of the TS s in; The flow reltion is s follows: e r i iff r i is pre-region of e n e r i iff r i is post-region of e, i.e., F TS ef = {(r, e) r R TS e E r e} {(e, r) r R TS e E r e } Fig. 3. Algorithm for Petri net synthesis from [11]. 2.3 Genertion of miniml regions The omputtion of the miniml regions is ruil for the synthesis methos in [5, 6]. It is se on the notion of exittion region [10]. Definition 7 (Exittion region 4 ). The exittion region of n event e, ER(e), is the set of sttes in whih e is enle, i.e. ER(e) = {s s : (s, e, s ) A} Miniml regions n e generte from the ERs of the events in TS in the following wy: strting from the ER of eh event, set expnsion is performe on those events tht violte the region onition ( pseuooe of the expnsion lgorithm is given in Figure 10 from [6]). This explortion n e one effiiently y onsiering only sets tht re not supersets of regions lrey ompute [6], euse only miniml regions re require. Eh time region is otine (oring to Definition 5), it is e to the set of regions. Finlly, from the set of regions ompute, non-miniml regions re remove. 2.4 Region-se synthesis The proeure given y [11] to synthesize PN, N TS = (R TS, E, F TS, R sin ), from n elementry trnsition system 5, TS = (S, E, A, s in ), is illustrte in Figure 3. Notie tht only miniml regions re require in the lgorithm [7]. An exmple of the pplition of the lgorithm is shown in Figure 2. The initil TS n the set of miniml regions is reporte in Figures 2() n (), respetively. The synthesize PN is shown in Figure 2(). 4 Exittion regions re not regions in the terms of Definition 5. The term is use ue to historil resons. 5 Elementry trnsition systems re proper sulss of the TS onsiere in this pper, were itionl onitions to the ones presente in Setion 2.1 re require.

() 1 1 2 2 () 1 () 2 Fig. 4. () TS, () ECTS y lel-splitting, () synthesize PN. 2.5 Exittion-lose trnsition systems Definition 8 (Exittion-lose TS). A trnsition system TS = (S, E, A, s in ) is lle exittion-lose (ECTS) if it stisfies the following two xioms: Exittion losure: For eh event : r r = ER() Event effetiveness: For eh event : The synthesis lgorithm in Figure 3 pplie to n ECTS erives Petri net with rehility grph isimilr to the initil TS [6]. Note tht the stte seprtion property of elementry trnsition systems, whih enfores every pir of sttes to e istinguishe y the set of regions is not require. The set of regions neee y the lgorithm to preserve isimilrity n e onstrine to miniml pre-regions. When the TS is not exittion lose, then it must e trnsforme to enfore tht property. One possile strtegy is to represent every event y multiple trnsitions with the sme lel. This tehnique is lle lel splitting. Figure 4 illustrtes this tehnique. The initil TS, shown in Figure 4() is trnsforme y splitting the event into the events 1 n 2, s shown in Figure 4(), resulting in n ECTS. The synthesize PN, with two trnsitions for event is shown in Figure 4(). Hene in PN synthesis lel splitting might e ruil for the existene of PN with isimilr ehvior. However, sometimes lel splitting might egre the resulting PN struture signifintly, eriving intrite uslity reltions tht re not helpful for visuliztion. This phenomenon is isusse in [5]. The following setions introue PN mining, version of PN synthesis where the exittion losure is roppe. 3 Algorithm for Petri net mining The gol of Petri net mining is to generte PN tht over-pproximtes ll oserve ehviors in the TS, i.e. L(TS) L(PN), n where L(PN) \ L(TS) is smll [4]. Aitionlly, otining suint PN with nie uslity reltions is esirle. For this purpose, the lssil synthesis onitions must e pte

to llow the isovery of ehviors not present in the input TS. In this setion we show simple yet powerful pproh for relxing the region-se synthesis onitions from [5, 6] to otin over-pproximtions of the TS. Formlly, given TS=(S, E, A, s in ), the theory of regions n e pte for mining Petri net PN=(P, T, F, M 0 ) with the following hrteristis: 1. L(TS) L(PN), 2. T = E, i.e. there is no lel splitting, n 3. Miniml lnguge ontinment (MLC) property: PN = (P, T, F, M 0) s.t. T = E : L(TS) L(PN ) L(PN) L(PN ) Therefore the otine Petri net represents the miniml over-pproximtion of the input TS tht n e synthesize without lel splitting. The reminer of this setion will show how to relx the region-se synthesis to erive suh Petri net. 3.1 Mining over-pproximtions of TS Bisimilrity or lnguge equivlene re very restriting equivlene reltions, not very useful for the re of Petri net mining where over-pproximtions of the initil event log re more vlule [4,19]. In [5,6], isimultion etween the TS n the synthesize PN hols ue to the exittion losure onition. Let us ssume in this setion tht the exittion losure onition is roppe, i.e. the set of miniml pre-regions of some events my properly inlue the exittion region of the event. With this simple relxtion, the PN otine y the Algorithm of Figure 3 will stisfy the MLC property (Theorem 2). Theorem 1. Let TS=(S, E, A, s in ) e trnsition system, n PN=(P, T, F, M 0 ) e the synthesize net with the set of miniml regions of TS, using Algorithm of Figure 3. Then L(TS) L(PN). Proof. The proof orrespons to the suffiieny iretion from Theorem 3.4 in [6]. The theorem gurntees isimilrity etween n ECTS n the rehility grph of the synthesize Petri net from the set of miniml regions. From the two simultions neessry for hving isimultion in tht theorem, only one is preserve if the exittion losure onition is roppe. This remining simultion is the one etween the TS n the rehility grph of the PN. Hene every tre in the TS n e simulte y the PN when miniml regions re use, even if the TS is not exittion lose. This suffies to prove the theorem. Moreover, s the following results show, regions re preserve uner lnguge ontinment or simultion. Lemm 1. Let TS=(S, E, A, s in ), TS = (S, E, A, s in ) e two trnsition systems suh tht S S, E E, T T. If r R TS then r S R TS.

Proof. If preites (i),(ii) in Definition 5 hol in r for trnsitions in A, then they lso hol for the trnsitions in A when r is restrite to S, given tht A A n S S, i.e. y removing rs, no new violtions of the region onitions n e rete (see Definition 5). We now prove similr lemm on the orresponene of regions etween simulte TSs. Lemm 2. Let TS=(S, E, A, s in ), TS = (S, E, A, s in ) e suh tht there exists simultion reltion of TS y TS with reltion π. If r R TS, then π 1 (r) R TS, n the noross/enter/exit preites for every event t r re preserve in π 1 (r). Proof. The proof for this lemm is similr to the one use in Lemm 1, ut on simulte sttes: for every trnsition (s, e, s ) A there exists trnsition (π(s), e, π(s )) A. Therefore, the preites (i),(ii) in Definition 5 hol in TS for the set π 1 (r). In generl, lnguge ontinment etween two TSs oes not imply simultion [9]. However, if the TS orresponing to the superset lnguge is eterministi then lnguge ontinment gurntees the existene of simultion: Lemm 3. Let TS 1 = (S 1, E 1, A 1, s in1 ) n TS 2 = (S 2, E 2, A 2, s in2 ) e two TSs suh tht TS 2 is eterministi, n L(TS 1 ) L(TS 2 ). Then TS 2 is simultion of TS 1. Proof. The reltion π (S 1 S 2 ) efine s follows: s 1 πs 2 σ : s in1 σ s1 s in2 σ s2 represents simultion of TS 1 y TS 2 : the first item of Definition 2 hols sine L(TS 1 ) L(TS 2 ). If the ontrry is ssume, i.e. s 1 S 1 : s 2 S 2 : s 1 πs 2 then the tre leing to s 1 in TS 1 is not fesile in TS 2, whih ontrits the ssumption. The seon item hols euse the first item n the eterminism of TS 2 : for every s 1 S 1, TS 2 eterministi implies tht there is only one stte possile s 2 S 2 suh tht s 1 πs 2. But now if e is enle in s 1 n not enle σ in s 2 will imply tht the tre σe, with s in1 s1, is not fesile in TS 2, rehing ontrition to L(TS 1 ) L(TS 2 ). An now we n proof the MLC property on the mine Petri net from TS: Theorem 2. Let PN=(P, T, F, M 0 ) e the synthesize net with the set of miniml regions of TS=(S, E, A, s in ), using Algorithm of Figure 3. Then PN stisfies the MLC property. Proof. By ontrition. Let TS = (S, E, A, s in ) e the rehility grph orresponing to PN = (P, T, F, M 0) suh tht E = T, L(TS) L(TS ) n L(PN) L(TS ). The following fts n e oserve:

TS n RG(PN) re eterministi euse E = E = T n therefore they orrespon to the rehility grph of Petri nets with ifferent lel for eh trnsition. Sine TS is eterministi n L(TS) L(TS ), then there is simultion π of TS y TS (Lemm 3). r R TS, r = π 1 (r) R TS, n the noross/enter/exit preites of the events is the sme in r n r (Lemm 2). Eh non-miniml region n e esrie s the union of isjoint miniml regions [6], n therefore we n fous only on miniml regions. Eh miniml region r R TS is region in R RG(PN), sine PN hs een otine with Algorithm of Figure 3 from the set of miniml regions in TS. Moreover, sine RG(PN) is eterministi n L(TS) L(PN) (Theorem 1), then there is simultion of TS y RG(PN) (Lemm 3). Now using Lemm 2, together with the ft tht r is region oth in R TS n R RG(PN), the noross/enter/exit preites of events in TS hol lso in RG(PN). Hene, the previous items show tht for region in TS there is orresponing region in RG(PN) with the sme noross/enter/exit preites on the events. In Petri net terms, this ft mens tht the flow reltion of PN is inlue in the flow reltion of PN. Aitionlly, the simultions onneting oth trnsition systems preserve the initil sttes (see Lemm 3). This ontrits the ssumption tht L(PN) L(TS ). 3.2 Relte issues n further extensions Now we isuss the fetures of the mining strtegy n possile extensions. Visuliztion pilities. By removing the exittion losure onition, one gurntees tht there is 1-to-1 orresponene etween the events in the log n the trnsitions in the Petri net. This is importnt in terms of visuliztion. Moreover, the set of miniml regions n inlue reunnt regions: region r is reunnt if the lnguge of the Petri net without r is preserve. Therefore reunnt regions n e sfely remove from the net. The theory in [5, 6] proposes methos to etet reunnt ples, se in the preservtion of the exittion losure. Those methos hve een pte n inlue in the mining pproh presente in this pper. Mining of Petri net sulsses. As it hs een one in synthesis (see [6], Setion 4.4), the pproh presente in this pper might e pte to mine Petri net sulsses. The si ie is to restrit the genertion of regions in orer to generte regions stisfying struturl Petri net onitions. Let us use the exmple in Figure 5 to illustrte this. In Figure 5() we report the mine two-oune Petri net from the TS of Figure 5() (next Setion shows how to generlize the mining metho to the oune se). Now imgine tht we re intereste in the mining of mrke grphs, i.e. Petri nets where ples hve t most one preeessor (suessor) trnsition. Notie tht ple p in Figure 5() oes not stisfy this onition. If the mining of mrke grph is pplie, the

p e e () e () () Fig. 5. () Initil TS, () Mine Petri net, () Mine mrke grph. 3 () () () Fig. 6. () Trnsition system, () Mine sfe Petri net, () Mine 3-oune Petri net. Petri net shown in Figure 5() is otine. Critil events. The methos presente n e extene to selet those events tht might e ritil in terms of representtion: for those events, the voine of over-pproximtion might e impose y requiring exittion losure on them. The pplition of lel-splitting n e guie to ttin this gol. 4 Mining oune Petri nets In the literture for the mining of Petri nets from event logs, it is wiely epte the use of orinry n sfe Petri nets for the isovery of proess moels ( remrkle exeption is presente in [4]). Due to the reent results for the synthesis of oune n weighte Petri nets [5], this limittion n e wve, n therefore more suint n urte moel of the log n e otine using the tehniques evelope in this pper. Moreover, the possiility to serh for unsfe regions might e ruil in orer not to over-pproximte the event log

200 120 111 102 040 031 022 013 004 p2 2 p1 2 p3 Fig. 7. A trnsition system n n equivlent oune Petri net. too muh. To illustrte this ft, see the exmple in Figure 6. The mining of sfe Petri net from the TS of Figure 6() is shown in Figure 6(), wheres Figure 6() reports the mining of 3-oune net. The lnguge epte y the PN from Figure 6() is ( ) whih might e n over-onservtive pproximtion 6, while the net in Figure 6() epts ( 3 ), whih lthough eing lso n over-pproximtion, it is more urte one. This setion introues informlly how the theory of the previous setions n e generlize to mine oune systems. In [5], n extension of the region-se synthesis of Petri nets hs een presente to support oune nets. The methos ssume tht k is initilly given for the serh of k-oune Petri net. Let us use the exmple in Figure 7 to summrize the theory. In the oune se, the si ie is tht regions re represente y multisets (i.e., stte might hve multipliity greter thn one). Figure 7 epits TS with 9 sttes n 3 events. After synthesis, the Petri net t the right is otine. Eh stte hs 3-igit lel tht orrespons to the mrking of ples p 1, p 2 n p 3 of the Petri net, respetively. The showe sttes represent the generl region tht hrterizes ple p 2. Eh grey tone represents ifferent multipliity of the stte (4 for the rkest n 1 for the lightest). Eh event hs grient with respet to the region (+2 for, -1 for n 0 for ). The grient inites how the event hnges the multipliity of the stte fter firing. For the sme exmple, the equivlent sfe Petri net hs 5 ples n 10 trnsitions. In summry, the generliztion of the theory of Setions 2 n 3 is se on the ie tht regions re no longer sets ut multisets, n the preites for region onitions must tke into ount the grient of eh event on the multisets. The exittion losure notion is efine on the support (sttes with multipliity greter or equl thn one) of the multiset. Finlly, the lgorithm for synthesis of oune Petri nets is generlize to ount for oune mrkings n weighte rs. The intereste reer my refer to [5] for etils. The theory in [5] hs een inorporte in the Petri net mining pproh presente in this pper. Hene the mining of Petri nets n e guie to fin oune Petri nets. An exmple of k-oune mining is shown in Setion 5. 6 The expression e 1 e 2 enotes the set of possile interlevings etween e 1 n e 2.

e e e e e e e () () Fig. 8. () Initil TS, () Mine Petri net. 5 Exmples, experiments n tool The theory esrie in this pper hs een inorporte in Genet, tool for the synthesis n mining of onurrent systems [5]. Most of the exmples hve een otine from [1]. Mining of sfe Petri nets Some exmples hve een presente long the pper. An itionl exmple is shown in Figure 8. The lnguge epte y the PN of Figure 8() is e ( ), whih properly inlues the lnguge of the TS of Figure 8(). Remrkly, pplying the α-lgorithm [17] to this event log results in the sme PN. Mining of oune Petri nets An exmple of the power of k-oune PN mining is shown in Figure 9. The system moele represents 5 proesess shring 3 resoures. Every proess requires one resoure, ut there is one proess tht requires two resoures. We ssume tht the TS use for this exmple n e onstrute from set of simultions. The 3-oune PN from the orresponing trnsition system ontins 20 sttes n 74 rs. The synthesis of sfe PN from the trnsition system pplies mny lel splittings in orer to enfore the exittion losure, eriving in PN with 15 ples, 34 trnsitions n 128 rs. Clerly, neither the initil TS nor the synthesize PN re of ny help to relize the ontrol flow of this exmple. However, the mine 3-oune PN is suint representtion of the log.

e1 2 2 t1 e2 e3 e4 e5 t2 t3 t4 t5 Fig. 9. Mine 3-oune PN for system of five proesses shring three resoures. synthesis mining petrify Genet Genet ProM ProM sfe sfe 2-oune Prikh Heuristis enhmrk S [S] E P T P [S] P [S] P T U [S] P T I [S] groupefollows7 18 10 7 7 7 6 11 7 11 7 0 10 7 1 8 groupefollowsl1 15 15 7 8 9 10 16 12 15 7 0 7 14 10 22 groupefollowsl2 25 25 11 15 11 15 25 15 25 11 0 13 15 3 25 herstfig6p21 16 16 7 11 13 7 22 11 16 1 6 2 18 15 herstfig6p34 32 32 12 16 13 16 34 18 32 8 2 12 19 12 herstfig6p41 20 18 14 16 14 16 18 16 18 17 0 18 14 0 18 stffwre 15 31 24 19 20 20 18 22 19 31 18 0 21 19 0 19 pn ex 10 233 210 11 64 218 13 281 16 145 8 2 14 41 25 Tle 1. PN mining pplie to event logs from [1]. Experiments The mining of some exmples is summrize in Tle 1. Following the twostep mining pproh from [13], we hve otine the trnsition systems from eh log with the FSM Miner plugin ville in ProM. For eh log, olumns report the numer of sttes of the initil log S, numer of sttes of the miniml isimilr trnsition system [S] (tht gives n ie of the mount of reunny present in the initil log) n numer of events E. Next, the numer of ples P n trnsitions T of the PN otine y synthesis is reporte. For eh version of the mining lgorithm (sfe n 2-oune), the numer of ples of the mine PN n numer of sttes of the orresponing miniml isimilr rehility grph re reporte. The CPU time for the mining of ll exmples ut the lst one hs tken less thn two seons. The mining of pn ex 10, 2- oune version, took one minute. Finlly the sme informtion is provie for two well-known mining lgorithms in ProM: the Prikh Lnguge-se Region n the Heuristis [20] miners. The numer of unonnete trnsitions ( T U )

erive y the Prikh miner n the numer of invisile trnsitions introue y the Heuristi miner is lso reporte ( T I ). The numers in Tle 1 suggest some remrks. If the synthesis is ompre with the mining in the se of sfe PNs, it shoul e notie tht even for those smll exmples the numer of trnsitions is reue, ue to the sene of lel splitting (see row for pn ex 10). The numer of ples is lso reue in generl. It shoul lso e notie tht 2-oune mining represents the log more urtely, n thus more ples re neee with respet to the mining of sfe nets. Sometimes the mine PN epts more tres ut the orresponing miniml isimilr trnsition system hs less sttes, e.g. pn ex 10: fter overpproximting the initil TS, severl sttes eome isimilr n n e minimize. The Prikh miner tens to erive very ggressive strtions, s it is emonstrte in the pn ex 10 n herstfig6p21 logs. Sometimes the Petri nets otine with this miner ontin isolte trnsitions, euse the miner oul not fin ples onneting them to the net. The Heuristis miner is se on the frequeny of ptterns in the log. The miner erives heuristi net tht n e fterwrs onverte to Petri net with ProM. Some of the Petri nets otine with this onversion turne out to e unoune (enote with symol in the tle), n ontin signifint mount of invisile trnsitions. This miner is however very roust to noise in the log. In onlusion, ifferent miners n hieve ifferent mining gols, wiening the pplition of Proess mining into severl iretions. 6 Conlusions A strtegy for pting the theory of regions for the re of Proess mining hs een presente. The min ontriution is to llow the genertion of overpproximtions of the event log y mens of oune Petri net, not neessrily sfe. An importnt result is presente tht gurntees the miniml lnguge ontinment property on the mine PN. The theory hs een inorporte in synthesis tool. Aknowlegements We woul like to thnk W. vn er Alst, E. Vereek n B. vn Dongen for helpful isussions n guine, n nonymous referees for their help in improving the finl version of the pper. This work hs een supporte y the projet FORMALISM (TIN2007-66523), n grnt y Intel Corportion. Referenes 1. Proess mining. www.proessmining.org. 2. A. Arnol. Finite Trnsition Systems. Prentie Hll, 1994.

3. E. Bouel, L. Bernrinello, n P. Droneu. Polynomil lgorithms for the synthesis of oune nets. Leture Notes in Computer Siene, 915:364 383, 1995. 4. R. Bergenthum, J. Desel, R. Lorenz, n S.Muser. Proess mining se on regions of lnguges. In Pro. 5th Int. Conf. on Business Proess Mngement, pges 375 383, Sept. 2007. 5. J. Crmon, J. Cortell, M. Kishinevsky, A. Konrtyev, L. Lvgno, n A. Ykovlev. A symoli lgorithm for the synthesis of oune Petri nets. In 29th Interntionl Conferene on Applition n Theory of Petri Nets n Other Moels of Conurreny, June 2008. 6. J. Cortell, M. Kishinevsky, L. Lvgno, n A. Ykovlev. Deriving Petri nets from finite trnsition systems. IEEE Trnstions on Computers, 47(8):859 882, Aug. 1998. 7. J. Desel n W. Reisig. The synthesis prolem of Petri nets. At Inf., 33(4):297 315, 1996. 8. A. Ehrenfeuht n G. Rozenerg. Prtil (Set) 2-Strutures. Prt I, II. At Informti, 27:315 368, 1990. 9. J. Engelfriet. Determiny - (oservtion equivlene = tre equivlene). Theor. Comput. Si., 36:21 25, 1985. 10. M. Kishinevsky, A. Konrtyev, A. Tuin, n V. Vrshvsky. Conurrent Hrwre: The Theory n Prtie of Self-Time Design. John Wiley n Sons, Lonon, 1993. 11. M. Nielsen, G. Rozenerg, n P. Thigrjn. Elementry trnsition systems. Theoretil Computer Siene, 96:3 33, 1992. 12. C. A. Petri. Kommuniktion mit Automten. PhD thesis, Bonn, Institut für Instrumentelle Mthemtik, 1962. (tehnil report Shriften es IIM Nr. 3). 13. W. vn er Alst, V. Ruin, H. Vereek, B. vn Dongen, E. Kinler, n C. Günther. Proess mining: A two-step pproh to lne etween unerfitting n overfitting. Tehnil Report BPM-08-01, BPM Center, 2008. 14. W. M. P. vn er Alst n C. W. Günther. Fining struture in unstruture proesses: The se for proess mining. In T. Bsten, G. Juhás, n S. K. Shukl, eitors, ACSD, pges 3 12. IEEE Computer Soiety, 2007. 15. W. M. P. vn er Alst, B. F. vn Dongen, C. W. Günther, R. S. Mns, A. K. A. e Meeiros, A. Rozint, V. Ruin, M. Song, H. M. W. E. Vereek, n A. J. M. M. Weijters. ProM 4.0: Comprehensive support for el proess nlysis. In J. Kleijn n A. Ykovlev, eitors, ICATPN, volume 4546 of Leture Notes in Computer Siene, pges 484 494. Springer, 2007. 16. W. M. P. vn er Alst, B. F. vn Dongen, J. Herst, L. Mruster, G. Shimm, n A. J. M. M. Weijters. Workflow mining: A survey of issues n pprohes. Dt Knowl. Eng., 47(2):237 267, 2003. 17. W. M. P. vn er Alst, T. Weijters, n L. Mruster. Workflow mining: Disovering proess moels from event logs. IEEE Trns. Knowl. Dt Eng., 16(9):1128 1142, 2004. 18. B. vn Dongen, N. Busi, G. Pinn, n W. vn er Alst. An itertive lgorithm for pplying the theory of regions in proess mining. Tehnil Report Bet rpport 195, Deprtment of Tehnology Mngement, Einhoven University of Tehnology, 2007. 19. H. Vereek, A. Pretorius, W.M.P. vn er Alst, n J.J. vn Wijk. On Petri-net synthesis n ttriute-se visuliztion. In Pro. Workshop on Petri Nets n Softwre Engineering (PNSE 07), pges 127 141, June 2007. 20. A. Weijters, W. vn er Alst, n A. A. e Meeiros. Proess mining with the heuristis miner-lgorithm. Tehnil Report WP 166, BETA Working Pper Series, Einhoven University of Tehnology, 2006.