A Framework for Semi automated Process Instance Discovery from Decorative Attributes

Size: px
Start display at page:

Download "A Framework for Semi automated Process Instance Discovery from Decorative Attributes"

Transcription

1 A Framework for Semi automated Process Instance Discovery from Decorative Attributes Andrea Burattin Student Member, IEEE Department of Pure and Applied Mathematics University of Padua, Italy Roberto Vigo Research and Development Department SIAV S.p.A. Rubano, Italy Abstract Process mining is a relatively new field of research: is final aim is to bridge the gap between data mining and business process modelling. In particular, the assumption underpinning this discipline is the availability of data coming from business process executions. In business process theory, once the process has been defined, it is possible to have a number of instances of the process running at the same time. Usually, the identification of different instances is referred to a specific case id field in the log exploited by process mining techniques. The software systems that support the execution of a business process, however, often do not record explicitly such information. This paper presents an approach that faces the absence of the case id information: we have a set of extra fields, decorating each single activity log, that are known to carry the information on the process instance. A framework is addressed, based on simple relation algebra notions, to extract the most promising case ids from the extra fields. The work is a generalization of a real business case. I. INTRODUCTION Process mining is a relatively young discipline emerged from the join of data mining with BPM [1]. The final goal of this discipline is the reconstruction of a process model, that points out a certain perspective of a given business process. Together with discovery, process mining provides also techniques for process conformance analysis (given a model, it tries to map the actual executions into the original model). It has a lot of real-world applications, such as the one described in [2]. The idea underpinning process mining techniques is that most business processes, that are executed with the support of an information system, leave traces of their activity executions and this information is stored in the so-called log files. The aim of process mining is to discover, starting from those logs, as much information as possible. Examples of reconstructed information are the control flow of the activities [3], or the network of social interactions between the originators involved in the process [4]. In order to perform such reconstructions, it is necessary that the log contains a minimum set of information. In particular, for all the recorded activities, there must be: the name of the activity; the name of the process the activity belongs to; the process case identifier (i.e. a field with the same value for all the executions of the same process instance); the time when the activity is performed; the activity originator (if available). Typically, these logs are collected in MXML files [5] or, according to a new recent standard, in XES files [6], so that they can be analyzed using the ProM tool [7]. When process mining techniques are introduced in new environments, the data can be sufficient for the mining (or can even be more than the necessary, as in [8]) or can lack some fundamental information, as considered in this paper. In the context of this work, two similar concepts are used in different ways: process instance and case id. The first term indicates the logical flow of the activities for one particular instance of the process. The case id is the way the process instance is implemented: typically, the case id is a set of one or more fields of the dataset, whose values identify a single execution of the process. It is important to note that there can be many possible case ids but, of course, not all of them are going to recognize the actual process. The final aim of this paper is to give a general framework for the selection of the most interesting case ids and then let the final decision to an expert user. The source of this work is a real business need encountered by Siav S.p.A. on logs coming from its document management solution. Section II introduces the main issues related to the selection of the case id and the contexts where the problem arises. In Section III we recall other works addressing the question, and the differences from our paper are highlighted. Section IV presents the conditions that enable the use of our technique, which is described in Section V. Sections VI reports some notes about the implementation of the algorithms, the settings and results of our experiments. Finally, Section VII concludes and sketches a line for future work. II. THE PROBLEM OF SELECTING THE CASE ID As already introduced in the previous section, it is worthwhile observing that process mining techniques assume that all logs, used as input for the mining, come from executions of business process activities. In a typical environment, there is a formal process definition (it is not required to the model to be explicit) that is implemented and executed. All these executions are sequentially recorded into the log files; those act as input for the mining algorithms [3] and these will generate the mined models (possibly different from the original process models).

2 One of the fundamental principles that underpins the idea of process modeling is that a defined process can generate any number of concurrent instances, running at the same time. Consider, as an example, the process model defined (in terms of Petri Net) in Figure 1: it is composed of five activities. The process always starts executing A; then B; C and D can run in any order and, finally, E. We can have more than one instance running at the same time so, for example, at the time t we can observe any number of activities that are executed. Figure 2 shows three instances of the same process: the first is almost complete; the second is just started and the last one has just completed the first two activities. In order to identify different instances, it is easy to figure out the need of an element that connects all the observations belonging to the same instance. This element is called case identifier (or case id ). Fig. 2 represents it with different colors and with the three labels c i, c i+1 and c i+2. A. Process Mining in New Contexts There is a clear distinction between a process model and its implementation, and there exists a lot of software tools for supporting the application of a business process. Nonetheless, a lot of software systems concur to support the process implementation, but typically the information they provide is not explicitly linked to the process model because of the lack of a case id. This is mainly due to business reasons and software interoperability issues. Consider, for example, a document management system (DMS): a tool which can store and manage digital documents (or images of paper documents). Those systems are widely used in large companies, public administrations, municipalities, etc. Of course, the documents managed with a DMS can be referred to processes and protocols (consider, for the example, the documents involved in supporting the process of selling a product). In this case, the set of documents managed by the DMS users leaves a trace of what happens at the business process level, since they are a support for the process activities. The idea presented in this paper is to exploit the information produced by such a support system, not limiting to DMSs, in order to mine the processes in which the system itself is involved. The nodal point is that such systems typically do not log explicitly the case id. Therefore, it is necessary to infer this information, that enables to relate the system entities (e.g. documents in a DMS) to process instances. III. RELATED WORK The problem of relating a set of activities to the same process instance is already known in literature. In [9], Ferreira A B Figure 1. Example of a process model. In this case, activities C and D can be executed in parallel, i.e. in no specific order. C D E Time c i A B C D c i+1 A c i+2 A B... Time Case Activity t 3 c i A t 2 c i B t 1 c i C t 1 c i+2 A t c i+1 A t c i D t c i+2 B Time t Figure 2. Two representations (one graphical and one tabular) of three instances of the same process (the one of Figure 1). It can be noticed that, at time t, the three executions of the same process reached different points. and Gillblad presented an approach for the identification of the case id based on the Expectation-Maximization (EM) technique. The main characteristics of this approach are its generality, that allows to execute it in all possible environments, and its computational complexity; furthermore, the EM-based algorithms converge to a local maximum of the likelihood function. Other approaches, such as the one presented by Ingvaldsen et al., in [1] and in [11], use the input and output produced by the activities registered in the SAP ERP. In particular, they construct process chains (composed by activities belonging to the same instance) by identifying the events that produce and consume the same set of resources. The assumption underpinning this approach (i.e. the presence of resources produced and consumed by two joint activities) seems too strong for a broad and general application. Other works that can be reduced to the same issue are presented in [12], [13], but all of them solve only partially the problem and do not provide a sufficiently general approach that can be applied to other environments. The most important difference between this work and others in literature is that, in our case, the information on the process instance is hidden inside the log (it is recorded in some fields we ignore), and therefore it has to be extracted properly. Such a difference is very important for two main reasons: (i) the settings we required are sufficiently general to be observed in a number of real environments and (ii) our technique is devised for this particular problem, hence it is more efficient than others. IV. WORKING FRAMEWORK The process mining framework we address is based on a set L of log entries originated by auditing activities on a given system. Each log entry l L is a tuple of the form (activity, timestamp, originator, info 1,..., info m )

3 being activity the name of the registered activity; timestamp the temporal instant in which the activity is registered; originator the agent that executed the registered activity; info 1,..., info m possibly empty additional attributes. The semantics of these additional attributes is a function of the activity of the respective log entry, that is, given an attribute info k and activities a 1, a 2, info k entries may represent different information for a 1 and a 2 ; moreover, the semantics is not explicit. We call these data decorative or additional since they are not exploited by standard process mining algorithms. Observe that two log entries, referring to the same activity, are not required to share the values of their additional attributes. Table I shows an example of such a log in a document management environment; please note the semantic dependency of attribute info 1 on activities: in case of Invoice it may represent the sender, in case of Cash order it may represent the account owner. The difference between such log entries and an event in the standard process mining approach is the lack of process instance information: bridging this gap, i.e. identifying the case id, is the target of our work. More in general, it can be observed that the source systems we consider do not implement explicitly some workflow concepts, since L might not come from the sampling of a formalized business process at all. Actually, L lacks also a process information: the issue is addressed in Section V-D. In the following we assume a relational algebra point of view over the framework: a log L is a relation (set of tuples) whose attributes are (activity, timestamp, originator, info 1,..., info m ). As usual, we define the projection operator π a1,...,a n on a relation R as the restriction of R to attributes a 1,..., a n (observe that duplicates are removed by projection otherwise the output would not be a set). For the sake of brevity, given a set of attributes A = {a 1,..., a n }, we denote a projection on all the attributes in A as π A (R). Analogously, given an attribute a, a value constant v and a binary operation θ, we define the selection operator σ aθv (R) as the selection of tuples of R for which θ holds between a and v; for example, given a = activity, v = Invoice, and θ being the identity function, σ aθv (L) is the set of elements of L having Invoice as value for attribute activity. For a complete survey about relational algebra concepts refer to [14]. It is worthwhile to notice that relational algebra does not deal with tuples ordering, a crucial issue in process mining. This is not a problem, however, because (i) the log can be sorted whenever required and (ii) this work concerns the generation of a log suitable for applying process mining techniques (and not a process mining algorithm itself). From now on, identifiers a 1,..., a n will range over activities. Moreover, given a log L, we define the set A(L) = π activity (L) (distinct activities occurring in L). Finally, we denote with I the set of attributes {info 1,..., info m }. As stated above, our efforts are concentrated on extracting flow of information from L, that is, guessing the case id for each entry l L, according to the following restrictions on the framework. Fixed a log L, we assume that 1) given a log entry l L, if a case id exists in l, then it is a combination of values in the set of attributes P I I (i.e. activity, timestamp, originator do not participate in the process instance definition); 2) given two log entries l 1, l 2 L such that π activity (l 1 ) = π activity (l 2 ), if P I contains the case id for l 1, then it also contains the case id for l 2 (i.e., process instance attributes set is fixed per activity); this is implied by the assumption that the semantics of additional fields is a function of the activity, as stated above. V. IDENTIFICATION OF PROCESS INSTANCES From the basis we defined, it follows that the process instance has to be guessed as a subset of I; however, since the semantics is not explicitly given, it cannot be exploited to establish correlation between activities, hence the process instance selection must be carried out looking at the entries π infoi (L), for each i [1, I ]. Nonetheless, since the semantics of I is a function of the activity, the selection should be performed for each activity in A(L), for each attribute in I, resulting in a computationally expensive procedure. In order to reduce the search space, in the following Section some intuitive heuristics are depicted, that we applied successfully in our experimental environment (cf. Section VI). Then, in Section V-B, we show how to carry out the selection process over the resulting search space. A. Exploiting a-priori Knowledge Experts of the source domain typically hold some knowledge about the data, that can be exploited to discard the less promising attributes. Let a be an activity, and C(a) I the set of attributes candidate to participate in the process instance definition, with respect to the given activity. Clearly, if no a- priori knowledge can be exploited to discard some attributes, then C(a) = I. The experiments we carried out helped us define some simple heuristics for reducing the cardinality of C(a), basing on assumptions on the data type (e.g. discarding timestamps); assumptions on the case id expected format, like average length upper and lower bounds, length variance, presence or absence of given symbols, etc. It is worthwhile to notice that this procedure may lead to discard all the attributes info i s for some activities in A(L). In the following we denote with A(C) the set of all the activities that overcome this step, that is A(C) = a A(L) {a C(a) }. A(C) contains all the activities which have some candidate attribute, that is, all the activities that can participate in the process we are looking for.

4 Table I AN EXAMPLE OF LOG EXTRACTED FROM THE DOCUMENT MANAGEMENT SYSTEM: THERE IS ALL THE BASIC INFORMATION (SUCH AS THE ACTIVITY, THE TIMESTAMPS AND THE ORIGINATOR), PLUS A SET OF INFORMATION ON THE DOCUMENTS (info 1... info m ). Activity Timestamp Originator info 1 info 2... info m a 1 = Invoice :35:47 Alice A a 2 = Waybill :36:18 Alice A B a 3 = Cash order :41:1 Bob A a 4 = Carrier receipt :12:28 Charlie A B a :45:12 Eve B a :21:2 Alice B A a :54:23 Bob C a :15:37 Charlie B A a :55:14 Bob D a :11:22 Bob D C a :1:28 Bob C a :45:9 Charlie D D B. Selection of the Identifier After the search space has been reduced and the set C(a) has been computed for each activity a A(L), we must select those elements of C(a) that participate in the process instance. The only information we can exploit in order to automatically perform this selection is the amount of data shared by different attributes. Aiming at modeling real settings, we fix a sharing threshold T, and we retain as candidate those subsets of C(a) that share at least T entries with some attribute sets of other activities. This threshold must be defined with respect to the number of distinct entries of the involved activities, for instance as a fraction of the number of entries of the less frequent one. Let (a 1, a 2 ) be a pair of activities, such that a 1 a 2 and let P I a1 and P I a2 the corresponding process instances field. We define the function S that calculates the shared values among them: S(a 1, a 2, P I a1, P I a2 ) = π P Ia1 (σ activity=a1 (L)) π P Ia2 (σ activity=a2 (L)) Observe that, in order to perform the intersection, it must hold P I a1 = P I a2. Using such function, we define the process instance candidates for (a 1, a 2 ) as: ϕ(a 1, a 2 ) = {(P I a1 P(C(a 1 )), P I a2 P(C(a 2 ))) S(a 1, a 2, P I a1, P I a2 ) > T } where P denotes the power set. Elements of ϕ(a 1, a 2 ) are pairs, whose components are those attribute sets, respectively of a 1, a 2, that share a number of values greater than T (i.e. the cardinality of the intersection of P I a1 and P I a2 is greater than T ). In the following, we denote with ϕ a the set of all the candidate attribute sets for activity a, i.e. ϕ a = {P I a 1 A(C), P I a1 P(C(a 1 )).(P I, P I a1 ) ϕ(a, a 1 )}. This formula figures out some candidate process instances that may relate two activities: it is worthwhile noticing, however, that our target is the correlation of a set of activities whose cardinality is in general greater than 2. Actually, we want to build a sequence S = a S1,..., a Sn of distinct activities. Nonetheless, given activity a Si, there may be a number of choices for a Si+1, and then a number of process instances in ϕ(a Si, a Si+1 ). Hence, a number of sequences may be built. We call chain a finite sequence C of n components of the form [a, X], being a an activity and X ϕ a. Let us denote it as follows: C = [a 1, P I a1 ], [a 2, P I a2 ],..., [a n, P I an ] such that (P I ai, P I ai+1 ) ϕ(a i, a i+1 ), with i [1, n 1]. We denote with Ci a I the i-th activity of the chain C, and with CP i the i-th P I of the chain C. Observe that a given activity must appear only once in a chain, since a process instance is defined by a single attribute set. Given a chain C ending with element [a j, P I aj ], we say that C is extensible if there exists an activity a k A(C) such that (P I aj, X) ϕ(a j, a k ), for some set X P(C(a k )). Otherwise, C is said to be complete. Moreover, we say that an activity a occurs in a chain C, denoted a C, if there exists a chain component [a, X] in C for some attribute set X. Since an activity can occur in more than one chain with different process instances, in some case we write P I ai,c to denote the process instance of activity a i in chain C. Finally, let A(C) denote the set of activities occurring in a chain C. The empty chain is denoted with. Given a chain C we define the average value sharing S(C) among selected attributes of C as: 1 i<n S(C) = S ( Ci a, Ca i+1, ) CP i I, Ci+1 P I n 1 where n denotes the chain length. All the possible complete chains on L are built according to Algorithms 1 and 2. Algorithm 1 Build Chains 1: for all a A(C) do 2: for all P I ϕ a do 3: Extend Chain([a, P I]) 4: end for 5: end for Algorithm 1 calls Algorithm 2 for all the extensible chains of length 1. Observe that the pseudo code of Algorithms 1 and 2 builds also some chains which are permutations of one

5 Algorithm 2 Extend Chain Require: a chain C = [a 1, P I 1 ],..., [a i 1, P I i 1 ] 1: for all a i A(C) a i / C do 2: for all P I i ϕ ai (P I i 1, P I i ) ϕ(a i 1, a i ) do 3: C = C, [a i, P I i ] 4: return Extend Chain(C) 5: end for 6: end for another (cf. Section V-D). Example 1. The following example highlights the bias introduced by the order of activities and attributes selection in building chains. Limiting to the first two attributes of the log given in Table I we can derive the following settings: A(C) = {a 1, a 2, a 3, a 4 } C(a 1 ) = {info 1 } C(a 2 ) = {info 1, info 2 } C(a 3 ) = {info 1 } C(a 4 ) = {info 1, info 2 } Observe that field info 2 is discarded as a candidate attribute for activities a 1, a 3, since its values have the form of timestamps (cf. Section V-A). We compute now the values of function ϕ for the activities in A(C). Let T = 1 and consider ϕ(a 1, a 2 ): P I a1 = {info 1 }, since it is the only available candidate attribute; P I a2 may be chosen among the sets {info 1 }, {info 2 }, {info 1, info 2 }. We need to find those attribute sets of a 1, a 2 respectively, that share at least two values in the log (since T = 1). It is simple to see that the values of info 1 exactly overlap for both the activities, thus ({info 1 }, {info 1 }) is such a pair; another valid pair is ({info 1 }, {info 2 }), since S(a 1, a 2, {info 1 }, {info 2 }) = π info1 (σ activity=a1 (L)) π info2 (σ activity=a2 (L)) = 2 > 1 in fact and π info1 (σ activity=a1 (L)) = {A, B, D} π info2 (σ activity=a2 (L)) = {B, A, C} Follow the values of function ϕ for all the activities in A(C) and the respective { values of S } ({info1 }, {info ϕ(a 1, a 2 ) = 1 }), ({info 1 }, {info 2 }) with S = 3 and S = 2, respectively; ϕ(a 1, a 3 ) = { } ({info1 }, {info ϕ(a 1, a 4 ) = 1 }), ({info 1 }, {info 2 }) with S = 3 and S = 3, respectively; ϕ(a 2, a 3 ) = {({info 2 }, {info 1 })} with S = 2; ϕ(a 2, a 4 ) = ({info 1 }, {info 1 }), ({info 1 }, {info 2 }), ({info 2 }, {info 1 }), ({info 2 }, {info 2 }), ({info 1, info 2 }, {info 1, info 2 }) with S = 3, S = 3, S = 2, S = 2 and S = 2, respectively; ϕ(a 3, a 4 ) =. Assume to start the chain building process from activity a 1 and, for the sake of simplicity, assume to restrict the log to activities a 1, a 2, a 3. The next activity a i has to be chosen among those such that ϕ(a 1, a i ), thus only a 2 is selectable. ϕ(a 1, a 2 ) contains two pairs: suppose to select ({info 1 }, {info 1 }). The first selection step is finished and the chain is C = [a 1, {info 1 }], [a 2, {info 1 }] Then, in order to extend the chain, we must look for an activity a i a 1, a 2 such that ϕ(a 2, a i ) and there exists a pair in ϕ(a 2, a i ) that contains {info 1 } as its first component. For such conditions an activity does not exist, the chain is complete. It is simple to see, however, that both choosing the process instances pair ({info 1 }, {info 2 }) for (a 1, a 2 ), or starting from a 3, would produce complete chains of length 3. Since the candidate attributes are computed on a statistical basis, the chain length is not the only significant parameter to take into account, but the example underlines the importance of building all the complete chains. 1) Match Heuristics: In computing the amount of data shared by two activities via function ϕ, heuristics approaches may help in modeling the complexity of a real domain. Actually, the comparison performed between values does not need to be an identity match, but rather a fuzzy match can be implemented. Guided by this basic heuristics, we can substitute the intersection operator in ϕ with an approximation of its, whose definition may be domain specific or not. Simple examples we tested in our experimental environment are: equality match up to X leading characters, equality match up to Y trailing characters, and their combinations. In general it is possible to use a measure for string distance. C. Results Organization and Filtering In the previous Sections we shown how to compute a number of chains (i.e. a number of logs); in general, a domain expert is able of telling apart good chains from less reasonable ones, but this could be a demanding task. This section aims presenting the problem of comparing different chains. In order to address this issue, it is worthwhile to analyze a methodology that helps restricting the number of possible chains. Generally, we can reject a chain in favor of another one if and only if the latter contains all the activities of the former, and it is either simpler or it supports a higher confidence. Example of parameters to take into account are:

6 the number of attributes in the process instance of a chain component (recall that each component has the same number of process instance attributes): a chain that concerns less attributes may be considered simpler, thus preferable since more readable for a human agent; the cardinality of the value sharing between chain components (S( )): a chain whose share factor is higher gives higher confidence; this parameter could be tuned by a threshold. Let H be the set of complete chains computed by Algorithms 1 and 2, without permutations. Given two chains C 1 = [a 1, P I a1 ],..., [a n, P I an ] C 2 = [b 1, P I b1 ],..., [b m, P I bm ] in the set H, we define an ordering operator as in Equation 1. defines a reflexive, antisymmetric, and transitive relation over chains, hence (H, ) is a partially ordered set [15]. For the sake of simplicity, in the above formulation we do not use any threshold to tune the value sharing comparison. The ordering we defined strives to equip the framework with a notion of best chains, i.e. those chains which it is worthwhile suggesting to a domain expert. The maximal elements of the poset are exactly those chains, as underlined in Example 2. Example 2. Recall the settings of Example 1. In this case the set H contains the complete chains listed below. C 1 = [a 1, {info 1 }], [a 2, {info 1 }], [a 4, {info 1 }], S(C 1 ) = 3 C 2 = [a 1, {info 1 }], [a 2, {info 1 }], [a 4, {info 2 }], S(C 2 ) = 3 C 3 = [a 1, {info 1 }], [a 2, {info 2 }], [a 3, {info 1 }], S(C 3 ) = 2 C 4 = [a 1, {info 1 }], [a 2, {info 2 }], [a 4, {info 1 }], S(C 4 ) = 2 C 5 = [a 1, {info 1 }], [a 2, {info 2 }], [a 4, {info 2 }], S(C 5 ) = 2 C 6 = [a 1, {info 1 }], [a 4, {info 2 }], [a 2, {info 2 }], [a 3, {info 1 }] C 7 = [a 2, {info 1, info 2 }], [a 4, {info 1, info 2 }] Applying the ordering relation we can compare different chains, as follows: C 1 = C 2 : in fact both C 1 C 2 nor C 1 C 2 holds; C 1, C 2 are not comparable with C 3, since A(C 1 ) = A(C 2 ) neither contain nor are contained in A(C 3 ); C 1 = C 2 C 4 since S(C 1 ) = S(C 2 ) > S(C 4 ); C 4 = C 5 ; i [1, 5] {7} such that C i C 6 ; C 7 is not comparable with C 3. The Hasse diagram representing the poset (H { }, ) is given in Figure 3. It follows that only chain C 6 should undergo the experts inspection. D. Deriving a Log to Mine For each chain C with positive length, we can build a log L whose tuples have the form: (activity, timestamp, originator, caseid, processid) Please observe that the process instance we selected is a set of attributes, whereas a single one is expected by standard process C 3 C 6 C 1 = C 2 C 4 = C 5 Figure 3. Hasse diagram representing the poset (H { }, ) of Example 2. mining techniques. Hence, a composition function k from a set of values to a single one is needed (e.g. string concatenation). Eventually, the log L is obtained, starting from L with the execution of Algorithm 3. Algorithm 3 Conversion of L to L 1: L 2: chainno 3: for all C H do 4: L C σ activity A(C) (L) 5: for all l L C do 6: activity π activity (l) 7: timestamp π timestamp (l) 8: 9: originator π originator (l) caseid k ( π P Iactivity,C (l) ) C 7 1: processid chainno 11: L L {(activity, timestamp, originator, caseid, processid)} 12: end for 13: chainno chainno : end for Observe that the chain construction is sequential (i.e. it iteratively adds a component) and hence there is an implicit ordering among its elements. In order to build the log L, once all the chains are complete (no more extensible), it is possible to ignore the chains that are permutations of a given one. Thus, some chains computed by Algorithm 1 can be discarded. It is worthwhile observing, however, that maximal elements in the poset represent different processes. A conservative approach compels us considering each maximal chain as defining a distinct process. The following example illustrates the reason why we chose this approach: let C 1 =..., [a i 1, P I ai 1 ], [a i, P I ai ], [a i+1, P I ai+1 ],... C 2 =..., [b j 1, P I bj 1 ], [b j, P I bj ], [b j+1, P I aj+1 ],... be two maximal chains where P I ai 1 P I bj 1 ; P I ai P I bj ; P I ai+1 P I bj+1 and a i = b j. In other words, C 1

7 A P I 1 B P I 1 A B S(A) S(B) A(A) A(B) if A(A) = A(B) S(A) = S(B) if A(A) = A(B) S(A) S(B) otherwise (1) and C 2 contain the same activity a i but with different process instances. Considering C 1 and C 2 as belonging to the same process is not desirable, since it can lead to inconsistent control flow reconstruction. Hence, each maximal chain defines a process and the domain expert is in charge of recognizing if different chains belong to the same real process. During the conversion of L to the process log L, we assign as process id a chain counter. VI. EXPERIMENTAL RESULTS As explained in the introduction, this paper is connected to a real business need, and it presents a generalization of a research activity carried out by Siav S.p.A. The existing implementation is limited to process instances constituted by a single attribute (e.g. P I i = 1), due both to a-priori knowledge about the domain and computational requirements. In particular, all the preprocessing steps that reduce the search space are implemented as Oracle store procedures, written in PL/SQL. Then the chain building algorithms are implemented in C#. Moreover, for improving performances, we do not compute the heuristics on the whole log, but on a fraction of random entries of its. We experimented our implementation on logs coming from a document management system; the source log is reduced to the form described in Section IV after undergoing some preprocessing steps. We applied the algorithms to real logs obtaining concrete results, validated by domain experts. In Table II the main information is summarized: please notice that the expert chains are always within the set of maximal chains (computed by the algorithm) because selected among the fists. Figure 4 shows how chains evolve when the log cardinality scales up: in particular, notice that the number of chains tends to increase, while the poset structure tears down the number of chains we presented to the domain experts. Figure 5 plots the processing time: it is a function of the log cardinality, of the number of activities in the log (i.e. the number of possible chain components), and of the number of decorative attributes (i.e. the number of possible ways of chaining two components). The knowledge of the application domain gave us the opportunity to implement some heuristics, as explained in Sections V-A and V-B1. The following criteria were selected in order to reduce the search space: a candidate attribute must have a string type (e.g. we discard timestamps and numeric types, that in our case mostly represent prices); the values of a candidate attribute must fulfill these requirements: maximum average length: 2 characters, H Max chains Experts' chains Figure 4. This figure plots the total number of chains the procedure identifies, the number of maximal chains and the number of chains the expert will select, given the size of the preprocessed log Time (seconds) Figure 5. This figure represents the time (expressed in seconds) required for the extraction of the chains. minimum average length: 3 characters, maximum variation w.r.t. the average length: 1. Finally, we relax the intersection operator in ϕ requiring values identity up to the first leading character and up to 2 trailing characters. The experiments were carried out on an Intel Core2 Quad at 2,4 GHz, equipped with 4GB of RAM. The DBMS where the logs were stored was local to the machine, thus no network overhead has to be considered. VII. CONCLUSION AND FUTURE WORK This work presents a new approach for the identification of process instances on logs generated from systems that are not process-aware. Process instance information is guessed exploiting additional information with respect to a standard process mining framework, but this information is typically available when dealing with software systems. Our work is a generalization of a real business case related to a document

8 Table II RESULTS SUMMARY. HORIZONTAL LINES SEPARATE DIFFERENT LOG SOURCES. L A(L ) I Time H Maximal chains Experts chains s s m 4s s s s m management system, where discovering the process instance means correlating different document sets. The procedure we described to solve the problem is entirely based on the information that decorates documents, and relies on a relational algebra approach. Moreover, we deem that our generalization can be fairly adoptable in a number of domains, with a reasonable effort. Eventually, observe that we aim at applying some process mining techniques on a log computed on statistical basis, thus error-robust mining algorithms must be used. There is still some important work that could improve the presented results. For example, we think it would be interesting to consider not only the value of the case id candidates, but to go deeper, to their semantic meaning (if any), which could act as a-priori knowledge. Moreover, a flexible framework for expressing and feeding the system with a-priori knowledge is desirable, in order to earn a higher level of generalization. Then, other refinements are domain-specific: dealing with documents, for instance, we could exploit their content in order to confirm or reject the findings of our algorithms, when the result confidence is low. Finally, before carrying on these new ideas, we are planning a wider test campaign on logs coming from other systems. From the implementation point of view, it is desirable to extend the current version of the software in order to consider more than one attribute per process instance. ACKNOWLEDGMENT This work was entirely supported by SIAV S.p.A. We thank Nicola Costantini from SIAV and Prof. Alessandro Sperduti, for their important suggestions and supports. [4] W. M. P. van Der Aalst and A. H. M. Ter Hofstede, YAWL: yet another workflow language, Information Systems, vol. 3, no. 4, pp , Jun. 25. [Online]. Available: [5] W. M. P. van Der Aalst, B. F. van Dongen, C. W. Günther, R. S. Mans, A. K. A. de Medeiros, A. Rozinat, V. Rubin, M. Song, H. M. W. Verbeek, and A. J. M. M. Weijters, ProM 4.: Comprehensive Support for Real Process Analysis, in Petri Nets and Other Models of Concurrency, J. Kleijn and A. Yakovlev, Eds. Springer, 27, pp [6] C. W. Günther, XES - Standard Definition, 29. [Online]. Available: [7] B. F. van Dongen, A. K. A. de Medeiros, H. M. W. Verbeek, A. J. M. M. Weijters, and W. M. P. van der Aalst, The ProM framework: A new era in process mining tool support, Application and Theory of Petri Nets, vol. 3536, pp , 25. [8] A. Burattin and A. Sperduti, Heuristics Miner for Time Intervals, in European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, 21. [9] D. R. Ferreira and D. Gillblad, Discovering Process Models from Unlabelled Event Logs, in Business Process Management, 7th International Conference, BPM 29, ser. Lecture Notes in Computer Science, U. Dayal, H. A. Reijers, J. Eder, and J. Koehler, Eds., vol Ulm, Germany: Springer, 29, pp [1] J. Espen Ingvaldsen and J. Atle Gulla, Preprocessing Support for Large Scale Process Mining of SAP Transactions, in Business Process Management Workshops, A. H. M. Ter Hofstede, B. Benatallah, and H.- Y. Paik, Eds. Springer Berlin / Heidelberg, 28, pp [Online]. Available: [11], Semantic Business Process Mining of SAP Transactions, 1st ed. Business Science Reference, 21, ch. 17, pp [12] M. Walicki and D. R. Ferreira, Mining Sequences for Patterns with Non-Repeating Symbols, in IEEE Congress on Evolutionary Computation 21, Barcelona, Spain, 21, pp [Online]. Available: all.jsp?arnumber= [13] D. R. Ferreira, M. Zacarias, M. Malheiros, and P. Ferreira, Approaching Process Mining with Sequence Clustering : Experiments and Findings, in Business Process Management, G. Alonso, P. Dadam, and M. Rosemann, Eds., no. 1. Springer Berlin / Heidelberg, 27, pp [14] R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 5th ed. Boston: Addison-Wesley, 26. [15] B. Schröder, Ordered Sets: An Introduction. Boston: Birkhäuser Boston, 22. REFERENCES [1] W. M. P. van der Aalst, Process Discovery: Capturing the Invisible, IEEE Computational Intelligence Magazine, vol. 5, no. 1, pp , 21. [Online]. Available: [2] W. M. P. van Der Aalst, H. A. Reijers, A. J. M. M. Weijters, B. F. van Dongen, A. K. A. de Medeiros, M. Song, and H. M. W. Verbeek, Business process mining: An industrial application, Information Systems, vol. 32, no. 5, pp , Jul. 27. [Online]. Available: [3] W. M. P. van Der Aalst, A. J. M. M. Weijters, B. F. van Dongen, J. Herbst, L. Maruster, and G. Schimm, Workflow mining: a survey of issues and approaches, Data & Knowledge Engineering, vol. 47, no. 2, pp , 23.

An Experimental Evaluation of Passage-Based Process Discovery

An Experimental Evaluation of Passage-Based Process Discovery An Experimental Evaluation of Passage-Based Process Discovery H.M.W. Verbeek and W.M.P. van der Aalst Technische Universiteit Eindhoven Department of Mathematics and Computer Science P.O. Box 513, 5600

More information

Flexible Heuristics Miner (FHM)

Flexible Heuristics Miner (FHM) Flexible Heuristics Miner (FHM) A.J.M.M. Weijters Eindhoven University of Technology Email: a.j.m.m.weijters@tue.nl J.T.S. Ribeiro Eindhoven University of Technology Email: j.t.s.ribeiro@tue.nl Abstract

More information

Automatic Root Cause Identification Using Most Probable Alignments

Automatic Root Cause Identification Using Most Probable Alignments Automatic Root Cause Identification Using Most Probable Alignments Marie Koorneef, Andreas Solti 2, Henrik Leopold, Hajo A. Reijers,3 Department of Computer Sciences, Vrije Universiteit Amsterdam, The

More information

Discovering Petri Nets

Discovering Petri Nets Discovering Petri Nets It s a kind of magic Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information and Technology P.O. Box 513, 5600 MB Eindhoven The Netherlands w.m.p.v.d.aalst@tm.tue.nl

More information

Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach

Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Krzysztof Pancerz, Wies law Paja, Mariusz Wrzesień, and Jan Warcho l 1 University of

More information

Process Mining in Non-Stationary Environments

Process Mining in Non-Stationary Environments and Machine Learning. Bruges Belgium), 25-27 April 2012, i6doc.com publ., ISBN 978-2-87419-049-0. Process Mining in Non-Stationary Environments Phil Weber, Peter Tiňo and Behzad Bordbar School of Computer

More information

Business Process Technology Master Seminar

Business Process Technology Master Seminar Business Process Technology Master Seminar BPT Group Summer Semester 2008 Agenda 2 Official Information Seminar Timeline Tasks Outline Topics Sergey Smirnov 17 April 2008 Official Information 3 Title:

More information

Discovering Block-Structured Process Models From Event Logs - A Constructive Approach

Discovering Block-Structured Process Models From Event Logs - A Constructive Approach Discovering Block-Structured Process Models From Event Logs - A Constructive Approach S.J.J. Leemans, D. Fahland, and W.M.P. van der Aalst Department of Mathematics and Computer Science, Eindhoven University

More information

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on

More information

DrFurby Classifier. Process Discovery BPM Where innovation starts

DrFurby Classifier. Process Discovery BPM Where innovation starts Den Dolech 2, 5612 AZ Eindhoven P.O. Box 513, 5600 MB Eindhoven The Netherlands www.tue.nl Author Eric Verbeek and Felix Mannhardt Date June 14, 2016 Version 1.2 DrFurby Classifier Process Discovery Contest

More information

Causal Nets: A Modeling Language Tailored towards Process Discovery

Causal Nets: A Modeling Language Tailored towards Process Discovery Causal Nets: A Modeling Language Tailored towards Process Discovery Wil van der Aalst, Arya Adriansyah, and Boudewijn van Dongen Department of Mathematics and Computer Science, Technische Universiteit

More information

Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics

Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Raman

More information

Genetic Process Mining

Genetic Process Mining Genetic Process Mining W.M.P. van der Aalst, A.K. Alves de Medeiros, and A.J.M.M. Weijters Department of Technology Management, Eindhoven University of Technology P.O. Box 513, NL-5600 MB, Eindhoven, The

More information

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Massimo Franceschet Angelo Montanari Dipartimento di Matematica e Informatica, Università di Udine Via delle

More information

Genetic Process Mining

Genetic Process Mining Genetic Process Mining W.M.P. van der Aalst, A.K. Alves de Medeiros, and A.J.M.M. Weijters Department of Technology Management, Eindhoven University of Technology P.O. Box 513, NL-5600 MB, Eindhoven, The

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

A new Approach to Drawing Conclusions from Data A Rough Set Perspective Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy

More information

Data-Driven Logical Reasoning

Data-Driven Logical Reasoning Data-Driven Logical Reasoning Claudia d Amato Volha Bryl, Luciano Serafini November 11, 2012 8 th International Workshop on Uncertainty Reasoning for the Semantic Web 11 th ISWC, Boston (MA), USA. Heterogeneous

More information

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Massimo Franceschet Angelo Montanari Dipartimento di Matematica e Informatica, Università di Udine Via delle

More information

Interpreting Low and High Order Rules: A Granular Computing Approach

Interpreting Low and High Order Rules: A Granular Computing Approach Interpreting Low and High Order Rules: A Granular Computing Approach Yiyu Yao, Bing Zhou and Yaohua Chen Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail:

More information

Efficiently merging symbolic rules into integrated rules

Efficiently merging symbolic rules into integrated rules Efficiently merging symbolic rules into integrated rules Jim Prentzas a, Ioannis Hatzilygeroudis b a Democritus University of Thrace, School of Education Sciences Department of Education Sciences in Pre-School

More information

Markings in Perpetual Free-Choice Nets Are Fully Characterized by Their Enabled Transitions

Markings in Perpetual Free-Choice Nets Are Fully Characterized by Their Enabled Transitions Markings in Perpetual Free-Choice Nets Are Fully Characterized by Their Enabled Transitions Wil M.P. van der Aalst Process and Data Science (PADS), RWTH Aachen University, Germany. wvdaalst@pads.rwth-aachen.de

More information

Checking Behavioral Conformance of Artifacts

Checking Behavioral Conformance of Artifacts Checking Behavioral Conformance of Artifacts Dirk Fahland Massimiliano de Leoni Boudewijn F. van Dongen Wil M.P. van der Aalst, Eindhoven University of Technology, The Netherlands (d.fahland m.d.leoni

More information

Process Discovery and Conformance Checking Using Passages

Process Discovery and Conformance Checking Using Passages Fundamenta Informaticae XX (2012) 1 35 1 DOI 10.3233/FI-2012-0000 IOS Press Process Discovery and Conformance Checking Using Passages W.M.P. van der Aalst Department of Mathematics and Computer Science,

More information

Security as a Resource in Process-Aware Information Systems

Security as a Resource in Process-Aware Information Systems Security as a Resource in Process-Aware Information Systems 14 October 2011 Joint Work with: Jason Crampton Information Security Group Royal Holloway Jim Huan-Pu Kuo Department of Computing Imperial College

More information

Kleene Algebras and Algebraic Path Problems

Kleene Algebras and Algebraic Path Problems Kleene Algebras and Algebraic Path Problems Davis Foote May 8, 015 1 Regular Languages 1.1 Deterministic Finite Automata A deterministic finite automaton (DFA) is a model of computation that can simulate

More information

Similarity-based Classification with Dominance-based Decision Rules

Similarity-based Classification with Dominance-based Decision Rules Similarity-based Classification with Dominance-based Decision Rules Marcin Szeląg, Salvatore Greco 2,3, Roman Słowiński,4 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań,

More information

Can Vector Space Bases Model Context?

Can Vector Space Bases Model Context? Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy melo@dei.unipd.it Abstract Current Information Retrieval

More information

What You Must Remember When Processing Data Words

What You Must Remember When Processing Data Words What You Must Remember When Processing Data Words Michael Benedikt, Clemens Ley, and Gabriele Puppis Oxford University Computing Laboratory, Park Rd, Oxford OX13QD UK Abstract. We provide a Myhill-Nerode-like

More information

SOFTWARE ARCHITECTURE DESIGN OF GIS WEB SERVICE AGGREGATION BASED ON SERVICE GROUP

SOFTWARE ARCHITECTURE DESIGN OF GIS WEB SERVICE AGGREGATION BASED ON SERVICE GROUP SOFTWARE ARCHITECTURE DESIGN OF GIS WEB SERVICE AGGREGATION BASED ON SERVICE GROUP LIU Jian-chuan*, YANG Jun, TAN Ming-jian, GAN Quan Sichuan Geomatics Center, Chengdu 610041, China Keywords: GIS; Web;

More information

1 [15 points] Frequent Itemsets Generation With Map-Reduce

1 [15 points] Frequent Itemsets Generation With Map-Reduce Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.

More information

A 2-Approximation Algorithm for Scheduling Parallel and Time-Sensitive Applications to Maximize Total Accrued Utility Value

A 2-Approximation Algorithm for Scheduling Parallel and Time-Sensitive Applications to Maximize Total Accrued Utility Value A -Approximation Algorithm for Scheduling Parallel and Time-Sensitive Applications to Maximize Total Accrued Utility Value Shuhui Li, Miao Song, Peng-Jun Wan, Shangping Ren Department of Engineering Mechanics,

More information

Fuzzy Systems. Introduction

Fuzzy Systems. Introduction Fuzzy Systems Introduction Prof. Dr. Rudolf Kruse Christian Moewes {kruse,cmoewes}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge

More information

Mining periodic patterns from nested event logs

Mining periodic patterns from nested event logs University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2014 Mining periodic patterns from nested event

More information

Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations

Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations S. Alonso Department of Software Engineering University of Granada, 18071, Granada, Spain; salonso@decsai.ugr.es, F.J. Cabrerizo

More information

Computational Tasks and Models

Computational Tasks and Models 1 Computational Tasks and Models Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to

More information

Ranking Verification Counterexamples: An Invariant guided approach

Ranking Verification Counterexamples: An Invariant guided approach Ranking Verification Counterexamples: An Invariant guided approach Ansuman Banerjee Indian Statistical Institute Joint work with Pallab Dasgupta, Srobona Mitra and Harish Kumar Complex Systems Everywhere

More information

Theory of Computation

Theory of Computation Theory of Computation Lecture #2 Sarmad Abbasi Virtual University Sarmad Abbasi (Virtual University) Theory of Computation 1 / 1 Lecture 2: Overview Recall some basic definitions from Automata Theory.

More information

Ultimate approximation and its application in nonmonotonic knowledge representation systems

Ultimate approximation and its application in nonmonotonic knowledge representation systems Ultimate approximation and its application in nonmonotonic knowledge representation systems Marc Denecker Department of Computer Science, K.U.Leuven Celestijnenlaan 200A, B-3001 Heverlee Département d

More information

Intuitionistic Fuzzy Estimation of the Ant Methodology

Intuitionistic Fuzzy Estimation of the Ant Methodology BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 2 Sofia 2009 Intuitionistic Fuzzy Estimation of the Ant Methodology S Fidanova, P Marinov Institute of Parallel Processing,

More information

Estimating the Parameters of Randomly Interleaved Markov Models

Estimating the Parameters of Randomly Interleaved Markov Models Estimating the Parameters of Randomly Interleaved Markov Models Daniel Gillblad, Rebecca Steinert (Authors) Swedish Institute of Computer Science Box 1263, SE-164 29 Kista, Sweden Emails: {dgi, rebste}@sics.se

More information

Can we measure the difficulty of an optimization problem?

Can we measure the difficulty of an optimization problem? 1 / 22 Can we measure the difficulty of an optimization problem? *, Tom Everitt**, and Marcus Hutter*** * Dept. of Electrical and Electronic Engineering The University of Melbourne ** Department of Mathematics

More information

On Real-time Monitoring with Imprecise Timestamps

On Real-time Monitoring with Imprecise Timestamps On Real-time Monitoring with Imprecise Timestamps David Basin 1, Felix Klaedtke 2, Srdjan Marinovic 1, and Eugen Zălinescu 1 1 Institute of Information Security, ETH Zurich, Switzerland 2 NEC Europe Ltd.,

More information

Learning Hybrid Process Models From Events

Learning Hybrid Process Models From Events Learning Hybrid Process Models From Events Process Discovery Without Faking Confidence (Experimental Results) Wil M.P. van der Aalst 1,2 and Riccardo De Masellis 2 and Chiara Di Francescomarino 2 and Chiara

More information

Fuzzy Cognitive Maps Learning through Swarm Intelligence

Fuzzy Cognitive Maps Learning through Swarm Intelligence Fuzzy Cognitive Maps Learning through Swarm Intelligence E.I. Papageorgiou,3, K.E. Parsopoulos 2,3, P.P. Groumpos,3, and M.N. Vrahatis 2,3 Department of Electrical and Computer Engineering, University

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Discrete Mathematics: Lectures 6 and 7 Sets, Relations, Functions and Counting Instructor: Arijit Bishnu Date: August 4 and 6, 2009

Discrete Mathematics: Lectures 6 and 7 Sets, Relations, Functions and Counting Instructor: Arijit Bishnu Date: August 4 and 6, 2009 Discrete Mathematics: Lectures 6 and 7 Sets, Relations, Functions and Counting Instructor: Arijit Bishnu Date: August 4 and 6, 2009 Our main goal is here is to do counting using functions. For that, we

More information

Space-Time Viewpoints for Concurrent Processes Represented by Relational Structures

Space-Time Viewpoints for Concurrent Processes Represented by Relational Structures Space-Time Viewpoints for Concurrent Processes Represented by Relational Structures Irina Virbitskaite 1,2, Elena Bozhenkova 1,2, Evgeny Erofeev 3 1 A.P. Ershov Institute of Informatics Systems, SB RAS

More information

Automatic Partitions Extraction to Distribute the Runtime Verification of a Global Specification

Automatic Partitions Extraction to Distribute the Runtime Verification of a Global Specification Automatic Partitions Extraction to Distribute the Runtime Verification of a Global Specification Angelo Ferrando DIBRIS, University of Genova, Italy angelo.ferrando@dibris.unige.it Abstract. Trace expressions

More information

The State Explosion Problem

The State Explosion Problem The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis

More information

Pattern Structures 1

Pattern Structures 1 Pattern Structures 1 Pattern Structures Models describe whole or a large part of the data Pattern characterizes some local aspect of the data Pattern is a predicate that returns true for those objects

More information

Explaining Results of Neural Networks by Contextual Importance and Utility

Explaining Results of Neural Networks by Contextual Importance and Utility Explaining Results of Neural Networks by Contextual Importance and Utility Kary FRÄMLING Dep. SIMADE, Ecole des Mines, 158 cours Fauriel, 42023 Saint-Etienne Cedex 2, FRANCE framling@emse.fr, tel.: +33-77.42.66.09

More information

MultiscaleMaterialsDesignUsingInformatics. S. R. Kalidindi, A. Agrawal, A. Choudhary, V. Sundararaghavan AFOSR-FA

MultiscaleMaterialsDesignUsingInformatics. S. R. Kalidindi, A. Agrawal, A. Choudhary, V. Sundararaghavan AFOSR-FA MultiscaleMaterialsDesignUsingInformatics S. R. Kalidindi, A. Agrawal, A. Choudhary, V. Sundararaghavan AFOSR-FA9550-12-1-0458 1 Hierarchical Material Structure Kalidindi and DeGraef, ARMS, 2015 Main Challenges

More information

On the complexity of shallow and deep neural network classifiers

On the complexity of shallow and deep neural network classifiers On the complexity of shallow and deep neural network classifiers Monica Bianchini and Franco Scarselli Department of Information Engineering and Mathematics University of Siena Via Roma 56, I-53100, Siena,

More information

DISCOVERING BLOCK STRUCTURED PARALLEL PROCESS MODELS FROM CAUSALLY COMPLETE EVENT LOGS

DISCOVERING BLOCK STRUCTURED PARALLEL PROCESS MODELS FROM CAUSALLY COMPLETE EVENT LOGS Journal of ELECTRICAL ENGINEERING, VOL 67 (2016), NO2, 111 123 DISCOVERING BLOCK STRUCTURED PARALLEL PROCESS MODELS FROM CAUSALLY COMPLETE EVENT LOGS Julijana Lekić Dragan Milićev α-algorithm is suitable

More information

Georg Frey ANALYSIS OF PETRI NET BASED CONTROL ALGORITHMS

Georg Frey ANALYSIS OF PETRI NET BASED CONTROL ALGORITHMS Georg Frey ANALYSIS OF PETRI NET BASED CONTROL ALGORITHMS Proceedings SDPS, Fifth World Conference on Integrated Design and Process Technologies, IEEE International Conference on Systems Integration, Dallas,

More information

Tennis player segmentation for semantic behavior analysis

Tennis player segmentation for semantic behavior analysis Proposta di Tennis player segmentation for semantic behavior analysis Architettura Software per Robot Mobili Vito Renò, Nicola Mosca, Massimiliano Nitti, Tiziana D Orazio, Donato Campagnoli, Andrea Prati,

More information

Maximal ideal recursive semantics for defeasible argumentation

Maximal ideal recursive semantics for defeasible argumentation Maximal ideal recursive semantics for defeasible argumentation Teresa Alsinet 1, Ramón Béjar 1, Lluis Godo 2, and Francesc Guitart 1 1 Department of Computer Science University of Lleida Jaume II, 69 251

More information

Pattern-Oriented Analysis and Design (POAD) Theory

Pattern-Oriented Analysis and Design (POAD) Theory Pattern-Oriented Analysis and Design (POAD) Theory Jerry Overton Computer Sciences Corporation, CSC joverton@csc.com Abstract Pattern-Oriented Analysis and Design (POAD) is the practice of building complex

More information

Designing and Evaluating Generic Ontologies

Designing and Evaluating Generic Ontologies Designing and Evaluating Generic Ontologies Michael Grüninger Department of Industrial Engineering University of Toronto gruninger@ie.utoronto.ca August 28, 2007 1 Introduction One of the many uses of

More information

Fuzzy Systems. Introduction

Fuzzy Systems. Introduction Fuzzy Systems Introduction Prof. Dr. Rudolf Kruse Christoph Doell {kruse,doell}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge Processing

More information

Using a Hopfield Network: A Nuts and Bolts Approach

Using a Hopfield Network: A Nuts and Bolts Approach Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of

More information

Uncertain Quadratic Minimum Spanning Tree Problem

Uncertain Quadratic Minimum Spanning Tree Problem Uncertain Quadratic Minimum Spanning Tree Problem Jian Zhou Xing He Ke Wang School of Management Shanghai University Shanghai 200444 China Email: zhou_jian hexing ke@shu.edu.cn Abstract The quadratic minimum

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining 13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

Design of Distributed Systems Melinda Tóth, Zoltán Horváth

Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Publication date 2014 Copyright 2014 Melinda Tóth, Zoltán Horváth Supported by TÁMOP-412A/1-11/1-2011-0052

More information

Requirements Validation. Content. What the standards say (*) ?? Validation, Verification, Accreditation!! Correctness and completeness

Requirements Validation. Content. What the standards say (*) ?? Validation, Verification, Accreditation!! Correctness and completeness Requirements Validation Requirements Management Requirements Validation?? Validation, Verification, Accreditation!! Check if evrything is OK With respect to what? Mesurement associated with requirements

More information

Most General computer?

Most General computer? Turing Machines Most General computer? DFAs are simple model of computation. Accept only the regular languages. Is there a kind of computer that can accept any language, or compute any function? Recall

More information

Convex Hull-Based Metric Refinements for Topological Spatial Relations

Convex Hull-Based Metric Refinements for Topological Spatial Relations ABSTRACT Convex Hull-Based Metric Refinements for Topological Spatial Relations Fuyu (Frank) Xu School of Computing and Information Science University of Maine Orono, ME 04469-5711, USA fuyu.xu@maine.edu

More information

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata.

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata. Finite Automata Automata (singular: automation) are a particularly simple, but useful, model of computation. They were initially proposed as a simple model for the behavior of neurons. The concept of a

More information

Towards a Structured Analysis of Approximate Problem Solving: ACaseStudy in Classification

Towards a Structured Analysis of Approximate Problem Solving: ACaseStudy in Classification Towards a Structured Analysis of Approximate Problem Solving: ACaseStudy in Classification Perry Groot and Annette ten Teije and Frank van Harmelen Division of Mathematics and Computer Science, Faculty

More information

Conformance Checking of Interacting Processes With Overlapping Instances

Conformance Checking of Interacting Processes With Overlapping Instances Conformance Checking of Interacting Processes With Overlapping Instances Dirk Fahland, Massimiliano de Leoni, Boudewijn F. van Dongen, and Wil M.P. van der Aalst Eindhoven University of Technology, The

More information

Combining Memory and Landmarks with Predictive State Representations

Combining Memory and Landmarks with Predictive State Representations Combining Memory and Landmarks with Predictive State Representations Michael R. James and Britton Wolfe and Satinder Singh Computer Science and Engineering University of Michigan {mrjames, bdwolfe, baveja}@umich.edu

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Towards Lightweight Integration of SMT Solvers

Towards Lightweight Integration of SMT Solvers Towards Lightweight Integration of SMT Solvers Andrei Lapets Boston University Boston, USA lapets@bu.edu Saber Mirzaei Boston University Boston, USA smirzaei@bu.edu 1 Introduction A large variety of SMT

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

de Blanc, Peter Ontological Crises in Artificial Agents Value Systems. The Singularity Institute, San Francisco, CA, May 19.

de Blanc, Peter Ontological Crises in Artificial Agents Value Systems. The Singularity Institute, San Francisco, CA, May 19. MIRI MACHINE INTELLIGENCE RESEARCH INSTITUTE Ontological Crises in Artificial Agents Value Systems Peter de Blanc Machine Intelligence Research Institute Abstract Decision-theoretic agents predict and

More information

Mining Positive and Negative Fuzzy Association Rules

Mining Positive and Negative Fuzzy Association Rules Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing

More information

Using Genetic Algorithms to Mine Process Models: Representation, Operators and Results

Using Genetic Algorithms to Mine Process Models: Representation, Operators and Results Using Genetic Algorithms to Mine Process Models: Representation, Operators and Results A.K. Alves de Medeiros, A.J.M.M. Weijters and W.M.P. van der Aalst Department of Technology Management, Eindhoven

More information

A Posteriori Corrections to Classification Methods.

A Posteriori Corrections to Classification Methods. A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk

More information

Finite Automata - Deterministic Finite Automata. Deterministic Finite Automaton (DFA) (or Finite State Machine)

Finite Automata - Deterministic Finite Automata. Deterministic Finite Automaton (DFA) (or Finite State Machine) Finite Automata - Deterministic Finite Automata Deterministic Finite Automaton (DFA) (or Finite State Machine) M = (K, Σ, δ, s, A), where K is a finite set of states Σ is an input alphabet s K is a distinguished

More information

Sequential programs. Uri Abraham. March 9, 2014

Sequential programs. Uri Abraham. March 9, 2014 Sequential programs Uri Abraham March 9, 2014 Abstract In this lecture we deal with executions by a single processor, and explain some basic notions which are important for concurrent systems as well.

More information

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation

Logical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation Logical Time Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation 2013 ACM Turing Award:

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Sound Recognition in Mixtures

Sound Recognition in Mixtures Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems

More information

Reading 11 : Relations and Functions

Reading 11 : Relations and Functions CS/Math 240: Introduction to Discrete Mathematics Fall 2015 Reading 11 : Relations and Functions Instructor: Beck Hasti and Gautam Prakriya In reading 3, we described a correspondence between predicates

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

SPATIAL INFORMATION GRID AND ITS APPLICATION IN GEOLOGICAL SURVEY

SPATIAL INFORMATION GRID AND ITS APPLICATION IN GEOLOGICAL SURVEY SPATIAL INFORMATION GRID AND ITS APPLICATION IN GEOLOGICAL SURVEY K. T. He a, b, Y. Tang a, W. X. Yu a a School of Electronic Science and Engineering, National University of Defense Technology, Changsha,

More information

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH M. De Cock C. Cornelis E. E. Kerre Dept. of Applied Mathematics and Computer Science Ghent University, Krijgslaan 281 (S9), B-9000 Gent, Belgium phone: +32

More information

Efficient Approximate Reasoning with Positive and Negative Information

Efficient Approximate Reasoning with Positive and Negative Information Efficient Approximate Reasoning with Positive and Negative Information Chris Cornelis, Martine De Cock, and Etienne Kerre Fuzziness and Uncertainty Modelling Research Unit, Department of Applied Mathematics

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

for XPS surface analysis

for XPS surface analysis Thermo Scientific Avantage XPS Software Powerful instrument operation and data processing for XPS surface analysis Avantage Software Atomic Concentration (%) 100 The premier software for surface analysis

More information

Data Mining and Machine Learning

Data Mining and Machine Learning Data Mining and Machine Learning Concept Learning and Version Spaces Introduction Concept Learning Generality Relations Refinement Operators Structured Hypothesis Spaces Simple algorithms Find-S Find-G

More information

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Leopoldo Bertossi 1, Solmaz Kolahi 2, and Laks V. S. Lakshmanan 2 1 Carleton University, Ottawa, Canada. bertossi@scs.carleton.ca

More information

Cell-based Model For GIS Generalization

Cell-based Model For GIS Generalization Cell-based Model For GIS Generalization Bo Li, Graeme G. Wilkinson & Souheil Khaddaj School of Computing & Information Systems Kingston University Penrhyn Road, Kingston upon Thames Surrey, KT1 2EE UK

More information

From Constructibility and Absoluteness to Computability and Domain Independence

From Constructibility and Absoluteness to Computability and Domain Independence From Constructibility and Absoluteness to Computability and Domain Independence Arnon Avron School of Computer Science Tel Aviv University, Tel Aviv 69978, Israel aa@math.tau.ac.il Abstract. Gödel s main

More information

Lecture notes on Turing machines

Lecture notes on Turing machines Lecture notes on Turing machines Ivano Ciardelli 1 Introduction Turing machines, introduced by Alan Turing in 1936, are one of the earliest and perhaps the best known model of computation. The importance

More information

A Hierarchical Markov Model to Understand the Behaviour of Agents in Business Processes

A Hierarchical Markov Model to Understand the Behaviour of Agents in Business Processes A Hierarchical Markov Model to Understand the Behaviour of Agents in Business Processes Diogo R. Ferreira 1, Fernando Szimanski 2, Célia Ghedini Ralha 2 1 IST Technical University of Lisbon, Portugal diogo.ferreira@ist.utl.pt

More information

A Simple Model for Sequences of Relational State Descriptions

A Simple Model for Sequences of Relational State Descriptions A Simple Model for Sequences of Relational State Descriptions Ingo Thon, Niels Landwehr, and Luc De Raedt Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001 Heverlee,

More information

On Tuning OWA Operators in a Flexible Querying Interface

On Tuning OWA Operators in a Flexible Querying Interface On Tuning OWA Operators in a Flexible Querying Interface Sławomir Zadrożny 1 and Janusz Kacprzyk 2 1 Warsaw School of Information Technology, ul. Newelska 6, 01-447 Warsaw, Poland 2 Systems Research Institute

More information