A Framework for Semi automated Process Instance Discovery from Decorative Attributes
|
|
- Arline Kelly
- 5 years ago
- Views:
Transcription
1 A Framework for Semi automated Process Instance Discovery from Decorative Attributes Andrea Burattin Student Member, IEEE Department of Pure and Applied Mathematics University of Padua, Italy Roberto Vigo Research and Development Department SIAV S.p.A. Rubano, Italy Abstract Process mining is a relatively new field of research: is final aim is to bridge the gap between data mining and business process modelling. In particular, the assumption underpinning this discipline is the availability of data coming from business process executions. In business process theory, once the process has been defined, it is possible to have a number of instances of the process running at the same time. Usually, the identification of different instances is referred to a specific case id field in the log exploited by process mining techniques. The software systems that support the execution of a business process, however, often do not record explicitly such information. This paper presents an approach that faces the absence of the case id information: we have a set of extra fields, decorating each single activity log, that are known to carry the information on the process instance. A framework is addressed, based on simple relation algebra notions, to extract the most promising case ids from the extra fields. The work is a generalization of a real business case. I. INTRODUCTION Process mining is a relatively young discipline emerged from the join of data mining with BPM [1]. The final goal of this discipline is the reconstruction of a process model, that points out a certain perspective of a given business process. Together with discovery, process mining provides also techniques for process conformance analysis (given a model, it tries to map the actual executions into the original model). It has a lot of real-world applications, such as the one described in [2]. The idea underpinning process mining techniques is that most business processes, that are executed with the support of an information system, leave traces of their activity executions and this information is stored in the so-called log files. The aim of process mining is to discover, starting from those logs, as much information as possible. Examples of reconstructed information are the control flow of the activities [3], or the network of social interactions between the originators involved in the process [4]. In order to perform such reconstructions, it is necessary that the log contains a minimum set of information. In particular, for all the recorded activities, there must be: the name of the activity; the name of the process the activity belongs to; the process case identifier (i.e. a field with the same value for all the executions of the same process instance); the time when the activity is performed; the activity originator (if available). Typically, these logs are collected in MXML files [5] or, according to a new recent standard, in XES files [6], so that they can be analyzed using the ProM tool [7]. When process mining techniques are introduced in new environments, the data can be sufficient for the mining (or can even be more than the necessary, as in [8]) or can lack some fundamental information, as considered in this paper. In the context of this work, two similar concepts are used in different ways: process instance and case id. The first term indicates the logical flow of the activities for one particular instance of the process. The case id is the way the process instance is implemented: typically, the case id is a set of one or more fields of the dataset, whose values identify a single execution of the process. It is important to note that there can be many possible case ids but, of course, not all of them are going to recognize the actual process. The final aim of this paper is to give a general framework for the selection of the most interesting case ids and then let the final decision to an expert user. The source of this work is a real business need encountered by Siav S.p.A. on logs coming from its document management solution. Section II introduces the main issues related to the selection of the case id and the contexts where the problem arises. In Section III we recall other works addressing the question, and the differences from our paper are highlighted. Section IV presents the conditions that enable the use of our technique, which is described in Section V. Sections VI reports some notes about the implementation of the algorithms, the settings and results of our experiments. Finally, Section VII concludes and sketches a line for future work. II. THE PROBLEM OF SELECTING THE CASE ID As already introduced in the previous section, it is worthwhile observing that process mining techniques assume that all logs, used as input for the mining, come from executions of business process activities. In a typical environment, there is a formal process definition (it is not required to the model to be explicit) that is implemented and executed. All these executions are sequentially recorded into the log files; those act as input for the mining algorithms [3] and these will generate the mined models (possibly different from the original process models).
2 One of the fundamental principles that underpins the idea of process modeling is that a defined process can generate any number of concurrent instances, running at the same time. Consider, as an example, the process model defined (in terms of Petri Net) in Figure 1: it is composed of five activities. The process always starts executing A; then B; C and D can run in any order and, finally, E. We can have more than one instance running at the same time so, for example, at the time t we can observe any number of activities that are executed. Figure 2 shows three instances of the same process: the first is almost complete; the second is just started and the last one has just completed the first two activities. In order to identify different instances, it is easy to figure out the need of an element that connects all the observations belonging to the same instance. This element is called case identifier (or case id ). Fig. 2 represents it with different colors and with the three labels c i, c i+1 and c i+2. A. Process Mining in New Contexts There is a clear distinction between a process model and its implementation, and there exists a lot of software tools for supporting the application of a business process. Nonetheless, a lot of software systems concur to support the process implementation, but typically the information they provide is not explicitly linked to the process model because of the lack of a case id. This is mainly due to business reasons and software interoperability issues. Consider, for example, a document management system (DMS): a tool which can store and manage digital documents (or images of paper documents). Those systems are widely used in large companies, public administrations, municipalities, etc. Of course, the documents managed with a DMS can be referred to processes and protocols (consider, for the example, the documents involved in supporting the process of selling a product). In this case, the set of documents managed by the DMS users leaves a trace of what happens at the business process level, since they are a support for the process activities. The idea presented in this paper is to exploit the information produced by such a support system, not limiting to DMSs, in order to mine the processes in which the system itself is involved. The nodal point is that such systems typically do not log explicitly the case id. Therefore, it is necessary to infer this information, that enables to relate the system entities (e.g. documents in a DMS) to process instances. III. RELATED WORK The problem of relating a set of activities to the same process instance is already known in literature. In [9], Ferreira A B Figure 1. Example of a process model. In this case, activities C and D can be executed in parallel, i.e. in no specific order. C D E Time c i A B C D c i+1 A c i+2 A B... Time Case Activity t 3 c i A t 2 c i B t 1 c i C t 1 c i+2 A t c i+1 A t c i D t c i+2 B Time t Figure 2. Two representations (one graphical and one tabular) of three instances of the same process (the one of Figure 1). It can be noticed that, at time t, the three executions of the same process reached different points. and Gillblad presented an approach for the identification of the case id based on the Expectation-Maximization (EM) technique. The main characteristics of this approach are its generality, that allows to execute it in all possible environments, and its computational complexity; furthermore, the EM-based algorithms converge to a local maximum of the likelihood function. Other approaches, such as the one presented by Ingvaldsen et al., in [1] and in [11], use the input and output produced by the activities registered in the SAP ERP. In particular, they construct process chains (composed by activities belonging to the same instance) by identifying the events that produce and consume the same set of resources. The assumption underpinning this approach (i.e. the presence of resources produced and consumed by two joint activities) seems too strong for a broad and general application. Other works that can be reduced to the same issue are presented in [12], [13], but all of them solve only partially the problem and do not provide a sufficiently general approach that can be applied to other environments. The most important difference between this work and others in literature is that, in our case, the information on the process instance is hidden inside the log (it is recorded in some fields we ignore), and therefore it has to be extracted properly. Such a difference is very important for two main reasons: (i) the settings we required are sufficiently general to be observed in a number of real environments and (ii) our technique is devised for this particular problem, hence it is more efficient than others. IV. WORKING FRAMEWORK The process mining framework we address is based on a set L of log entries originated by auditing activities on a given system. Each log entry l L is a tuple of the form (activity, timestamp, originator, info 1,..., info m )
3 being activity the name of the registered activity; timestamp the temporal instant in which the activity is registered; originator the agent that executed the registered activity; info 1,..., info m possibly empty additional attributes. The semantics of these additional attributes is a function of the activity of the respective log entry, that is, given an attribute info k and activities a 1, a 2, info k entries may represent different information for a 1 and a 2 ; moreover, the semantics is not explicit. We call these data decorative or additional since they are not exploited by standard process mining algorithms. Observe that two log entries, referring to the same activity, are not required to share the values of their additional attributes. Table I shows an example of such a log in a document management environment; please note the semantic dependency of attribute info 1 on activities: in case of Invoice it may represent the sender, in case of Cash order it may represent the account owner. The difference between such log entries and an event in the standard process mining approach is the lack of process instance information: bridging this gap, i.e. identifying the case id, is the target of our work. More in general, it can be observed that the source systems we consider do not implement explicitly some workflow concepts, since L might not come from the sampling of a formalized business process at all. Actually, L lacks also a process information: the issue is addressed in Section V-D. In the following we assume a relational algebra point of view over the framework: a log L is a relation (set of tuples) whose attributes are (activity, timestamp, originator, info 1,..., info m ). As usual, we define the projection operator π a1,...,a n on a relation R as the restriction of R to attributes a 1,..., a n (observe that duplicates are removed by projection otherwise the output would not be a set). For the sake of brevity, given a set of attributes A = {a 1,..., a n }, we denote a projection on all the attributes in A as π A (R). Analogously, given an attribute a, a value constant v and a binary operation θ, we define the selection operator σ aθv (R) as the selection of tuples of R for which θ holds between a and v; for example, given a = activity, v = Invoice, and θ being the identity function, σ aθv (L) is the set of elements of L having Invoice as value for attribute activity. For a complete survey about relational algebra concepts refer to [14]. It is worthwhile to notice that relational algebra does not deal with tuples ordering, a crucial issue in process mining. This is not a problem, however, because (i) the log can be sorted whenever required and (ii) this work concerns the generation of a log suitable for applying process mining techniques (and not a process mining algorithm itself). From now on, identifiers a 1,..., a n will range over activities. Moreover, given a log L, we define the set A(L) = π activity (L) (distinct activities occurring in L). Finally, we denote with I the set of attributes {info 1,..., info m }. As stated above, our efforts are concentrated on extracting flow of information from L, that is, guessing the case id for each entry l L, according to the following restrictions on the framework. Fixed a log L, we assume that 1) given a log entry l L, if a case id exists in l, then it is a combination of values in the set of attributes P I I (i.e. activity, timestamp, originator do not participate in the process instance definition); 2) given two log entries l 1, l 2 L such that π activity (l 1 ) = π activity (l 2 ), if P I contains the case id for l 1, then it also contains the case id for l 2 (i.e., process instance attributes set is fixed per activity); this is implied by the assumption that the semantics of additional fields is a function of the activity, as stated above. V. IDENTIFICATION OF PROCESS INSTANCES From the basis we defined, it follows that the process instance has to be guessed as a subset of I; however, since the semantics is not explicitly given, it cannot be exploited to establish correlation between activities, hence the process instance selection must be carried out looking at the entries π infoi (L), for each i [1, I ]. Nonetheless, since the semantics of I is a function of the activity, the selection should be performed for each activity in A(L), for each attribute in I, resulting in a computationally expensive procedure. In order to reduce the search space, in the following Section some intuitive heuristics are depicted, that we applied successfully in our experimental environment (cf. Section VI). Then, in Section V-B, we show how to carry out the selection process over the resulting search space. A. Exploiting a-priori Knowledge Experts of the source domain typically hold some knowledge about the data, that can be exploited to discard the less promising attributes. Let a be an activity, and C(a) I the set of attributes candidate to participate in the process instance definition, with respect to the given activity. Clearly, if no a- priori knowledge can be exploited to discard some attributes, then C(a) = I. The experiments we carried out helped us define some simple heuristics for reducing the cardinality of C(a), basing on assumptions on the data type (e.g. discarding timestamps); assumptions on the case id expected format, like average length upper and lower bounds, length variance, presence or absence of given symbols, etc. It is worthwhile to notice that this procedure may lead to discard all the attributes info i s for some activities in A(L). In the following we denote with A(C) the set of all the activities that overcome this step, that is A(C) = a A(L) {a C(a) }. A(C) contains all the activities which have some candidate attribute, that is, all the activities that can participate in the process we are looking for.
4 Table I AN EXAMPLE OF LOG EXTRACTED FROM THE DOCUMENT MANAGEMENT SYSTEM: THERE IS ALL THE BASIC INFORMATION (SUCH AS THE ACTIVITY, THE TIMESTAMPS AND THE ORIGINATOR), PLUS A SET OF INFORMATION ON THE DOCUMENTS (info 1... info m ). Activity Timestamp Originator info 1 info 2... info m a 1 = Invoice :35:47 Alice A a 2 = Waybill :36:18 Alice A B a 3 = Cash order :41:1 Bob A a 4 = Carrier receipt :12:28 Charlie A B a :45:12 Eve B a :21:2 Alice B A a :54:23 Bob C a :15:37 Charlie B A a :55:14 Bob D a :11:22 Bob D C a :1:28 Bob C a :45:9 Charlie D D B. Selection of the Identifier After the search space has been reduced and the set C(a) has been computed for each activity a A(L), we must select those elements of C(a) that participate in the process instance. The only information we can exploit in order to automatically perform this selection is the amount of data shared by different attributes. Aiming at modeling real settings, we fix a sharing threshold T, and we retain as candidate those subsets of C(a) that share at least T entries with some attribute sets of other activities. This threshold must be defined with respect to the number of distinct entries of the involved activities, for instance as a fraction of the number of entries of the less frequent one. Let (a 1, a 2 ) be a pair of activities, such that a 1 a 2 and let P I a1 and P I a2 the corresponding process instances field. We define the function S that calculates the shared values among them: S(a 1, a 2, P I a1, P I a2 ) = π P Ia1 (σ activity=a1 (L)) π P Ia2 (σ activity=a2 (L)) Observe that, in order to perform the intersection, it must hold P I a1 = P I a2. Using such function, we define the process instance candidates for (a 1, a 2 ) as: ϕ(a 1, a 2 ) = {(P I a1 P(C(a 1 )), P I a2 P(C(a 2 ))) S(a 1, a 2, P I a1, P I a2 ) > T } where P denotes the power set. Elements of ϕ(a 1, a 2 ) are pairs, whose components are those attribute sets, respectively of a 1, a 2, that share a number of values greater than T (i.e. the cardinality of the intersection of P I a1 and P I a2 is greater than T ). In the following, we denote with ϕ a the set of all the candidate attribute sets for activity a, i.e. ϕ a = {P I a 1 A(C), P I a1 P(C(a 1 )).(P I, P I a1 ) ϕ(a, a 1 )}. This formula figures out some candidate process instances that may relate two activities: it is worthwhile noticing, however, that our target is the correlation of a set of activities whose cardinality is in general greater than 2. Actually, we want to build a sequence S = a S1,..., a Sn of distinct activities. Nonetheless, given activity a Si, there may be a number of choices for a Si+1, and then a number of process instances in ϕ(a Si, a Si+1 ). Hence, a number of sequences may be built. We call chain a finite sequence C of n components of the form [a, X], being a an activity and X ϕ a. Let us denote it as follows: C = [a 1, P I a1 ], [a 2, P I a2 ],..., [a n, P I an ] such that (P I ai, P I ai+1 ) ϕ(a i, a i+1 ), with i [1, n 1]. We denote with Ci a I the i-th activity of the chain C, and with CP i the i-th P I of the chain C. Observe that a given activity must appear only once in a chain, since a process instance is defined by a single attribute set. Given a chain C ending with element [a j, P I aj ], we say that C is extensible if there exists an activity a k A(C) such that (P I aj, X) ϕ(a j, a k ), for some set X P(C(a k )). Otherwise, C is said to be complete. Moreover, we say that an activity a occurs in a chain C, denoted a C, if there exists a chain component [a, X] in C for some attribute set X. Since an activity can occur in more than one chain with different process instances, in some case we write P I ai,c to denote the process instance of activity a i in chain C. Finally, let A(C) denote the set of activities occurring in a chain C. The empty chain is denoted with. Given a chain C we define the average value sharing S(C) among selected attributes of C as: 1 i<n S(C) = S ( Ci a, Ca i+1, ) CP i I, Ci+1 P I n 1 where n denotes the chain length. All the possible complete chains on L are built according to Algorithms 1 and 2. Algorithm 1 Build Chains 1: for all a A(C) do 2: for all P I ϕ a do 3: Extend Chain([a, P I]) 4: end for 5: end for Algorithm 1 calls Algorithm 2 for all the extensible chains of length 1. Observe that the pseudo code of Algorithms 1 and 2 builds also some chains which are permutations of one
5 Algorithm 2 Extend Chain Require: a chain C = [a 1, P I 1 ],..., [a i 1, P I i 1 ] 1: for all a i A(C) a i / C do 2: for all P I i ϕ ai (P I i 1, P I i ) ϕ(a i 1, a i ) do 3: C = C, [a i, P I i ] 4: return Extend Chain(C) 5: end for 6: end for another (cf. Section V-D). Example 1. The following example highlights the bias introduced by the order of activities and attributes selection in building chains. Limiting to the first two attributes of the log given in Table I we can derive the following settings: A(C) = {a 1, a 2, a 3, a 4 } C(a 1 ) = {info 1 } C(a 2 ) = {info 1, info 2 } C(a 3 ) = {info 1 } C(a 4 ) = {info 1, info 2 } Observe that field info 2 is discarded as a candidate attribute for activities a 1, a 3, since its values have the form of timestamps (cf. Section V-A). We compute now the values of function ϕ for the activities in A(C). Let T = 1 and consider ϕ(a 1, a 2 ): P I a1 = {info 1 }, since it is the only available candidate attribute; P I a2 may be chosen among the sets {info 1 }, {info 2 }, {info 1, info 2 }. We need to find those attribute sets of a 1, a 2 respectively, that share at least two values in the log (since T = 1). It is simple to see that the values of info 1 exactly overlap for both the activities, thus ({info 1 }, {info 1 }) is such a pair; another valid pair is ({info 1 }, {info 2 }), since S(a 1, a 2, {info 1 }, {info 2 }) = π info1 (σ activity=a1 (L)) π info2 (σ activity=a2 (L)) = 2 > 1 in fact and π info1 (σ activity=a1 (L)) = {A, B, D} π info2 (σ activity=a2 (L)) = {B, A, C} Follow the values of function ϕ for all the activities in A(C) and the respective { values of S } ({info1 }, {info ϕ(a 1, a 2 ) = 1 }), ({info 1 }, {info 2 }) with S = 3 and S = 2, respectively; ϕ(a 1, a 3 ) = { } ({info1 }, {info ϕ(a 1, a 4 ) = 1 }), ({info 1 }, {info 2 }) with S = 3 and S = 3, respectively; ϕ(a 2, a 3 ) = {({info 2 }, {info 1 })} with S = 2; ϕ(a 2, a 4 ) = ({info 1 }, {info 1 }), ({info 1 }, {info 2 }), ({info 2 }, {info 1 }), ({info 2 }, {info 2 }), ({info 1, info 2 }, {info 1, info 2 }) with S = 3, S = 3, S = 2, S = 2 and S = 2, respectively; ϕ(a 3, a 4 ) =. Assume to start the chain building process from activity a 1 and, for the sake of simplicity, assume to restrict the log to activities a 1, a 2, a 3. The next activity a i has to be chosen among those such that ϕ(a 1, a i ), thus only a 2 is selectable. ϕ(a 1, a 2 ) contains two pairs: suppose to select ({info 1 }, {info 1 }). The first selection step is finished and the chain is C = [a 1, {info 1 }], [a 2, {info 1 }] Then, in order to extend the chain, we must look for an activity a i a 1, a 2 such that ϕ(a 2, a i ) and there exists a pair in ϕ(a 2, a i ) that contains {info 1 } as its first component. For such conditions an activity does not exist, the chain is complete. It is simple to see, however, that both choosing the process instances pair ({info 1 }, {info 2 }) for (a 1, a 2 ), or starting from a 3, would produce complete chains of length 3. Since the candidate attributes are computed on a statistical basis, the chain length is not the only significant parameter to take into account, but the example underlines the importance of building all the complete chains. 1) Match Heuristics: In computing the amount of data shared by two activities via function ϕ, heuristics approaches may help in modeling the complexity of a real domain. Actually, the comparison performed between values does not need to be an identity match, but rather a fuzzy match can be implemented. Guided by this basic heuristics, we can substitute the intersection operator in ϕ with an approximation of its, whose definition may be domain specific or not. Simple examples we tested in our experimental environment are: equality match up to X leading characters, equality match up to Y trailing characters, and their combinations. In general it is possible to use a measure for string distance. C. Results Organization and Filtering In the previous Sections we shown how to compute a number of chains (i.e. a number of logs); in general, a domain expert is able of telling apart good chains from less reasonable ones, but this could be a demanding task. This section aims presenting the problem of comparing different chains. In order to address this issue, it is worthwhile to analyze a methodology that helps restricting the number of possible chains. Generally, we can reject a chain in favor of another one if and only if the latter contains all the activities of the former, and it is either simpler or it supports a higher confidence. Example of parameters to take into account are:
6 the number of attributes in the process instance of a chain component (recall that each component has the same number of process instance attributes): a chain that concerns less attributes may be considered simpler, thus preferable since more readable for a human agent; the cardinality of the value sharing between chain components (S( )): a chain whose share factor is higher gives higher confidence; this parameter could be tuned by a threshold. Let H be the set of complete chains computed by Algorithms 1 and 2, without permutations. Given two chains C 1 = [a 1, P I a1 ],..., [a n, P I an ] C 2 = [b 1, P I b1 ],..., [b m, P I bm ] in the set H, we define an ordering operator as in Equation 1. defines a reflexive, antisymmetric, and transitive relation over chains, hence (H, ) is a partially ordered set [15]. For the sake of simplicity, in the above formulation we do not use any threshold to tune the value sharing comparison. The ordering we defined strives to equip the framework with a notion of best chains, i.e. those chains which it is worthwhile suggesting to a domain expert. The maximal elements of the poset are exactly those chains, as underlined in Example 2. Example 2. Recall the settings of Example 1. In this case the set H contains the complete chains listed below. C 1 = [a 1, {info 1 }], [a 2, {info 1 }], [a 4, {info 1 }], S(C 1 ) = 3 C 2 = [a 1, {info 1 }], [a 2, {info 1 }], [a 4, {info 2 }], S(C 2 ) = 3 C 3 = [a 1, {info 1 }], [a 2, {info 2 }], [a 3, {info 1 }], S(C 3 ) = 2 C 4 = [a 1, {info 1 }], [a 2, {info 2 }], [a 4, {info 1 }], S(C 4 ) = 2 C 5 = [a 1, {info 1 }], [a 2, {info 2 }], [a 4, {info 2 }], S(C 5 ) = 2 C 6 = [a 1, {info 1 }], [a 4, {info 2 }], [a 2, {info 2 }], [a 3, {info 1 }] C 7 = [a 2, {info 1, info 2 }], [a 4, {info 1, info 2 }] Applying the ordering relation we can compare different chains, as follows: C 1 = C 2 : in fact both C 1 C 2 nor C 1 C 2 holds; C 1, C 2 are not comparable with C 3, since A(C 1 ) = A(C 2 ) neither contain nor are contained in A(C 3 ); C 1 = C 2 C 4 since S(C 1 ) = S(C 2 ) > S(C 4 ); C 4 = C 5 ; i [1, 5] {7} such that C i C 6 ; C 7 is not comparable with C 3. The Hasse diagram representing the poset (H { }, ) is given in Figure 3. It follows that only chain C 6 should undergo the experts inspection. D. Deriving a Log to Mine For each chain C with positive length, we can build a log L whose tuples have the form: (activity, timestamp, originator, caseid, processid) Please observe that the process instance we selected is a set of attributes, whereas a single one is expected by standard process C 3 C 6 C 1 = C 2 C 4 = C 5 Figure 3. Hasse diagram representing the poset (H { }, ) of Example 2. mining techniques. Hence, a composition function k from a set of values to a single one is needed (e.g. string concatenation). Eventually, the log L is obtained, starting from L with the execution of Algorithm 3. Algorithm 3 Conversion of L to L 1: L 2: chainno 3: for all C H do 4: L C σ activity A(C) (L) 5: for all l L C do 6: activity π activity (l) 7: timestamp π timestamp (l) 8: 9: originator π originator (l) caseid k ( π P Iactivity,C (l) ) C 7 1: processid chainno 11: L L {(activity, timestamp, originator, caseid, processid)} 12: end for 13: chainno chainno : end for Observe that the chain construction is sequential (i.e. it iteratively adds a component) and hence there is an implicit ordering among its elements. In order to build the log L, once all the chains are complete (no more extensible), it is possible to ignore the chains that are permutations of a given one. Thus, some chains computed by Algorithm 1 can be discarded. It is worthwhile observing, however, that maximal elements in the poset represent different processes. A conservative approach compels us considering each maximal chain as defining a distinct process. The following example illustrates the reason why we chose this approach: let C 1 =..., [a i 1, P I ai 1 ], [a i, P I ai ], [a i+1, P I ai+1 ],... C 2 =..., [b j 1, P I bj 1 ], [b j, P I bj ], [b j+1, P I aj+1 ],... be two maximal chains where P I ai 1 P I bj 1 ; P I ai P I bj ; P I ai+1 P I bj+1 and a i = b j. In other words, C 1
7 A P I 1 B P I 1 A B S(A) S(B) A(A) A(B) if A(A) = A(B) S(A) = S(B) if A(A) = A(B) S(A) S(B) otherwise (1) and C 2 contain the same activity a i but with different process instances. Considering C 1 and C 2 as belonging to the same process is not desirable, since it can lead to inconsistent control flow reconstruction. Hence, each maximal chain defines a process and the domain expert is in charge of recognizing if different chains belong to the same real process. During the conversion of L to the process log L, we assign as process id a chain counter. VI. EXPERIMENTAL RESULTS As explained in the introduction, this paper is connected to a real business need, and it presents a generalization of a research activity carried out by Siav S.p.A. The existing implementation is limited to process instances constituted by a single attribute (e.g. P I i = 1), due both to a-priori knowledge about the domain and computational requirements. In particular, all the preprocessing steps that reduce the search space are implemented as Oracle store procedures, written in PL/SQL. Then the chain building algorithms are implemented in C#. Moreover, for improving performances, we do not compute the heuristics on the whole log, but on a fraction of random entries of its. We experimented our implementation on logs coming from a document management system; the source log is reduced to the form described in Section IV after undergoing some preprocessing steps. We applied the algorithms to real logs obtaining concrete results, validated by domain experts. In Table II the main information is summarized: please notice that the expert chains are always within the set of maximal chains (computed by the algorithm) because selected among the fists. Figure 4 shows how chains evolve when the log cardinality scales up: in particular, notice that the number of chains tends to increase, while the poset structure tears down the number of chains we presented to the domain experts. Figure 5 plots the processing time: it is a function of the log cardinality, of the number of activities in the log (i.e. the number of possible chain components), and of the number of decorative attributes (i.e. the number of possible ways of chaining two components). The knowledge of the application domain gave us the opportunity to implement some heuristics, as explained in Sections V-A and V-B1. The following criteria were selected in order to reduce the search space: a candidate attribute must have a string type (e.g. we discard timestamps and numeric types, that in our case mostly represent prices); the values of a candidate attribute must fulfill these requirements: maximum average length: 2 characters, H Max chains Experts' chains Figure 4. This figure plots the total number of chains the procedure identifies, the number of maximal chains and the number of chains the expert will select, given the size of the preprocessed log Time (seconds) Figure 5. This figure represents the time (expressed in seconds) required for the extraction of the chains. minimum average length: 3 characters, maximum variation w.r.t. the average length: 1. Finally, we relax the intersection operator in ϕ requiring values identity up to the first leading character and up to 2 trailing characters. The experiments were carried out on an Intel Core2 Quad at 2,4 GHz, equipped with 4GB of RAM. The DBMS where the logs were stored was local to the machine, thus no network overhead has to be considered. VII. CONCLUSION AND FUTURE WORK This work presents a new approach for the identification of process instances on logs generated from systems that are not process-aware. Process instance information is guessed exploiting additional information with respect to a standard process mining framework, but this information is typically available when dealing with software systems. Our work is a generalization of a real business case related to a document
8 Table II RESULTS SUMMARY. HORIZONTAL LINES SEPARATE DIFFERENT LOG SOURCES. L A(L ) I Time H Maximal chains Experts chains s s m 4s s s s m management system, where discovering the process instance means correlating different document sets. The procedure we described to solve the problem is entirely based on the information that decorates documents, and relies on a relational algebra approach. Moreover, we deem that our generalization can be fairly adoptable in a number of domains, with a reasonable effort. Eventually, observe that we aim at applying some process mining techniques on a log computed on statistical basis, thus error-robust mining algorithms must be used. There is still some important work that could improve the presented results. For example, we think it would be interesting to consider not only the value of the case id candidates, but to go deeper, to their semantic meaning (if any), which could act as a-priori knowledge. Moreover, a flexible framework for expressing and feeding the system with a-priori knowledge is desirable, in order to earn a higher level of generalization. Then, other refinements are domain-specific: dealing with documents, for instance, we could exploit their content in order to confirm or reject the findings of our algorithms, when the result confidence is low. Finally, before carrying on these new ideas, we are planning a wider test campaign on logs coming from other systems. From the implementation point of view, it is desirable to extend the current version of the software in order to consider more than one attribute per process instance. ACKNOWLEDGMENT This work was entirely supported by SIAV S.p.A. We thank Nicola Costantini from SIAV and Prof. Alessandro Sperduti, for their important suggestions and supports. [4] W. M. P. van Der Aalst and A. H. M. Ter Hofstede, YAWL: yet another workflow language, Information Systems, vol. 3, no. 4, pp , Jun. 25. [Online]. Available: [5] W. M. P. van Der Aalst, B. F. van Dongen, C. W. Günther, R. S. Mans, A. K. A. de Medeiros, A. Rozinat, V. Rubin, M. Song, H. M. W. Verbeek, and A. J. M. M. Weijters, ProM 4.: Comprehensive Support for Real Process Analysis, in Petri Nets and Other Models of Concurrency, J. Kleijn and A. Yakovlev, Eds. Springer, 27, pp [6] C. W. Günther, XES - Standard Definition, 29. [Online]. Available: [7] B. F. van Dongen, A. K. A. de Medeiros, H. M. W. Verbeek, A. J. M. M. Weijters, and W. M. P. van der Aalst, The ProM framework: A new era in process mining tool support, Application and Theory of Petri Nets, vol. 3536, pp , 25. [8] A. Burattin and A. Sperduti, Heuristics Miner for Time Intervals, in European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, 21. [9] D. R. Ferreira and D. Gillblad, Discovering Process Models from Unlabelled Event Logs, in Business Process Management, 7th International Conference, BPM 29, ser. Lecture Notes in Computer Science, U. Dayal, H. A. Reijers, J. Eder, and J. Koehler, Eds., vol Ulm, Germany: Springer, 29, pp [1] J. Espen Ingvaldsen and J. Atle Gulla, Preprocessing Support for Large Scale Process Mining of SAP Transactions, in Business Process Management Workshops, A. H. M. Ter Hofstede, B. Benatallah, and H.- Y. Paik, Eds. Springer Berlin / Heidelberg, 28, pp [Online]. Available: [11], Semantic Business Process Mining of SAP Transactions, 1st ed. Business Science Reference, 21, ch. 17, pp [12] M. Walicki and D. R. Ferreira, Mining Sequences for Patterns with Non-Repeating Symbols, in IEEE Congress on Evolutionary Computation 21, Barcelona, Spain, 21, pp [Online]. Available: all.jsp?arnumber= [13] D. R. Ferreira, M. Zacarias, M. Malheiros, and P. Ferreira, Approaching Process Mining with Sequence Clustering : Experiments and Findings, in Business Process Management, G. Alonso, P. Dadam, and M. Rosemann, Eds., no. 1. Springer Berlin / Heidelberg, 27, pp [14] R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 5th ed. Boston: Addison-Wesley, 26. [15] B. Schröder, Ordered Sets: An Introduction. Boston: Birkhäuser Boston, 22. REFERENCES [1] W. M. P. van der Aalst, Process Discovery: Capturing the Invisible, IEEE Computational Intelligence Magazine, vol. 5, no. 1, pp , 21. [Online]. Available: [2] W. M. P. van Der Aalst, H. A. Reijers, A. J. M. M. Weijters, B. F. van Dongen, A. K. A. de Medeiros, M. Song, and H. M. W. Verbeek, Business process mining: An industrial application, Information Systems, vol. 32, no. 5, pp , Jul. 27. [Online]. Available: [3] W. M. P. van Der Aalst, A. J. M. M. Weijters, B. F. van Dongen, J. Herbst, L. Maruster, and G. Schimm, Workflow mining: a survey of issues and approaches, Data & Knowledge Engineering, vol. 47, no. 2, pp , 23.
An Experimental Evaluation of Passage-Based Process Discovery
An Experimental Evaluation of Passage-Based Process Discovery H.M.W. Verbeek and W.M.P. van der Aalst Technische Universiteit Eindhoven Department of Mathematics and Computer Science P.O. Box 513, 5600
More informationFlexible Heuristics Miner (FHM)
Flexible Heuristics Miner (FHM) A.J.M.M. Weijters Eindhoven University of Technology Email: a.j.m.m.weijters@tue.nl J.T.S. Ribeiro Eindhoven University of Technology Email: j.t.s.ribeiro@tue.nl Abstract
More informationAutomatic Root Cause Identification Using Most Probable Alignments
Automatic Root Cause Identification Using Most Probable Alignments Marie Koorneef, Andreas Solti 2, Henrik Leopold, Hajo A. Reijers,3 Department of Computer Sciences, Vrije Universiteit Amsterdam, The
More informationDiscovering Petri Nets
Discovering Petri Nets It s a kind of magic Prof.dr.ir. Wil van der Aalst Eindhoven University of Technology Department of Information and Technology P.O. Box 513, 5600 MB Eindhoven The Netherlands w.m.p.v.d.aalst@tm.tue.nl
More informationClassification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach
Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Krzysztof Pancerz, Wies law Paja, Mariusz Wrzesień, and Jan Warcho l 1 University of
More informationProcess Mining in Non-Stationary Environments
and Machine Learning. Bruges Belgium), 25-27 April 2012, i6doc.com publ., ISBN 978-2-87419-049-0. Process Mining in Non-Stationary Environments Phil Weber, Peter Tiňo and Behzad Bordbar School of Computer
More informationBusiness Process Technology Master Seminar
Business Process Technology Master Seminar BPT Group Summer Semester 2008 Agenda 2 Official Information Seminar Timeline Tasks Outline Topics Sergey Smirnov 17 April 2008 Official Information 3 Title:
More informationDiscovering Block-Structured Process Models From Event Logs - A Constructive Approach
Discovering Block-Structured Process Models From Event Logs - A Constructive Approach S.J.J. Leemans, D. Fahland, and W.M.P. van der Aalst Department of Mathematics and Computer Science, Eindhoven University
More informationDetecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.
Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on
More informationDrFurby Classifier. Process Discovery BPM Where innovation starts
Den Dolech 2, 5612 AZ Eindhoven P.O. Box 513, 5600 MB Eindhoven The Netherlands www.tue.nl Author Eric Verbeek and Felix Mannhardt Date June 14, 2016 Version 1.2 DrFurby Classifier Process Discovery Contest
More informationCausal Nets: A Modeling Language Tailored towards Process Discovery
Causal Nets: A Modeling Language Tailored towards Process Discovery Wil van der Aalst, Arya Adriansyah, and Boudewijn van Dongen Department of Mathematics and Computer Science, Technische Universiteit
More informationEvent Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics
Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019 Event Operators: Formalization, Algorithms, and Implementation Using Interval- Based Semantics Raman
More informationGenetic Process Mining
Genetic Process Mining W.M.P. van der Aalst, A.K. Alves de Medeiros, and A.J.M.M. Weijters Department of Technology Management, Eindhoven University of Technology P.O. Box 513, NL-5600 MB, Eindhoven, The
More informationPairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events
Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Massimo Franceschet Angelo Montanari Dipartimento di Matematica e Informatica, Università di Udine Via delle
More informationGenetic Process Mining
Genetic Process Mining W.M.P. van der Aalst, A.K. Alves de Medeiros, and A.J.M.M. Weijters Department of Technology Management, Eindhoven University of Technology P.O. Box 513, NL-5600 MB, Eindhoven, The
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationA new Approach to Drawing Conclusions from Data A Rough Set Perspective
Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy
More informationData-Driven Logical Reasoning
Data-Driven Logical Reasoning Claudia d Amato Volha Bryl, Luciano Serafini November 11, 2012 8 th International Workshop on Uncertainty Reasoning for the Semantic Web 11 th ISWC, Boston (MA), USA. Heterogeneous
More informationPairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events
Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Massimo Franceschet Angelo Montanari Dipartimento di Matematica e Informatica, Università di Udine Via delle
More informationInterpreting Low and High Order Rules: A Granular Computing Approach
Interpreting Low and High Order Rules: A Granular Computing Approach Yiyu Yao, Bing Zhou and Yaohua Chen Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail:
More informationEfficiently merging symbolic rules into integrated rules
Efficiently merging symbolic rules into integrated rules Jim Prentzas a, Ioannis Hatzilygeroudis b a Democritus University of Thrace, School of Education Sciences Department of Education Sciences in Pre-School
More informationMarkings in Perpetual Free-Choice Nets Are Fully Characterized by Their Enabled Transitions
Markings in Perpetual Free-Choice Nets Are Fully Characterized by Their Enabled Transitions Wil M.P. van der Aalst Process and Data Science (PADS), RWTH Aachen University, Germany. wvdaalst@pads.rwth-aachen.de
More informationChecking Behavioral Conformance of Artifacts
Checking Behavioral Conformance of Artifacts Dirk Fahland Massimiliano de Leoni Boudewijn F. van Dongen Wil M.P. van der Aalst, Eindhoven University of Technology, The Netherlands (d.fahland m.d.leoni
More informationProcess Discovery and Conformance Checking Using Passages
Fundamenta Informaticae XX (2012) 1 35 1 DOI 10.3233/FI-2012-0000 IOS Press Process Discovery and Conformance Checking Using Passages W.M.P. van der Aalst Department of Mathematics and Computer Science,
More informationSecurity as a Resource in Process-Aware Information Systems
Security as a Resource in Process-Aware Information Systems 14 October 2011 Joint Work with: Jason Crampton Information Security Group Royal Holloway Jim Huan-Pu Kuo Department of Computing Imperial College
More informationKleene Algebras and Algebraic Path Problems
Kleene Algebras and Algebraic Path Problems Davis Foote May 8, 015 1 Regular Languages 1.1 Deterministic Finite Automata A deterministic finite automaton (DFA) is a model of computation that can simulate
More informationSimilarity-based Classification with Dominance-based Decision Rules
Similarity-based Classification with Dominance-based Decision Rules Marcin Szeląg, Salvatore Greco 2,3, Roman Słowiński,4 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań,
More informationCan Vector Space Bases Model Context?
Can Vector Space Bases Model Context? Massimo Melucci University of Padua Department of Information Engineering Via Gradenigo, 6/a 35031 Padova Italy melo@dei.unipd.it Abstract Current Information Retrieval
More informationWhat You Must Remember When Processing Data Words
What You Must Remember When Processing Data Words Michael Benedikt, Clemens Ley, and Gabriele Puppis Oxford University Computing Laboratory, Park Rd, Oxford OX13QD UK Abstract. We provide a Myhill-Nerode-like
More informationSOFTWARE ARCHITECTURE DESIGN OF GIS WEB SERVICE AGGREGATION BASED ON SERVICE GROUP
SOFTWARE ARCHITECTURE DESIGN OF GIS WEB SERVICE AGGREGATION BASED ON SERVICE GROUP LIU Jian-chuan*, YANG Jun, TAN Ming-jian, GAN Quan Sichuan Geomatics Center, Chengdu 610041, China Keywords: GIS; Web;
More information1 [15 points] Frequent Itemsets Generation With Map-Reduce
Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.
More informationA 2-Approximation Algorithm for Scheduling Parallel and Time-Sensitive Applications to Maximize Total Accrued Utility Value
A -Approximation Algorithm for Scheduling Parallel and Time-Sensitive Applications to Maximize Total Accrued Utility Value Shuhui Li, Miao Song, Peng-Jun Wan, Shangping Ren Department of Engineering Mechanics,
More informationFuzzy Systems. Introduction
Fuzzy Systems Introduction Prof. Dr. Rudolf Kruse Christian Moewes {kruse,cmoewes}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge
More informationMining periodic patterns from nested event logs
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2014 Mining periodic patterns from nested event
More informationGroup Decision-Making with Incomplete Fuzzy Linguistic Preference Relations
Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations S. Alonso Department of Software Engineering University of Granada, 18071, Granada, Spain; salonso@decsai.ugr.es, F.J. Cabrerizo
More informationComputational Tasks and Models
1 Computational Tasks and Models Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to
More informationRanking Verification Counterexamples: An Invariant guided approach
Ranking Verification Counterexamples: An Invariant guided approach Ansuman Banerjee Indian Statistical Institute Joint work with Pallab Dasgupta, Srobona Mitra and Harish Kumar Complex Systems Everywhere
More informationTheory of Computation
Theory of Computation Lecture #2 Sarmad Abbasi Virtual University Sarmad Abbasi (Virtual University) Theory of Computation 1 / 1 Lecture 2: Overview Recall some basic definitions from Automata Theory.
More informationUltimate approximation and its application in nonmonotonic knowledge representation systems
Ultimate approximation and its application in nonmonotonic knowledge representation systems Marc Denecker Department of Computer Science, K.U.Leuven Celestijnenlaan 200A, B-3001 Heverlee Département d
More informationIntuitionistic Fuzzy Estimation of the Ant Methodology
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 2 Sofia 2009 Intuitionistic Fuzzy Estimation of the Ant Methodology S Fidanova, P Marinov Institute of Parallel Processing,
More informationEstimating the Parameters of Randomly Interleaved Markov Models
Estimating the Parameters of Randomly Interleaved Markov Models Daniel Gillblad, Rebecca Steinert (Authors) Swedish Institute of Computer Science Box 1263, SE-164 29 Kista, Sweden Emails: {dgi, rebste}@sics.se
More informationCan we measure the difficulty of an optimization problem?
1 / 22 Can we measure the difficulty of an optimization problem? *, Tom Everitt**, and Marcus Hutter*** * Dept. of Electrical and Electronic Engineering The University of Melbourne ** Department of Mathematics
More informationOn Real-time Monitoring with Imprecise Timestamps
On Real-time Monitoring with Imprecise Timestamps David Basin 1, Felix Klaedtke 2, Srdjan Marinovic 1, and Eugen Zălinescu 1 1 Institute of Information Security, ETH Zurich, Switzerland 2 NEC Europe Ltd.,
More informationLearning Hybrid Process Models From Events
Learning Hybrid Process Models From Events Process Discovery Without Faking Confidence (Experimental Results) Wil M.P. van der Aalst 1,2 and Riccardo De Masellis 2 and Chiara Di Francescomarino 2 and Chiara
More informationFuzzy Cognitive Maps Learning through Swarm Intelligence
Fuzzy Cognitive Maps Learning through Swarm Intelligence E.I. Papageorgiou,3, K.E. Parsopoulos 2,3, P.P. Groumpos,3, and M.N. Vrahatis 2,3 Department of Electrical and Computer Engineering, University
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationDiscrete Mathematics: Lectures 6 and 7 Sets, Relations, Functions and Counting Instructor: Arijit Bishnu Date: August 4 and 6, 2009
Discrete Mathematics: Lectures 6 and 7 Sets, Relations, Functions and Counting Instructor: Arijit Bishnu Date: August 4 and 6, 2009 Our main goal is here is to do counting using functions. For that, we
More informationSpace-Time Viewpoints for Concurrent Processes Represented by Relational Structures
Space-Time Viewpoints for Concurrent Processes Represented by Relational Structures Irina Virbitskaite 1,2, Elena Bozhenkova 1,2, Evgeny Erofeev 3 1 A.P. Ershov Institute of Informatics Systems, SB RAS
More informationAutomatic Partitions Extraction to Distribute the Runtime Verification of a Global Specification
Automatic Partitions Extraction to Distribute the Runtime Verification of a Global Specification Angelo Ferrando DIBRIS, University of Genova, Italy angelo.ferrando@dibris.unige.it Abstract. Trace expressions
More informationThe State Explosion Problem
The State Explosion Problem Martin Kot August 16, 2003 1 Introduction One from main approaches to checking correctness of a concurrent system are state space methods. They are suitable for automatic analysis
More informationPattern Structures 1
Pattern Structures 1 Pattern Structures Models describe whole or a large part of the data Pattern characterizes some local aspect of the data Pattern is a predicate that returns true for those objects
More informationExplaining Results of Neural Networks by Contextual Importance and Utility
Explaining Results of Neural Networks by Contextual Importance and Utility Kary FRÄMLING Dep. SIMADE, Ecole des Mines, 158 cours Fauriel, 42023 Saint-Etienne Cedex 2, FRANCE framling@emse.fr, tel.: +33-77.42.66.09
More informationMultiscaleMaterialsDesignUsingInformatics. S. R. Kalidindi, A. Agrawal, A. Choudhary, V. Sundararaghavan AFOSR-FA
MultiscaleMaterialsDesignUsingInformatics S. R. Kalidindi, A. Agrawal, A. Choudhary, V. Sundararaghavan AFOSR-FA9550-12-1-0458 1 Hierarchical Material Structure Kalidindi and DeGraef, ARMS, 2015 Main Challenges
More informationOn the complexity of shallow and deep neural network classifiers
On the complexity of shallow and deep neural network classifiers Monica Bianchini and Franco Scarselli Department of Information Engineering and Mathematics University of Siena Via Roma 56, I-53100, Siena,
More informationDISCOVERING BLOCK STRUCTURED PARALLEL PROCESS MODELS FROM CAUSALLY COMPLETE EVENT LOGS
Journal of ELECTRICAL ENGINEERING, VOL 67 (2016), NO2, 111 123 DISCOVERING BLOCK STRUCTURED PARALLEL PROCESS MODELS FROM CAUSALLY COMPLETE EVENT LOGS Julijana Lekić Dragan Milićev α-algorithm is suitable
More informationGeorg Frey ANALYSIS OF PETRI NET BASED CONTROL ALGORITHMS
Georg Frey ANALYSIS OF PETRI NET BASED CONTROL ALGORITHMS Proceedings SDPS, Fifth World Conference on Integrated Design and Process Technologies, IEEE International Conference on Systems Integration, Dallas,
More informationTennis player segmentation for semantic behavior analysis
Proposta di Tennis player segmentation for semantic behavior analysis Architettura Software per Robot Mobili Vito Renò, Nicola Mosca, Massimiliano Nitti, Tiziana D Orazio, Donato Campagnoli, Andrea Prati,
More informationMaximal ideal recursive semantics for defeasible argumentation
Maximal ideal recursive semantics for defeasible argumentation Teresa Alsinet 1, Ramón Béjar 1, Lluis Godo 2, and Francesc Guitart 1 1 Department of Computer Science University of Lleida Jaume II, 69 251
More informationPattern-Oriented Analysis and Design (POAD) Theory
Pattern-Oriented Analysis and Design (POAD) Theory Jerry Overton Computer Sciences Corporation, CSC joverton@csc.com Abstract Pattern-Oriented Analysis and Design (POAD) is the practice of building complex
More informationDesigning and Evaluating Generic Ontologies
Designing and Evaluating Generic Ontologies Michael Grüninger Department of Industrial Engineering University of Toronto gruninger@ie.utoronto.ca August 28, 2007 1 Introduction One of the many uses of
More informationFuzzy Systems. Introduction
Fuzzy Systems Introduction Prof. Dr. Rudolf Kruse Christoph Doell {kruse,doell}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge Processing
More informationUsing a Hopfield Network: A Nuts and Bolts Approach
Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of
More informationUncertain Quadratic Minimum Spanning Tree Problem
Uncertain Quadratic Minimum Spanning Tree Problem Jian Zhou Xing He Ke Wang School of Management Shanghai University Shanghai 200444 China Email: zhou_jian hexing ke@shu.edu.cn Abstract The quadratic minimum
More informationData Warehousing & Data Mining
13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.
More informationComparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees
Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland
More informationDesign of Distributed Systems Melinda Tóth, Zoltán Horváth
Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Publication date 2014 Copyright 2014 Melinda Tóth, Zoltán Horváth Supported by TÁMOP-412A/1-11/1-2011-0052
More informationRequirements Validation. Content. What the standards say (*) ?? Validation, Verification, Accreditation!! Correctness and completeness
Requirements Validation Requirements Management Requirements Validation?? Validation, Verification, Accreditation!! Check if evrything is OK With respect to what? Mesurement associated with requirements
More informationMost General computer?
Turing Machines Most General computer? DFAs are simple model of computation. Accept only the regular languages. Is there a kind of computer that can accept any language, or compute any function? Recall
More informationConvex Hull-Based Metric Refinements for Topological Spatial Relations
ABSTRACT Convex Hull-Based Metric Refinements for Topological Spatial Relations Fuyu (Frank) Xu School of Computing and Information Science University of Maine Orono, ME 04469-5711, USA fuyu.xu@maine.edu
More informationLet us first give some intuitive idea about a state of a system and state transitions before describing finite automata.
Finite Automata Automata (singular: automation) are a particularly simple, but useful, model of computation. They were initially proposed as a simple model for the behavior of neurons. The concept of a
More informationTowards a Structured Analysis of Approximate Problem Solving: ACaseStudy in Classification
Towards a Structured Analysis of Approximate Problem Solving: ACaseStudy in Classification Perry Groot and Annette ten Teije and Frank van Harmelen Division of Mathematics and Computer Science, Faculty
More informationConformance Checking of Interacting Processes With Overlapping Instances
Conformance Checking of Interacting Processes With Overlapping Instances Dirk Fahland, Massimiliano de Leoni, Boudewijn F. van Dongen, and Wil M.P. van der Aalst Eindhoven University of Technology, The
More informationCombining Memory and Landmarks with Predictive State Representations
Combining Memory and Landmarks with Predictive State Representations Michael R. James and Britton Wolfe and Satinder Singh Computer Science and Engineering University of Michigan {mrjames, bdwolfe, baveja}@umich.edu
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationTowards Lightweight Integration of SMT Solvers
Towards Lightweight Integration of SMT Solvers Andrei Lapets Boston University Boston, USA lapets@bu.edu Saber Mirzaei Boston University Boston, USA smirzaei@bu.edu 1 Introduction A large variety of SMT
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationde Blanc, Peter Ontological Crises in Artificial Agents Value Systems. The Singularity Institute, San Francisco, CA, May 19.
MIRI MACHINE INTELLIGENCE RESEARCH INSTITUTE Ontological Crises in Artificial Agents Value Systems Peter de Blanc Machine Intelligence Research Institute Abstract Decision-theoretic agents predict and
More informationMining Positive and Negative Fuzzy Association Rules
Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing
More informationUsing Genetic Algorithms to Mine Process Models: Representation, Operators and Results
Using Genetic Algorithms to Mine Process Models: Representation, Operators and Results A.K. Alves de Medeiros, A.J.M.M. Weijters and W.M.P. van der Aalst Department of Technology Management, Eindhoven
More informationA Posteriori Corrections to Classification Methods.
A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk
More informationFinite Automata - Deterministic Finite Automata. Deterministic Finite Automaton (DFA) (or Finite State Machine)
Finite Automata - Deterministic Finite Automata Deterministic Finite Automaton (DFA) (or Finite State Machine) M = (K, Σ, δ, s, A), where K is a finite set of states Σ is an input alphabet s K is a distinguished
More informationSequential programs. Uri Abraham. March 9, 2014
Sequential programs Uri Abraham March 9, 2014 Abstract In this lecture we deal with executions by a single processor, and explain some basic notions which are important for concurrent systems as well.
More informationLogical Time. 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation
Logical Time Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction 2. Clock and Events 3. Logical (Lamport) Clocks 4. Vector Clocks 5. Efficient Implementation 2013 ACM Turing Award:
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationSound Recognition in Mixtures
Sound Recognition in Mixtures Juhan Nam, Gautham J. Mysore 2, and Paris Smaragdis 2,3 Center for Computer Research in Music and Acoustics, Stanford University, 2 Advanced Technology Labs, Adobe Systems
More informationReading 11 : Relations and Functions
CS/Math 240: Introduction to Discrete Mathematics Fall 2015 Reading 11 : Relations and Functions Instructor: Beck Hasti and Gautam Prakriya In reading 3, we described a correspondence between predicates
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationSPATIAL INFORMATION GRID AND ITS APPLICATION IN GEOLOGICAL SURVEY
SPATIAL INFORMATION GRID AND ITS APPLICATION IN GEOLOGICAL SURVEY K. T. He a, b, Y. Tang a, W. X. Yu a a School of Electronic Science and Engineering, National University of Defense Technology, Changsha,
More informationFUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH
FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH M. De Cock C. Cornelis E. E. Kerre Dept. of Applied Mathematics and Computer Science Ghent University, Krijgslaan 281 (S9), B-9000 Gent, Belgium phone: +32
More informationEfficient Approximate Reasoning with Positive and Negative Information
Efficient Approximate Reasoning with Positive and Negative Information Chris Cornelis, Martine De Cock, and Etienne Kerre Fuzziness and Uncertainty Modelling Research Unit, Department of Applied Mathematics
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationfor XPS surface analysis
Thermo Scientific Avantage XPS Software Powerful instrument operation and data processing for XPS surface analysis Avantage Software Atomic Concentration (%) 100 The premier software for surface analysis
More informationData Mining and Machine Learning
Data Mining and Machine Learning Concept Learning and Version Spaces Introduction Concept Learning Generality Relations Refinement Operators Structured Hypothesis Spaces Simple algorithms Find-S Find-G
More informationData Cleaning and Query Answering with Matching Dependencies and Matching Functions
Data Cleaning and Query Answering with Matching Dependencies and Matching Functions Leopoldo Bertossi 1, Solmaz Kolahi 2, and Laks V. S. Lakshmanan 2 1 Carleton University, Ottawa, Canada. bertossi@scs.carleton.ca
More informationCell-based Model For GIS Generalization
Cell-based Model For GIS Generalization Bo Li, Graeme G. Wilkinson & Souheil Khaddaj School of Computing & Information Systems Kingston University Penrhyn Road, Kingston upon Thames Surrey, KT1 2EE UK
More informationFrom Constructibility and Absoluteness to Computability and Domain Independence
From Constructibility and Absoluteness to Computability and Domain Independence Arnon Avron School of Computer Science Tel Aviv University, Tel Aviv 69978, Israel aa@math.tau.ac.il Abstract. Gödel s main
More informationLecture notes on Turing machines
Lecture notes on Turing machines Ivano Ciardelli 1 Introduction Turing machines, introduced by Alan Turing in 1936, are one of the earliest and perhaps the best known model of computation. The importance
More informationA Hierarchical Markov Model to Understand the Behaviour of Agents in Business Processes
A Hierarchical Markov Model to Understand the Behaviour of Agents in Business Processes Diogo R. Ferreira 1, Fernando Szimanski 2, Célia Ghedini Ralha 2 1 IST Technical University of Lisbon, Portugal diogo.ferreira@ist.utl.pt
More informationA Simple Model for Sequences of Relational State Descriptions
A Simple Model for Sequences of Relational State Descriptions Ingo Thon, Niels Landwehr, and Luc De Raedt Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001 Heverlee,
More informationOn Tuning OWA Operators in a Flexible Querying Interface
On Tuning OWA Operators in a Flexible Querying Interface Sławomir Zadrożny 1 and Janusz Kacprzyk 2 1 Warsaw School of Information Technology, ul. Newelska 6, 01-447 Warsaw, Poland 2 Systems Research Institute
More information