Estimation of Average Latent Waiting and Service Times of Activities from Event Logs

Size: px

Start display at page:

Download "Estimation of Average Latent Waiting and Service Times of Activities from Event Logs"

Ashley Shepherd
6 years ago
Views:

1 Estimation of Average Latent Waiting and Service Times of Activities from Event Logs Takahide Nogayama 1 and Haruhisa Takahashi 2 1 Department of Cloud and Security, IBM Research - Tokyo, Japan nogayama@jp.ibm.com 2 Department of Information and Communication Engineering, University of Electro-Communications, Tokyo, Japan takaharuroka@uec.ac.jp Abstract. Analysis of performance is crucial in the redesign phase of business processes where bottlenecks are identified from the average waiting and service times of activities and resources in business processes. However, such averages of waiting and service times are not readily available in most event logs that only record either the start or the completion times of events in activities. The transition times between events in such logs are the only performance features that are available. This paper proposes a novel method of estimating the average latent waiting and service times from the transition times that employs the optimization of the likelihood of the probabilistic model with expectation and maximization (EM) algorithms. Our experimental results indicated that our method could estimate the average latent waiting and service times with sufficient accuracy to enable practical applications through performance analysis. Keywords: Process mining, performance analysis, latent waiting and service times, convolution of gamma distributions, EM algorithm 1 Introduction The role of process mining has become much more important in the redesign phases of business processes. Recent developments in process mining have been based on the Process-Aware Information System (PAIS) [5] including the Workflow Management (WFM) system and the Business Process Management (BPM) system, which record business events as event logs. Process models and performance data extracted from observed event logs have played significant roles in business process improvement, redesign, re-engineering, and optimization. Performance analysis that deals with time and frequency is especially crucial in improvements to business processes to reduce labor costs and to increase customer satisfaction. Event logs typically observed on PAIS contain the assignment, start, and end times of events in activities, from which useful statistics can be inferred. We can easily calculate the average waiting time and the average service time of individual activities or resources from such event logs to discover bottlenecks in business processes.

2 2 Takahide Nogayama and Haruhisa Takahashi The average of activity transition times, which are differences between the source end and destination start times, provides information on how many time resources are consumed outside of activities. However, the domain that can be improved is limited to inside the activities in most cases. Hence, the average transition time needs to be split into two parts, i.e., the source and destination activity related parts, to know how much of the transition time the activities consume in two cases: 1. When activity transition times tend to be very long, the source or destination activity must be a bottleneck. 2. The duration of activities cannot be known when only a single timestamp is observed for each activity. These types of event logs usually arise from non-pais. For example, legacy BPMS may only recode the end time, and Kuo et al. [7] faced start times only from logs in the clinical system of a hospital. Such problems with incompleteness become more serious in non-pais. Nevertheless, the analysis of such incomplete event logs is indeed required by BPM system vendors. In fact, if the vendor can demonstrate promising improvements to business processes based on logs from the existing non-pais when customers of a vendor consider system migration from non-pais to PAIS, customers can easily be encouraged to accept the proposal for migration by the vendor. We modeled the latent average waiting and service times of event logs under two assumptions: 1. The probability distribution of time duration only depends on the present state, and not on the previous sequence of events that preceded it, which are so called Markov properties. 2. The transition times are composed of latent waiting and service times, and individual time durations are i.i.d. with Gamma distributions. We aimed at estimating the parameters of Gamma distributions for latent waiting and service times based on the maximum likelihood estimation (MLE). Since these were latent variables, the EM (expectation maximization) algorithm was applied to infer the parameters. We evaluated our method with artificially generated data, and an event log teleclaim referred to by van der Aalst [1]. Performance analysis is discussed in Section 2 in terms of process mining. Observable and unobservable performance indicators are described in Section 3. The probabilistic model is defined in Section 4, and it is estimated in Section 5. Finally, the experimental results are presented in Section 6. 2 Related Work A number of studies and tools have been developed to summarize performance data from workflow logs or event logs. Ferreira [6] gathered the minimum, maximum, and average activity durations of health care processes to analyze the

3 Estimation of Average Latent Waiting and Service Times of Activities 3 lengths of stays by patients in a hospital. Lanz et al. [8] have accurately defined various time patterns that could be observed from event logs. Performance summary data can also be utilized in the running process, i.e., in so called operational support. The prediction of the remaining lead time in instances of the running process has been presented by van der Aalst et al. [2, 3]. Sindhgatta et al. [11] assigned the tasks of a running process to appropriate workers by analyzing the average service time of activities. Rogge-solti and Kasneci [9] used the average service time of activities to detect anomalies in the instance of running processes. Activities in business processes have been analyzed using queueing models in some literature [10, 12] based on rich event logs that contain several types of events such as Entry, Assign, Start, and End. Arrival processes and the number of services (operators) need to be described beforehand in these models. Nevertheless, difficulties arise since the arrival processes can vary by the behavior of the preceding activities, and the number of available operators in the back office can vary by a day or weeks as well as the time zone of the day. Since assumptions about the arrival processes and the number of services are usually not recorded in event logs, they are determined based on the experience of data analysts. We tried to make our method work with as few assumptions as possible to avoid making estimations with incorrect assumptions. Our proposed method estimated latent waiting and service times without any assumptions about the arrival processes or the number of services. We only assumed that the time intervals were distributed with Gamma distributions because it includes rich and flexible expressions. 3 Performance Indicators of Activities Deliverable from Event Logs Event sequences provide us some performance indicators of activities due to the difference in the two timestamps between two events. When an event sequence with a timestamp (e.g., in Fig. 1) is given, some indicators can be explicitly derived such as waiting time, service time, activity duration, and transition time. K d t ^ d ^ h > > > > > > > Fig. 1. Observable and unobservable performance indicators on event log.

4 4 Takahide Nogayama and Haruhisa Takahashi The timestamps when one activity passes control to another are not explicitly recorded. We assumed that the latent service time, i.e., the first part of the transition time, would be controlled by the source activity, and the latent waiting time, i.e., the last part of the transition time, would be controlled by the destination activity in this research. 4 Probabilistic Model We used the gamma distribution as the probability density function of the latent waiting and service times because it has enough flexibility to model the time duration composed in the practice of several small activities. For example, the underwriting activity in the insurance industry includes understanding of incoming applications, measuring of risk exposures, and determining of premiums that need to be charged to insure the client against risk. Its probability density function over a probabilistic variable X > 0 is defined by p(x; q, α) = X q 1 e X α /Γ(q)α q with a shape parameter q > 0 and a scale parameter α > 0 where Γ(q) is the gamma function Γ(q) = z q 1 e z dz. We write this as 0 X G(q, α). We assumed that the transition time from a source to different destinations would share the same latent service time at the transition source. In addition, the transition time from different sources to one destination shares the same latent waiting time at that transition destination. For example, Fig. 2 illustrates the transition time and their latent waiting and service times on an XOR-split gateway. We expect that latent times can be estimated if we simultaneously solve the inverse problem. Let A be the set of activities, and T be the set of transitions. Further, let T ij be the random variable of the transition time from i A to j A, S i be the random variable of the latent service time of activity i A, and W j be the random variable of the latent waiting time of activity j A. From the assumption in Section 1, S i G(q i, α i ), W j G(r j, β j ), and T ij = S i + W j for all (i, j) T (see Fig. 3). The probabilistic density function of T ij can be Fig. 2. Couple of transition times with couple of latent waiting and single shared latent service times on XOR-split gateway. Fig. 3. Probabilistic model focusing on activity i.

5 Estimation of Average Latent Waiting and Service Times of Activities 5 Fig. 4. Examples of BPMN notations. Time periods of notations on left can be considered as sum and generated XOR-split/join Fig. 5. Sequential process (Top) of two latent times following gamma distribution. gateways (Bottom) by changing Time periods of notations on right do not match an activity identifier from name to our model well. name + resource. obtained from convolutional integration p(s i ; q i,α i ) p(w j ; r j,β j ) as: p(t ij ; θ ij ) = e Tij/βj Γ(q i )Γ(r j )α qi i βrj j f(t ij, θ ij ) (1) where we define θ ij as the vector (q i, α i, r j, β j ), and f(x, θ) = x 0 zq 1 (x z) r 1 e ( 1 β 1 α)z dz. This probabilistic model can describe the Business Process Model and Notation (BPMN) on the left of Fig. 4. However, it cannot adequately describe the AND-join gateway and time event on the right of Fig. 4. Another limitation of this model is that the transition time on stand alone sequential process, like that at the top of Fig. 5, is always split into the same two distributions. However, it is possible to generate XOR-split/join gateways by changing the definition of an activity identifier from name to name + resource. 5 Estimation of Parameters Let Θ be a vector (q 1,..., α 1,..., r 1,..., β 1,... ) that includes all shape and scale parameters. Suppose the transition times t ijk ((i, j) T, k = 1,..., ) are independent and observed to be identically distributed, we aim to find an estimator Θ which would be as close to the true value Θ 0 as possible by using MLE. The estimators give us the average latent waiting and service times of activity i as r i βi and q i α i respectively. The log likelihood function becomes: log L = (i,j) T k=1 log p(t ijk ; θ ij ) (2) However, the parameters that maximize this function are impossible to solve explicitly with equation transformations. The EM algorithm [4] is a general technique for finding maximum likelihood solutions to probabilistic models that have latent variables. The observed variables in this case are T ij and the latent variables are S i. Note that W j vanishes

6 6 Takahide Nogayama and Haruhisa Takahashi with W j = T ij S i. The EM algorithm iteratively maximizes the likelihood with the expectation and maximization steps. In the expectation step, we construct a function: Q(Θ, Θ ) = (i,j) T k=1 tijk 0 p(s i t ijk ; θ ij ) log p(t ijk, S i ; θ ij)ds i to approximate the log likelihood (2), where the joint probability p(t, S; θ) = p(s; l, α) p(t S; m, β) = Sq 1 (T S) r 1 Γ(q)Γ(r)α q β e T r β e ( 1 β α)s 1 and the posterior probability p(s T ; θ) = p(t, S; θ)/p(t ; θ) = Sq 1 (T S) r 1 f(t,q,α,r,β) e ( 1 β α)s 1. In the maximization step, we find a new parameter Θ that maximizes Q while holding the old parameter Θ. The optimal point satisfies Q Θ = 0. By substituting Q α = 0 into Q i q = 0, and Q i β = 0 into Q j r = 0, we obtain: j ψ(q i) log(q i) 1 g(t ijk, θ ij) f(t ijk, θ +log 1 ij) j A j Ak=1 j A j Ak=1 ψ(r j) log(r j) 1 h(t ijk, θ ij) f(t ijk, θ +log 1 ij) i A i Ak=1 i A i Ak=1 f 1(t ijk, θ ij) f(t ijk, θ ij) f 2(t ijk, θ ij) f(t ijk, θ ij) =0 (3) =0 (4) where we have defined g(x, θ) = x 0 zq 1 (x z) r 1 e ( 1 β α)z 1 log z dz, h(x, θ) = x 0 zq 1 (x z) r 1 e ( 1 β α)z 1 log(x z)dz, f 1 (x, θ) = f(x, q + 1, α, r, β), f 2 (x, θ) = f(x, q, α, r +1, β), and ψ (x) is the digamma function d dx ln Γ(x). The optimal shape parameters q i and r j are given by solving these nonlinear one variable equations. Then, by substituting the shape parameters into Q = 0 and Q β j = 0, we obtain the optimal scale parameters: α i = 1 q i j A j A k=1 6 Experimental Results f 1 (t ijk, θ ij ) f(t ijk, θ ij ), β j = 1 r j i A i A k=1 α i f 2 (t ijk, θ ij ) f(t ijk, θ ij ). (5) We implemented our method and the synthetic log generator by using GNU Octave. The programming language was required to support a numerical integration and a root-finding algorithm. 6.1 Synthetic Log We evaluated our method using artificially generated random numbers with three gamma distributions whose parameters were known in a business process of Fig. 6. We generated random numbers s 1 1,..., s as the instance of

7 Estimation of Average Latent Waiting and Service Times of Activities 7 dd dd dd e dd dd d d d Fig. 6. XOR-split gateway (Top) and stand alone sequential process (Bottom) to log data generated on process in Fig. 6 (Top). Fig. 7. Estimators of each iteration with the synthetic generate synthetic logs. latent service time S 1 G(3, 5), w2, 1..., w2 100 as the instance of latent waiting time W 2 G(2, 4), and w3, 1..., w3 100 as the instance of latent waiting time W 3 G(4, 6). Based on t i 12 = s i 1 + w2 i and t i 13 = s i w3 i for each i = 1,..., 100, We then created transition times t 1 12,..., t and t 1 13,..., t For the XOR-split gateway shown by the top of Fig. 6, two hundreds of transition times from A1 to A2 and A1 to A3 were generated. Fig. 7 illustrates the history of the estimators over iterations. The estimators converged with values close to the true average latent waiting and service times as the iteration progressed. For the sequential process shown by the bottom of Fig. 6, one hundred of transition times from A1 to A2 were generated. The estimaters of the average latent waiting time of A1 and service time of A2 were the same value, the half of the sample average of transition times from A1 to A2, and not close to the true values. This result suggests that our method cannot decompose latent times from stand alone transition times as we discussed in Section Teleclaim Log We evaluated our method with an event log teleclaim referred to by van der Aalst [1] that describes the handling of claims in an insurance company. It contains an instance of an XOR-split gateway from an Incoming Claim activity to a B check and an S check activities. The average transition time from an Incoming Claim to the B check and the S check were 2518 (s) for the former and 936 (s) for the latter. We found that the estimator of the average latent service time of an Incoming Claim was 508 (s) from the results in this experiment. The estimator of the average latent waiting time of the B check and the S check were 1415 (s) for the former and 105 (s) for the latter. These results indicate the Incoming Claim incurred a relatively heavy time cost after the activity is completed.

8 8 Takahide Nogayama and Haruhisa Takahashi 7 Conclusion We proposed three unobservable performance indicators, i.e., the latent waiting and service times and the latent activity duration. These indicators are not only useful in the performance analysis but also in predicting the time of a new sequence of activities that was not observed in event logs. We also proposed a statistical algorithm to estimate averages of such unobservable indicators with some assumptions. We considered that these assumptions were reasonable in practice and our method could be applied to various event logs. The estimators obtained with our method were in good agreement with the true average latent times in the experimental results with artificially generated numbers. Decomposition of transition times on AND-join gateway remains as one of the key issues to be clarified. Experiments with event logs that include a large number of events and gateways are also required as future work. References 1. van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes, vol Springer (2011) 2. van der Aalst, W.M.P., Pesic, M., Song, M.: Beyond process mining: from the past to present and future. In: Pernici, B. (ed.) CAiSE LNCS, vol. 6051, pp Springer, Heidelberg (2010) 3. van der Aalst, W.M.P., Schonenberg, M.H., Song, M.: Time prediction based on process mining. Information Systems 36(2), (2011) 4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society. Series B 39(1), 1 38 (1977) 5. Dumas, M., van der Aalst, W.M.P., ter Hofstede, A.H.M.: Process-Aware Information Systems: Bridging People and Software through Process Technology. Wiley (2005) 6. Ferreira, D.R.: Performance analysis of healthcare processes through process mining. ERCIM News 89 pp (2012) 7. Kuo, Y.H., Leung, J.M.Y., Graham, C.A.: Simulation with data scarcity: Developing a simulation model of a hospital emergency department. In: Proceedings of the 2012 Winter Simulation Conference. pp IEEE (2012) 8. Lanz, A., Weber, B., Reichert, M.: Time patterns for process-aware information systems. Requirements Engineering 19(2), (2014) 9. Rogge-solti, A., Kasneci, G.: Temporal anomaly detection in business processes. In: Sadiq, S., Soffer, P., Hagen, V. (eds.) BPM2014. LNCS, vol. 8659, pp Springer, Heidelberg (2014) 10. Senderovich, A., Weidlich, M., Gal, A., Mandelbaum, A.: Queue mining - predicting delays in service processes. In: Jarke, M., Mylopoulos, J., Quix, C., Rolland, C., Manolopoulos, Y., Mouratidis, H., Horkoff, J. (eds.) CAiSE LNCS, vol. 8484, pp Springer, Heidelberg (2014) 11. Sindhgatta, R., Dasgupta, G.B., Ghose, A.: Analysis of operational data for expertise aware staffing. In: Sadiq, S., Soffer, P., Hagen, V. (eds.) BPM LNCS, vol. 8659, pp Springer, Heidelberg (2014) 12. Zerguini, L.: On the estimation of the response time of the business process. In: 17th UK Performance Engineering Workshop, University of Leeds (2001)

9 Estimation of Average Latent Waiting and Service Times of Activities from Event Logs Takahide Nogayama and Haruhisa Takahashi This is an author-created version of research paper. The final publication [1] is available at link.springer.com ( chapter/ %2f _11). author = {Nogayama, Takahide and Takahashi, Haruhisa}, title = {{Estimation of Average Latent Waiting and Service Times of Activities from Event Logs}}, booktitle = {BPM LNCS}, volume = {9253}, pages = { }, publisher = {Springer}, address = {Heidelberg}, year = {2015} } References 1. Nogayama, T., Takahashi, H.: Estimation of Average Latent Waiting and Service Times of Activities from Event Logs. In: BPM LNCS, vol. 9253, pp Springer, Heidelberg (2015)

Process Mining in Non-Stationary Environments

and Machine Learning. Bruges Belgium), 25-27 April 2012, i6doc.com publ., ISBN 978-2-87419-049-0. Process Mining in Non-Stationary Environments Phil Weber, Peter Tiňo and Behzad Bordbar School of Computer