Advanced Techniques for Mining Structured Data: Process Mining Frequent Pattern Discovery /Event Forecasting Dr A. Appice Scuola di Dottorato in Informatica e Matematica XXXII
Problem definition 1. Given a set T of examples, which relate the characteristics of an event at time t (predictor variables) to the (numeric and discrete) characteristics of the events observed in the window t-w, t-w+1,..t-1 (descriptor variables) 2. Learn a forecasting model F(T) to forecast the characteristics of the next event: - Regression (for numeric variables) - Classification (for categorical variables) 2
Applications Use F(T) to: to check conformance to recommend appropriate actions of enterprises' users. 3
Event forecasting service Off-line step Sliding widow model + event log of full traces in order to learn a forecasting model F(T) On-line step recent events in a running trace + F(T) generated off-line in order to forecast the next event of the running trace 4
Event Forecasting Service Deployed in OPENNESS (PON VINCENTE) 5
Sliding window model Temporal correlation between events of a case The future event is correlated to the events observed in the recent past 1 2 3 4 5 6 7 8 9 10 The timestamp is ransformed into the time (in seconds) gone by the beginning of the case. When an optional characteristic lacks in the related event, the associated variable assumes the value \none" in the training example. 6
Sliding window model descriptive space X none, none, none, 0, (1) none, none, none, 0, UPDATE,com.liferay.portlet.documentlibrary.model.DLFileEntry, Paul,0 predictive space Y (1) Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 11:42:22.0 1 UPDATE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 17:21:25.0 1 DELETE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 17:49:55.0 1 CREATE com.liferay.portlet.blogs.model.blogsentry Paul 2014-11-24 18:22:00.0 1 UPDATE com.liferay.portlet.blogs.model.blogsentry Paul 2014-11-24 18:32:00.0 2 CREATE com.liferay.portlet.blogs.model.blogsentry Mary 2014-11-25 12:12:12.0........... 7
Sliding window model descriptive space X none, none, none, 0, (2) UPDATE, com.liferay.portlet.documentlibrary.model.dlfileentry, Paul,0 UPDATE,com.liferay.portlet.documentlibrary.model.DLFileEntry, Paul, 20363 predictive space Y (2) Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 11:42:22.0 1 UPDATE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 17:21:25.0 1 DELETE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 17:49:55.0 1 CREATE com.liferay.portlet.blogs.model.blogsentry Paul 2014-11-24 18:22:00.0 1 UPDATE com.liferay.portlet.blogs.model.blogsentry Paul 2014-11-24 18:32:00.0 2 CREATE com.liferay.portlet.blogs.model.blogsentry Mary 2014-11-25 12:12:12.0........... 8
In alternative, Landmark model For each event, the Landmark goes from the starting time point of the case to the present time. The descriptive characteristics are aggregated on the Landmark time. A categorical characteristic is transformed into n numeric variables (one variable for each distinct value of the characteristic domain). Each aggregated variable measures the frequency of the value over the Landmark. A numeric characteristic (e.g.time) is transformed into a numeric variable that sums values in the
Forecasting model: how-to? Predictive clustering tree (PCT) Tree structured predictive clustering models that generalize decision trees X 1 { 1, 1 } X 1 { 1, 1 } Y 1 =c 1,,Y q =c q X 2 2 Y 1 =c 1,,Y q =c q X 2 > 2 Y 1 =c 1,,Y q =c q X 1 { 1, 1 } ; Y 1 =c 1,, Y q =c q X 1 { 1, 1 } and X 2 2 ; Y 1 =c 1,, Y q =c q X 1 { 1, 1 } and X 2 > 2 ; Y 1 =c 1,, Y q =c q 10
Predictive clusters Each cluster is associate to: the description of the event grouped in the cluster based on properties of events observed in the recent past, the values forecast for the properties of the next event in the case (S, f) S is symbolic description defined on X f is a predictive function f: X Y 11
Learning the forecasting model At each internal node t, a test has to be selected by maximizing the (inter-cluster) variance reduction over the target space, defined as follows: Y T t, P = Var T t, Y ti P #T t i T t Var(T t i, Y), where T(t) denotes the set of training examples falling in t and P defines a partition T(t 1 ) and T(t 2 ) of T(t). 12
Learning the forecasting model The partition is defined according to a Boolean test on a predictor variable in X. A new partition is recursively found until a stopping criterion is satisfied. a node is leaf when it hosts a number of examples that is smaller than 2 size(t), with size(t) the number of training examples 13
Learning the forecasting model In the multi-target context, the variance reduction is computed for each target variable. The total variance reduction is the average value of variance taken over the set of target variables 14
Learning the forecasting model For a numeric target variable Y (i.e. Y Y, Y is numeric), the variance function Var( ) returns the variance of the target variable Y of the examples in the partition T(t), whereas the predictive function is the average of the target values in a cluster (leaf node). The variance reduction is computed after scaling real values of Y falling in T(t) in the interval [0,1]. For a categorical target variable Y (i.e. Y Y, Y is categorical), the variance function Var( ) returns the Gini index of the target variable Y of the examples in the partition T(t), whereas the predictive function is the majority class for the target variable in the cluster. 15
Learning the forecasting model If a leaf node is found, a predictive cluster is added to the final model. The symbolic description of this predictive cluster is the conjunction of Boolean tests along the path from the root to current leaf. The predictive function is that associated with the leaf, constructed for each target variable, by considering target values of examples falling in the leaf partition. 16
On-line phase running case next event PCT
Experiments 10-fold cross validation of cases in a log and by varying the window size between two and the maximum length of a case in the log
Accuracy averaged on the target space
Number of leaves
Learning time
Case study in VINCENTE: data Daily routine of users of the platform OPENNESS belonging to a specific group (group id=13723) between September 1, 2014 and November 30, 2014 201 full traces 5477 events 3 characteristics (activity, class, timestamp)
Case study: experimental setup Off-line learning : 90% of randomly selected traces (180 traces) On-line learning: 10% of traces (21 traces)
Case study: off-line learning 30
Case study: on-line forecasting (1/2) Trace Number of Events Activity type Class name Time (secs) 1 14 12.77% 100.00% 4.743 2 17 100.00% 100.00% 20.879 3 36 91.67% 87.50% 0.71 4 58 100.00% 80.00% 2.96 5 93 100.00% 100.00% 454.12 6 98 100.00% 66.67% 1950.00 7 101 50.00% 50.00% 928.72 8 105 100.00% 0.00% 14155.87 9 124 100.00% 50.00% 2388.25 10 125 58.33% 58.33% 9.39 11 127 100.00% 50.00% 2388.25 31
Case study: on-line forecasting(2/2) Trace Number of Events Activity type Class name Time (secs) 12 129 100.00% 83.33% 2.48 13 132 100.00% 87.5% 2.15 14 138 92.86% 71.42% 1.62 15 139 100.00% 94.12% 1.47 16 141 100.00% 66.67% 1950.00 17 174 20.00% 80.00% 587.37 18 179 96.15% 61.54% 2.62 19 181 97.92% 64.58% 165.14 20 190 60.00% 80.00% 587.37 21 194 50.00% 50.00% 928.72 Avg 82.37% 70.56% 755.23 32
Bibliography A. Appice, S. Pravilovic e D. Malerba, Process Mining to Forecast the Future of Running Cases, 2nd Internation workshop on New Fronteirs in Mining compelx Patterns, NFMCP@ECMLPKDD 2013 A. Appice, D. Malerba, V. Morreale, G. Vella, Business Event Forecasting. In: IFKAD 2015, Bari, Italy, 10-12 June