Advanced Techniques for Mining Structured Data: Process Mining

Similar documents
Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

CS145: INTRODUCTION TO DATA MINING

CS6220: DATA MINING TECHNIQUES

Machine Learning 3. week

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Decision trees COMS 4771

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Decision Tree Learning Lecture 2

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

SF2930 Regression Analysis

CS6220: DATA MINING TECHNIQUES

the tree till a class assignment is reached

Knowledge Discovery and Data Mining

Induction of Decision Trees

BAGGING PREDICTORS AND RANDOM FOREST

Variance Reduction and Ensemble Methods

Evaluation. Albert Bifet. April 2012

Classification Using Decision Trees

Classification: Decision Trees

Empirical Risk Minimization, Model Selection, and Model Assessment

Decision Tree Learning

Machine Learning & Data Mining

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Click Prediction and Preference Ranking of RSS Feeds

Midterm: CS 6375 Spring 2015 Solutions

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Jeffrey D. Ullman Stanford University

Informal Definition: Telling things apart

Classification and Prediction

Generalization to Multi-Class and Continuous Responses. STA Data Mining I

1 Handling of Continuous Attributes in C4.5. Algorithm

Statistical Consulting Topics Classification and Regression Trees (CART)

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Decision Trees Part 1. Rao Vemuri University of California, Davis

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

10-701/ Machine Learning, Fall

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Regression tree methods for subgroup identification I

Deconstructing Data Science

CS 6375 Machine Learning

Rule Generation using Decision Trees

LOAD FORECASTING APPLICATIONS for THE ENERGY SECTOR

Resampling Methods CAPT David Ruth, USN

Learning Decision Trees

day month year documentname/initials 1

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Machine Learning on temporal data

Koza s Algorithm. Choose a set of possible functions and terminals for the program.

Network regression with predictive clustering trees

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Classification Trees

CSC 411: Lecture 04: Logistic Regression

Decision Support. Dr. Johan Hagelbäck.

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Statistics and learning: Big Data

Learning Decision Trees

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Decision Trees (Cont.)

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 331 le-tex

Machine Learning 2nd Edi7on

Ensemble Methods and Random Forests

Linear Classifiers: Expressiveness

CHAPTER-17. Decision Tree Induction

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

ECE 5424: Introduction to Machine Learning

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision Trees.

Introduction to Machine Learning CMU-10701

Classification Based on Logical Concept Analysis

Describing Data Table with Best Decision

Predictive analysis on Multivariate, Time Series datasets using Shapelets

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Generative v. Discriminative classifiers Intuition

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Decision T ree Tree Algorithm Week 4 1

Decision Trees.

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Final Exam, Machine Learning, Spring 2009

Clustering with k-means and Gaussian mixture distributions

Modern Information Retrieval

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Hierarchical models for the rainfall forecast DATA MINING APPROACH

Lecture 3: Decision Trees

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

UAPD: Predicting Urban Anomalies from Spatial-Temporal Data

Decision Trees Entropy, Information Gain, Gain Ratio

Dan Roth 461C, 3401 Walnut

Feature Engineering, Model Evaluations

Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Trees. Introduction. Some facts about decision trees: They represent data-classification models.

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions

Holdout and Cross-Validation Methods Overfitting Avoidance

Transcription:

Advanced Techniques for Mining Structured Data: Process Mining Frequent Pattern Discovery /Event Forecasting Dr A. Appice Scuola di Dottorato in Informatica e Matematica XXXII

Problem definition 1. Given a set T of examples, which relate the characteristics of an event at time t (predictor variables) to the (numeric and discrete) characteristics of the events observed in the window t-w, t-w+1,..t-1 (descriptor variables) 2. Learn a forecasting model F(T) to forecast the characteristics of the next event: - Regression (for numeric variables) - Classification (for categorical variables) 2

Applications Use F(T) to: to check conformance to recommend appropriate actions of enterprises' users. 3

Event forecasting service Off-line step Sliding widow model + event log of full traces in order to learn a forecasting model F(T) On-line step recent events in a running trace + F(T) generated off-line in order to forecast the next event of the running trace 4

Event Forecasting Service Deployed in OPENNESS (PON VINCENTE) 5

Sliding window model Temporal correlation between events of a case The future event is correlated to the events observed in the recent past 1 2 3 4 5 6 7 8 9 10 The timestamp is ransformed into the time (in seconds) gone by the beginning of the case. When an optional characteristic lacks in the related event, the associated variable assumes the value \none" in the training example. 6

Sliding window model descriptive space X none, none, none, 0, (1) none, none, none, 0, UPDATE,com.liferay.portlet.documentlibrary.model.DLFileEntry, Paul,0 predictive space Y (1) Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 11:42:22.0 1 UPDATE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 17:21:25.0 1 DELETE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 17:49:55.0 1 CREATE com.liferay.portlet.blogs.model.blogsentry Paul 2014-11-24 18:22:00.0 1 UPDATE com.liferay.portlet.blogs.model.blogsentry Paul 2014-11-24 18:32:00.0 2 CREATE com.liferay.portlet.blogs.model.blogsentry Mary 2014-11-25 12:12:12.0........... 7

Sliding window model descriptive space X none, none, none, 0, (2) UPDATE, com.liferay.portlet.documentlibrary.model.dlfileentry, Paul,0 UPDATE,com.liferay.portlet.documentlibrary.model.DLFileEntry, Paul, 20363 predictive space Y (2) Id Activity Class Name User Timestamp 1 UPDATE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 11:42:22.0 1 UPDATE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 17:21:25.0 1 DELETE com.liferay.portlet.documentlibrary.model.dlfileentry Paul 2014-11-24 17:49:55.0 1 CREATE com.liferay.portlet.blogs.model.blogsentry Paul 2014-11-24 18:22:00.0 1 UPDATE com.liferay.portlet.blogs.model.blogsentry Paul 2014-11-24 18:32:00.0 2 CREATE com.liferay.portlet.blogs.model.blogsentry Mary 2014-11-25 12:12:12.0........... 8

In alternative, Landmark model For each event, the Landmark goes from the starting time point of the case to the present time. The descriptive characteristics are aggregated on the Landmark time. A categorical characteristic is transformed into n numeric variables (one variable for each distinct value of the characteristic domain). Each aggregated variable measures the frequency of the value over the Landmark. A numeric characteristic (e.g.time) is transformed into a numeric variable that sums values in the

Forecasting model: how-to? Predictive clustering tree (PCT) Tree structured predictive clustering models that generalize decision trees X 1 { 1, 1 } X 1 { 1, 1 } Y 1 =c 1,,Y q =c q X 2 2 Y 1 =c 1,,Y q =c q X 2 > 2 Y 1 =c 1,,Y q =c q X 1 { 1, 1 } ; Y 1 =c 1,, Y q =c q X 1 { 1, 1 } and X 2 2 ; Y 1 =c 1,, Y q =c q X 1 { 1, 1 } and X 2 > 2 ; Y 1 =c 1,, Y q =c q 10

Predictive clusters Each cluster is associate to: the description of the event grouped in the cluster based on properties of events observed in the recent past, the values forecast for the properties of the next event in the case (S, f) S is symbolic description defined on X f is a predictive function f: X Y 11

Learning the forecasting model At each internal node t, a test has to be selected by maximizing the (inter-cluster) variance reduction over the target space, defined as follows: Y T t, P = Var T t, Y ti P #T t i T t Var(T t i, Y), where T(t) denotes the set of training examples falling in t and P defines a partition T(t 1 ) and T(t 2 ) of T(t). 12

Learning the forecasting model The partition is defined according to a Boolean test on a predictor variable in X. A new partition is recursively found until a stopping criterion is satisfied. a node is leaf when it hosts a number of examples that is smaller than 2 size(t), with size(t) the number of training examples 13

Learning the forecasting model In the multi-target context, the variance reduction is computed for each target variable. The total variance reduction is the average value of variance taken over the set of target variables 14

Learning the forecasting model For a numeric target variable Y (i.e. Y Y, Y is numeric), the variance function Var( ) returns the variance of the target variable Y of the examples in the partition T(t), whereas the predictive function is the average of the target values in a cluster (leaf node). The variance reduction is computed after scaling real values of Y falling in T(t) in the interval [0,1]. For a categorical target variable Y (i.e. Y Y, Y is categorical), the variance function Var( ) returns the Gini index of the target variable Y of the examples in the partition T(t), whereas the predictive function is the majority class for the target variable in the cluster. 15

Learning the forecasting model If a leaf node is found, a predictive cluster is added to the final model. The symbolic description of this predictive cluster is the conjunction of Boolean tests along the path from the root to current leaf. The predictive function is that associated with the leaf, constructed for each target variable, by considering target values of examples falling in the leaf partition. 16

On-line phase running case next event PCT

Experiments 10-fold cross validation of cases in a log and by varying the window size between two and the maximum length of a case in the log

Accuracy averaged on the target space

Number of leaves

Learning time

Case study in VINCENTE: data Daily routine of users of the platform OPENNESS belonging to a specific group (group id=13723) between September 1, 2014 and November 30, 2014 201 full traces 5477 events 3 characteristics (activity, class, timestamp)

Case study: experimental setup Off-line learning : 90% of randomly selected traces (180 traces) On-line learning: 10% of traces (21 traces)

Case study: off-line learning 30

Case study: on-line forecasting (1/2) Trace Number of Events Activity type Class name Time (secs) 1 14 12.77% 100.00% 4.743 2 17 100.00% 100.00% 20.879 3 36 91.67% 87.50% 0.71 4 58 100.00% 80.00% 2.96 5 93 100.00% 100.00% 454.12 6 98 100.00% 66.67% 1950.00 7 101 50.00% 50.00% 928.72 8 105 100.00% 0.00% 14155.87 9 124 100.00% 50.00% 2388.25 10 125 58.33% 58.33% 9.39 11 127 100.00% 50.00% 2388.25 31

Case study: on-line forecasting(2/2) Trace Number of Events Activity type Class name Time (secs) 12 129 100.00% 83.33% 2.48 13 132 100.00% 87.5% 2.15 14 138 92.86% 71.42% 1.62 15 139 100.00% 94.12% 1.47 16 141 100.00% 66.67% 1950.00 17 174 20.00% 80.00% 587.37 18 179 96.15% 61.54% 2.62 19 181 97.92% 64.58% 165.14 20 190 60.00% 80.00% 587.37 21 194 50.00% 50.00% 928.72 Avg 82.37% 70.56% 755.23 32

Bibliography A. Appice, S. Pravilovic e D. Malerba, Process Mining to Forecast the Future of Running Cases, 2nd Internation workshop on New Fronteirs in Mining compelx Patterns, NFMCP@ECMLPKDD 2013 A. Appice, D. Malerba, V. Morreale, G. Vella, Business Event Forecasting. In: IFKAD 2015, Bari, Italy, 10-12 June