Adaptive Learning and Mining for Data Streams and Frequent Patterns

Similar documents
Concept Drift. Albert Bifet. March 2012

Mining Frequent Closed Unordered Trees Through Natural Representations

Learning from Time-Changing Data with Adaptive Windowing

Online Estimation of Discrete Densities using Classifier Chains

Evaluation. Albert Bifet. April 2012

Using HDDT to avoid instances propagation in unbalanced and evolving data streams

Efficient Online Evaluation of Big Data Stream Classifiers

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Learning Decision Trees

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

CS341 info session is on Thu 3/1 5pm in Gates415. CS246: Mining Massive Datasets Jure Leskovec, Stanford University

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

Decision Trees Part 1. Rao Vemuri University of California, Davis

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

A Novel Concept Drift Detection Method in Data Streams Using Ensemble Classifiers

Learning Decision Trees

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Voting Massive Collections of Bayesian Network Classifiers for Data Streams

4/26/2017. More algorithms for streams: Each element of data stream is a tuple Given a list of keys S Determine which tuples of stream are in S

Evaluation methods and decision theory for classification of streaming data with temporal dependence

Ad Placement Strategies

Algorithms for Classification: The Basic Methods

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Intuition Bayesian Classification

Data classification (II)

1 [15 points] Frequent Itemsets Generation With Map-Reduce

Adaptively Learning Probabilistic Deterministic Automata from Data Streams

Data Analytics Beyond OLAP. Prof. Yanlei Diao

FINAL: CS 6375 (Machine Learning) Fall 2014

Introduction to Machine Learning Midterm, Tues April 8

CSCI-567: Machine Learning (Spring 2019)

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

D B M G Data Base and Data Mining Group of Politecnico di Torino

Decision Trees. Tirgul 5

Association Rules. Fundamentals

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)

ECE 5424: Introduction to Machine Learning

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

Chapter 6: Classification

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

CS246 Final Exam, Winter 2011

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification Using Decision Trees

Decision Trees: Overfitting

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Data Mining und Maschinelles Lernen

Decision Trees.

Modern Information Retrieval

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

XRules: An Effective Structural Classifier for XML Data

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

P, NP, NP-Complete, and NPhard

MIDTERM SOLUTIONS: FALL 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Chapter 5-2: Clustering

An Optimal Algorithm for l 1 -Heavy Hitters in Insertion Streams and Related Problems

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Anomaly Detection for the CERN Large Hadron Collider injection magnets

CS145: INTRODUCTION TO DATA MINING

Machine Learning Concepts in Chemoinformatics

Classification: Decision Trees

Decision Trees.

Real-time Sentiment-Based Anomaly Detection in Twitter Data Streams

CHAPTER-17. Decision Tree Induction

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

arxiv: v1 [cs.ai] 23 Apr 2015

Andras Hajdu Faculty of Informatics, University of Debrecen

Data Mining Project. C4.5 Algorithm. Saber Salah. Naji Sami Abduljalil Abdulhak

ECE 5984: Introduction to Machine Learning

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Solving Classification Problems By Knowledge Sets

Introduction to Randomized Algorithms III

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Bayesian Classification. Bayesian Classification: Why?

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 4 of Data Mining by I. H. Witten, E. Frank and M. A.

Introduction to Machine Learning

Bootstrapping and Learning PDFA in Data Streams

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

CS145: INTRODUCTION TO DATA MINING

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Information Theory & Decision Trees

Generative v. Discriminative classifiers Intuition

Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion

Decision Tree Learning Lecture 2

Modern Information Retrieval

Examination Artificial Intelligence Module Intelligent Interaction Design December 2014

Model updating mechanism of concept drift detection in data stream based on classifier pool

CS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine

Prediction of Citations for Academic Papers

Decision Support. Dr. Johan Hagelbäck.

Computational Learning Theory

Transcription:

Adaptive Learning and Mining for Data Streams and Frequent Patterns Albert Bifet Laboratory for Relational Algorithmics, Complexity and Learning LARCA Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Ph.D. dissertation, 24 April 2009 Advisors: Ricard Gavaldà and José L. Balcázar LARCA

Future Data Mining Future Data Mining Structured data Find Interesting Patterns Predictions On-line processing 2 / 59

Mining Evolving Massive Structured Data The basic problem Finding interesting structure on data Mining massive data Mining time varying data Mining on real time Mining structured data The Disintegration of Persistence of Memory 1952-54 Salvador Dalí 3 / 59

Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Approximation algorithms Small error rate with high probability An algorithm (ε,δ) approximates F if it outputs F for which Pr[ F F > εf] < δ. 4 / 59

Tree Pattern Mining Trees are sanctuaries. Whoever knows how to listen to them, can learn the truth. Given a dataset of trees, find the complete set of frequent subtrees Frequent Tree Pattern (FT): Include all the trees whose support is no less than min_sup Closed Frequent Tree Pattern (CT): Include no tree which has a super-tree with the same support CT FT Herman Hesse 5 / 59

Outline Mining Evolving Data Streams 1 Framework 2 ADWIN 3 Classifiers 4 MOA 5 ASHT Tree Mining 6 Closure Operator on Trees 7 Unlabeled Tree Mining Methods 8 Deterministic Association Rules 9 Implicit Rules Mining Evolving Tree Data Streams 10 Incremental Method 11 Sliding Window Method 12 Adaptive Method 13 Logarithmic Relaxed Support 14 XML Classification 6 / 59

Outline Mining Evolving Data Streams 1 Framework 2 ADWIN 3 Classifiers 4 MOA 5 ASHT Tree Mining 6 Closure Operator on Trees 7 Unlabeled Tree Mining Methods 8 Deterministic Association Rules 9 Implicit Rules Mining Evolving Tree Data Streams 10 Incremental Method 11 Sliding Window Method 12 Adaptive Method 13 Logarithmic Relaxed Support 14 XML Classification 6 / 59

Outline Mining Evolving Data Streams 1 Framework 2 ADWIN 3 Classifiers 4 MOA 5 ASHT Tree Mining 6 Closure Operator on Trees 7 Unlabeled Tree Mining Methods 8 Deterministic Association Rules 9 Implicit Rules Mining Evolving Tree Data Streams 10 Incremental Method 11 Sliding Window Method 12 Adaptive Method 13 Logarithmic Relaxed Support 14 XML Classification 6 / 59

Outline 1 Introduction 2 Mining Evolving Data Streams 3 Tree Mining 4 Mining Evolving Tree Data Streams 5 Conclusions 7 / 59

Data Mining Algorithms with Concept Drift No Concept Drift Concept Drift input DM Algorithm Counter 5 Counter 4 Counter 3 output input DM Algorithm Static Model output Counter 2 Counter 1 Change Detect. 8 / 59

Data Mining Algorithms with Concept Drift No Concept Drift Concept Drift DM Algorithm DM Algorithm input Counter 5 output input Estimator 5 output Counter 4 Estimator 4 Counter 3 Estimator 3 Counter 2 Estimator 2 Counter 1 Estimator 1 8 / 59

Time Change Detectors and Predictors (1) General Framework Problem Given an input sequence x 1,x 2,...,x t,... we want to output at instant t a prediction x t+1 minimizing prediction error: an alert if change is detected x t+1 x t+1 considering distribution changes overtime. 9 / 59

Time Change Detectors and Predictors (1) General Framework x t Estimator Estimation 10 / 59

Time Change Detectors and Predictors (1) General Framework x t Estimator Change Detect. Estimation Alarm 10 / 59

Time Change Detectors and Predictors (1) General Framework x t Estimator Change Detect. Estimation Alarm Memory 10 / 59

Optimal Change Detector and Predictor (1) General Framework High accuracy Fast detection of change Low false positives and false negatives ratios Low computational cost: minimum space and time needed Theoretical guarantees No parameters needed Estimator with Memory and Change Detector 11 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W 0 = 1 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W 0 = 1 W 1 = 01010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W 0 = 10 W 1 = 1010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W 0 = 101 W 1 = 010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W 0 = 1010 W 1 = 10110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W 0 = 10101 W 1 = 0110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W 0 = 101010 W 1 = 110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W 0 = 1010101 W 1 = 10111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 W 0 = 10101011 W 1 = 0111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 ˆµ W0 ˆµ W1 ε c : CHANGE DET.! W 0 = 101010110 W 1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 101010110111111 Drop elements from the tail of W W 0 = 101010110 W 1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Example W = 01010110111111 Drop elements from the tail of W W 0 = 101010110 W 1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t > 0 3 do W W {x t } (i.e., add x t to the head of W ) 4 repeat Drop elements from the tail of W 5 until ˆµ W0 ˆµ W1 < ε c holds 6 for every split of W into W = W 0 W 1 7 Output ˆµ W 12 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN Theorem At every time step we have: 1 (False positive rate bound). If µ t remains constant within W, the probability that ADWIN shrinks the window at this step is at most δ. 2 (False negative rate bound). Suppose that for some partition of W in two parts W 0 W 1 (where W 1 contains the most recent items) we have µ W0 µ W1 > 2ε c. Then with probability 1 δ ADWIN shrinks W to W 1, or shorter. ADWIN tunes itself to the data stream at hand, with no need for the user to hardwire or precompute parameters. 13 / 59

Algorithm ADaptive Sliding WINdow (2) ADWIN ADWIN using a Data Stream Sliding Window Model, can provide the exact counts of 1 s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 ε logw ) memory words the processing time per example is O(log W ) (amortized and worst-case). Sliding Window Model 1010101 101 11 1 1 Content: 4 2 2 1 1 Capacity: 7 3 2 1 1 14 / 59

K-ADWIN = ADWIN + Kalman Filtering (2) ADWIN x t Kalman ADWIN Estimation Alarm ADWIN Memory R = W 2 /50 and Q = 200/W (theoretically justifiable), where W is the length of the window maintained by ADWIN. 15 / 59

Classification (3) Mining Algorithms Definition Given n C different classes, a classifier algorithm builds a model that predicts for every unlabeled instance I the class C to which it belongs with accuracy. Classification Mining Algorithms Naïve Bayes Decision Trees Ensemble Methods 16 / 59

Hoeffding Tree / CVFDT (3) Mining Algorithms Hoeffding Tree : VFDT Pedro Domingos and Geoff Hulten. Mining high-speed data streams. 2000 With high probability, constructs an identical model that a traditional (greedy) method would learn With theoretical guarantees on the error rate Contains Money Yes YES Day Time No NO Night YES 17 / 59

VFDT / CVFDT (3) Mining Algorithms Concept-adapting Very Fast Decision Trees: CVFDT G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. 2001 It keeps its model consistent with a sliding window of examples Construct alternative branches as preparation for changes If the alternative branch becomes more accurate, switch of tree branches occurs 18 / 59

Decision Trees: CVFDT (3) Mining Algorithms Contains Money Yes YES Day Time No NO Night YES No theoretical guarantees on the error rate of CVFDT CVFDT parameters : 1 W : is the example window size. 2 T 0 : number of examples used to check at each node if the splitting attribute is still the best. 3 T 1 : number of examples used to build the alternate tree. 4 T 2 : number of examples used to test the accuracy of the alternate tree. 19 / 59

Decision Trees: Hoeffding Adaptive Tree (3) Mining Algorithms Hoeffding Adaptive Tree: replace frequency statistics counters by estimators don t need a window to store examples, due to the fact that we maintain the statistics data needed with estimators change the way of checking the substitution of alternate subtrees, using a change detector with theoretical guarantees Summary: 1 Theoretical guarantees 2 No Parameters 20 / 59

What is MOA? {M}assive {O}nline {A}nalysis is a framework for online learning from data streams. It is closely related to WEKA It includes a collection of offline and online as well as tools for evaluation: boosting and bagging Hoeffding Trees with and without Naïve Bayes classifiers at the leaves. 21 / 59

Ensemble Methods (4) MOA for Evolving Data Streams http://www.cs.waikato.ac.nz/ abifet/moa/ New ensemble methods: ADWIN bagging: When a change is detected, the worst classifier is removed and a new classifier is added. Adaptive-Size Hoeffding Tree bagging 22 / 59

Adaptive-Size Hoeffding Tree (5) ASHT T 1 T 2 T 3 T 4 Ensemble of trees of different size smaller trees adapt more quickly to changes, larger trees do better during periods with little change diversity 23 / 59

Adaptive-Size Hoeffding Tree (5) ASHT 0,3 0,28 0,29 0,28 0,275 0,27 0,27 0,26 Error 0,25 Error 0,265 0,24 0,23 0,26 0,22 0,255 0,21 0,2 0 0,1 0,2 0,3 0,4 0,5 0,6 0,25 0,1 0,12 0,14 0,16 0,18 0,2 0,22 0,24 0,26 0,28 0,3 Kappa Kappa Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging (right) on dataset RandomRBF with drift, plotting 90 pairs of classifiers. 24 / 59

Adaptive-Size Hoeffding Tree (5) ASHT Figure: Accuracy and size on dataset LED with three concept drifts. 25 / 59

Main contributions (i) Mining Evolving Data Streams 1 General Framework for Time Change Detectors and Predictors 2 ADWIN 3 Mining methods: Naive Bayes, Decision Trees, Ensemble Methods 4 MOA for Evolving Data Streams 5 Adaptive-Size Hoeffding Tree 26 / 59

Outline 1 Introduction 2 Mining Evolving Data Streams 3 Tree Mining 4 Mining Evolving Tree Data Streams 5 Conclusions 27 / 59

Mining Closed Frequent Trees Our trees are: Labeled and Unlabeled Ordered and Unordered Our subtrees are: Induced Top-down Two different ordered trees but the same unordered tree 28 / 59

A tale of two trees Consider D = {A,B}, where A: Frequent subtrees A B B: and let min_sup = 2. 29 / 59

A tale of two trees Consider D = {A,B}, where A: Closed subtrees A B B: and let min_sup = 2. 29 / 59

Mining Closed Unordered Subtrees (6) Unlabeled Closed Frequent Tree Method CLOSED_SUBTREES(t, D, min_sup, T ) 1 2 3 for every t that can be extended from t in one step 4 do if Support(t ) min_sup 5 then T CLOSED_SUBTREES(t,D,min_sup,T ) 6 7 8 9 10 return T 30 / 59

Mining Closed Unordered Subtrees (6) Unlabeled Closed Frequent Tree Method CLOSED_SUBTREES(t, D, min_sup, T ) 1 if not CANONICAL_REPRESENTATIVE(t) 2 then return T 3 for every t that can be extended from t in one step 4 do if Support(t ) min_sup 5 then T CLOSED_SUBTREES(t,D,min_sup,T ) 6 7 8 9 10 return T 30 / 59

Mining Closed Unordered Subtrees (6) Unlabeled Closed Frequent Tree Method CLOSED_SUBTREES(t, D, min_sup, T ) 1 if not CANONICAL_REPRESENTATIVE(t) 2 then return T 3 for every t that can be extended from t in one step 4 do if Support(t ) min_sup 5 then T CLOSED_SUBTREES(t,D,min_sup,T ) 6 do if Support(t ) = Support(t) 7 then t is not closed 8 if t is closed 9 then insert t into T 10 return T 30 / 59

Example D = {A,B} A = (0,1,2,3,2,1) B = (0,1,2,3,1,2,2) min_sup = 2. (0,1,2,1) (0,1,1) (0,1,2,2,1) (0) (0,1) (0,1,2) (0,1,2,2) (0,1,2,3,1) (0,1,2,3) 31 / 59

Example D = {A,B} A = (0,1,2,3,2,1) B = (0,1,2,3,1,2,2) min_sup = 2. (0,1,2,1) (0,1,1) (0,1,2,2,1) (0) (0,1) (0,1,2) (0,1,2,2) (0,1,2,3,1) (0,1,2,3) 31 / 59

Experimental results (6) Unlabeled Closed Frequent Tree Method TreeNat Unlabeled Trees Top-Down Subtrees No Occurrences CMTreeMiner Labeled Trees Induced Subtrees Occurrences 32 / 59

Closure Operator on Trees (7) Closure Operator D: the finite input dataset of trees T : the (infinite) set of all trees Definition We define the following the Galois connection pair: For finite A D σ(a) is the set of subtrees of the A trees in T σ(a) = {t T t A(t t )} For finite B T τ D (B) is the set of supertrees of the B trees in D τ D (B) = {t D t B (t t )} Closure Operator The composition Γ D = σ τ D is a closure operator. 33 / 59

Galois Lattice of closed set of trees (7) Closure Operator 1 2 3 12 13 23 123 34 / 59

Galois Lattice of closed set of trees D 1 2 3 B = { } 12 13 23 123 35 / 59

Galois Lattice of closed set of trees B = { } 1 2 3 τ D (B) = {, } 12 13 23 123 35 / 59

Galois Lattice of closed set of trees B = { } 1 2 3 τ D (B) = {, } 12 13 23 Γ D (B) = σ τ D(B) = { and its subtrees } 123 35 / 59

Mining Implications from Lattices of Closed Trees (8) Association Rules Problem Given a dataset D of rooted, unlabeled and unordered trees, find a basis : a set of rules that are sufficient to infer all the rules that hold in the dataset D. D 36 / 59

Mining Implications from Lattices of Closed Trees Set of Rules: A Γ D (A). antecedents are obtained through a computation akin to a hypergraph transversal consequents follow from an application of the closure operators 1 2 3 12 13 23 123 37 / 59

Mining Implications from Lattices of Closed Trees Set of Rules: A Γ D (A). 1 2 3 12 13 23 123 37 / 59

Association Rule Computation Example (8) Association Rules 1 2 3 12 13 23 123 38 / 59

Association Rule Computation Example (8) Association Rules 1 2 3 12 13 23 123 38 / 59

Association Rule Computation Example (8) Association Rules 1 2 3 12 13 23 123 38 / 59

Association Rule Computation Example (8) Association Rules 1 2 3 12 13 23 123 38 / 59

Model transformation (8) Association Rules Intuition One propositional variable v t is assigned to each possible subtree t. A set of trees A corresponds in a natural way to a model m A. Let m A be a model: we impose on m A the constraints that if m A (v t ) = 1 for a variable v t, then m A (v t ) = 1 for all those variables v t such that v t represents a subtree of the tree represented by v t. R 0 = {v t v t t t, t U, t U } 39 / 59

Implicit Rules Definition (9) Implicit Rules Implicit Rule D Given three trees t 1, t 2, t 3, we say that t 1 t 2 t 3, is an implicit Horn rule (abbreviately, an implicit rule) if for every tree t it holds t 1 t t 2 t t 3 t. t 1 and t 2 have implicit rules if t 1 t 2 t is an implicit rule for some t. 40 / 59

Implicit Rules Definition (9) Implicit Rules NOT Implicit Rule D Given three trees t 1, t 2, t 3, we say that t 1 t 2 t 3, is an implicit Horn rule (abbreviately, an implicit rule) if for every tree t it holds t 1 t t 2 t t 3 t. t 1 and t 2 have implicit rules if t 1 t 2 t is an implicit rule for some t. 40 / 59

Implicit Rules Definition (9) Implicit Rules NOT Implicit Rule This supertree of the antecedents is NOT a supertree of the consequents. 40 / 59

Implicit Rules Characterization (9) Implicit Rules Theorem All trees a, b such that a b have implicit rules. Theorem Suppose that b has only one component. Then they have implicit rules if and only if a has a maximum component which is a subtree of the component of b. for all i < n a i a n b 1 a 1 a n 1 a n b 1 a 1 a n 1 b 1 41 / 59

Main contributions (ii) Tree Mining 6 Closure Operator on Trees 7 Unlabeled Closed Frequent Tree Mining 8 A way of extracting high-confidence association rules from datasets consisting of unlabeled trees antecedents are obtained through a computation akin to a hypergraph transversal consequents follow from an application of the closure operators 9 Detection of some cases of implicit rules: rules that always hold, independently of the dataset 42 / 59

Outline 1 Introduction 2 Mining Evolving Data Streams 3 Tree Mining 4 Mining Evolving Tree Data Streams 5 Conclusions 43 / 59

Mining Evolving Tree Data Streams (10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods Problem Given a data stream D of rooted, unlabeled and unordered trees, find frequent closed trees. D We provide three algorithms, of increasing power Incremental Sliding Window Adaptive 44 / 59

Relaxed Support (13) Logarithmic Relaxed Support Guojie Song, Dongqing Yang, Bin Cui, Baihua Zheng, Yunfeng Liu and Kunqing Xie. CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data Linear Relaxed Interval:The support space of all subpatterns can be divided into n = 1/ε r intervals, where ε r is a user-specified relaxed factor, and each interval can be denoted by I i = [l i,u i ), where l i = (n i) ε r 0, u i = (n i + 1) ε r 1 and i n. Linear Relaxed closed subpattern t: if and only if there exists no proper superpattern t of t such that their suports belong to the same interval I i. 45 / 59

Relaxed Support (13) Logarithmic Relaxed Support As the number of closed frequent patterns is not linear with respect support, we introduce a new relaxed support: Logarithmic Relaxed Interval:The support space of all subpatterns can be divided into n = 1/ε r intervals, where ε r is a user-specified relaxed factor, and each interval can be denoted by I i = [l i,u i ), where l i = c i, u i = c i+1 1 and i n. Logarithmic Relaxed closed subpattern t: if and only if there exists no proper superpattern t of t such that their suports belong to the same interval I i. 45 / 59

Algorithms (10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods Algorithms Incremental: INCTREENAT Sliding Window: WINTREENAT Adaptive: ADATREENAT Uses ADWIN to monitor change ADWIN An adaptive sliding window whose size is recomputed online according to the rate of change observed. ADWIN has rigorous guarantees (theorems) On ratio of false positives and false negatives On the relation of the size of the current window and change rates 46 / 59

Experimental Validation: TN1 (10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods Time (sec.) 300 200 100 CMTreeMiner INCTREENAT 2 4 6 8 Size (Milions) Figure: Experiments on ordered trees with TN1 dataset 47 / 59

Adaptive XML Tree Classification on evolving data streams (14) XML Tree Classification D D D D B B B B B B B C C C C C C A A CLASS1 CLASS2 CLASS1 CLASS2 D Figure: A dataset example 48 / 59

Adaptive XML Tree Classification on evolving data streams (14) XML Tree Classification Tree Trans. Closed Freq. not Closed Trees 1 2 3 4 D B B c 1 C C C C 1 0 1 0 D B B C A C C A c 2 A A 1 0 0 1 49 / 59

Adaptive XML Tree Classification on evolving data streams (14) XML Tree Classification Frequent Trees c 1 c 2 c 3 c 4 Id c 1 f1 1 c 2 f2 1 f 2 2 f 2 3 c 3 f3 1 c 4 f4 1 f 4 2 f 4 3 f4 4 f 4 5 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 2 0 0 0 0 0 0 1 1 1 1 1 1 1 1 3 1 1 0 0 0 0 1 1 1 1 1 1 1 1 4 0 0 1 1 1 1 1 1 1 1 1 1 1 1 Closed Maximal Trees Trees Id Tree c 1 c 2 c 3 c 4 c 1 c 2 c 3 Class 1 1 1 0 1 1 1 0 CLASS1 2 0 0 1 1 0 0 1 CLASS2 3 1 0 1 1 1 0 1 CLASS1 4 0 1 1 1 0 1 1 CLASS2 50 / 59

Adaptive XML Tree Framework on evolving data streams (14) XML Tree Classification XML Tree Classification Framework Components An XML closed frequent tree miner A Data stream classifier algorithm, which we will feed with tuples to be classified online. 51 / 59

Adaptive XML Tree Framework on evolving data streams (14) XML Tree Classification Maximal Closed # Trees Att. Acc. Mem. Att. Acc. Mem. CSLOG12 15483 84 79.64 1.2 228 78.12 2.54 CSLOG23 15037 88 79.81 1.21 243 78.77 2.75 CSLOG31 15702 86 79.94 1.25 243 77.60 2.73 CSLOG123 23111 84 80.02 1.7 228 78.91 4.18 Table: BAGGING on unordered trees. 52 / 59

Main contributions (iii) Mining Evolving Tree Data Streams 10 Incremental Method 11 Sliding Window Method 12 Adaptive Method 13 Logarithmic Relaxed Support 14 XML Classification 53 / 59

Outline 1 Introduction 2 Mining Evolving Data Streams 3 Tree Mining 4 Mining Evolving Tree Data Streams 5 Conclusions 54 / 59

Main contributions Mining Evolving Data Streams 1 Framework 2 ADWIN 3 Classifiers 4 MOA 5 ASHT Tree Mining 6 Closure Operator on Trees 7 Unlabeled Tree Mining Methods 8 Deterministic Association Rules 9 Implicit Rules Mining Evolving Tree Data Streams 10 Incremental Method 11 Sliding Window Method 12 Adaptive Method 13 Logarithmic Relaxed Support 14 XML Classification 55 / 59

Future Lines (i) Adaptive Kalman Filter Kalman filter adaptive computing Q and R without using the size of the window of ADWIN. Extend MOA framework Support vector machines Clustering Itemset mining Association rules 56 / 59

Future Lines (ii) Adaptive Deterministic Association Rules Deterministic Association Rules computed on evolving data streams General Implicit Rules Characterization Find a characterization of implicit rules with any number of components Not Deterministic Association Rules Find basis of association rules for trees with confidence lower than 100% 57 / 59

Future Lines (iii) Closed Frequent Graph Mining Mining methods to obtain closed frequent graphs. Not incremental Incremental Sliding Window Adaptive Graph Classification Classifiers of graphs using maximal and closed frequent subgraphs. 58 / 59

Relevant publications Albert Bifet and Ricard Gavaldà. Kalman filters and adaptive windows for learning in data streams. DS 06 Albert Bifet and Ricard Gavaldà. Learning from time-changing data with adaptive windowing. SDM 07 Albert Bifet and Ricard Gavaldà. Adaptive parameter-free learning from evolving data streams. Tech-Rep R09-9 A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà. New ensemble methods for evolving data streams. KDD 09 José L. Balcázar, Albert Bifet, and Antoni Lozano. Mining frequent closed unordered trees through natural representations. ICCS 07 José L. Balcázar, Albert Bifet, and Antoni Lozano. Subtree testing and closed tree mining through natural representations. DEXA 07 José L. Balcázar, Albert Bifet, and Antoni Lozano. Mining implications from lattices of closed trees. EGC 2008 José L. Balcázar, Albert Bifet, and Antoni Lozano. Mining Frequent Closed Rooted Trees. MLJ 09 Albert Bifet and Ricard Gavaldà. Mining adaptively frequent closed unlabeled rooted trees in data streams. KDD 08 Albert Bifet and Ricard Gavaldà. Adaptive XML Tree Classification on evolving data streams 59 / 59