CPT+: A Compact Model for Accurate Sequence Prediction

Similar documents
CPT+: Decreasing the time/space complexity of the Compact Prediction Tree

CS 484 Data Mining. Association Rule Mining 2

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

D B M G Data Base and Data Mining Group of Politecnico di Torino

Association Rules. Fundamentals

CS 584 Data Mining. Association Rule Mining 2

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining Concepts & Techniques

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

Lecture Notes for Chapter 6. Introduction to Data Mining

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci

DATA MINING - 1DL360

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

COMP 5331: Knowledge Discovery and Data Mining

FP-growth and PrefixSpan

Maintaining Frequent Itemsets over High-Speed Data Streams

CSE 5243 INTRO. TO DATA MINING

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

DATA MINING - 1DL360

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Mining Emerging Substrings

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

732A61/TDDD41 Data Mining - Clustering and Association Analysis

DATA MINING LECTURE 4. Frequent Itemsets, Association Rules Evaluation Alternative Algorithms

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

ST3232: Design and Analysis of Experiments

Exercise 1. min_sup = 0.3. Items support Type a 0.5 C b 0.7 C c 0.5 C d 0.9 C e 0.6 F

CSE 5243 INTRO. TO DATA MINING

UNIT I INFORMATION THEORY. I k log 2

WAM-Miner: In the Search of Web Access Motifs from Historical Web Log Data

Association Analysis Part 2. FP Growth (Pei et al 2000)

Association Rule Mining on Web

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Unit 3 Session - 9 Data-Processing Circuits

Association Analysis: Basic Concepts. and Algorithms. Lecture Notes for Chapter 6. Introduction to Data Mining

Lecture Notes for Chapter 6. Introduction to Data Mining. (modified by Predrag Radivojac, 2017)

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Approximate counting: count-min data structure. Problem definition

Sequential Pattern Mining

Handling a Concept Hierarchy

TWO-LEVEL FACTORIAL EXPERIMENTS: REGULAR FRACTIONAL FACTORIALS

Rank and Select Operations on Binary Strings (1974; Elias)

Frequent Itemset Mining

BASICS OF COMPRESSION THEORY

Lecture 1 : Data Compression and Entropy

Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Introduction to Data Mining

CS533 Fall 2017 HW5 Solutions. CS533 Information Retrieval Fall HW5 Solutions

Frequent Itemset Mining

Association Analysis. Part 2

Pattern Space Maintenance for Data Updates. and Interactive Mining

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise

Shannon-Fano-Elias coding

CS246 Final Exam. March 16, :30AM - 11:30AM

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

DATA MINING LECTURE 4. Frequent Itemsets and Association Rules

Descrip9ve data analysis. Example. Example. Example. Example. Data Mining MTAT (6EAP)

Machine Learning: Pattern Mining

Experimental design (DOE) - Design

Sanjit A. Seshia EECS, UC Berkeley

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

Mining Temporal Patterns for Interval-Based and Point-Based Events

CSE 421 Greedy: Huffman Codes

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Data Compression Techniques

A New Concise and Lossless Representation of Frequent Itemsets Using Generators and A Positive Border

Dictionary: an abstract data type

c 2011 by Hengzhi Zhong. All rights reserved.

Design theory for relational databases

CS246 Final Exam, Winter 2011

Theory of Computation Time Complexity

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Dictionary: an abstract data type

An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets

Mining Frequent Partite Episodes with Partwise Constraints

An O(N) Semi-Predictive Universal Encoder via the BWT

Applied Cartography and Introduction to GIS GEOG 2017 EL. Lecture-2 Chapters 3 and 4

arxiv: v1 [cs.ds] 9 Apr 2018

CSEP 590 Data Compression Autumn Arithmetic Coding

Frequent Itemset Mining

arxiv: v1 [cs.ds] 15 Feb 2012

Advanced Digital Design with the Verilog HDL, Second Edition Michael D. Ciletti Prentice Hall, Pearson Education, 2011

Association Analysis. Part 1

Computer Organization I. Lecture 13: Design of Combinational Logic Circuits

2018/5/3. YU Xiangyu

UVA UVA UVA UVA. Database Design. Relational Database Design. Functional Dependency. Loss of Information

CHAPTER 7 MULTI-LEVEL GATE CIRCUITS NAND AND NOR GATES

Effective Elimination of Redundant Association Rules

Text Compression. Jayadev Misra The University of Texas at Austin December 5, A Very Incomplete Introduction to Information Theory 2

CS1802 Week 11: Algorithms, Sums, Series, Induction

Mining Strong Positive and Negative Sequential Patterns

Lecture 18 April 26, 2012

CS Data Structures and Algorithm Analysis

Transcription:

CPT+: A Compact Model for Accurate Sequence Prediction Ted Gueniche 1, Philippe Fournier-Viger 1, Rajeev Raman 2, Vincent S. Tseng 3 1 University of Moncton, Canada 2 University of Leicester, UK 3 National Chiao Tung University, Taiwan 2015-05-22 PAKDD 2015, Ho Chi Minth City, Vietnam

The problem of Sequence Prediction A B C? Problem: Given a set of training sequences, predict the next symbol of a sequence. Applications: webpage prefetching, analyzing the behavior of customers on websites, keyboard typing prediction product recommendation, stock market prediction, 2

General approach for this problem Phase 1) Training Training sequences Building a sequence prediction model Prediction Model Phase 2) Prediction Prediction Model A sequence e.g. A,B,C Prediction algorithm Prediction e.g. D 3

Sequential pattern mining Discovery of patterns Using the patterns for prediction It is time-consuming to extract patterns. patterns ignore rare cases, updating the patterns: very costly! sequences Pattern Support PrefixSpan Minsup = 33 % 4 4

Dependency Graph (DG) S1: {A,B,C,A,C,B,D} S2: {C,C,A,B,C,B,C,A} D A 1 3 1 4 3 2 C 3 3 B DG with lookup table of size 2 5

Dependency Graph (DG) S1: {A,B,C,A,C,B,D} S2: {C,C,A,B,C,B,C,A} D 1 3 1 4 C P(B A) = 3 / SUP(A) = 3 / 4 P(C A) = 3 / SUP(A) = 3 / 4 3 3 A 3 2 B DG with lookup table of size 2 6

PPM order 1 (prediction by partial matching) S1: {A,B,C,A,C,B,D} S2: {C,C,A,B,C,B,C,A} 4 4 6 A B C 2 1 3 1 3 1 1 B C C D A B C

PPM order 1 (prediction by partial matching) S1: {A,B,C,A,C,B,D} S2: {C,C,A,B,C,B,C,A} 4 4 6 A B C 2 1 3 1 3 1 1 P(B A) = 2 / 4 P(C A) = 1 / 4 B C C D A B C

PPM order 2 S1: {A,B,C,A,C,B,D} S2: {C,C,A,B,C,B,C,A} 2 1 3 AB AC BC 2 C B 1 A 2 1 B predictions are inaccurate if there is noise

All-K-Order Markov Uses PPM from level 1 to K for prediction. More accurate than a fixed-order PPM, But exponential size 4 4 6 A B C 2 1 3 AB AC BC 2 1 3 1 3 1 1 2 1 2 1 B C C D A B C C B A B Example: order 2 P(C AB) = 2 / 2 P(B AC) = 1 / 1 P(A BC) = 2 / 3

Limitations Several models assume that each event depends only on the immediately preceding event. Otherwise, often an exponential complexity (e.g.: All-K-Markov) Some improvements to reduce the size of markovian models, but few work to improve their accuracy. Several models are not noise tolerant. Some models are costly to update (e.g. sequential patterns). All the aforementioned models are lossy models. 11

CPT: COMPACT PREDICTION TREE Gueniche, T., Fournier-Viger, P., Tseng, V.-S. (2013). Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction. Proc. 9th International Conference on Advanced Data Mining and Applications (ADMA 2013) Part II, Springer LNAI 8347, pp. 177-188. 12

Goal to provide more accurate predictions, a model having a reasonable size, a model that is noise tolerant. 13

Hypothesis Idea: build a lossless model (or a model where the loss of information can be controlled), use all relevant information to perform each sequence prediction. Hypothesis: this would increase prediction accuracy. 14

Challenges 1) Define an efficient structure in terms of space to store sequences, 2) The structure must be incrementally updatable to add new sequences 3) Propose a prediction algorithm that : offers accurate predictions, if possible, is also time-efficient. 15

Our proposal Compact Prediction Tree (CPT) A tree-structure to store training sequences, An indexing mechanism, Each sequence is inserted one after the other in the CPT. Illustration 16

Example We will consider the four following training sequences: 1. ABC 2. AB 3. ABDC 4. BC 5. BDE 17

Example (construction) Prediction tree root Inverted Index Lookup table 18

Example: Inserting <A,B,C> Prediction tree root Inverted Index Lookup table 19

Example: Inserting <A,B,C> Prediction tree root A B s1 A 1 B 1 C 1 Inverted Index C Lookup table s1 20

Example: Inserting <A,B> A Prediction tree root Inverted Index s1 s2 A 1 1 B 1 1 C 1 0 B C Lookup table s1 s2 21

Example: Inserting <A,B,D,C> Prediction tree root A B Inverted Index s1 s2 s3 A 1 1 1 B 1 1 1 C 1 0 1 D 0 0 1 C D C Lookup table s1 s2 s3 22

Example: Inserting <B,C> Prediction tree root A B B C Inverted Index s1 s2 s3 s4 A 1 1 1 0 B 1 1 1 1 C 1 0 1 1 D 0 0 1 0 C D C Lookup table s1 s2 s3 s4 23

Example: Inserting <B,D,E> Prediction tree root A B B C D C C Lookup table s1 s2 s3 D E s4 Inverted Index s1 s2 s3 s4 s5 A 1 1 1 0 0 B 1 1 1 1 1 C 1 0 1 1 0 D 0 0 1 0 1 E 0 0 0 0 1 s5 24

Example: Inserting <B,D,E> Prediction tree root A B B C D C C Lookup table s1 s2 s3 D E s4 Inverted Index s1 s2 s3 s4 s5 A 1 1 1 0 0 B 1 1 1 1 1 C 1 0 1 1 0 D 0 0 1 0 1 E 0 0 0 0 1 s5 25

Insertion linear complexity, O(m) where m is the sequence length. a reversible operation (sequences can be recovered from the CPT). the insertion order of sequences is preserved in the CPT. 26

Space complexity A B root B C D Size of the prediction tree worst case: O(N * average sequence length) where N is the number of sequences. In general, much smaller, because sequences overlap. C D C E 27

Space complexity (cont d) Size of Inverted Index (n x b) n = sequence count b = symbol count small because encoded as bit vectors s1 s2 s3 s4 s5 A 1 1 1 0 0 B 1 1 1 1 1 C 1 0 1 1 0 D 0 0 1 0 1 E 0 0 0 0 1 28

Space complexity (cont d) A root B Size of lookup table n pointers where n is the sequence count B C D D C E C Lookup table s1 s2 s3 s4 s5 29

PREDICTION 30

Predicting the symbol following <A,B> Prediction tree root A B B C D C C Lookup table s1 s2 s3 D E s4 Inverted Index s1 s2 s3 s4 s5 A 1 1 1 0 0 B 1 1 1 1 1 C 1 0 1 1 0 D 0 0 1 0 1 E 0 0 0 0 1 s5 31

Predicting the symbol following <A,B> Prediction tree root A B B C D C C Lookup table s1 s2 s3 D E s4 Inverted Index s1 s2 s3 s4 s5 A 1 1 1 0 0 B 1 1 1 1 1 C 1 0 1 1 0 D 0 0 1 0 1 E 0 0 0 0 1 s5 The logical AND indicates that the sequences common to A and B are: s1, s2 et s3 32

Predicting the symbol following <A,B> Prediction tree root A B B C D C C Lookup table s1 s2 s3 D E s4 Inverted Index s1 s2 s3 s4 s5 A 1 1 1 0 0 B 1 1 1 1 1 C 1 0 1 1 0 D 0 0 1 0 1 E 0 0 0 0 1 s5 The Lookup table allows to traverse the corresponding sequences in from the end to the start. 33

Predicting the symbol following <A,B> Prediction tree root A B B C D C C Lookup table s1 s2 s3 D E s4 Inverted Index s1 s2 s3 s4 s5 A 1 1 1 0 0 B 1 1 1 1 1 C 1 0 1 1 0 D 0 0 1 0 1 E 0 0 0 0 1 Count table: C: 2 occurrences after {AB} D: 1 occurrences after {AB} s5 34

Complexity of prediction 1. Intersection of bit vectors: O(v) where v is the number of symbols. 2. Traversing sequences: O(n) where n is the sequence count 3. Creating the count table: O(x) where x is the number of symbols in sequences after the target sequence. 4. Choosing the predicted symbol: O(y) where y is the number of distinct symbols in the Count Table. 35

EXPERIMENTAL EVALUATION 36

Experimental evaluation Datasets BMS, FIFA, Kosarak: sequences of clicks on webpages. SIGN: sentences in sign languages. BIBLE: sequences of characters in a book.

Experimental evaluation (cont d) Competitor algorithms DG (lookup window = 4) All-K-Order Markov (order of 5) PPM (order of 1) 10-fold cross-validation

Experimental evaluation (cont d) Measures: Accuracy = success count / sequence count Coverage = prediction count / sequence count

Experiment 1 Accuracy CPT is the most accurate except for one dataset. PPM and DG perform well in some situations.

Experiment 1 size CPT is smaller than All-K-order-Markov larger than DG and PPM

Experiment 1 time (cont d) CPT s training time is at least 3 times less than DG and AKOM, and similar to PPM. CPT s prediction time is quite high (a trade-off for more accuracy)

Experiment 2 scalability CPT shows a trend similar to other algorithms

Experiment 3 prefix size prefix size: the number of symbols to be used for making a prediction for FIFA: The accuracy of CPT increases until a prefix size of around 8. (depends on the dataset)

Optimisation #1 - RecursiveDivider Example: {A,B,C,D} Level 1 Level 2 Level 3 {B,C,D} {C,D} {D} {A,C,D} {B,D} {C} {A,B,D} {B,C} {B} {A,B,C} {A,D} {A} {A,C} {A,B} Accuracy and coverage are increasing. Training time and prediction time remains more or less the same. Therefore, a high value for this parameter is better for all datasets.

Optimisation #2 sequence splitting Example: splitting sequence {A,B,C,D,E,F,G} with split_length = 5 gives {C,D,E,F,G}

Conclusion CPT, a new model for sequence prediction allows fast incremental updates, compresses training sequences, integrates an indexing mechanism two optimizations, Results: in general, more accurate than compared models but prediction time is greater (a trade-off), CPT is more than twice smaller than AKOM sequence insertion more than 3 times faster than DG and AKOM 47

CPT+: DECREASING THE TIME/SPACE COMPLEXITY OF CPT Gueniche, T., Fournier-Viger, P., Raman, R., Tseng, V. S. (2015). CPT+: Decreasing the time/space complexity of the Compact Prediction Tree. Proc. 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2015), Springer, LNAI9078, pp. 625-636 48

Introduction Two optimisations to reduce the size of the tree used by CPT: compressing frequent substrings, compressing simple branches. An optimisation to improve prediction time and noise tolerance. 49

(1) compressing frequent substrings This strategy is applied during training it identifies frequent substrings in training sequences, it replaces these substrings by new symbols Discovering substrings is done with a modified version of the PrefixSpan algorithm parameters: minsup, minlength and maxlength 50

(1) compressing frequent substrings Prediction tree Inverted Index Lookup table 51

(1) compressing frequent substrings Prediction tree Inverted Index Lookup table 52

(1) Compressing simple branches Time complexity: training : non negligible cost to discover frequent substrings, prediction: symbols are uncompressed on-thefly in O(1) time. Space complexity: O(m) where m is the number of frequent substrings. 53

(2) Compressing simple branches A second optimization to reduce the size of the tree A simple branch is a branch where all nodes have a single child. Each simple branch is replaced by a single node representing the whole branch. 54

(2) Compressing simple branches Prediction tree Inverted Index Lookup table 55

(2) Compressing simple branches Prediction tree Inverted Index Lookup table 56

(2) Compressing simple branches Prediction tree Inverted Index Lookup table 57

(2) Compressing simple branches Time complexity very fast. after building the tree, we only need to traverse the branches from the bottom using the lookup table. 58

(3) Improved Noise Reduction Recall that CPT removes items from a sequence to be predicted to be more noise tolerant. Improvement: only remove less frequent symbols from sequences, assuming that they are more likely to be noise, consider a minimum number of sequences to perform a prediction, add a new parameter Noise Ratio (e.g. 20%) to determine how many symbols should be removed from sequences (e.g.: the 20% most infrequent symbols). Thus, the amount of noise is assumed to be proportional to the length of sequences. 59

Experiment Datasets Competitor algorithms DG, TDAG, PPM, LZ78, All-K-Markov 60

Prediction accuracy dataset CPT+ is also up to 4.5 times faster than CPT in terms of prediction time 61

Size (nodes) Scalability PPM Sequence count 62

Conclusion CPT(+): a novel sequence prediction model Fast training time Good scalability, High prediction accuracy. Future work: further compress the model, compare with other predictions models such as CTW and NN, data stream, user profiles open-source library for web prefetching IPredict https://github.com/tedgueniche/ipredict/tree/master/src/ ca/ipredict 63