Un nouvel algorithme de génération des itemsets fermés fréquents

Similar documents
Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Formal Concept Analysis

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Data Mining Concepts & Techniques

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

CS 484 Data Mining. Association Rule Mining 2

732A61/TDDD41 Data Mining - Clustering and Association Analysis

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG

Association Rules. Fundamentals

CS 584 Data Mining. Association Rule Mining 2

D B M G Data Base and Data Mining Group of Politecnico di Torino

CSE 5243 INTRO. TO DATA MINING

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

Frequent Pattern Mining: Exercises

Lecture Notes for Chapter 6. Introduction to Data Mining

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

Machine Learning: Pattern Mining

Frequent Itemset Mining

COMP 5331: Knowledge Discovery and Data Mining

CS 412 Intro. to Data Mining

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

CS5112: Algorithms and Data Structures for Applications

CSE 5243 INTRO. TO DATA MINING

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

Assignment 7 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

Chapters 6 & 7, Frequent Pattern Mining

Zijian Zheng, Ron Kohavi, and Llew Mason. Blue Martini Software San Mateo, California

Correlation Preserving Unsupervised Discretization. Outline

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

DATA MINING - 1DL360

DATA MINING - 1DL360

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Analytics Beyond OLAP. Prof. Yanlei Diao

Frequent Itemset Mining

FP-growth and PrefixSpan

Unit II Association Rules

FREQUENT PATTERN SPACE MAINTENANCE: THEORIES & ALGORITHMS

Concepts of a Discrete Random Variable

Mining alpha/beta concepts as relevant bi-sets from transactional data

COMP 5331: Knowledge Discovery and Data Mining

Association Analysis Part 2. FP Growth (Pei et al 2000)

DATA MINING LECTURE 4. Frequent Itemsets, Association Rules Evaluation Alternative Algorithms

Sequential Pattern Mining

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Data preprocessing. DataBase and Data Mining Group 1. Data set types. Tabular Data. Document Data. Transaction Data. Ordered Data

OPPA European Social Fund Prague & EU: We invest in your future.

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise

Association Analysis: Basic Concepts. and Algorithms. Lecture Notes for Chapter 6. Introduction to Data Mining

Lecture Notes for Chapter 6. Introduction to Data Mining. (modified by Predrag Radivojac, 2017)

A Global Constraint for Closed Frequent Pattern Mining

Association Rule Mining on Web

Association Analysis. Part 1

Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Removing trivial associations in association rule discovery

10/19/2017 MIST.6060 Business Intelligence and Data Mining 1. Association Rules

1 Frequent Pattern Mining

Handling a Concept Hierarchy

Σοφια: how to make FCA polynomial?

Pattern Space Maintenance for Data Updates. and Interactive Mining

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Mining Approximative Descriptions of Sets Using Rough Sets

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

Foreword. Grammatical inference. Examples of sequences. Sources. Example of problems expressed by sequences Switching the light

Unsupervised Learning. k-means Algorithm

On the Mining of Numerical Data with Formal Concept Analysis

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

On Condensed Representations of Constrained Frequent Patterns

Frequent Itemset Mining

An Introduction to Formal Concept Analysis

Using transposition for pattern discovery from microarray data

A generalized framework to consider positive and negative attributes in formal concept analysis.

Constraint-based Subspace Clustering

An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases

DATA MINING - 1DL105, 1DL111

The Challenge of Mining Billions of Transactions

Privacy-preserving Data Mining

Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data

Descrip9ve data analysis. Example. Example. Example. Example. Data Mining MTAT (6EAP)

Frequent Subtree Mining on the Automata Processor: Challenges and Opportunities

Course Content. Association Rules Outline. Chapter 6 Objectives. Chapter 6: Mining Association Rules. Dr. Osmar R. Zaïane. University of Alberta 4

Mining Statistically Important Equivalence Classes and Delta-Discriminative Emerging Patterns

A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms

Mining Temporal Patterns for Interval-Based and Point-Based Events

On Compressing Frequent Patterns

Associa'on Rule Mining

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

Mining Positive and Negative Fuzzy Association Rules

On Minimal Infrequent Itemset Mining

Frequent Itemset Mining

Discovering Binding Motif Pairs from Interacting Protein Groups

Transcription:

Un nouvel algorithme de génération des itemsets fermés fréquents Huaiguo Fu CRIL-CNRS FRE2499, Université d Artois - IUT de Lens Rue de l université SP 16, 62307 Lens cedex. France. E-mail: fu@cril.univ-artois.fr

Motivation P. 2/32 Develop data mining tools based on formal concept analysis Concept lattices is an effective tool for data analysis Develop scalable concept lattice algorithm to reduce expensive cost when mining very large data Problem: very large data still bothers us Situation: applications of concept lattice suffer from the complexity of the concept lattice algorithms for large dataset One solution: divide-and-conquer

Outline P. 3/32 1 Introduction 2 Concept lattice 3 Mining frequent closed itemsets 4 PFC algorithm 5 Conclusion

❶ Introduction 1 Introduction 2 Concept lattice 3 Mining frequent closed itemsets 4 PFC algorithm 5 Conclusion

❶ Introduction Concept lattice P. 5/32 The core of formal concept analysis Derived from mathematical order theory and lattice theory Study of the generation and relation of formal concepts: how objects can be hierarchically grouped together according to their common attributes generate knowledge rules from concept lattice Data Concept lattice Knowledge

❶ Introduction An effective tool P. 6/32 Concept lattice is an effective tool for Data analysis Knowledge discovery and data mining Information retrieval Software engineering...

❶ Introduction Concept lattice for Data mining (DM) P. 7/32 Association rules mining (ARM) Frequent Closed Itemset (FCI) mining CLOSE, A-close, Titanic... CLOSET, CHARM, CLOSET+... Supervised classification GRAND, LEGAL, GALOIS RULEARNER, CIBLe, CLNN&CLNB... Clustering analysis based on concept lattices Data visualization...

❶ Introduction Concept lattice algorithms P. 8/32 Many concept lattice algorithms Chein (1969) Norris (1978) Ganter (NextClosure) (1984) Bordat (1986) Godin (1991, 1995) Dowling (1993) Kuznetsov (1993) Carpineto and Romano (1993, 1996) Nourine (1999) Lindig (2000) Valtchev (2002)... However, large data impedes or limits the application of concept lattice algorithms We need to develop scalable lattice-based algorithm for data mining

❷ Concept lattice 1 Introduction 2 Concept lattice 3 Mining frequent closed itemsets 4 PFC algorithm 5 Conclusion

❷ Concept lattice Formal context P. 10/32 Definition Formal context is defined by a triple (O, A, R), where O and A are two sets, and R is a relation between O and A. The elements of O are called objects or transactions, while the elements of A are called items or attributes. Example Formal context D: (O, A, R) O = {1, 2, 3, 4, 5, 6, 7, 8} A = {a 1, a 2, a 3, a 4, a 5, a 6, a 7, a 8 } a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 1 2 3 4 5 6 7 8

❷ Concept lattice Formal concept P. 11/32 Definition Two closure operators are defined as O 1 O 1 for set O and A 1 A 1 for set A. Definition O 1 := {a A ora for all o O 1} A 1 := {o O ora for all a A 1} A formal concept of (O, A, R) is a pair (O 1, A 1 ) with O 1 O, A 1 A, O 1 = A 1 and A 1 = O 1. O 1 is called extent, A 1 is called intent.

❷ Concept lattice Concept lattice P. 12/32 Definition We say that there is a hierarchical order between two formal concepts (O 1, A 1 ) and (O 2, A 2 ), if O 1 O 2 (or A 2 A 1 ). All formal concepts with the hierarchical order of concepts form a complete lattice called concept lattice. 12345678 a 1 A formal concept 1234 34678 12356 5678 a 1a 3 extent a 1a 7 a 1a 4 a intent 1a 2 234 568 a 1a 7a 8 a 1a 4a 6 a 1a 3a 4 34 123 a 1a 2a 3 56 36 678 a 1a 3a 7a 8 a 1a 2a 7 a 1a 2a 4a 6 23 a 1a 2a 7a 8 68 a 1a 3a 4a 6 3 7 6 a 1a 2a 3a 7a 8 a 1a 3a 4a 5 a 1a 2a 3a 4a 6 a 1a 2a 3a 4a 5a 6a 7a 8

❷ Concept lattice Concept lattice P. 12/32 Definition We say that there is a hierarchical order between two formal concepts (O 1, A 1 ) and (O 2, A 2 ), if O 1 O 2 (or A 2 A 1 ). All formal concepts with the hierarchical order of concepts form a complete lattice called concept lattice. 12345678 a 1 A formal concept 1234 34678 12356 5678 a 1a 3 extent a 1a 7 a 1a 4 a intent 1a 2 234 568 a 1a 7a 8 a 1a 4a 6 a 1a 3a 4 34 123 a 1a 2a 3 56 36 678 a 1a 3a 7a 8 a 1a 2a 7 a 1a 2a 4a 6 23 a 1a 2a 7a 8 68 a 1a 3a 4a 6 3 7 6 a 1a 2a 3a 7a 8 a 1a 3a 4a 5 a 1a 2a 3a 4a 6 a 1a 2a 3a 4a 5a 6a 7a 8

❸ Mining frequent closed itemsets 1 Introduction 2 Concept lattice 3 Mining frequent closed itemsets 4 PFC algorithm 5 Conclusion

❸ Mining frequent closed itemsets Association rule mining P. 14/32 Finding frequent patterns, associations, correlations, or causal structures among itemsets or objects in databases Two sub-problems mining frequent itemsets generating association rules from frequent itemsets Basic Concepts Formal context (O, A, R), an itemset A 1 support: support(a 1 ) = A 1 frequent itemset: if support(a 1 ) minsupport, given a minimum support threshold minsupport

❸ Mining frequent closed itemsets Algorithms of mining frequent itemsets P. 15/32 Classic algorithm: Apriori and OCD [Agrawal et al., 96] Improvement of Apriori AprioriTid, AprioriHybrid DHP (Direct Hashing and Pruning) Partition Sampling DIC (Dynamic Itemset Counting)... Other strategies FP-Growth... Problem: too many frequent itemsets if the minsupport is low Solutions: Maximal frequent itemset mining Frequent closed itemset mining

❸ Mining frequent closed itemsets Closed itemset lattice P. 16/32 Definition For formal context (O, A, R), an itemset A 1 A is a closed itemset iff A 1 = A 1. Definition We say that there is a hierarchical order between closed itemsets A 1 and A 2, if A 1 A 2. All closed itemsets with the hierarchical order form of a complete lattice called closed itemset lattice. a 1 a 1a 7 a 1a 3 a 1a 2 a1a 4 a 1a 7a 8 a 1a 4a 6 a 1a 3a 7a 8 a a 1a 2a 7 1a 3a 4 a 1a 2a 4a 6 a 1a 2a 3 a 1a a 3a 4a 6 1a 2a 7a 8 a 1a 2a 3a 7a 8 a 1a 3a 4a 5 a 1a 2a 3a 4a 6 a 1a 2a 3a 4a 5a 6a 7a 8

❸ Mining frequent closed itemsets FCI mining algorithms P. 17/32 Apriori-like algorithms Close A-Close... Hybrid algorithms CHARM CLOSET CLOSET+ FP-Close DCI-Closed LCM...

❹ PFC algorithm 1 Introduction 2 Concept lattice 3 Mining frequent closed itemsets 4 PFC algorithm 5 Conclusion

❹ PFC algorithm Analysis of search space P. 19/32 Search space of a formal context with 4 attributes a m am 1 a m 2 a m 3 a m 1a m a m 2a m 1 a m 3a m a m 3a m 2 a m 2a m a m 3a m 1 a m 2a m 1a m a m 3a m 2a m 1 a m 3a m 1a m a m 3a m 2a m Search space decomposition a m 3a m 2a m 1a m a m a m 1 a m 1a m a m 2 a m 2a m 1 a m 3a m 1 a m 2a m a m 3a m a m 2a m 1a m a m 3 a m 3a m 1a m a m 3a m 2 a m 3a m 2a m a m 3a m 2a m 1 A m A m 1 A m 2 A m 3 a m 3a m 2a m 1a m

❹ PFC algorithm Analysis of search space P. 19/32 Search space of a formal context with 4 attributes a m am 1 a m 2 a m 3 a m 1a m a m 2a m 1 a m 3a m a m 3a m 2 a m 2a m a m 3a m 1 a m 2a m 1a m a m 3a m 2a m 1 a m 3a m 1a m a m 3a m 2a m Search space decomposition a m 3a m 2a m 1a m a m a m 1 a m 1a m a m 2 a m 2a m 1 a m 3a m 1 a m 2a m a m 3a m a m 2a m 1a m a m 3 a m 3a m 1a m a m 3a m 2 a m 3a m 2a m a m 3a m 2a m 1 A m A m 1 A m 2 A m 3 a m 3a m 2a m 1a m

❹ PFC algorithm Analysis of search space P. 19/32 Search space of a formal context with 4 attributes a m am 1 a m 2 a m 3 a m 1a m a m 2a m 1 a m 3a m a m 3a m 2 a m 2a m a m 3a m 1 a m 2a m 1a m a m 3a m 2a m 1 a m 3a m 1a m a m 3a m 2a m Search space decomposition a m 3a m 2a m 1a m a m a m 1 a m 1a m a m 2 a m 2a m 1 a m 3a m 1 a m 2a m a m 3a m a m 2a m 1a m a m 3 a m 3a m 1a m a m 3a m 2 a m 3a m 2a m a m 3a m 2a m 1 A m A m 1 A m 2 A m 3 a m 3a m 2a m 1a m

❹ PFC algorithm Analysis of search space P. 20/32 Definition Given an attribute a i A of the context (O, A, R), a set E, a i E. We define a i E = {{a i } X for all X E}. A k = {a k } {{a k } X i } X i A j = a k {a k+1, a k+2,, a m } k + 1 j m Problems: how to balance the size of partitions whether each partition contains closed itemsets

❹ PFC algorithm A strategy of decomposition P. 21/32 Partitions (for the preceding example: formal context D) partition 1 A 8 A 7 A 6 A 5 partition 2 A 4 A 3 partition 3 A 2 partition 4 A 1 Results: not good! Nb c (partition 1 )=0 Nb c (partition 2 )=0 Nb c (partition 3 )=0 Nb c (partition 1 )=17 A solution: order the attributes Nb c (partition 1 )=6 Nb c (partition 2 )=6 Nb c (partition 3 )=4 Nb c (partition 1 )=1

❹ PFC algorithm Ordered data context P. 22/32 Definition A formal context is called ordered data context if we order the items of formal context by number of objects of each item from the smallest to the biggest one, and the items with the same objects are merged as one item. Example a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8 a 9 1 2 3 4 5 6 7 8 = a 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 1 2 3 4 5 6 7 8 Formal context = Ordered data context

❹ PFC algorithm Folding search sub-space P. 23/32 Definition An item a i of a formal context (O, A, R), all subsets of {a i, a i+1,..., a m 1, a m } that include a i, form a search sub-space (for closed itemset) that is called folding search sub-space (F3S) of a i, denoted F3S i. Remark The search space of closed itemsets: F3S m F3S m 1 F3S m 2 F3S m 3 F3S i F3S 1

❹ PFC algorithm An example P. 24/32 step 1: (O, A, R) = (O, A, R), Ordered data context: a 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 step 2: step 3: DP = 0.5 = p 1 = 8, p 2 = 4, p 3 = 2, p 4 = 1 = p 1 a 1, p 2 a 7, p 3 a 8, p 4 a 5 1 2 3 4 5 6 7 8 a }{{} 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 p4=1 }{{} p3=2 }{{} p2=4 } {{ } p1=8 partition 1 =[a 1, a 7 ): a 1 a 4 Partitions: choose: a 4 a 3 a 2 a 1 remain: a 5 a 8 a 6 a 7 choose: a 6 a 7 remain: a 5 a 8 partition 2 =[a 7, a 8 ): a 7 a 6 choose: a 8 remain: a 5 partition 3 =[a 8, a 5 ): a 8 choose: a 5 remain: partition 4 =[a 5 ): a 5 Search the closed itemsets in each partition separately: [a 1, a 7 ): a 1, a 2 a 1, a 3 a 1, a 3 a 2 a 1, a 4 a 1, a 4 a 3 a 1 [a 7, a 8 ): a 7 a 1, a 7 a 2 a 1, a 6 a 4 a 1, a 6 a 4 a 2 a 1, a 6 a 4 a 3 a 1, a 6 a 4 a 3 a 2 a 1 [a 8, a 5 ): a 8 a 7 a 1, a 8 a 7 a 2 a 1, a 8 a 7 a 3 a 1, a 8 a 7 a 3 a 2 a 1 [a 5 ): a 5 a 4 a 3 a 1

❹ PFC algorithm An example P. 24/32 step 1: (O, A, R) = (O, A, R), Ordered data context: a 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 step 2: step 3: DP = 0.5 = p 1 = 8, p 2 = 4, p 3 = 2, p 4 = 1 = p 1 a 1, p 2 a 7, p 3 a 8, p 4 a 5 1 2 3 4 5 6 7 8 a }{{} 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 p4=1 }{{} p3=2 }{{} p2=4 } {{ } p1=8 partition 1 =[a 1, a 7 ): a 1 a 4 Partitions: choose: a 4 a 3 a 2 a 1 remain: a 5 a 8 a 6 a 7 choose: a 6 a 7 remain: a 5 a 8 partition 2 =[a 7, a 8 ): a 7 a 6 choose: a 8 remain: a 5 partition 3 =[a 8, a 5 ): a 8 choose: a 5 remain: partition 4 =[a 5 ): a 5 Search the closed itemsets in each partition separately: [a 1, a 7 ): a 1, a 2 a 1, a 3 a 1, a 3 a 2 a 1, a 4 a 1, a 4 a 3 a 1 [a 7, a 8 ): a 7 a 1, a 7 a 2 a 1, a 6 a 4 a 1, a 6 a 4 a 2 a 1, a 6 a 4 a 3 a 1, a 6 a 4 a 3 a 2 a 1 [a 8, a 5 ): a 8 a 7 a 1, a 8 a 7 a 2 a 1, a 8 a 7 a 3 a 1, a 8 a 7 a 3 a 2 a 1 [a 5 ): a 5 a 4 a 3 a 1

❹ PFC algorithm An example P. 24/32 step 1: (O, A, R) = (O, A, R), Ordered data context: a 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 step 2: step 3: DP = 0.5 = p 1 = 8, p 2 = 4, p 3 = 2, p 4 = 1 = p 1 a 1, p 2 a 7, p 3 a 8, p 4 a 5 1 2 3 4 5 6 7 8 a }{{} 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 p4=1 }{{} p3=2 }{{} p2=4 } {{ } p1=8 partition 1 =[a 1, a 7 ): a 1 a 4 Partitions: choose: a 4 a 3 a 2 a 1 remain: a 5 a 8 a 6 a 7 choose: a 6 a 7 remain: a 5 a 8 partition 2 =[a 7, a 8 ): a 7 a 6 choose: a 8 remain: a 5 partition 3 =[a 8, a 5 ): a 8 choose: a 5 remain: partition 4 =[a 5 ): a 5 Search the closed itemsets in each partition separately: [a 1, a 7 ): a 1, a 2 a 1, a 3 a 1, a 3 a 2 a 1, a 4 a 1, a 4 a 3 a 1 [a 7, a 8 ): a 7 a 1, a 7 a 2 a 1, a 6 a 4 a 1, a 6 a 4 a 2 a 1, a 6 a 4 a 3 a 1, a 6 a 4 a 3 a 2 a 1 [a 8, a 5 ): a 8 a 7 a 1, a 8 a 7 a 2 a 1, a 8 a 7 a 3 a 1, a 8 a 7 a 3 a 2 a 1 [a 5 ): a 5 a 4 a 3 a 1

❹ PFC algorithm An example P. 24/32 step 1: (O, A, R) = (O, A, R), Ordered data context: a 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 step 2: step 3: DP = 0.5 = p 1 = 8, p 2 = 4, p 3 = 2, p 4 = 1 = p 1 a 1, p 2 a 7, p 3 a 8, p 4 a 5 1 2 3 4 5 6 7 8 a }{{} 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 p4=1 }{{} p3=2 }{{} p2=4 } {{ } p1=8 partition 1 =[a 1, a 7 ): a 1 a 4 Partitions: choose: a 4 a 3 a 2 a 1 remain: a 5 a 8 a 6 a 7 choose: a 6 a 7 remain: a 5 a 8 partition 2 =[a 7, a 8 ): a 7 a 6 choose: a 8 remain: a 5 partition 3 =[a 8, a 5 ): a 8 choose: a 5 remain: partition 4 =[a 5 ): a 5 Search the closed itemsets in each partition separately: [a 1, a 7 ): a 1, a 2 a 1, a 3 a 1, a 3 a 2 a 1, a 4 a 1, a 4 a 3 a 1 [a 7, a 8 ): a 7 a 1, a 7 a 2 a 1, a 6 a 4 a 1, a 6 a 4 a 2 a 1, a 6 a 4 a 3 a 1, a 6 a 4 a 3 a 2 a 1 [a 8, a 5 ): a 8 a 7 a 1, a 8 a 7 a 2 a 1, a 8 a 7 a 3 a 1, a 8 a 7 a 3 a 2 a 1 [a 5 ): a 5 a 4 a 3 a 1

❹ PFC algorithm An example P. 24/32 step 1: (O, A, R) = (O, A, R), Ordered data context: a 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 step 2: step 3: DP = 0.5 = p 1 = 8, p 2 = 4, p 3 = 2, p 4 = 1 = p 1 a 1, p 2 a 7, p 3 a 8, p 4 a 5 1 2 3 4 5 6 7 8 a }{{} 5 a 8 a 6 a 7 a 4 a 3 a 2 a 1 p4=1 }{{} p3=2 }{{} p2=4 } {{ } p1=8 partition 1 =[a 1, a 7 ): a 1 a 4 Partitions: choose: a 4 a 3 a 2 a 1 remain: a 5 a 8 a 6 a 7 choose: a 6 a 7 remain: a 5 a 8 partition 2 =[a 7, a 8 ): a 7 a 6 choose: a 8 remain: a 5 partition 3 =[a 8, a 5 ): a 8 choose: a 5 remain: partition 4 =[a 5 ): a 5 Search the closed itemsets in each partition separately: [a 1, a 7 ): a 1, a 2 a 1, a 3 a 1, a 3 a 2 a 1, a 4 a 1, a 4 a 3 a 1 [a 7, a 8 ): a 7 a 1, a 7 a 2 a 1, a 6 a 4 a 1, a 6 a 4 a 2 a 1, a 6 a 4 a 3 a 1, a 6 a 4 a 3 a 2 a 1 [a 8, a 5 ): a 8 a 7 a 1, a 8 a 7 a 2 a 1, a 8 a 7 a 3 a 1, a 8 a 7 a 3 a 2 a 1 [a 5 ): a 5 a 4 a 3 a 1

❹ PFC algorithm The principle of PFC P. 25/32 PFC algorithm has two steps: 1) Determining the partitions 2) Generating independently all frequent closed itemsets of each partition Failure divide-and-conquer Success

❹ PFC algorithm Determining the partitions P. 26/32 (O, A, R) (O, A, R) where A = {a 1, a 2..., a i,..., a m } Some items of A are chosen to form an order set P P = {a P T, a P T 1..., a P k,..., a P 1 }, P = T, a P k A a P T < a P T 1 <... < a P k <... < a P 2 < a P 1 = a m DP is used to choose a P k (0 < DP < 1) DP= {a 1,,a P k } {a 1,,a P k 1 } The partitions: [a P k, a P k+1 ) and [a P T ) Interval [a Pk, a Pk+1 ) is the search space from item a Pk ) [a Pk, a Pk+1 = (F 3S i ) P k i<p k+1 P T ) [a PT = F3S PT to a Pk+1

❹ PFC algorithm Generating FCIs in each partition P. 27/32 Definition Given an itemset A 1 A, A 1 = {b 1, b 2,..., b i,..., b k }, b i A. A 1 is an infrequent itemset. The candidate of next closed itemset of A 1, noted CA 1, is A 1 a i = (A 1 (a 1, a 2,..., a i 1 ) {a i }), where a i < b k and a i / A 1, a i is the biggest one of A with A 1 < A 1 a i following the order: a 1 <... < a i <... < a m. Partitions are independent and non-overlapped For each partition, from an itemset A 1 If A 1 minsupport, we search the next closure of A 1 If A 1 < minsupport, we search C A 1. The closed itemsets between A 1 and CA 1 are ignored

❹ PFC algorithm An example P. 28/32 Search FCIs with PFC a m 3 a m 4 a m 3a m 1 a m 3 a m 2 a m 3 a m a m 3 a m 2 a m a m 3 a m 1 a m a m 3 a m 2 a m 1 a m 3 a m 2 a m 1 a m

❹ PFC algorithm Theoretical comparison P. 29/32 Extended comparison of [BenYahia and Mephu, 04] Comparison Algorithm Format Technique Architecture Parallelism A-CLOSE horizontal Generate-and-test Sequential no CLOSET+ horizontal Hybrid, FP-tree Sequential no CHARM vertical Hybrid, IT-tree, diffset Sequential no PFC relational Hybrid, partition Distributed yes

❹ PFC algorithm Experimentation P. 30/32 Many FCI algorithms were compared in [FIMI03] In order to get a reference of the performance of PFC C++ for CLOSET+ vs Java for PFC Sequential architecture Data Objects Items Minsupport FCI PFC (msec) CLOSET+ Ref. audiology 26 110 1 30401 1302 8563 + soybean-small 47 79 1 3541 516 431 - lung-cancer 32 228 1 186092 21381 279689 + promoters 106 228 3 298772 120426 111421 - soybean-large 307 133 1 806030 357408 364524 + dermatology 366 130 50 192203 20204 18387 - breast-cancer-wis 699 110 1 9860 3529 1131 - kr-vs-kp 3196 75 1100 2770846 1823092 483896 - agaricus-lepiota 8124 124 100 38347 34815 1462 - connect-4 67557 126 1000 2447136 1165806 65084 - Discussion 3 contexts with good results Density or A >> O

❺ Conclusion 1 Introduction 2 Concept lattice 3 Mining frequent closed itemsets 4 PFC algorithm 5 Conclusion

❺ Conclusion Conclusion P. 32/32 PFC: a scalable algorithm PFC algorithm can be used to generate FCIs Preliminary performance results show that PFC is suitable for large data One limitation of PFC: formal context can be loaded in memory Perspectives Analyze context characteristic Optimize the parameter of algorithms Parallel and distributed environment Application