Formal Concept Analysis

Similar documents
Conceptual Knowledge Discovery. Data Mining with Formal Concept Analysis. Gerd Stumme. Tutorial at ECML/PKDD Tutorial Formal Concept Analysis

Formal Concept Analysis

Formal Concept Analysis

Formal Concept Analysis

Un nouvel algorithme de génération des itemsets fermés fréquents

Assignment 7 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

CS 484 Data Mining. Association Rule Mining 2

D B M G Data Base and Data Mining Group of Politecnico di Torino

Association Rules. Fundamentals

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Error-Tolerant Direct Bases

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Attribute exploration with fuzzy attributes and background knowledge

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Towards a Navigation Paradigm for Triadic Concepts

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Data Mining Concepts & Techniques

On the Intractability of Computing the Duquenne-Guigues Base

Sequential Pattern Mining

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

COMP 5331: Knowledge Discovery and Data Mining

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion

Data Mining Concepts & Techniques

Data Mining and Analysis: Fundamental Concepts and Algorithms

Lecture Notes for Chapter 6. Introduction to Data Mining

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University

Association Analysis. Part 1

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

CS 584 Data Mining. Association Rule Mining 2

Mining Approximative Descriptions of Sets Using Rough Sets

DATA MINING - 1DL360

TRIAS An Algorithm for Mining Iceberg Tri-Lattices

Frequent Itemset Mining

Frequent Pattern Mining. Toon Calders University of Antwerp

Foundations of Knowledge Management Categorization & Formal Concept Analysis

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

TRIAS An Algorithm for Mining Iceberg Tri-Lattices

DATA MINING LECTURE 4. Frequent Itemsets, Association Rules Evaluation Alternative Algorithms

Foundations of Knowledge Management Categorization & Formal Concept Analysis

BASIC MATHEMATICAL TECHNIQUES

DATA MINING - 1DL360

Mining Strong Positive and Negative Sequential Patterns

Machine Learning: Pattern Mining

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise

DATA MINING - 1DL105, 1DL111

732A61/TDDD41 Data Mining - Clustering and Association Analysis

Handling a Concept Hierarchy

Effective Elimination of Redundant Association Rules

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

Discovering Non-Redundant Association Rules using MinMax Approximation Rules

Frequent Itemset Mining

On the Complexity of Enumerating Pseudo-intents

Knowledge Discovery and Data Mining I

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Data Analytics Beyond OLAP. Prof. Yanlei Diao

Pushing Tougher Constraints in Frequent Pattern Mining

Pattern Space Maintenance for Data Updates. and Interactive Mining

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Frequent Itemset Mining

Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data

Section Summary. Relations and Functions Properties of Relations. Combining Relations

Frequent Itemset Mining

Σοφια: how to make FCA polynomial?

Chapter 4: Frequent Itemsets and Association Rules

Introduction to Kleene Algebras

Introduction to Data Mining

CSE 5243 INTRO. TO DATA MINING

CHAPTER 1 SETS AND EVENTS

Computing minimal generators from implications: a logic-guided approach

Completing Description Logic Knowledge Bases using Formal Concept Analysis

Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

On Differentially Private Frequent Itemsets Mining

Implications from data with fuzzy attributes

Associa'on Rule Mining

Data mining, 4 cu Lecture 7:

COMP 5331: Knowledge Discovery and Data Mining

Basic Data Structures and Algorithms for Data Profiling Felix Naumann

A New Concise and Lossless Representation of Frequent Itemsets Using Generators and A Positive Border

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG

A generalized framework to consider positive and negative attributes in formal concept analysis.

MODEL ANSWERS TO HWK #7. 1. Suppose that F is a field and that a and b are in F. Suppose that. Thus a = 0. It follows that F is an integral domain.

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05

An Extended Branch and Bound Search Algorithm for Finding Top-N Formal Concepts of Documents

On Minimal Infrequent Itemset Mining

Association Analysis. Part 2

A Study on Monotone Self-Dual Boolean Functions

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Analysing Online Social Network Data with Biclustering and Triclustering

Unsupervised Learning. k-means Algorithm

Mining Free Itemsets under Constraints

EQUIVALENCE RELATIONS (NOTES FOR STUDENTS) 1. RELATIONS

Transcription:

Formal Concept Analysis 2 Closure Systems and Implications 4 Closure Systems

Concept intents as closed sets c e b 2 a 1 3 1 2 3 a b c e 20.06.2005 2

Next-Closure was developed by B. Ganter (1984). Itcanbeused to determine the concept lattice or to determine the concept lattice together with the stem basis or for interactive knowledge acquisition. It determines the concept intents in lectical order. 20.06.2005 3

Let M = {1,..., n}. A M is lectically smaller than B M, if B A if the smallest element where A and B differ belongs to B : A < B : i B\A: A {1, 2,..., i-1} = B {1, 2,..., i-1} 12345 2345 1345 1245 1235 1234 45 5 4 345 35 34 124 25 3 24 235 2 234 23 145 15 14 135 145 1 125 13 124 12 123 20.06.2005 4

We need the following: A < i B : i B \A A {1, 2,..., i-1} = B {1, 2,..., i-1} A i := ( A {1, 2,..., i-1} ) {i} Theorem: The smallest concept intent, which according to the lectical order is larger as a given set A M, is A i := (A i), where i is the largest element of M with A < i A i. 20.06.2005 5

Algorithm Next-Closure for determining all concept intents: 1) The lectically smallest concept intent is. 2) Is A a concept intent, then we find the lectically next intent, by checking all attributes i M \ A, starting with the largest, und then in decreasing order, until A < i (A i ) holds. Then A i is the lectically next concept intent. 3) If A i = M, then stop, else A A i and goto 2). 20.06.2005 6

Example: on blackboard Sinus 44 Nokia 6110 T-Fax 301 T-Fax 360 PC Handy (1) X Telefon (2) X X Fax (3) X X Fax w. n. paper (4) X A i A i A i :=(A i ) A < i A i? new concept intent 20.06.2005 7

Iceberg Concept Lattices minsupp = 85% For minsupp = 85% the seven most general of the 32.086 concepts of the Mushrooms database http:\\kdd.ics.uci.edu are shown. 20.06.2005 8

Iceberg Concept Lattices minsupp = 85% minsupp = 70% 20.06.2005 9

With decreasing minimum support the information gets richer. minsupp = 55% 20.06.2005 10

The visualization as a nested line diagram indicates implications. 20.06.2005 11

The support of a set X M of attributes is given by supp (X) = X G Def.: The iceberg concept lattice of a formal context (G,M,I) for a given minimal support minsupp is the set { (A,B) B(G,M,I) supp(b) minsupp } It can be computed with TITANIC. [Stumme et al 2001] 20.06.2005 12

TITANIC computes the closure system of all (frequent) concept intents using the support function: Def.: The support of an attribute set (itemset) X M is given by supp (X) = X G Only concepts with a support above a threshold minsupp [0,1]. 20.06.2005 13

TITANIC makes use of some simple facts about the support function: 20.06.2005 14

c e b 2 a 1 3 1 2 3 a b c e 20.06.2005 15

TITANIC tries to optimize the following three questions: 1. How can the closure of an itemset be determined based on supports only? 2. How can the closure system be computed with determining as few closures as possible? 3. How can as many supports as possible be derived from already known supports? 20.06.2005 16

TITANIC 1. How can the closure of an itemset be determined based on supports only? X = X { x M \ X supp(x) = supp(x { x }) } Example: { b,c } = { b, c, e }, since supp( { b, c } ) = 1/3 and supp( { a, b, c } ) = 0/3 supp( { b, c, e } ) = 1/3, 1 2 3 a b c e 20.06.2005 17

TITANIC 2. How can the closure system be computed with determining as few closures as possible? We determine only the closures of the minimal generators. c e b 2 a 1 3 1 2 3 a b c e 20.06.2005 18

TITANIC 2. How can the closure system be computed with determining as few closures as possible? We determine only the closures of the minimal generators. A set is minimal generator iff its support is different of the supports of all its lower covers. The minimal generators are an order ideal (i.e., if a set is not minimal generator, then none of its supersets is either.) Apriori like approach In the example, TITANIC needs two runs (and Apriori four). 20.06.2005 19

TITANIC 1. How can the closure of an itemset be determined based on supports only? X = X x M \ X supp(x) = supp(x x ) 2. How can the closure system be computed with determining as few closures as possible? Approach à la Apriori 3. How can as many supports as possible be derived from already known supports? 20.06.2005 20

3. How can as many supports as possible be derived from already known supports? Theorem: If X is no minimal generator, then 1 2 3 a b c e supp(x) = min { supp(k) K is minimal generator, K X }. Example: supp( { a, b, c } ) = min { 0/3, 1/3, 1/3, 2/3, 2/3 } = 0, since the set is no minimal generator, and since supp( { a, b } ) = 0/3, supp( { b, c } ) = 1/3 supp( { a } ) = 1/3, supp( { b } ) = 2/3 supp( { c } ) = 2/3 Remark: It is sufficient to check the largest generators K with K X, i.e. here { a, b } and { b, c}. 20.06.2005 21

TITANIC 1. How can the closure of an itemset be determined based on supports only? X = X { x M \ X supp(x) = supp(x x ) } 2. How can the closure system be computed with determining as few closures as possible? Approach à la Apriori 3. How can as many supports as possible be derived from already known supports? If X is no minimal generator, then supp(x) = min { supp(k) K is minimal generator, K X }. 20.06.2005 22

i 1 i singletons A la Apriori Determine support for all C i For pot. min. generators: count in database. Else supp(x) = min { supp(k) K X, K m.g.}. Determine closures for all C i - 1 X = X { x M \ X supp(x) = supp(x {x}) } Prune non-minimal generators from i iff supp(x) supp(x \ {x}) f.a. x X i i + 1 i Generate_Candidates( i - 1 ) A la Apriori no i empty? End yes TITANIC Apriori like approach 20.06.2005 23

TITANIC compared with Apriori i 1 i singletons Determine support for all C i - 1 Determine closures for all C i - 1 Prune non-minimal generators from i If the support is too low or equal to the support of a lower cover, the candidate is pruned. i i + 1 i Generate_Candidates( i - 1 ) no i empty? End yes We only generate candidates for minimal generators. 20.06.2005 24

TITANIC 20.06.2005 25

TITANIC 20.06.2005 26

TITANIC 20.06.2005 27

Example of TITANIC 20.06.2005 28

20.06.2005 29

20.06.2005 30

20.06.2005 31

20.06.2005 32

TITANIC vs. Next-Closure Next-Closure needs almost no memory. Next-Closure can exploit known symmetries between attributes. Next-Closure can be used for knowledge acquisition. TITANIC has far better performance, especially on large data sets. 20.06.2005 33