Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Similar documents
1 Frequent Pattern Mining

CS 484 Data Mining. Association Rule Mining 2

Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Assignment 7 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

732A61/TDDD41 Data Mining - Clustering and Association Analysis

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

Association Rule Mining on Web

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Association Rules. Fundamentals

D B M G Data Base and Data Mining Group of Politecnico di Torino

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci

Handling a Concept Hierarchy

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

CS 584 Data Mining. Association Rule Mining 2

EFFICIENT MINING OF WEIGHTED QUANTITATIVE ASSOCIATION RULES AND CHARACTERIZATION OF FREQUENT ITEMSETS

Chapters 6 & 7, Frequent Pattern Mining

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

FP-growth and PrefixSpan

Data Mining Concepts & Techniques

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING

Data Mining and Analysis: Fundamental Concepts and Algorithms

Privacy-preserving Data Mining

Data Analytics Beyond OLAP. Prof. Yanlei Diao

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

COMP 5331: Knowledge Discovery and Data Mining

Lecture Notes for Chapter 6. Introduction to Data Mining

COMP 5331: Knowledge Discovery and Data Mining

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

DATA MINING - 1DL105, 1DL111

Data Warehousing & Data Mining

10/19/2017 MIST.6060 Business Intelligence and Data Mining 1. Association Rules

Data Warehousing & Data Mining

Sequential Pattern Mining

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

CS4445 B10 Homework 4 Part I Solution

CS5112: Algorithms and Data Structures for Applications

Un nouvel algorithme de génération des itemsets fermés fréquents

Density-Based Clustering

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG

DATA MINING - 1DL360

Unit II Association Rules

Machine Learning: Pattern Mining

DATA MINING - 1DL360

Discovering Threshold-based Frequent Closed Itemsets over Probabilistic Data

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Basic Data Structures and Algorithms for Data Profiling Felix Naumann

Data Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Association Analysis. Part 1

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

Frequent Itemset Mining

Summary. 8.1 BI Overview. 8. Business Intelligence. 8.1 BI Overview. 8.1 BI Overview 12/17/ Business Intelligence

Mining Infrequent Patter ns

DATA MINING LECTURE 4. Frequent Itemsets, Association Rules Evaluation Alternative Algorithms

CS 412 Intro. to Data Mining

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05

Quantitative Association Rule Mining on Weighted Transactional Data

Exercise 1. min_sup = 0.3. Items support Type a 0.5 C b 0.7 C c 0.5 C d 0.9 C e 0.6 F

Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Zijian Zheng, Ron Kohavi, and Llew Mason. Blue Martini Software San Mateo, California

Association Analysis Part 2. FP Growth (Pei et al 2000)

Data mining, 4 cu Lecture 5:

DATA MINING LECTURE 4. Frequent Itemsets and Association Rules

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

Associa'on Rule Mining

An Improved Apriori Algorithm for Association Rules

Association Analysis: Basic Concepts. and Algorithms. Lecture Notes for Chapter 6. Introduction to Data Mining

Lecture Notes for Chapter 6. Introduction to Data Mining. (modified by Predrag Radivojac, 2017)

Association Rules. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION

Introduction to Data Mining

1. Data summary and visualization

Model-based probabilistic frequent itemset mining

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Data Mining Concepts & Techniques

Association Analysis. Part 2

Unsupervised Learning. k-means Algorithm

Approximate counting: count-min data structure. Problem definition

Data mining, 4 cu Lecture 7:

Mining Positive and Negative Fuzzy Association Rules

Mining Frequent Itemsets in Correlated Uncertain Databases

Knowledge Discovery and Data Mining I

15 Introduction to Data Mining

Descrip9ve data analysis. Example. Example. Example. Example. Data Mining MTAT (6EAP)

The Market-Basket Model. Association Rules. Example. Support. Applications --- (1) Applications --- (2)

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise

Formal Concept Analysis

Data Mining and Analysis: Fundamental Concepts and Algorithms

Course Content. Association Rules Outline. Chapter 6 Objectives. Chapter 6: Mining Association Rules. Dr. Osmar R. Zaïane. University of Alberta 4

Università di Pisa A.A Data Mining II June 13th, < {A} {B,F} {E} {A,B} {A,C,D} {F} {B,E} {C,D} > t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7

Application of Apriori Algorithm in Open Experiment

A New Concise and Lossless Representation of Frequent Itemsets Using Generators and A Positive Border

Fast Approximation of Probabilistic Frequent Closed Itemsets

Removing trivial associations in association rule discovery

Frequent Itemset Mining

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Transcription:

Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 2016 2017

Road map The Apriori algorithm Step 1: Mining all frequent itemsets Definition of Apriori Algorithm Definition (contd.) Steps to Perform Apriori Algorithm The Apriori Algorithm Example-1 The Apriori Algorithm Example-2 Step 1: Generating 1-itemset Frequent Pattern Step 2: Generating 2-itemset Frequent Pattern Step 3: Generating 3-itemset Frequent Pattern Step 4: Generating 4-itemset Frequent Pattern Step 5: Generating Association Rules from Frequent Itemsets Department of CS ---- DM ---- UHD 2

The Apriori algorithm Key Concepts : 1. Frequent Itemsets: The sets of item which has minimum support (denoted by Li for ith- Itemset). 2. Apriori Property: Any subset of frequent itemset must be frequent. 3. Join Operation: To find Lk, a set of candidate k- itemsets is generated by joining Lk-1 with itself. Department of CS ---- DM ---- UHD 3

Definition (contd.) Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation, and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. Department of CS ---- DM ---- UHD 4

Steps to Perform Apriori Algorithm Department of CS ---- DM ---- UHD 5

The Apriori Algorithm: Example TID T100 T200 T300 T400 T500 T600 T700 T800 T900 List of Items I1, I2, I5 I2, I4 I2, I3 I1, I2, I4 I1, I3 I2, I3 I1, I3 I1, I2,I3, I5 I1, I2, I3 Consider a database, D, consisting of 9 transactions. Suppose min.support count required is 2 (i.e. min_sup = 2/9 = 22 % ) Let minimum confidence required is 70%. We have to first find out the frequent itemset using Apriori algorithm. Then, Association rules will be generated using min. support & min. confidence. Department of CS ---- DM ---- UHD 6

Step 1: Generating 1-itemset Frequent Pattern Scan D for count of each candidate Itemset Sup.Count {I1} 6 {I2} 7 Compare candidate support count with minimum support count Itemset Sup.Count {I1} 6 {I2} 7 {I3} 6 {I3} 6 {I4} 2 {I4} 2 {I5} 2 {I5} 2 C 1 L 1 In the first iteration of the algorithm, each item is a member of the set of candidate. The set of frequent 1-itemsets, L 1, consists of the candidate 1-itemsets satisfying minimum support. Department of CS ---- DM ---- UHD 7

Step 2: Generating 2-itemset Frequent Pattern Generate C 2 candidates from L 1 Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} Scan D for count of each candidate Itemset Sup. Count {I1, I2} 4 {I1, I3} 4 {I1, I4} 1 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 Compare candidate support count with minimum support count Itemset Sup Count {I1, I2} 4 {I1, I3} 4 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2 {I3, I4} {I3, I5} {I4, I5} {I2, I5} 2 {I3, I4} 0 {I3, I5} 1 L 2 C 2 {I4, I5} 0 C 2 Department of CS ---- DM ---- UHD 8

Step 2: Generating 2-itemset Frequent Pattern [Cont.] To discover the set of frequent 2-itemsets, L 2, the algorithm uses L 1 Join L 1 to generate a candidate set of 2- itemsets, C 2. Next, the transactions in D are scanned and the support count for each candidate itemset in C 2 is accumulated (as shown in the middle table). The set of frequent 2-itemsets, L 2, is then determined, consisting of those candidate 2-itemsets in C 2 having minimum support. Note: We haven t used Apriori Property yet. Department of CS ---- DM ---- UHD 9

Step 3: Generating 3-itemset Frequent Pattern Generate C 3 candidates from L 2 Itemset {I1, I2, I3} {I1, I2, I5} Scan D for count of each candidate Itemset Sup. Count {I1, I2, I3} 2 {I1, I2, I5} 2 Compare candidate support count with min support count Itemset Sup Count {I1, I2, I3} 2 {I1, I2, I5} 2 C 3 C 3 L 3 The generation of the set of candidate 3-itemsets, C 3, involves use of the Apriori Property. In order to find C 3, we compute L 2 Join L 2. C 3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}. Now, Join step is complete and Prune step will be used to reduce the size of C 3. Prune step helps to avoid heavy computation due to large C k. Department of CS ---- DM ---- UHD 10

Step 3: Generating 3-itemset Frequent Pattern [Cont.] Based on the Apriori property that all subsets of a frequent itemset must also be frequent, we can determine that four candidates cannot possibly be frequent. How? For example, lets take {I1, I2, I3}. The 2-item subsets of it are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item subsets of {I1, I2, I3} are members of L 2, We will keep {I1, I2, I3} in C 3. Lets take another example of {I2, I3, I5} which shows how the pruning is performed. The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}. BUT, {I3, I5} is not a member of L 2 and hence it is not frequent violating Apriori Property. Thus We will have to remove {I2, I3, I5} from C 3. Therefore, C 3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of result of Join operation for Pruning. Now, the transactions in D are scanned in order to determine L 3, consisting of those candidates 3-itemsets in C 3 having minimum support. Department of CS ---- DM ---- UHD 11

Step 4: Generating 4-itemset Frequent Pattern The algorithm uses L 3 Join L 3 to generate a candidate set of 4-itemsets, C 4. Although the join results in {{I1, I2, I3, I5}}, this itemset is pruned since its subset {{I2, I3, I5}} is not frequent. Thus, C 4 = φ, and algorithm terminates, having found all of the frequent items. This completes our Apriori Algorithm. What s Next? These frequent itemsets will be used to generate strong association rules ( where strong association rules satisfy both minimum support & minimum confidence). Department of CS ---- DM ---- UHD 12

The Apriori Algorithm Example Database D TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 L 2 itemset sup {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2 C 3 itemset {2 3 5} Min support = 2 C 1 Scan D C 2 itemset sup. {1} 2 {2} 3 {3} 3 {4} 1 {5} 3 itemset sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2 Scan D L 1 C 2 Scan D L 3 itemset sup {2 3 5} 2 Department of CS ---- DM ---- UHD itemset sup. {1} 2 {2} 3 {3} 3 {5} 3 itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} 13

Department of CS ---- DM ---- UHD 14

Apriori algorithm example Department of CS ---- DM ---- UHD 15

Department of CS ---- DM ---- UHD 16