Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion

Similar documents
Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

CS 484 Data Mining. Association Rule Mining 2

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

732A61/TDDD41 Data Mining - Clustering and Association Analysis

Data Analytics Beyond OLAP. Prof. Yanlei Diao

The Market-Basket Model. Association Rules. Example. Support. Applications --- (1) Applications --- (2)

CSE 5243 INTRO. TO DATA MINING

Association Rules. Fundamentals

Association Rule Mining on Web

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Associa'on Rule Mining

CS5112: Algorithms and Data Structures for Applications

D B M G Data Base and Data Mining Group of Politecnico di Torino

Data Mining Concepts & Techniques

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

Introduction to Data Mining

CSE 5243 INTRO. TO DATA MINING

Machine Learning: Pattern Mining

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci

Chapters 6 & 7, Frequent Pattern Mining

Lecture Notes for Chapter 6. Introduction to Data Mining

COMP 5331: Knowledge Discovery and Data Mining

Frequent Itemset Mining

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG

Association Rules. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION

DATA MINING LECTURE 4. Frequent Itemsets and Association Rules

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

10/19/2017 MIST.6060 Business Intelligence and Data Mining 1. Association Rules

EFFICIENT MINING OF WEIGHTED QUANTITATIVE ASSOCIATION RULES AND CHARACTERIZATION OF FREQUENT ITEMSETS

Basic Data Structures and Algorithms for Data Profiling Felix Naumann

1 Frequent Pattern Mining

DATA MINING LECTURE 4. Frequent Itemsets, Association Rules Evaluation Alternative Algorithms

Un nouvel algorithme de génération des itemsets fermés fréquents

DATA MINING - 1DL360

Assignment 7 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

DATA MINING - 1DL360

12 Count-Min Sketch and Apriori Algorithm (and Bloom Filters)

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05

Handling a Concept Hierarchy

Density-Based Clustering

Association Analysis. Part 1

CS 584 Data Mining. Association Rule Mining 2

Unit II Association Rules

Frequent Itemset Mining

Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Data Warehousing & Data Mining

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Data Mining and Analysis: Fundamental Concepts and Algorithms

COMP 5331: Knowledge Discovery and Data Mining

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #11: Frequent Itemsets

Data Warehousing & Data Mining

Data preprocessing. DataBase and Data Mining Group 1. Data set types. Tabular Data. Document Data. Transaction Data. Ordered Data

Sequential Pattern Mining

Data mining, 4 cu Lecture 5:

Association Analysis: Basic Concepts. and Algorithms. Lecture Notes for Chapter 6. Introduction to Data Mining

Lecture Notes for Chapter 6. Introduction to Data Mining. (modified by Predrag Radivojac, 2017)

Mining Infrequent Patter ns

CS 412 Intro. to Data Mining

sor exam What Is Association Rule Mining?

Mining Positive and Negative Fuzzy Association Rules

Course Content. Association Rules Outline. Chapter 6 Objectives. Chapter 6: Mining Association Rules. Dr. Osmar R. Zaïane. University of Alberta 4

Frequent Itemset Mining

Statistical Privacy For Privacy Preserving Information Sharing

CPDA Based Fuzzy Association Rules for Learning Achievement Mining

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

OPPA European Social Fund Prague & EU: We invest in your future.

Knowledge Discovery and Data Mining I

1. Data summary and visualization

Association Analysis Part 2. FP Growth (Pei et al 2000)

DATA MINING - 1DL105, 1DL111

Data Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Descrip9ve data analysis. Example. Example. Example. Example. Data Mining MTAT (6EAP)

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany

NetBox: A Probabilistic Method for Analyzing Market Basket Data

Summary. 8.1 BI Overview. 8. Business Intelligence. 8.1 BI Overview. 8.1 BI Overview 12/17/ Business Intelligence

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Inter-transactional association rules for multi-dimensional contexts for prediction and their application to studying meteorological data

FP-growth and PrefixSpan

Descrip<ve data analysis. Example. E.g. set of items. Example. Example. Data Mining MTAT (6EAP)

Privacy-preserving Data Mining

Descriptive data analysis. E.g. set of items. Example: Items in baskets. Sort or renumber in any way. Example

Temporal Data Mining

Quantitative Association Rule Mining on Weighted Transactional Data

Approximate counting: count-min data structure. Problem definition

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

STUDY ON SCALE TRANSFORMATION IN SPATIAL DATA MINING

BAYESIAN MIXTURE MODELS FOR FREQUENT ITEMSET MINING

Chapter 4: Frequent Itemsets and Association Rules

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Algorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview

Transcription:

Outline Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Introduction Algorithm Apriori Algorithm AprioriTid Comparison of Algorithms Conclusion Presenter: Dan Li Discussion: Joel Lanir 1 2 Data Mining What is data mining? Tell me what I want to know. Example: The legend of diaper and beer Applications of Data Mining Market basket analysis Market segmentation Customer churn Fraud detection Direct marketing Interactive marketing Trend analysis 3 4 Association Rule Example 90% of transactions that purchase diapers also purchase beer. 10% of all transactions purchase diaper and/or beer. Formal Definition X Y, where X I, Y I and X Y = Confidence c, c% of transactions that contain X also contain Y Support s, s% of all transactions contain X Y Discussion When looking for association rules, when would you consider support and when would you consider confidence? What considerations should be made when setting the value of both? For what applications would you set a high confidence value? A high support value? 5 6 1

Association Rule Mining The problem of finding association rules falls within the purview of database mining. Eg1: Find all rules that have Coke as consequent to boost the sale of Coke. Eg2: Find all rules relating items located on shelves A and B in the store. Finding association rules is valuable for Cross-marketing Catalog design Add-on sales Store layout and so on Problem Decomposition Mining association rules can be decomposed into two sub-problems: Discover large itemsets. Use the large itemsets to generate the desired rules. 7 8 Discovering Large Itemsets Algorithm Apriori Intuition: any subset of a large itemset must be large. Algorithms for discovering large itemsets make multiple passes over the data. In the first pass, determine which individual item is large. In each sequent pass, Previous large itemsets are used to generate candidate itemsets. Count actual support for the candidate itemsets. Determine which are the real large itemsets. This process continues until no new large itemsets are found. 9 10 Algorithm Apriori The apriori-gen function has two steps Join Prune Efficiently Implementing subset functions The hash tree Example for Apriori Database L1 C2 from apriori-gen TID Items Itemset Support Itemset Support 100 1 3 4 {1} 2 {1 2} 0 200 2 3 5 {2} 3 {1 3} 0 300 1 2 3 5 {3} 3 {1 5} 0 400 2 5 {5} 3 {2 3} 0 {2 5} 0 minsup = 2 {3 5} 0 11 12 2

Example for Apriori Example for Apriori t Ct C2 L2 {1 3 4} {1 3} Itemset Support Itemset Support {2 3 5} {2 3} {2 5} {3 5} {1 2} 1 {1 3} 2 {1 2 3 5} {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} {1 3} 2 {2 3} 2 {2 5} {2 5} {1 5} 1 {2 5} 3 {2 3} 2 {3 5} 2 {2 5} 3 {3 5} 2 C3 L3 C4 L4 Itemset Support Itemset Support Itemset Support Itemset Support {3 5} 2 {1 3} 2 Answer = k L k 13 14 Algorithm AprioriTid ( {1 ( 3} {1 3} {3} {3} t, {1 t, 3} {1 3} {1} {1} t ) t ) 15 16 17 18 3

Answer = k L k 19 20 Comparison of Different Algorithms Comparison of Different Algorithms In the later passes, the size of C k-1 is smaller than the size of database. AprioriTid becomes more efficient than Apriori. Example: L 3 = {{1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2 3 4}} t = {1 2 3 4 5} D Apriori/AprioriTid: After join: C 4 = {{1 2 3 4}, {1 3 4 5}} After prune C 4 = {{1 2 3 4}} AIS/SETM C 4 = {{1 2 3 4}, {1 2 3 5}, {1 2 4 5}, {1 3 4 5}, {2 3 4 5}} AprioriHybrid: Run Apriori for a while, then switch to AprioriTid when the generated DB would fit in memory. 21 22 Performance Apriori/AprioriTid outperforms AIS/SETM AprioriHybrid matches Apriori and AprioriTid when either one wins. Later Work Parallel versions Quantitative association rules E.g., "10% of married people between age 50 and 60 have at least 2 cars." Online association rules 23 24 4

Conclusion Two new algorithms, Apriori and AprioriTid are discussed. These algorithms outperform AIS and SETM. Apriori and AprioriTid can be combined into AprioriHybird. AprioriHybrid matches Apriori and AprioriTid when either one wins. Discussion This paper has spun off more similar algorithms in the database world than any other data mining algorithm. (866 citations in CiteSeer) Why do you think this is the case? Is it the algorithm? The problem? The approach? Something else? 25 26 5