Data Warehousing & Data Mining

Size: px
Start display at page:

Download "Data Warehousing & Data Mining"

Transcription

1 Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig

2 9. Business Intelligence 9. Business Intelligence 9.1 Business Intelligence Overview 9.2 Principles of Data Mining 9.3 Association Rule Mining DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2

3 9.1 BI Overview What is Business Intelligence (BI)? The process, technologies and tools needed to turn data into information, information into knowledge and knowledge into plans that drive profitable business action BI comprises data warehousing, business analytic tools, and content/knowledge management DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3

4 9.1 BI Overview Typical BI applications are Customer segmentation Propensity to buy (customer disposition to buy) Customer profitability Fraud detection Customer attrition (loss of customers) Channel optimization (connecting with the customer) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4

5 9.1 BI Overview Customer segmentation What market segments do my customers fall into, and what are their characteristics? Personalize customer relationships for higher customer satisfaction and retention DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5

6 9.1 BI Overview Propensity to buy Which customers are most likely to respond to my promotion? Target the right customers Increase campaign profitability by focusing on the customers most likely to buy DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6

7 9.1 BI Overview Customer profitability What is the lifetime profitability of my customer? Make individual business interaction decisions based on the overall profitability of customers DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7

8 9.1 BI Overview Fraud detection How can I tell which transactions are likely to be fraudulent? If your wife has just proposed to increase your life insurance policy, you should probably order pizza for a while Quickly determine fraud and take immediate action to minimize damage DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8

9 9.1 BI Overview Customer attrition Which customer is at risk of leaving? Prevent loss of high-value customers and let go of lower-value customers Channel optimization What is the best channel to reach my customer in each segment? Interact with customers based on their preference and your need to manage cost DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9

10 9.1 BI Overview Automated decision tools Rule-based systems that provide a solution usually in one functional area to a specific repetitive management problem in one industry E.g., automated loan approval, intelligent price setting Business performance management (BPM) A framework for defining, implementing and managing an enterprise s business strategy by linking objectives with factual measures - key performance indicators DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10

11 9.1 BI Overview Dashboards Provide a comprehensive visual view of corporate performance measures, trends, and exceptions from multiple business areas Allows executives to see hot spots in seconds and explore the situation DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11

12 9.2 Data Mining What is data mining (knowledge discovery in databases)? Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12

13 9.2 Applications Market analysis Targeted marketing/ Customer profiling Find clusters of model customers who share the same characteristics: interest, income level, spending habits, etc. Determine customer purchasing patterns over time Cross-market analysis Associations/co-relations between product sales Prediction based on the association of information DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13

14 9.2 Applications Corporate analysis and risk management Finance planning and asset evaluation Cash flow analysis and prediction Trend analysis, time series, etc. Resource planning Summarize and compare the resources and spending Competition Monitor competitors and market directions Group customers into classes and a class-based pricing procedure Set pricing strategy in a highly competitive market DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14

15 9.2 Data Mining Architecture of DM systems Graphical user interface Pattern evaluation Data mining engine ETL Database or data warehouse server Filtering Knowledge-base Databases Data Warehouse DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15

16 9.2 Data Mining Techniques Association (correlation and causality) Multi-dimensional vs. single-dimensional association age(x, ), income(x, K ) buys(x, PC ) [support = 2%, confidence = 60%] contains(t, computer ) contains(x, software ) [1%, 75%] Classification and Prediction Finding models (functions) that describe and distinguish classes or concepts for future predictions Presentation: decision-tree, classification rule, neural network Prediction: predict some unknown or missing numerical values DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16

17 9.2 Data Mining Techniques Cluster analysis Class label is unknown: group data to form new classes, e.g., advertising based on client groups Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity Outlier analysis Outlier: a data object that does not comply with the general behavior of the data Can be considered as noise or exception, but is quite useful in fraud detection, rare events analysis DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17

18 9.3 Association Rule Mining Association rule mining has the objective of finding all co-occurrence relationships (called associations), among data items Classical application: market basket data analysis, which aims to discover how items are purchased by customers in a supermarket E.g., Cheese Wine [support = 10%, confidence = 80%] meaning that 10% of the customers buy cheese and wine together, and 80% of customers buying cheese also buy wine DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18

19 9.3 Association Rule Mining Basic concepts of association rules Let I = {i 1, i 2,, i m } be a set of items. Let T = {t 1, t 2,, t n } be a set of transactions where each transaction t i is a set of items such that t i I. An association rule is an implication of the form: X Y, where X I, Y I and X Y = DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19

20 9.3 Association Rule Mining Association rule mining market basket analysis example I set of all items sold in a store E.g., i 1 = Beef, i 2 = Chicken, i 3 = Cheese, T set of transactions The content of a customers basket E.g., t 1 : Beef, Chicken, Milk; t 2 : Beef, Cheese; t 3 : Cheese, Wine; t 4 : An association rule might be Beef, Chicken Milk, where {Beef, Chicken} is X and {Milk} is Y DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20

21 9.3 Association Rule Mining Rules can be weak or strong The strength of a rule is measured by its support and confidence The support of a rule X Y, is the percentage of transactions in T that contains X and Y Can be seen as an estimate of the probability Pr({X,Y} t i ) With n as number of transactions in T the support of the rule X Y is: support = {i {X, Y} t i } / n DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21

22 9.3 Association Rule Mining The confidence of a rule X Y, is the percentage of transactions in T containing X, that contain X Y Can be seen as estimate of the probability Pr(Y t i X t i ) confidence = {i {X, Y} t i } / {j X t j } DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22

23 9.3 Association Rule Mining How do we interpret support and confidence? If support is too low, the rule may just occur due to chance Acting on a rule with low support may not be profitable since it covers too few cases If confidence is too low, we cannot reliably predict Y from X Objective of mining association rules is to discover all associated rules in T that have support and confidence greater than a minimum threshold (minsup, minconf)! DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23

24 9.3 Association Rule Mining Finding rules based on support and confidence thresholds Let minsup = 30% and minconf = 80% Chicken, Clothes Milk is valid, [sup = 3/7 (42.84%), conf = 3/3 (100%)] Clothes Milk, Chicken is also valid, and there are more T1 T2 T3 T4 T5 T6 T7 Transactions Beef, Chicken, Milk Beef, Cheese Cheese, Boots Beef, Chicken, Cheese Beef, Chicken, Clothes, Cheese, Milk Clothes, Chicken, Milk Chicken, Milk, Clothes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24

25 9.3 Association Rule Mining This is rather a simplistic view of shopping baskets Some important information is not considered e.g. the quantity of each item purchased, the price paid, There are a large number of rule mining algorithms They use different strategies and data structures Their resulting sets of rules are all the same DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25

26 9.3 Association Rule Mining Approaches in association rule mining Apriori algorithm Mining with multiple minimum supports Mining class association rules The best known mining algorithm is the Apriori algorithm Step 1: find all frequent itemsets (set of items with support minsup) Step 2: use frequent itemsets to generate rules DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26

27 9.3 Apriori Algorithm: Step 1 Step 1: frequent itemset generation The key is the apriori property (downward closure property): any subset of a frequent itemset is also a frequent itemset E.g., for minsup = 30% Transactions Chicken, Clothes, Milk Chicken, Clothes Chicken, Milk Clothes, Milk Chicken Clothes Milk T1 T2 T3 T4 T5 T6 T7 Beef, Chicken, Milk Beef, Cheese Cheese, Boots Beef, Chicken, Cheese Beef, Chicken, Clothes, Cheese, Milk Clothes, Chicken, Milk Chicken, Milk, Clothes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27

28 9.3 Apriori Algorithm: Step 1 Finding frequent items Find all 1-item frequent itemsets; then all 2-item frequent itemsets, etc. In each iteration k, only consider itemsets that contain a k-1 frequent itemset Optimization: the algorithm assumes that items are sorted in lexicographic order The order is used throughout the algorithm in each itemset {w[1], w[2],, w[k]} represents a k-itemset w consisting of items w[1], w[2],, w[k], where w[1] < w[2] < < w[k] according to the lexicographic order DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28

29 9.3 Finding frequent items Initial step Find frequent itemsets of size 1: F 1 Generalization, k 2 C k = candidates of size k: those itemsets of size k that could be frequent, given F k-1 F k = those itemsets that are actually frequent, F k C k (need to scan the database once) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29

30 9.3 Apriori Algorithm: Step 1 Generalization of candidates uses F k-1 as input and returns a superset (candidates) of the set of all frequent k-itemsets. It has two steps: Join step: generate all possible candidate itemsets C k of length k, e.g., I k = join(a k-1, B k-1 ) A k-1 = {i 1, i 2,, i k-2, i k-1 } and B k-1 = {i 1, i 2,, i k-2, i k-1 } and i k-1 < i k-1 ; Then I k = {i 1, i 2,, i k-2, i k-1, i k-1 } Prune step: remove those candidates in C k that do not respect the downward closure property (include k-1 non-frequent subsets) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30

31 9.3 Apriori Algorithm: Step 1 Generalization e.g., F 3 = {{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}} Try joining each 2 candidates from F 3 {1, 2, 3} {1, 2, 4} {1, 2, 3, 4} {1, 2, 4} {1, 3, 4} {1, 3, 4} {1, 3, 5} {1, 3, 5} {2, 3, 4} {2, 3, 4} {1, 3, 4} {1, 3, 5} {1, 3, 4, 5} {2, 3, 4} {1, 3, 5} {2, 3, 4} DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31

32 9.3 Apriori Algorithm: Step 1 After join C 4 = {{1, 2, 3, 4}, {1, 3, 4, 5}} Pruning: {1, 2, 3} {1, 2, 3, 4} {1, 2, 4} {1, 3, 4} F 3 {1, 2, 3, 4} is a good candidate {1, 3, 4, 5} {2, 3, 4} {1, 3, 4} {1, 3, 5} {1, 4, 5} {3, 4, 5} F 3 = {{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}} {1, 3, 4, 5} F 3 Removed from C 4 After pruning C 4 = {{1, 2, 3, 4}} DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32

33 9.3 Apriori Algorithm: Step 1 Finding frequent items, example, minsup = 0.5 First T scan ({item}:count) C 1 : {1}:2, {2}:3, {3}:3, {4}:1, {5}:3 F 1 : {1}:2, {2}:3, {3}:3, {5}:3; {4} has a support of ¼ < 0.5 so it does not belong to the frequent items C 2 = prune(join(f 1 )) join : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}; prune: C 2 : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5}; (all items belong to F 1 ) TID Items T100 1, 3, 4 T200 2, 3, 5 T300 1, 2, 3, 5 T400 2, 5 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33

34 9.3 Apriori Algorithm: Step 1 Second T scan C 2 : {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2 F 2 : {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2 Join: we could join {1,3} only with {1,4} or {1,5}, but they are not in F 2. The only possible join in F 2 is {2, 3} with {2, 5} resulting in {2, 3, 5}; prune({2, 3, 5}): {2, 3}, {2, 5}, {3, 5} all belong to F 2, hence, C 3 : {2, 3, 5} Third T scan {2, 3, 5}:2, then sup({2, 3, 5}) = 50%, minsup condition is fulfilled. Then F 3 : {2, 3, 5} TID Items T100 1, 3, 4 T200 2, 3, 5 T300 1, 2, 3, 5 T400 2, 5 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34

35 9.3 Apriori Algorithm: Step 2 Step 2: generating rules from frequent itemsets Frequent itemsets are not the same as association rules One more step is needed to generate association rules: for each frequent itemset I, for each proper nonempty subset X of I: Let Y = I \ X; X Y is an association rule if: Confidence(X Y) minconf, Support(X Y) := {i {X, Y} t i } / n = support(i) Confidence(X Y) := {i {X, Y} t i } / {j X t j } = support(i) / support(x) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35

36 9.3 Apriori Algorithm: Step 2 Rule generation example, minconf = 50% Suppose {2, 3, 5} is a frequent itemset, with sup=50%, as calculated in step 1 Proper nonempty subsets: {2, 3}, {2, 5}, {3, 5}, {2}, {3}, {5}, with sup=50%, 75%, 50%, 75%, 75%, 75% respectively These generate the following association rules: 2,3 5, confidence=100%; (sup(i)=50%; sup{2,3}=50%; 50/50= 1) 2,5 3, confidence=67%; (50/75) 3,5 2, confidence=100%; ( ) 2 3,5, confidence=67% 3 2,5, confidence=67% 5 2,3, confidence=67% All rules have support = support(i) = 50% TID Items T100 1, 3, 4 T200 2, 3, 5 T300 1, 2, 3, 5 T400 2, 5 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36

37 9.3 Apriori Algorithm: Step 2 Rule generation, summary In order to obtain X Y, we need to know support(i) and support(x) All the required information for confidence computation has already been recorded in itemset generation No need to read the transactions data any more This step is not as time-consuming as frequent itemsets generation DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37

38 9.3 Apriori Algorithm Apriori Algorithm, summary If k is the size of the largest itemset, then it makes at most k passes over data (in practice, k is bounded e.g., 10) The mining exploits sparseness of data, and high minsup and minconf thresholds High minsup threshold makes it impossible to find rules involving rare items in the data. The solution is a mining with multiple minimum supports approach DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38

39 9.3 Multiple Minimum Supports Mining with multiple minimum supports Single minimum support assumes that all items in the data are of the same nature and/or have similar frequencies, which is incorrect In practice, some items appear very frequently in the data, while others rarely appear E.g., in a supermarket, people buy cooking pans much less frequently than they buy bread and milk DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39

40 9.3 Multiple Minimum Supports Rare item problem: if the frequencies of items vary significantly, we encounter two problems If minsup is set too high, those rules that involve rare items will not be found To find rules that involve both frequent and rare items, minsup has to be set very low. This may cause combinatorial explosion because those frequent items will be associated with one another in all possible ways DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40

41 9.3 Multiple Minimum Supports Multiple Minimum Supports Each item can have a minimum item support Different support requirements for different rules To prevent very frequent items and very rare items from appearing in the same itemset S, we introduce a support difference constraint (φ) max i S {sup(i)} - min i S {sup(i)} φ, where 0 φ 1 is user specified DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41

42 9.3 Multiple Minimum Supports Minsup of a rule Let MIS(i) be the minimum item support (MIS) value of item i. The minsup of a rule R is the lowest MIS value of the items in the rule: Rule R: i 1, i 2,, i k i k+1,, i r satisfies its minimum support if its actual support is min(mis(i 1 ), MIS(i 2 ),, MIS(i r )) E.g., the user-specified MIS values are as follows: MIS(bread) = 2%, MIS(shoes) = 0.1%, MIS(clothes) = 0.2% clothes bread [sup=0.15%,conf =70%] doesn t satisfy its minsup clothes shoes [sup=0.15%,conf =70%] satisfies its minsup DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42

43 9.3 Multiple Minimum Supports Downward closure property is not valid anymore E.g., consider four items 1, 2, 3 and 4 in a database Their minimum item supports are MIS(1) = 10%, MIS(2) = 20%, MIS(3) = 5%, MIS(4) = 6% {1, 2} with a support of 9% is infrequent since min(10%, 20%) > 9%, but {1, 2, 3} could be frequent, if it would have a support of e.g., 7% If applied, downward closure, eliminates {1, 2} so that {1, 2, 3} is never evaluated DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43

44 9.3 Multiple Minimum Supports How do we solve the downward closure property problem? Sort all items in I according to their MIS values (make it a total order) The order is used throughout the algorithm in each itemset Each itemset w is of the following form: {w[1], w[2],, w[k]}, consisting of items, w[1], w[2],, w[k], where MIS(w[1]) MIS(w[2]) MIS(w[k]) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44

45 9.3 Multiple Minimum Supports Multiple minimum supports is an extension of the Apriori algorithm Step 1: frequent itemset generation Initial step Produce the seeds for generating candidate itemsets Candidate generation For k = 2 Generalization For k > 2, pruning step differs from the Apriori algorithm Step 2: rule generation DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45

46 9.3 Multiple Minimum Supports: Step 1 Step 1: frequent itemset generation E.g., I={1, 2, 3, 4}, with given MIS(1)=10%, MIS(2)=20%, MIS(3)=5%, MIS(4)=6%, and consider n=100 transactions: Initial step Sort I according to the MIS value of each item. Let M represent the sorted items Sort I, in M = {3, 4, 1, 2} Scan the data once to record the support count of each item E.g., {3}:6, {4}:3, {1}:9 and {2}:25 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46

47 9.3 Multiple Minimum Supports: Step 1 MIS(1)=10%, MIS(2)=20%, MIS(3)=5%, MIS(4)=6%, n=100 {3}:6, {4}:3, {1}:9 {2}:25 Go through the items in M to find the first item i, that meets MIS(i). Insert it into a list of seeds L For each subsequent item j in M (after i), if sup(j) MIS(i), then insert j in L MIS(3) = 5%; sup ({3}) = 6%; sup(3) > MIS(3), so L={3} Sup({4}) = 3% < MIS(3), so L remains {3} Sup({1}) = 9% > MIS(3), L = {3, 1} Sup({2}) = 25% > MIS(3), L = {3, 1, 2} Calculate F 1 from L based on MIS of each item in L F 1 = {{3}, {2}}, since sup({1}) = 9% < MIS(1) Why not eliminate {1} directly? Why calculate L and not directly F? Downward closure property is not valid from F anymore due to multiple minimum supports DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47

48 9.3 Multiple Minimum Supports: Step 1 Candidate generation, k = 2. Let φ = 10% (support difference) Take each item (seed) from L in order. Use L and not F 1 due to the downward closure property invalidity! Test the chosen item against its MIS: sup({3}) MIS(3) If true, then we can use this value to form a level 2 candidate If not, then go to the next element in L Items MIS SUP L {3, 1, 2} If true, e.g., sup({3}) = 6% MIS(3) = 5%, then try to form a 2 level candidate together with each of the next items in L, e.g., {3, 1}, then {3, 2} DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48

49 9.3 Multiple Minimum Supports: Step 1 {3, 1} is a candidate : sup({1}) MIS(3) and sup({3}) sup({1}) φ sup({1}) = 9%; MIS(3) = 5%; sup({3}) = 6%; φ := 10% 9% > 5% and 6%-9% < 10%, thus C 2 = {3, 1} Now try {3, 2} sup({2}) = 25%; 25% > 5% but 6%-25% > 10% so this candidate will be rejected due to the support difference constraint Items MIS SUP L {3, 1, 2} DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49

50 9.3 Multiple Minimum Supports: Step 1 Pick the next seed from L, i.e. 1 (needed to try {1,2}) sup({1}) < MIS(1) so we can not use 1 as seed! Candidate generation for k=2 remains C 2 = {{3, 1}} Now read the transaction list and calculate the support of each item in C 2. Let s assume sup({3, 1})=6, which is larger than min(mis(3), MIS(1)) Thus F 2 = {{3, 1}} Items MIS SUP L {3, 1, 2} DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50

51 9.3 Multiple Minimum Supports: Step 1 Generalization, k > 2 uses L k-1 as input and returns a superset (candidates) of the set of all frequent k- itemsets. It has two steps: Join step: same as in the case of k=2 I k = join(a k-1, B k-1 ) A k-1 = {i 1, i 2,, i k-2, i k-1 } and B k-1 = {i 1, i 2,, i k-2, i k-1 } and i k-1 < i k-1 and sup(i k-1 ) sup(i k-1 ) φ. Then I k = {i 1, i 2,, i k-2, i k-1, i k-1 } Prune step: for each (k-1) subset s of I k, if s is not in F k-1, then I k can be removed from C k (it is not a good candidate). There is however one exception to this rule, when s does not include the first item from I k DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51

52 9.3 Multiple Minimum Supports: Step 1 Generalization, k > 2 example: let s consider F3={{1, 2, 3}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {1, 4, 5}, {1, 4, 6}, {2, 3, 5}} After join we obtain {1, 2, 3, 5}, {1, 3, 4, 5} and {1, 4, 5, 6} (we do not consider the support difference constraint) After pruning we get C 4 = {{1, 2, 3, 5}, {1, 3, 4, 5}} {1, 2, 3, 5} is ok {1, 3, 4, 5} is not deleted although {3, 4, 5} F 3, because MIS(3) > MIS(1). If MIS(3) = MIS(1), it could be deleted {1, 4, 5, 6} is deleted because {1, 5, 6} F 3 DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52

53 9.3 Multiple Minimum Supports: Step 2 Step 2: rule generation Downward closure property is not valid anymore, therefore we have frequent k order items, which contain (k-1) non-frequent sub-items For those non-frequent items we do not have the support value recorded This problem arises when we form rules of the form A,B C, where MIS(C) = min(mis(a), MIS(B), MIS(C)) Conf(A,B C) = sup({a,b,c}) / sup({a,b}) We have the frequency of {A, B, C} because it is frequent, but we don t have the frequency to calculate support of {AB} since it is not frequent by itself This is called head-item problem DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53

54 9.3 Multiple Minimum Supports: Step 2 Rule generation example Items {Clothes},{Bread} {Shoes, Clothes, Bread} SUP Items Bread Clothes Shoes MIS {Shoes, Clothes, Bread} is a frequent itemset since MIS({Shoes, Clothes, Bread}) = 0.1 < sup({shoes, Clothes, Bread}) = 0.12 However {Clothes, Bread} is not since neither Clothes nor Bread can seed frequent itemsets So we may not calculate the confidence of all rules depending on Shoes, i.e. rules: Clothes, Bread Shoes Clothes Shoes, Bread Bread Shoes, Clothes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54

55 9.3 Multiple Minimum Supports: Step 2 Head-item problem e.g.: Clothes, Bread Shoes; Clothes Shoes, Bread; Bread Shoes, Clothes. If we have some item on the right side of a rule, which has the minimum MIS (e.g. Shoes), we may not be able to calculate the confidence without reading the data again DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55

56 9.3 Multiple Minimum Supports Advantages It is a more realistic model for practical applications The model enables us to find rare item rules, but without producing a huge number of meaningless rules with frequent items By setting MIS values of some items to 100% (or more), we can effectively instruct the algorithms not to generate rules only involving these items DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56

57 9.3 Association Rule Mining Mining Class Association Rules (CAR) Normal association rule mining doesn t have a target It finds all possible rules that exist in data, i.e., any item can appear as a consequent or a condition of a rule However, in some applications, the user is interested in some targets E.g. the user has a set of text documents from some known topics. He wants to find out what words are associated or correlated with each topic DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57

58 9.3 Class Association Rules CAR, example A text document data set doc 1: Student, Teach, School : Education doc 2: Student, School : Education doc 3: Teach, School, City, Game : Education doc 4: Baseball, Basketball : Sport doc 5: Basketball, Player, Spectator : Sport doc 6: Baseball, Coach, Game, Team : Sport doc 7: Basketball, Team, City, Game : Sport Let minsup = 20% and minconf = 60%. Examples of class association rules: Student, School Education [sup= 2/7, conf = 2/2] Game Sport [sup= 2/7, conf = 2/3] DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 58

59 9.3 Class Association Rules CAR can also be extended with multiple minimum supports The user can specify different minimum supports to different classes, which effectively assign a different minimum support to rules of each class E.g. a data set with two classes, Yes and No. We may want rules of class Yes to have the minimum support of 5% and rules of class No to have the minimum support of 10% By setting minimum class supports to 100% we can skip generating rules of those classes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 59

60 9.3 Association Rule Mining Tools Open source projects Weka RapidMiner Commercial Intelligent Miner, replaced by DB2 Data Warehouse Editions PASW Modeler, developed by SPSS Oracle Data Mining (ODM) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 60

61 9.3 Association Rule Mining Apriori algorithm, on a car s sales data-set Class values: unacceptable, acceptable, good, very good And 6 attributes: Buying cost: vhigh, high, med, low Maintenance costs: vhigh, high, med, low DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 61

62 9.3 Association Rule Mining Apriori algorithm Number of rules Support interval Upper and lower bound Class index Confidence DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 62

63 9.3 Association Rule Mining Apriori algorithm Largest frequent itemsets comprise 3 items Most powerful rules are simple rules Most of the people find 2 person cars unacceptable DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 63

64 9.3 Association Rule Mining Lower confidence rule (62%) If 4 seat car, is found unacceptable, the it is because it s unsafe (rule 30) DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 64

65 9.3 Association Rule Mining Open source projects also have their limits Car accidents data set rows 54 attributes DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 65

66 Summary Business Intelligence Overview Customer segmentation, propensity to buy, customer profitability, attrition, etc. Data Mining Overview Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases Association Rule Mining Apriori algorithm, support, confidence, downward closure property Multiple minimum supports solve the rare-item problem which introduced the Head-item problem Data Warehousing & OLAP Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 66

67 Next lecture Data Mining Time Series data Trend and Similarity Search Analysis Sequence Patterns DW & DM Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 67

Data Warehousing & Data Mining

Data Warehousing & Data Mining 9. Business Intelligence Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 9. Business Intelligence

More information

Data Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Data Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary How to build a DW The DW Project:

More information

Summary. 8.1 BI Overview. 8. Business Intelligence. 8.1 BI Overview. 8.1 BI Overview 12/17/ Business Intelligence

Summary. 8.1 BI Overview. 8. Business Intelligence. 8.1 BI Overview. 8.1 BI Overview 12/17/ Business Intelligence Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de How to build a DW The DW Project:

More information

732A61/TDDD41 Data Mining - Clustering and Association Analysis

732A61/TDDD41 Data Mining - Clustering and Association Analysis 732A61/TDDD41 Data Mining - Clustering and Association Analysis Lecture 6: Association Analysis I Jose M. Peña IDA, Linköping University, Sweden 1/14 Outline Content Association Rules Frequent Itemsets

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket

More information

Association Rules. Fundamentals

Association Rules. Fundamentals Politecnico di Torino Politecnico di Torino 1 Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket counter Association rule

More information

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example Association rules Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket

More information

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions. Definitions Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Itemset is a set including one or more items Example: {Beer, Diapers} k-itemset is an itemset that contains k

More information

Data Analytics Beyond OLAP. Prof. Yanlei Diao

Data Analytics Beyond OLAP. Prof. Yanlei Diao Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of

More information

Association Rule Mining on Web

Association Rule Mining on Web Association Rule Mining on Web What Is Association Rule Mining? Association rule mining: Finding interesting relationships among items (or objects, events) in a given data set. Example: Basket data analysis

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights

More information

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 2016 2017 Road map The Apriori algorithm Step 1: Mining all frequent

More information

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci Association Rules Information Retrieval and Data Mining Prof. Matteo Matteucci Learning Unsupervised Rules!?! 2 Market-Basket Transactions 3 Bread Peanuts Milk Fruit Jam Bread Jam Soda Chips Milk Fruit

More information

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Association Rule. Lecturer: Dr. Bo Yuan. LOGO Association Rule Lecturer: Dr. Bo Yuan LOGO E-mail: yuanb@sz.tsinghua.edu.cn Overview Frequent Itemsets Association Rules Sequential Patterns 2 A Real Example 3 Market-Based Problems Finding associations

More information

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of

More information

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association

More information

Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion

Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion Outline Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Introduction Algorithm Apriori Algorithm AprioriTid Comparison of Algorithms Conclusion Presenter: Dan Li Discussion:

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan

More information

1 Frequent Pattern Mining

1 Frequent Pattern Mining Decision Support Systems MEIC - Alameda 2010/2011 Homework #5 Due date: 31.Oct.2011 1 Frequent Pattern Mining 1. The Apriori algorithm uses prior knowledge about subset support properties. In particular,

More information

EFFICIENT MINING OF WEIGHTED QUANTITATIVE ASSOCIATION RULES AND CHARACTERIZATION OF FREQUENT ITEMSETS

EFFICIENT MINING OF WEIGHTED QUANTITATIVE ASSOCIATION RULES AND CHARACTERIZATION OF FREQUENT ITEMSETS EFFICIENT MINING OF WEIGHTED QUANTITATIVE ASSOCIATION RULES AND CHARACTERIZATION OF FREQUENT ITEMSETS Arumugam G Senior Professor and Head, Department of Computer Science Madurai Kamaraj University Madurai,

More information

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

DATA MINING LECTURE 3. Frequent Itemsets Association Rules DATA MINING LECTURE 3 Frequent Itemsets Association Rules This is how it all started Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases.

More information

Associa'on Rule Mining

Associa'on Rule Mining Associa'on Rule Mining Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 4 and 7, 2014 1 Market Basket Analysis Scenario: customers shopping at a supermarket Transaction

More information

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit: Frequent Itemsets and Association Rule Mining Vinay Setty vinay.j.setty@uis.no Slides credit: http://www.mmds.org/ Association Rule Discovery Supermarket shelf management Market-basket model: Goal: Identify

More information

COMP 5331: Knowledge Discovery and Data Mining

COMP 5331: Knowledge Discovery and Data Mining COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar And Jiawei Han, Micheline Kamber, and Jian Pei 1 10

More information

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05 Meelis Kull meelis.kull@ut.ee Autumn 2017 1 Sample vs population Example task with red and black cards Statistical terminology Permutation test and hypergeometric test Histogram on a sample vs population

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/17/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.

More information

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. .. Cal Poly CSC 4: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Mining Association Rules Examples Course Enrollments Itemset. I = { CSC3, CSC3, CSC40, CSC40, CSC4, CSC44, CSC4, CSC44,

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 04 Association Analysis Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Association Rule Mining Apriori algorithm,

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #12: Frequent Itemsets Seoul National University 1 In This Lecture Motivation of association rule mining Important concepts of association rules Naïve approaches for

More information

CS5112: Algorithms and Data Structures for Applications

CS5112: Algorithms and Data Structures for Applications CS5112: Algorithms and Data Structures for Applications Lecture 19: Association rules Ramin Zabih Some content from: Wikipedia/Google image search; Harrington; J. Leskovec, A. Rajaraman, J. Ullman: Mining

More information

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING, LIG M2 SIF DMV course 207/208 Market basket analysis Analyse supermarket s transaction data Transaction = «market basket» of a customer Find which items are

More information

Lecture Notes for Chapter 6. Introduction to Data Mining

Lecture Notes for Chapter 6. Introduction to Data Mining Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

More information

Mining Infrequent Patter ns

Mining Infrequent Patter ns Mining Infrequent Patter ns JOHAN BJARNLE (JOHBJ551) PETER ZHU (PETZH912) LINKÖPING UNIVERSITY, 2009 TNM033 DATA MINING Contents 1 Introduction... 2 2 Techniques... 3 2.1 Negative Patterns... 3 2.2 Negative

More information

Chapters 6 & 7, Frequent Pattern Mining

Chapters 6 & 7, Frequent Pattern Mining CSI 4352, Introduction to Data Mining Chapters 6 & 7, Frequent Pattern Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining Chapters

More information

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation 12.3.2008 Lauri Lahti Association rules Techniques for data mining and knowledge discovery in databases

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Handling a Concept Hierarchy

Handling a Concept Hierarchy Food Electronics Handling a Concept Hierarchy Bread Milk Computers Home Wheat White Skim 2% Desktop Laptop Accessory TV DVD Foremost Kemps Printer Scanner Data Mining: Association Rules 5 Why should we

More information

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014 CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof.

More information

Frequent Itemset Mining

Frequent Itemset Mining ì 1 Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM COCONUT Team (PART I) IMAGINA 17/18 Webpage: http://www.lirmm.fr/~lazaar/teaching.html Email: lazaar@lirmm.fr 2 Data Mining ì Data Mining (DM) or Knowledge

More information

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University Association Rules CS 5331 by Rattikorn Hewett Texas Tech University 1 Acknowledgements Some parts of these slides are modified from n C. Clifton & W. Aref, Purdue University 2 1 Outline n Association Rule

More information

The Market-Basket Model. Association Rules. Example. Support. Applications --- (1) Applications --- (2)

The Market-Basket Model. Association Rules. Example. Support. Applications --- (1) Applications --- (2) The Market-Basket Model Association Rules Market Baskets Frequent sets A-priori Algorithm A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small set

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining 13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.

More information

Density-Based Clustering

Density-Based Clustering Density-Based Clustering idea: Clusters are dense regions in feature space F. density: objects volume ε here: volume: ε-neighborhood for object o w.r.t. distance measure dist(x,y) dense region: ε-neighborhood

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Association Rule Mining Apriori

More information

CS 484 Data Mining. Association Rule Mining 2

CS 484 Data Mining. Association Rule Mining 2 CS 484 Data Mining Association Rule Mining 2 Review: Reducing Number of Candidates Apriori principle: If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization

More information

Unit II Association Rules

Unit II Association Rules Unit II Association Rules Basic Concepts Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set Frequent Itemset

More information

Data mining, 4 cu Lecture 5:

Data mining, 4 cu Lecture 5: 582364 Data mining, 4 cu Lecture 5: Evaluation of Association Patterns Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Evaluation of Association Patterns Association rule algorithms

More information

Association Analysis. Part 1

Association Analysis. Part 1 Association Analysis Part 1 1 Market-basket analysis DATA: A large set of items: e.g., products sold in a supermarket A large set of baskets: e.g., each basket represents what a customer bought in one

More information

15 Introduction to Data Mining

15 Introduction to Data Mining 15 Introduction to Data Mining 15.1 Introduction to principle methods 15.2 Mining association rule see also: A. Kemper, Chap. 17.4, Kifer et al.: chap 17.7 ff 15.1 Introduction "Discovery of useful, possibly

More information

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated. 22 February 2007 CSE-4412(M) Midterm p. 1 of 12 CSE-4412(M) Midterm Sur / Last Name: Given / First Name: Student ID: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: Winter 2007 Answer the following

More information

10/19/2017 MIST.6060 Business Intelligence and Data Mining 1. Association Rules

10/19/2017 MIST.6060 Business Intelligence and Data Mining 1. Association Rules 10/19/2017 MIST6060 Business Intelligence and Data Mining 1 Examples of Association Rules Association Rules Sixty percent of customers who buy sheets and pillowcases order a comforter next, followed by

More information

COMP 5331: Knowledge Discovery and Data Mining

COMP 5331: Knowledge Discovery and Data Mining COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline Kamber, and Jian Pei And slides provide by Raymond

More information

Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Chapter 5 Association Analysis: Basic Concepts Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 2/3/28 Introduction to Data Mining Association Rule Mining Given

More information

Machine Learning: Pattern Mining

Machine Learning: Pattern Mining Machine Learning: Pattern Mining Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Wintersemester 2007 / 2008 Pattern Mining Overview Itemsets Task Naive Algorithm Apriori Algorithm

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - 1DL36 Fall 212" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht12 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala

More information

DATA MINING - 1DL105, 1DL111

DATA MINING - 1DL105, 1DL111 1 DATA MINING - 1DL105, 1DL111 Fall 2007 An introductory class in data mining http://user.it.uu.se/~udbl/dut-ht2007/ alt. http://www.it.uu.se/edu/course/homepage/infoutv/ht07 Kjell Orsborn Uppsala Database

More information

1. Data summary and visualization

1. Data summary and visualization 1. Data summary and visualization 1 Summary statistics 1 # The UScereal data frame has 65 rows and 11 columns. 2 # The data come from the 1993 ASA Statistical Graphics Exposition, 3 # and are taken from

More information

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on

More information

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Mining Molecular Fragments: Finding Relevant Substructures of Molecules Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli

More information

Sequential Pattern Mining

Sequential Pattern Mining Sequential Pattern Mining Lecture Notes for Chapter 7 Introduction to Data Mining Tan, Steinbach, Kumar From itemsets to sequences Frequent itemsets and association rules focus on transactions and the

More information

Are You Maximizing The Value Of All Your Data?

Are You Maximizing The Value Of All Your Data? Are You Maximizing The Value Of All Your Data? Using The SAS Bridge for ESRI With ArcGIS Business Analyst In A Retail Market Analysis SAS and ESRI: Bringing GIS Mapping and SAS Data Together Presented

More information

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29 Data Mining and Knowledge Discovery Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2011/11/29 1 Practice plan 2011/11/08: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate test set,

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - DL360 Fall 200 An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht0 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala

More information

Exam III Review Math-132 (Sections 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 8.1, 8.2, 8.3)

Exam III Review Math-132 (Sections 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 8.1, 8.2, 8.3) 1 Exam III Review Math-132 (Sections 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 8.1, 8.2, 8.3) On this exam, questions may come from any of the following topic areas: - Union and intersection of sets - Complement of

More information

Multimedia Databases. Previous Lecture. 4.1 Multiresolution Analysis. 4 Shape-based Features. 4.1 Multiresolution Analysis

Multimedia Databases. Previous Lecture. 4.1 Multiresolution Analysis. 4 Shape-based Features. 4.1 Multiresolution Analysis Previous Lecture Multimedia Databases Texture-Based Image Retrieval Low Level Features Tamura Measure, Random Field Model High-Level Features Fourier-Transform, Wavelets Wolf-Tilo Balke Silviu Homoceanu

More information

Frequent Itemset Mining

Frequent Itemset Mining ì 1 Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM COCONUT Team IMAGINA 16/17 Webpage: h;p://www.lirmm.fr/~lazaar/teaching.html Email: lazaar@lirmm.fr 2 Data Mining ì Data Mining (DM) or Knowledge Discovery

More information

Data mining, 4 cu Lecture 7:

Data mining, 4 cu Lecture 7: 582364 Data mining, 4 cu Lecture 7: Sequential Patterns Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Sequential Patterns In many data mining tasks the order and timing of events contains

More information

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 4 Previous Lecture Texture-Based Image Retrieval Low

More information

DATA MINING LECTURE 4. Frequent Itemsets, Association Rules Evaluation Alternative Algorithms

DATA MINING LECTURE 4. Frequent Itemsets, Association Rules Evaluation Alternative Algorithms DATA MINING LECTURE 4 Frequent Itemsets, Association Rules Evaluation Alternative Algorithms RECAP Mining Frequent Itemsets Itemset A collection of one or more items Example: {Milk, Bread, Diaper} k-itemset

More information

Algorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview

Algorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview Algorithmic Methods of Data Mining, Fall 2005, Course overview 1 Course overview lgorithmic Methods of Data Mining, Fall 2005, Course overview 1 T-61.5060 Algorithmic methods of data mining (3 cp) P T-61.5060

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

FP-growth and PrefixSpan

FP-growth and PrefixSpan FP-growth and PrefixSpan n Challenges of Frequent Pattern Mining n Improving Apriori n Fp-growth n Fp-tree n Mining frequent patterns with FP-tree n PrefixSpan Challenges of Frequent Pattern Mining n Challenges

More information

Association Rules. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION

Association Rules. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION CHAPTER2 Association Rules 2.1 Introduction Many large retail organizations are interested in instituting information-driven marketing processes, managed by database technology, that enable them to Jones

More information

CS 584 Data Mining. Association Rule Mining 2

CS 584 Data Mining. Association Rule Mining 2 CS 584 Data Mining Association Rule Mining 2 Recall from last time: Frequent Itemset Generation Strategies Reduce the number of candidates (M) Complete search: M=2 d Use pruning techniques to reduce M

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia

More information

Frequent Pattern Mining: Exercises

Frequent Pattern Mining: Exercises Frequent Pattern Mining: Exercises Christian Borgelt School of Computer Science tto-von-guericke-university of Magdeburg Universitätsplatz 2, 39106 Magdeburg, Germany christian@borgelt.net http://www.borgelt.net/

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Removing trivial associations in association rule discovery

Removing trivial associations in association rule discovery Removing trivial associations in association rule discovery Geoffrey I. Webb and Songmao Zhang School of Computing and Mathematics, Deakin University Geelong, Victoria 3217, Australia Abstract Association

More information

Mining Positive and Negative Fuzzy Association Rules

Mining Positive and Negative Fuzzy Association Rules Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing

More information

Multimedia Databases. 4 Shape-based Features. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis

Multimedia Databases. 4 Shape-based Features. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis 4 Shape-based Features Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 4 Multiresolution Analysis

More information

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH M. De Cock C. Cornelis E. E. Kerre Dept. of Applied Mathematics and Computer Science Ghent University, Krijgslaan 281 (S9), B-9000 Gent, Belgium phone: +32

More information

CS738 Class Notes. Steve Revilak

CS738 Class Notes. Steve Revilak CS738 Class Notes Steve Revilak January 2008 May 2008 Copyright c 2008 Steve Revilak. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Data preprocessing. DataBase and Data Mining Group 1. Data set types. Tabular Data. Document Data. Transaction Data. Ordered Data

Data preprocessing. DataBase and Data Mining Group 1. Data set types. Tabular Data. Document Data. Transaction Data. Ordered Data Elena Baralis and Tania Cerquitelli Politecnico di Torino Data set types Record Tables Document Data Transaction Data Graph World Wide Web Molecular Structures Ordered Spatial Data Temporal Data Sequential

More information

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12)

Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Marketing Research Session 10 Hypothesis Testing with Simple Random samples (Chapter 12) Remember: Z.05 = 1.645, Z.01 = 2.33 We will only cover one-sided hypothesis testing (cases 12.3, 12.4.2, 12.5.2,

More information

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data 1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

More information

NetBox: A Probabilistic Method for Analyzing Market Basket Data

NetBox: A Probabilistic Method for Analyzing Market Basket Data NetBox: A Probabilistic Method for Analyzing Market Basket Data José Miguel Hernández-Lobato joint work with Zoubin Gharhamani Department of Engineering, Cambridge University October 22, 2012 J. M. Hernández-Lobato

More information

CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS. Due to elements of uncertainty many problems in this world appear to be

CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS. Due to elements of uncertainty many problems in this world appear to be 11 CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS Due to elements of uncertainty many problems in this world appear to be complex. The uncertainty may be either in parameters defining the problem

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Indexes for Multimedia Data 14 Indexes for Multimedia

More information

Constraint-Based Rule Mining in Large, Dense Databases

Constraint-Based Rule Mining in Large, Dense Databases Appears in Proc of the 5th Int l Conf on Data Engineering, 88-97, 999 Constraint-Based Rule Mining in Large, Dense Databases Roberto J Bayardo Jr IBM Almaden Research Center bayardo@alummitedu Rakesh Agrawal

More information

Assignment 2 SOLUTIONS

Assignment 2 SOLUTIONS MATHEMATICS 01-10-LW Business Statistics Martin Huard Fall 00 Assignment SOLUTIONS This assignment is due on Friday September 6 at the beginning of the class. Question 1 ( points) In a marketing research,

More information

Map your way to deeper insights

Map your way to deeper insights Map your way to deeper insights Target, forecast and plan by geographic region Highlights Apply your data to pre-installed map templates and customize to meet your needs. Select from included map files

More information

TRAITS to put you on the map

TRAITS to put you on the map TRAITS to put you on the map Know what s where See the big picture Connect the dots Get it right Use where to say WOW Look around Spread the word Make it yours Finding your way Location is associated with

More information

Knowledge Discovery and Data Mining I

Knowledge Discovery and Data Mining I Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining Prof. Dr. Thomas Seidl Knowledge Discovery and Data Mining I Winter Semester 2018/19 Agenda 1. Introduction 2. Basics

More information

ASSOCIATION RULE MINING BASED ANALYSIS ON HOROSCOPE DATA A PERSPECTIVE STUDY

ASSOCIATION RULE MINING BASED ANALYSIS ON HOROSCOPE DATA A PERSPECTIVE STUDY International Journal of Computer Engineering & Technology (IJCET) Volume 8, Issue 3, May-June 2017, pp. 76 81, Article ID: IJCET_08_03_008 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=8&itype=3

More information

Temporal Data Mining

Temporal Data Mining Temporal Data Mining Christian Moewes cmoewes@ovgu.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge Processing and Language Engineering Zittau Fuzzy Colloquium

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Types of data sets Record Tables Document Data Transaction Data Graph World Wide Web Molecular Structures

More information