Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context.

Size: px
Start display at page:

Download "Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context."

Transcription

1 Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context. Murphy Choy Cally Claire Ong Michelle Cheong Abstract The rapid explosion in retail data calls for more effective and efficient discovery of association rules to develop relevant business strategies and rules.unlike online shopping sites, most brick and mortar retail shops are located in geographically and demographically diverse areas. This diversity presents a new challenge to the classical association rule model which assumes a homogenous group of customers behaving differently. The focus of this paper is centered on the discovery of association rules that were hidden as a result of a geographical and demographically diverse data. We will introduce a novel measure which incorporates the entropy measure with modified weighting for the detection of association rules not detected by the standard measures due to Simpson's paradox. The proposed measure is evaluated using a real-word case study involving a major retailer of fashion good in the context of traditional brick and mortar setting. Introduction Rapid advances in data management technologies has enabled large corporations to effectively store transactional data for retail stores at affordable rate. The sucess of such data collection exercises has spurned corporations that have access to huge data to attempt extracting important information from these sources to gain competitive advantage. The most common approach for analyzing huge transactional database is to apply market basket analysis, otherwise known as association rule mining. The technique allows for the discovery of customer purchasing behavior by examining the common occurrence of items in the transactional database. These discoveries can lead to interesting item bundling, service bundling as well as layout of retail stores. The association model was first introduced by Agrawal et. Al (1993). Ever since then, there have been extensive studies of the model which resulted in a proliferation of the model. The extensions that have been researched in recent years, includes algorithm improvements (Brin et. Al., 1997; Han et. Al., 2000; Liu et. Al., 2002; Rastogi and Shim, 2002), fuzzy rules enhancements (Ishibuchi et. Al., 2001), multi-level and generalized rules (Clementini and Koperski, 2000), spatial rules, interesting rules and temporal association rules (Ale et. Al., 2000; Lee et. Al., 2001). The simplicity of association rules and ease of understanding, there have been many successful business applications in areas such as finance, telecommunication, retailing, and web design analysis. The most commonly cited examples are mainly found in retail industry. Any major retailers is likely to have stores or branches in different geographical locations with a diverse demographic population. For such company with multiple stores, any purchasing behavior can be influenced by factors such as marketing, sales, service, and operation strategies at the company, local, and store levels. There are two challenges to the use of existing methods in any multi-store context. The first challenge comes from the dimension of time. As purchasing patterns can be seasonal even within a day, they can post difficulties in establishing the suitability and relevance of the rules without being able to determine the time period of relevance (Ale et. Al., 2000; Lee et. Al., 2001). This problem is addressed by temporal association models which are designed to find point-in-time behaviors or to search for time-invariant rules. For the models that are designed to detect point-in-time patterns, the period of relevance is incorporated into the computation of the support value. In this paper, we will not consider time as a factor due to the

2 continuation nature of fashion goods where an updated design will replace an older design. The second problem is found in problems involving the retailers having stores in diverse locations which differs both geographically as well as demographically (Chen et. Al., 2005). This new dimension can have a serious effects on the kind of product purchasing behavior. For example, a place with a well-to-do neighborhood is more likely to have purchases of more expensive categories of goods and brands compared to one found near rental housing area dominated by poorer individuals. At a location where most households have kids, the likelihood of toy and baby toiletries purchases are more likely than at a location more known as a retirement village. This makes it important to identify the rules which are location specific and can be influential in that area. At the same time, we should not penalize rules which are common in many areas. While this problem is considered to be related to spatial association rules, the focus is different as they focuses on the impact of geographical information rather than whether the rules are useful and whether they can be applied to multiple locations. The spread of the stores will be the focus of the paper where we will attempt identifying transactions which are important in different areas which are not identified by the traditional algorithm due to aggregation. The paper is organized as follows. We will define the market basket problem in Section 2. In Section 3, we will propose a modified support measure and compare it against a panel of measures widely used in market basket. In Section 4, we compare the rules selected from the use of each and evaluate them in the context of a real retailer operating in a multi-store environment. The conclusion and future direction will be discussed in Section 5. Association Model Given two products, X and Y, an association rule takes the form of X => Y which implies a purchase pattern where a purchase of X is likely to lead to a purchase of Y. Support and confidence are two measures generally employed to select the most relevant association rules. Support measures the frequency of transactional records in the database having both X and Y, while confidence measures the accuracy of the rule which is defined as the percentage of transactions with both X and Y to the transactions with X. These measures are used in the A-priori algorithm (Aggarwal at. Al., 1993) which is the most commonly implemented algorithm for association rule mining. Let us consider the problem in a mathematical form below. Let us assume a transactional database D that contains transactional records from multiple stores. We need to extract the association rules from the database. Let I =(I 1, I 2,..., I r ) be the set of product items that exists in D, where I k (1 k r) represent the k th item in D. The support for X that is composed of the relation I i I j is represented by support ( X, D), is the percentage of transactions containing X in database D, support ( X, D)=P( I i, I j ) For a specified support threshold ρ s support ( X, D) ρ s, X is a frequent itemset if The definitions of the support and the frequent itemset given above are used in the evaluation of traditional association rules. As seen in the formulation, the store information is not incorporated in the formula of the support of an item set. There have been extensive research in the measures used in market basket analysis (Tan et. Al., 2004). We will hereby list some of more common measures used in market basket analysis. The Interest measure is commonly used in market basket analysis to evaluate the interestingness of a rule. This measure is used as a measure of deviation from statistical independence. However, it is sensitive to the support of the items I i or I j. It is defined as,

3 Interest ( X, D)= P (I i, I j ) P (I i ) P (I j ) The Cosine Similarity Index is a simple index that is used to evaluate the association level between two items. It is closely related to the Interest Measure and is the geometric mean between interest measure and the support measure. The measure for pairs of items is also equivalent to the cosine measure commonly used as similarity measure for vector-space models. It is defined as, Cosine( X, D)= P( I i, I j ) P (I i ) P( I j ) The Jaccard measure is used extensively in information retrieval to measure the similarity between documents. The Jaccard measures estimates the similarity between sample sets of the items. It is defined as, P( I Jaccard ( X, D)= i, I j ) P( I i )+ P (I j ) P( I i, I j ) The Entropy measure is related to the variance of a probability distribution. The entropy of a flat distribution is large, whereas the entropy of a very skewed distribution is small. Given any two strongly associated items, the amount of reduction in entropy is high. The Entropy measure is defined as, Entropy( X, D)= P( I i, I j )log( P( I i, I j )) (1 P (I i, I j ))log(1 P( I i, I j )) In the next section, we will discuss about the Simpson's paradox in market basket analysis and how it affects the traditional market basket analysis. We will discuss about the use of the various measures and what are suitable cut off values for them. Simpson's Paradox and the Modified Entropy Measure In most data analysis, the association rule is usually deployed on the entire data set collected from different stores to extract key rules. However rules extracted this way may not be applicable in certain stores due to the geographical location or demographic factors. This nullification or even reversal of association is commonly known in Statistics as the Simpson's Paradox or Yules- Simpson's effect. One main reason why this effect is particularly obvious in market basket analysis is due mainly to the way the association rules are constructed. Most of the rules depends on confidence and support which are computed across stores. For example, good A is popular among the population but good B is usually bought with good A for a small group of people for a cluster of stores. Let P(A) = 0.4 and P(B) = 0.01 and P(A and B) = Support(A and B) = Interest = 1.25 From the support and interest value, we can see that it is not a strong association nor an interesting one. However, when we move down to the store cluster, the effect is very different. Let P(A X) = 0.3 and P(B X) = 0.1 and P(A and B X) = 0.07 Support(A and B X) = 0.07 Interest = 2.33 From the new cluster level support and interest value, we can see that association has improved in

4 terms of support and is considered an interesting one. The common approach to the problem is to adopt a small support threshold which has a side effect of an explosion in the number of rules generated. Selection of proper confidence threshold is even more complicated and subtle. Given that the measures have few commonality among them, it is very difficult to compare the values. The simplest measure which can be compared with the classic support measure is the entropy measure. The entropy is a measure of the information value that is contained in the data. The entropy measure used in this case is the binary entropy measure. A comparison of the support value and entropy values are listed below. Support(%) Entropy 1.00% 8.08% 2.00% 14.14% 3.00% 19.44% 4.00% 24.23% 5.00% 28.64% 10.00% 46.90% 20.00% 72.19% 30.00% 88.13% 40.00% 97.10% 50.00% % Table 1: Support and Entropy measure comparison One of the key problems with the entropy measure is the strong bias for small entropy values. This presents a problem for the rules as small entropy values tend to favor rules which has very small support. As the minimum support is usually set as 1%, the equivalence in entropy value will be around 8%. However, too high a support level tend to lead to rules which are uninteresting. Given this, the rule filtering will be capped with a maximum level of 10% which has an entropy equivalence of 46.90%. For Interest Measure, the common value used to determine whether rules are interesting is the cutoff of 3. Given the relation and similarities between Interest measure, Cosine measure and Jaccard measure, we will use a value of 5% for Cosine and 1% for Jaccard. However, for all the measures discussed so far, they do not incorporate the store level information. The author propose a modified entropy measure which combines a store measure with entropy function. The proposed function is as shown below, SL Entropy( X, D, X,Y )=log ( Y X )( P (I i, I j )log(p (I i, I j )) (1 P (I i, I j ))log(1 P (I i, I j ))) Where Y is the total number of stores and X being the number of stores where the rules is found. The new measure incorporated the store level distribution information together with the entropy measure. The measure increases the entropy values by inflating it with a measure that identifies the number of stores which contributed to the rules. If the rules is found in small clusters, the entropy will be improved to indicate a good support for it in those clusters. The measure also reward single store rules which have very entropy values that deserves a second look. The SL Entropy measure is similar in construct to the TD*IDF measure widely used in text mining. Both measures combine a frequency measure with an inverse coverage measure to establish appropriate measurement value for useful and interesting information. In the next section, we will test the measure in a fashion good retail context with a myriad of stores. Real World Case Study: Fashion Good Retailer

5 We evaluate our new association rule scoping measure using a multiple store transactional dataset provided by a fashion good retailer. The fashion good retailer is one of the leading fashion good provider in the country and has stores in various locations with different demographics. There are three different types of stores which are located at different malls and area. The first type of stores are located in super luxury malls patronized by the richer populace. The second type of stores are located in boutique shops dedicated to the brands. The final type of stores are located in departmental stores as display counters. We will first analyze the rules generated by the classic association model for each of the store type and compare them to the overall rules generated. Below is the preliminary result. Transaction T1 T2 T3 Raw A1=>A A1=>A A1=>A A1=>A A1=>A A1=>A A2=>A A2=>A A2=>A A2=>A A2=>A A2=>A A3=>A A5=>A A6=>A A6=>A A4=>A A4=>A A8=>A Sub Total A10=>A A11=>A A11=>A A11=>A A11=>A A11=>A A13=>A A13=>A A13=>A A14=>A A14=>A A15=>A A16=>A Sub Total Total Table 2: Store Level comparison From the results in table 2, we can observe that the type 1 and 2 stores are radically different in terms of the relations and that the classic model on the entire database indicated by raw has only detected 5 rules using the classic measures. This is poor as the bulk of the rules found in the three types of stores make up a total of 32 rules of which only 5 are detected at overall level. That is less than 50% of the rules discovered. Let us compare the performances of the other measures as well as the new SL Entropy measure.

6 Transaction T1 T2 T3 Raw IS Jaccard Entropy SL Entropy A1=>A A1=>A A1=>A A1=>A A1=>A A1=>A A2=>A A2=>A A2=>A A2=>A A2=>A A2=>A A3=>A A5=>A A6=>A A6=>A A4=>A A4=>A A8=>A Sub Total A10=>A A11=>A A11=>A A11=>A A11=>A A11=>A A13=>A A13=>A A13=>A A14=>A A14=>A A15=>A A16=>A Sub Total Total Table 2: New measure and alternate measure comparison From the new results in table 2, we can observe that the Entropy, Cosine and Jaccard measures have performed much better than the traditional support measure in detecting rules which are found at store level. Entropy achieved a detection rate of 68.75%, Jaccard achieved 71.88% while cosine achieved 53.12%. The new SL Entropy achieved 87.5% detection rate which is far above the others. The new measure is much more effective than the other measures as well as the traditional measure. Conclusion One critical challenge for store level association rule generation is the need to analyze data at overall level while compensating for the loss of granularity. We have proposed a novel measure of association rule for detection of rules which are at store level or dispersed across location. The new measure provides enhanced capabilities to retail companies to explore association rules which are based on clusters' behavior. We have evaluated the proposed measure in a real-world case study that focus on fashion good retail with brick and mortar stores. The measure was able to identify the rules identified at the store type level while analyzing the data at an overall level. For future work, we plan to extend the measure beyond a single factor. Only a handful of the measures discussed can be generalized to multi-level interactions. Analyzing such multi-layered associations is much more complicated due to the exponential increase in the number of factors to be considered. The increased dimensions will also cause the number of relations that is needed to be increased to a level that allows for proper analysis. A good measure must be able to identify the associations among the factors from the overall data. Careful study is needed to establish the most appropriate measure for this purpose.

7 References Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I., Fast discovery of association rules. In: Fayyad, G., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (Eds.), Advances in Knowledge Discovery and Data Mining. AAAI Press and The MIT Press, Menlo Park, CA, pp J.M. Ale, G.H. Rossi, An approach to discovering temporal association rules, Proceedings of the 2000 ACM Symposium on Applied Computing (Vol. 1), Villa Olmo, Como, Italy, 2000, pp Brin, S., Siverstein, C., Motwani, R., Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery 2, Chen, Y.-L., Tang, K., Hu, Y.-H., Market basket analysis in a multiple store environment. Decision Support Systems 40 (2), E. Clementini, P.D. Felice, K. Koperski, Mining multiple- level spatial association rules for objects with a broad boundary, Data and Knowledge Engineering 34 (3) (2000) Dimitriadou, E., Dolnicar, S., Weingessel, A., An examination of indexes for determining the number of clusters in binary data sets. Psychometrika 67, J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM-SIGMOD Int. Conf. on Management of Data, Dallas, TX, 2000, May. H. Ishibuchi, T. Nakashima, T. Yamamoto, Fuzzy association rules for handling continuous attributes, Proceedings of the IEEE International Symposium on Industrial Electronics, Pusan, Korea, 2001, pp C.M. Kuok, A.W. Fu, M.H. Wong, Mining fuzzy association rules in databases, SIGMOD Record 27 (1) (1998) K. Koperski, J. Han, Discovery of spatial association rules in geographic information databases, Proc. 4th International Symposium on Large Spatial Databases (SSD95), Portland, Maine, Aug. 1995, pp C.H. Lee, C.R. Lin, M.S. Chen, On mining general temporal association rules in a publication database, Proceedings of the 2001 IEEE International Conference on Data Mining, San Jose, California, 2001, pp J. Liu, Y. Pan, K. Wang, J. Han, Mining frequent item sets by opportunistic projection, Proceedings of the 2002 Int. Conf. on Knowledge Discovery in Databases, Edmonton, Canada, 2003, July. R. Rastogi, K. Shim, Mining optimized association rules with categorical and numeric attributes, IEEE Transactions on Knowledge and Data Engineering 14 (2002) P.-N. Tan, V. Kumar, and J. Srivastava, Selecting the right objective measure for association analysis, Inform. Syst., vol. 29, no. 4, pp , June 2004.

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan

More information

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/17/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.

More information

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Association Rule. Lecturer: Dr. Bo Yuan. LOGO Association Rule Lecturer: Dr. Bo Yuan LOGO E-mail: yuanb@sz.tsinghua.edu.cn Overview Frequent Itemsets Association Rules Sequential Patterns 2 A Real Example 3 Market-Based Problems Finding associations

More information

An Approach to Classification Based on Fuzzy Association Rules

An Approach to Classification Based on Fuzzy Association Rules An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based

More information

Mining Positive and Negative Fuzzy Association Rules

Mining Positive and Negative Fuzzy Association Rules Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing

More information

Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies

Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Juliano Brito da Justa Neves 1 Marina Teresa Pires Vieira {juliano,marina}@dc.ufscar.br Computer Science Department

More information

Selecting a Right Interestingness Measure for Rare Association Rules

Selecting a Right Interestingness Measure for Rare Association Rules Selecting a Right Interestingness Measure for Rare Association Rules Akshat Surana R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad

More information

NetBox: A Probabilistic Method for Analyzing Market Basket Data

NetBox: A Probabilistic Method for Analyzing Market Basket Data NetBox: A Probabilistic Method for Analyzing Market Basket Data José Miguel Hernández-Lobato joint work with Zoubin Gharhamani Department of Engineering, Cambridge University October 22, 2012 J. M. Hernández-Lobato

More information

Chapter 4: Frequent Itemsets and Association Rules

Chapter 4: Frequent Itemsets and Association Rules Chapter 4: Frequent Itemsets and Association Rules Jilles Vreeken Revision 1, November 9 th Notation clarified, Chi-square: clarified Revision 2, November 10 th details added of derivability example Revision

More information

Relation between Pareto-Optimal Fuzzy Rules and Pareto-Optimal Fuzzy Rule Sets

Relation between Pareto-Optimal Fuzzy Rules and Pareto-Optimal Fuzzy Rule Sets Relation between Pareto-Optimal Fuzzy Rules and Pareto-Optimal Fuzzy Rule Sets Hisao Ishibuchi, Isao Kuwajima, and Yusuke Nojima Department of Computer Science and Intelligent Systems, Osaka Prefecture

More information

On Minimal Infrequent Itemset Mining

On Minimal Infrequent Itemset Mining On Minimal Infrequent Itemset Mining David J. Haglin and Anna M. Manning Abstract A new algorithm for minimal infrequent itemset mining is presented. Potential applications of finding infrequent itemsets

More information

Data Analytics Beyond OLAP. Prof. Yanlei Diao

Data Analytics Beyond OLAP. Prof. Yanlei Diao Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights

More information

Mining Infrequent Patter ns

Mining Infrequent Patter ns Mining Infrequent Patter ns JOHAN BJARNLE (JOHBJ551) PETER ZHU (PETZH912) LINKÖPING UNIVERSITY, 2009 TNM033 DATA MINING Contents 1 Introduction... 2 2 Techniques... 3 2.1 Negative Patterns... 3 2.2 Negative

More information

COMP 5331: Knowledge Discovery and Data Mining

COMP 5331: Knowledge Discovery and Data Mining COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline Kamber, and Jian Pei And slides provide by Raymond

More information

Selecting the Right Interestingness Measure for Association Patterns

Selecting the Right Interestingness Measure for Association Patterns Selecting the Right ingness Measure for Association Patterns Pang-Ning Tan Department of Computer Science and Engineering University of Minnesota 2 Union Street SE Minneapolis, MN 55455 ptan@csumnedu Vipin

More information

Algorithms for Characterization and Trend Detection in Spatial Databases

Algorithms for Characterization and Trend Detection in Spatial Databases Published in Proceedings of 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) Algorithms for Characterization and Trend Detection in Spatial Databases Martin Ester, Alexander

More information

Data mining, 4 cu Lecture 5:

Data mining, 4 cu Lecture 5: 582364 Data mining, 4 cu Lecture 5: Evaluation of Association Patterns Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Evaluation of Association Patterns Association rule algorithms

More information

732A61/TDDD41 Data Mining - Clustering and Association Analysis

732A61/TDDD41 Data Mining - Clustering and Association Analysis 732A61/TDDD41 Data Mining - Clustering and Association Analysis Lecture 6: Association Analysis I Jose M. Peña IDA, Linköping University, Sweden 1/14 Outline Content Association Rules Frequent Itemsets

More information

Bottom-Up Propositionalization

Bottom-Up Propositionalization Bottom-Up Propositionalization Stefan Kramer 1 and Eibe Frank 2 1 Institute for Computer Science, Machine Learning Lab University Freiburg, Am Flughafen 17, D-79110 Freiburg/Br. skramer@informatik.uni-freiburg.de

More information

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Claudio Lucchese Ca Foscari University of Venice clucches@dsi.unive.it Raffaele Perego ISTI-CNR of Pisa perego@isti.cnr.it Salvatore

More information

Associa'on Rule Mining

Associa'on Rule Mining Associa'on Rule Mining Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 4 and 7, 2014 1 Market Basket Analysis Scenario: customers shopping at a supermarket Transaction

More information

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH M. De Cock C. Cornelis E. E. Kerre Dept. of Applied Mathematics and Computer Science Ghent University, Krijgslaan 281 (S9), B-9000 Gent, Belgium phone: +32

More information

Online Appendix for Discovery of Periodic Patterns in Sequence Data: A Variance Based Approach

Online Appendix for Discovery of Periodic Patterns in Sequence Data: A Variance Based Approach Online Appendix for Discovery of Periodic Patterns in Sequence Data: A Variance Based Approach Yinghui (Catherine) Yang Graduate School of Management, University of California, Davis AOB IV, One Shields

More information

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional -Series Data Xiaolei Li, Jiawei Han University of Illinois at Urbana-Champaign VLDB 2007 1 Series Data Many applications produce time series

More information

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:

Frequent Itemsets and Association Rule Mining. Vinay Setty Slides credit: Frequent Itemsets and Association Rule Mining Vinay Setty vinay.j.setty@uis.no Slides credit: http://www.mmds.org/ Association Rule Discovery Supermarket shelf management Market-basket model: Goal: Identify

More information

Regression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules

Regression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules International Journal of Innovative Research in Computer Scien & Technology (IJIRCST) Regression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules Mir Md Jahangir

More information

Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion

Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion Outline Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Introduction Algorithm Apriori Algorithm AprioriTid Comparison of Algorithms Conclusion Presenter: Dan Li Discussion:

More information

Mining Strong Positive and Negative Sequential Patterns

Mining Strong Positive and Negative Sequential Patterns Mining Strong Positive and Negative Sequential Patter NANCY P. LIN, HUNG-JEN CHEN, WEI-HUA HAO, HAO-EN CHUEH, CHUNG-I CHANG Department of Computer Science and Information Engineering Tamang University,

More information

Theory of Dependence Values

Theory of Dependence Values Theory of Dependence Values ROSA MEO Università degli Studi di Torino A new model to evaluate dependencies in data mining problems is presented and discussed. The well-known concept of the association

More information

Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries Data Mining and Knowledge Discovery, 7, 5 22, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency

More information

COMP 5331: Knowledge Discovery and Data Mining

COMP 5331: Knowledge Discovery and Data Mining COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar And Jiawei Han, Micheline Kamber, and Jian Pei 1 10

More information

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05 Meelis Kull meelis.kull@ut.ee Autumn 2017 1 Sample vs population Example task with red and black cards Statistical terminology Permutation test and hypergeometric test Histogram on a sample vs population

More information

Discovery of Functional and Approximate Functional Dependencies in Relational Databases

Discovery of Functional and Approximate Functional Dependencies in Relational Databases JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 7(1), 49 59 Copyright c 2003, Lawrence Erlbaum Associates, Inc. Discovery of Functional and Approximate Functional Dependencies in Relational Databases

More information

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association

More information

Mining Interesting Infrequent and Frequent Itemsets Based on Minimum Correlation Strength

Mining Interesting Infrequent and Frequent Itemsets Based on Minimum Correlation Strength Mining Interesting Infrequent and Frequent Itemsets Based on Minimum Correlation Strength Xiangjun Dong School of Information, Shandong Polytechnic University Jinan 250353, China dongxiangjun@gmail.com

More information

Type of data Interval-scaled variables: Binary variables: Nominal, ordinal, and ratio variables: Variables of mixed types: Attribute Types Nominal: Pr

Type of data Interval-scaled variables: Binary variables: Nominal, ordinal, and ratio variables: Variables of mixed types: Attribute Types Nominal: Pr Foundation of Data Mining i Topic: Data CMSC 49D/69D CSEE Department, e t, UMBC Some of the slides used in this presentation are prepared by Jiawei Han and Micheline Kamber Data Data types Quality of data

More information

Association Rule Mining on Web

Association Rule Mining on Web Association Rule Mining on Web What Is Association Rule Mining? Association rule mining: Finding interesting relationships among items (or objects, events) in a given data set. Example: Basket data analysis

More information

SPATIO-TEMPORAL PATTERN MINING ON TRAJECTORY DATA USING ARM

SPATIO-TEMPORAL PATTERN MINING ON TRAJECTORY DATA USING ARM SPATIO-TEMPORAL PATTERN MINING ON TRAJECTORY DATA USING ARM S. Khoshahval a *, M. Farnaghi a, M. Taleai a a Faculty of Geodesy and Geomatics Engineering, K.N. Toosi University of Technology, Tehran, Iran

More information

Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries To appear in Data Mining and Knowledge Discovery, an International Journal c Kluwer Academic Publishers

More information

Preserving Privacy in Data Mining using Data Distortion Approach

Preserving Privacy in Data Mining using Data Distortion Approach Preserving Privacy in Data Mining using Data Distortion Approach Mrs. Prachi Karandikar #, Prof. Sachin Deshpande * # M.E. Comp,VIT, Wadala, University of Mumbai * VIT Wadala,University of Mumbai 1. prachiv21@yahoo.co.in

More information

Mining Temporal Patterns for Interval-Based and Point-Based Events

Mining Temporal Patterns for Interval-Based and Point-Based Events International Journal of Computational Engineering Research Vol, 03 Issue, 4 Mining Temporal Patterns for Interval-Based and Point-Based Events 1, S.Kalaivani, 2, M.Gomathi, 3, R.Sethukkarasi 1,2,3, Department

More information

Chapter 2 Quality Measures in Pattern Mining

Chapter 2 Quality Measures in Pattern Mining Chapter 2 Quality Measures in Pattern Mining Abstract In this chapter different quality measures to evaluate the interest of the patterns discovered in the mining process are described. Patterns represent

More information

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Exploring Spatial Relationships for Knowledge Discovery in Spatial Norazwin Buang

More information

The Market-Basket Model. Association Rules. Example. Support. Applications --- (1) Applications --- (2)

The Market-Basket Model. Association Rules. Example. Support. Applications --- (1) Applications --- (2) The Market-Basket Model Association Rules Market Baskets Frequent sets A-priori Algorithm A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small set

More information

CS5112: Algorithms and Data Structures for Applications

CS5112: Algorithms and Data Structures for Applications CS5112: Algorithms and Data Structures for Applications Lecture 19: Association rules Ramin Zabih Some content from: Wikipedia/Google image search; Harrington; J. Leskovec, A. Rajaraman, J. Ullman: Mining

More information

Mining Frequent Items in a Stream Using Flexible Windows (Extended Abstract)

Mining Frequent Items in a Stream Using Flexible Windows (Extended Abstract) Mining Frequent Items in a Stream Using Flexible Windows (Extended Abstract) Toon Calders, Nele Dexters and Bart Goethals University of Antwerp, Belgium firstname.lastname@ua.ac.be Abstract. In this paper,

More information

Machine Learning: Pattern Mining

Machine Learning: Pattern Mining Machine Learning: Pattern Mining Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Wintersemester 2007 / 2008 Pattern Mining Overview Itemsets Task Naive Algorithm Apriori Algorithm

More information

The Journal of Database Marketing, Vol. 6, No. 3, 1999, pp Retail Trade Area Analysis: Concepts and New Approaches

The Journal of Database Marketing, Vol. 6, No. 3, 1999, pp Retail Trade Area Analysis: Concepts and New Approaches Retail Trade Area Analysis: Concepts and New Approaches By Donald B. Segal Spatial Insights, Inc. 4938 Hampden Lane, PMB 338 Bethesda, MD 20814 Abstract: The process of estimating or measuring store trade

More information

Impact of Data Characteristics on Recommender Systems Performance

Impact of Data Characteristics on Recommender Systems Performance Impact of Data Characteristics on Recommender Systems Performance Gediminas Adomavicius YoungOk Kwon Jingjing Zhang Department of Information and Decision Sciences Carlson School of Management, University

More information

Quantitative Association Rule Mining on Weighted Transactional Data

Quantitative Association Rule Mining on Weighted Transactional Data Quantitative Association Rule Mining on Weighted Transactional Data D. Sujatha and Naveen C. H. Abstract In this paper we have proposed an approach for mining quantitative association rules. The aim of

More information

Standardizing Interestingness Measures for Association Rules

Standardizing Interestingness Measures for Association Rules Standardizing Interestingness Measures for Association Rules arxiv:138.374v1 [stat.ap] 16 Aug 13 Mateen Shaikh, Paul D. McNicholas, M. Luiza Antonie and T. Brendan Murphy Department of Mathematics & Statistics,

More information

Finding All Minimal Infrequent Multi-dimensional Intervals

Finding All Minimal Infrequent Multi-dimensional Intervals Finding All Minimal nfrequent Multi-dimensional ntervals Khaled M. Elbassioni Max-Planck-nstitut für nformatik, Saarbrücken, Germany; elbassio@mpi-sb.mpg.de Abstract. Let D be a database of transactions

More information

CS 584 Data Mining. Association Rule Mining 2

CS 584 Data Mining. Association Rule Mining 2 CS 584 Data Mining Association Rule Mining 2 Recall from last time: Frequent Itemset Generation Strategies Reduce the number of candidates (M) Complete search: M=2 d Use pruning techniques to reduce M

More information

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany Mining Rank Data Sascha Henzgen and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {sascha.henzgen,eyke}@upb.de Abstract. This paper addresses the problem of mining rank

More information

10/19/2017 MIST.6060 Business Intelligence and Data Mining 1. Association Rules

10/19/2017 MIST.6060 Business Intelligence and Data Mining 1. Association Rules 10/19/2017 MIST6060 Business Intelligence and Data Mining 1 Examples of Association Rules Association Rules Sixty percent of customers who buy sheets and pillowcases order a comforter next, followed by

More information

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Mining Molecular Fragments: Finding Relevant Substructures of Molecules Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli

More information

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

DATA MINING LECTURE 3. Frequent Itemsets Association Rules DATA MINING LECTURE 3 Frequent Itemsets Association Rules This is how it all started Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases.

More information

Interestingness Measures for Data Mining: A Survey

Interestingness Measures for Data Mining: A Survey Interestingness Measures for Data Mining: A Survey LIQIANG GENG AND HOWARD J. HAMILTON University of Regina Interestingness measures play an important role in data mining, regardless of the kind of patterns

More information

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on

More information

An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers

An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers Fernando Berzal, Juan-Carlos Cubero, Nicolás Marín, and José-Luis Polo Department of Computer Science

More information

Chapters 6 & 7, Frequent Pattern Mining

Chapters 6 & 7, Frequent Pattern Mining CSI 4352, Introduction to Data Mining Chapters 6 & 7, Frequent Pattern Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining Chapters

More information

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University Association Rules CS 5331 by Rattikorn Hewett Texas Tech University 1 Acknowledgements Some parts of these slides are modified from n C. Clifton & W. Aref, Purdue University 2 1 Outline n Association Rule

More information

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation 12.3.2008 Lauri Lahti Association rules Techniques for data mining and knowledge discovery in databases

More information

Lecture Notes for Chapter 6. Introduction to Data Mining

Lecture Notes for Chapter 6. Introduction to Data Mining Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

Processing Count Queries over Event Streams at Multiple Time Granularities

Processing Count Queries over Event Streams at Multiple Time Granularities Processing Count Queries over Event Streams at Multiple Time Granularities Aykut Ünal, Yücel Saygın, Özgür Ulusoy Department of Computer Engineering, Bilkent University, Ankara, Turkey. Faculty of Engineering

More information

ASSOCIATION RULE MINING AND CLASSIFIER APPROACH FOR QUANTITATIVE SPOT RAINFALL PREDICTION

ASSOCIATION RULE MINING AND CLASSIFIER APPROACH FOR QUANTITATIVE SPOT RAINFALL PREDICTION ASSOCIATION RULE MINING AND CLASSIFIER APPROACH FOR QUANTITATIVE SPOT RAINFALL PREDICTION T.R. SIVARAMAKRISHNAN 1, S. MEGANATHAN 2 1 School of EEE, SASTRA University, Thanjavur, 613 401 2 Department of

More information

[11] [3] [1], [12] HITS [4] [2] k-means[9] 2 [11] [3] [6], [13] 1 ( ) k n O(n k ) 1 2

[11] [3] [1], [12] HITS [4] [2] k-means[9] 2 [11] [3] [6], [13] 1 ( ) k n O(n k ) 1 2 情報処理学会研究報告 1,a) 1 1 2 IT Clustering by Clique Enumeration and Data Cleaning like Method Abstract: Recent development on information technology has made bigdata analysis more familiar in research and industrial

More information

Electrical thunderstorm nowcasting using lightning data mining

Electrical thunderstorm nowcasting using lightning data mining Data Mining VII: Data, Text and Web Mining and their Business Applications 161 Electrical thunderstorm nowcasting using lightning data mining C. A. M. Vasconcellos 1, C. L. Curotto 2, C. Benetti 1, F.

More information

Mining Knowledge in Geographical Data. Krzysztof Koperski Jiawei Han Junas Adhikary. spatial access methods.

Mining Knowledge in Geographical Data. Krzysztof Koperski Jiawei Han Junas Adhikary. spatial access methods. Mining Knowledge in Geographical Data Krzysztof Koperski Jiawei Han Junas Adhikary Huge amounts of data have been stored in databases, data warehouses, geographic information systems, and other information

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #12: Frequent Itemsets Seoul National University 1 In This Lecture Motivation of association rule mining Important concepts of association rules Naïve approaches for

More information

Pattern Structures 1

Pattern Structures 1 Pattern Structures 1 Pattern Structures Models describe whole or a large part of the data Pattern characterizes some local aspect of the data Pattern is a predicate that returns true for those objects

More information

Ranking Sequential Patterns with Respect to Significance

Ranking Sequential Patterns with Respect to Significance Ranking Sequential Patterns with Respect to Significance Robert Gwadera, Fabio Crestani Universita della Svizzera Italiana Lugano, Switzerland Abstract. We present a reliable universal method for ranking

More information

Statistical Privacy For Privacy Preserving Information Sharing

Statistical Privacy For Privacy Preserving Information Sharing Statistical Privacy For Privacy Preserving Information Sharing Johannes Gehrke Cornell University http://www.cs.cornell.edu/johannes Joint work with: Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh

More information

NEGATED ITEMSETS OBTAINING METHODS FROM TREE- STRUCTURED STREAM DATA

NEGATED ITEMSETS OBTAINING METHODS FROM TREE- STRUCTURED STREAM DATA NEGATED ITEMSETS OBTAINING METHODS FROM TREE- STRUCTURED STREAM DATA JURYON PAIK Pyeongtaek University, Department of Digital Information and Statistics, Gyeonggi-do 17869, South Korea E-mail: jrpaik@ptu.ac.kr

More information

Spazi vettoriali e misure di similaritá

Spazi vettoriali e misure di similaritá Spazi vettoriali e misure di similaritá R. Basili Corso di Web Mining e Retrieval a.a. 2009-10 March 25, 2010 Outline Outline Spazi vettoriali a valori reali Operazioni tra vettori Indipendenza Lineare

More information

Two Novel Pioneer Objectives of Association Rule Mining for high and low correlation of 2-varibales and 3-variables

Two Novel Pioneer Objectives of Association Rule Mining for high and low correlation of 2-varibales and 3-variables Two Novel Pioneer Objectives of Association Rule Mining for high and low correlation of 2-varibales and 3-variables Hemant Kumar Soni #1, a, Sanjiv Sharma *2, A.K. Upadhyay #3 1,3 Department of Computer

More information

Measurement and Data. Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality

Measurement and Data. Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality Measurement and Data Topics: Types of Data Distance Measurement Data Transformation Forms of Data Data Quality Importance of Measurement Aim of mining structured data is to discover relationships that

More information

CS 412 Intro. to Data Mining

CS 412 Intro. to Data Mining CS 412 Intro. to Data Mining Chapter 6. Mining Frequent Patterns, Association and Correlations: Basic Concepts and Methods Jiawei Han, Computer Science, Univ. Illinois at Urbana -Champaign, 2017 1 2 3

More information

Frequent Itemset Mining

Frequent Itemset Mining ì 1 Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM COCONUT Team (PART I) IMAGINA 17/18 Webpage: http://www.lirmm.fr/~lazaar/teaching.html Email: lazaar@lirmm.fr 2 Data Mining ì Data Mining (DM) or Knowledge

More information

Guaranteeing the Accuracy of Association Rules by Statistical Significance

Guaranteeing the Accuracy of Association Rules by Statistical Significance Guaranteeing the Accuracy of Association Rules by Statistical Significance W. Hämäläinen Department of Computer Science, University of Helsinki, Finland Abstract. Association rules are a popular knowledge

More information

Discovery of Frequent Word Sequences in Text. Helena Ahonen-Myka. University of Helsinki

Discovery of Frequent Word Sequences in Text. Helena Ahonen-Myka. University of Helsinki Discovery of Frequent Word Sequences in Text Helena Ahonen-Myka University of Helsinki Department of Computer Science P.O. Box 26 (Teollisuuskatu 23) FIN{00014 University of Helsinki, Finland, helena.ahonen-myka@cs.helsinki.fi

More information

Finding Maximal Fully-Correlated Itemsets in Large Databases

Finding Maximal Fully-Correlated Itemsets in Large Databases Finding Maximal Fully-Correlated Itemsets in Large Databases Lian Duan Department of Management Sciences The University of Iowa Iowa City, IA 52241 USA lian-duan@uiowa.edu W. Nick Street Department of

More information

Fuzzy Logic Controller Based on Association Rules

Fuzzy Logic Controller Based on Association Rules Annals of the University of Craiova, Mathematics and Computer Science Series Volume 37(3), 2010, Pages 12 21 ISSN: 1223-6934 Fuzzy Logic Controller Based on Association Rules Ion IANCU and Mihai GABROVEANU

More information

CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS. Due to elements of uncertainty many problems in this world appear to be

CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS. Due to elements of uncertainty many problems in this world appear to be 11 CHAPTER 2: DATA MINING - A MODERN TOOL FOR ANALYSIS Due to elements of uncertainty many problems in this world appear to be complex. The uncertainty may be either in parameters defining the problem

More information

On Objective Measures of Rule Surprisingness.

On Objective Measures of Rule Surprisingness. On Objective Measures of Rule Surprisingness. Alex A. Freitas CEFET-PR (Federal Center of Technological Education), DAINF (Dept. of Informatics) Av. Sete de Setembro, 3165, Curitiba - PR, 80230-901, Brazil

More information

Leverage Sparse Information in Predictive Modeling

Leverage Sparse Information in Predictive Modeling Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from

More information

Association Rules. Fundamentals

Association Rules. Fundamentals Politecnico di Torino Politecnico di Torino 1 Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket counter Association rule

More information

Constraint-Based Rule Mining in Large, Dense Databases

Constraint-Based Rule Mining in Large, Dense Databases Appears in Proc of the 5th Int l Conf on Data Engineering, 88-97, 999 Constraint-Based Rule Mining in Large, Dense Databases Roberto J Bayardo Jr IBM Almaden Research Center bayardo@alummitedu Rakesh Agrawal

More information

Using Image Moment Invariants to Distinguish Classes of Geographical Shapes

Using Image Moment Invariants to Distinguish Classes of Geographical Shapes Using Image Moment Invariants to Distinguish Classes of Geographical Shapes J. F. Conley, I. J. Turton, M. N. Gahegan Pennsylvania State University Department of Geography 30 Walker Building University

More information

Mining Strongly Correlated Intervals with Hypergraphs

Mining Strongly Correlated Intervals with Hypergraphs Mining Strongly Correlated Intervals with Hypergraphs Hao Wang 1(B),DejingDou 1, Yue Fang 2, and Yongli Zhang 2 1 Computer and Information Science, University of Oregon, Eugene, OR, USA {csehao,dou}@cs.uoregon.edu

More information

Sequential Pattern Mining

Sequential Pattern Mining Sequential Pattern Mining Lecture Notes for Chapter 7 Introduction to Data Mining Tan, Steinbach, Kumar From itemsets to sequences Frequent itemsets and association rules focus on transactions and the

More information

A Multistage Modeling Strategy for Demand Forecasting

A Multistage Modeling Strategy for Demand Forecasting Paper SAS5260-2016 A Multistage Modeling Strategy for Demand Forecasting Pu Wang, Alex Chien, and Yue Li, SAS Institute Inc. ABSTRACT Although rapid development of information technologies in the past

More information

Removing trivial associations in association rule discovery

Removing trivial associations in association rule discovery Removing trivial associations in association rule discovery Geoffrey I. Webb and Songmao Zhang School of Computing and Mathematics, Deakin University Geelong, Victoria 3217, Australia Abstract Association

More information

Association Analysis. Part 1

Association Analysis. Part 1 Association Analysis Part 1 1 Market-basket analysis DATA: A large set of items: e.g., products sold in a supermarket A large set of baskets: e.g., each basket represents what a customer bought in one

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - 1DL36 Fall 212" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht12 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala

More information

Association Analysis 1

Association Analysis 1 Association Analysis 1 Association Rules 1 3 Learning Outcomes Motivation Association Rule Mining Definition Terminology Apriori Algorithm Reducing Output: Closed and Maximal Itemsets 4 Overview One of

More information