On Minimal Infrequent Itemset Mining
|
|
- Lenard Wood
- 6 years ago
- Views:
Transcription
1 On Minimal Infrequent Itemset Mining David J. Haglin and Anna M. Manning Abstract A new algorithm for minimal infrequent itemset mining is presented. Potential applications of finding infrequent itemsets include statistical disclosure risk assessment, bioinformatics, and fraud detection. This is the first algorithm designed specifically for finding these rare itemsets. Many itemset properties used implicitly in the algorithm are proved. The problem is shown to be N P-complete. Experimental results are then presented. I. INTRODUCTION Because of its importance to finding association rules, much attention has been given to the problem of finding itemsets that appear frequently in a dataset (cf. [1] [4]). There have been hundreds of papers as well as workshops at conferences (e.g. FIMI 3 and FIMI 4 at IEEE ICDM 3 and IEEE ICDM 4) devoted to this subject. The definition of frequent often includes the notion of an integer threshold parameter, τ, delineating those itemset patterns considered frequent appearing at least τ times in the dataset from those patterns considered infrequent. While it is possible to consider very small values of τ, the focus is most often placed on finding very frequent patterns. Relatively less attention has been paid to infrequent itemsets. Yet they have many potential applications, including: 1) statistical disclosure risk assessment where rare patterns in anonymized census data can lead to statistical disclosure; 2) bioinformatics where rare patterns in microarray data may suggest genetic disorders; and 3) fraud detection where rare patterns in financial or tax data may suggest unusual activity associated with fraudulent behavior. In this paper we present a new algorithm for finding minimal infrequent patterns. This is the first algorithm designed specifically for finding minimal infrequent itemsets. It is based upon the SUDA2 algorithm developed for finding minimal unique itemsets (itemsets with no unique proper subsets) [8], [9]. We then show that the minimal infrequent itemset problem is N P-complete. Finally, experimental results are presented. II. PROBLEM SPECIFICATION Let I = {i 1,i 2,...,i L } be a set of items. An itemset is a subset I I. The cardinality of I, denoted by I, is the number of items in the itemset. As a shorthand, we will David J. Haglin is with the Department of Computer and Information Sciences, Minnesota State University, Mankato, MN 561, USA (david.haglin@mnsu.edu), fax: Anna M. Manning is with the School of Computer Science, University of Manchester, Oxford Rd., Manchester, M13 9PL, UK (anna@manchester.ac.uk), fax: write c-itemset to mean an itemset of cardinality c. A dataset, D = {t 1,t 2,...,t R }, is a collection of R transactions (sometimes called records) of the form t i = (i,t i ), where i is the transaction identifier () and T i I. We denote by D the number of transactions in the dataset. Given an itemset I, a transaction T is said to contain I if I T. The support set of an itemset I with respect to the dataset D is D(I) = {t i D : I T i }. The support of an itemset I in dataset D is the cardinality of the support set of I. That is, Supp D (I) = D(I). The relative support of an itemset, defined as Supp D (I)/ D, is a number between and 1 inclusive. Given a dataset D and an integer threshold τ, we say an itemset I is: τ-occurrent if D(I) = τ τ-frequent if D(I) τ τ-infrequent if D(I) < τ To describe an itemset as unique, we can either say it is 1-occurrent or it is 2-infrequent. In addition, we say an itemset is: minimal τ-occurrent if it is τ-occurrent and all of its proper subsets are (τ + 1)-frequent; minimal τ-infrequent if it is τ-infrequent and all of its proper subsets are τ-frequent; maximal τ-occurrent if it is τ-occurrent and all of its proper supersets are τ-infrequent; and maximal τ-frequent if it is τ-frequent and all of its proper supersets are τ-infrequent. Since there are datasets known to produce exponentially many τ-frequent itemsets [7], many strategies to compress the output have been considered. One obvious strategy is to find only maximal τ-frequent itemsets. Similarly, for τ- infrequent itemsets, it may be enough to find only minimal τ-infrequent itemsets. We assume that the input is given in binary matrix form where the number of rows is R, the number of columns is L, and an entry at (x,y) is a 1 if and only if the transaction whose = x contains item y. III. ALGORITHM MINIT We recently introduced an algorithm called SUDA2 [8], [9], which finds minimal unique itemsets (MUIs) in a dataset with different properties than those defined above. With certain parameter settings both SUDA2 and our new algorithm should find MUIs. However, the input datasets differ enough to render comparing running times between these two algorithms meaningless.
2 A. Dataset differences between MINIT and SUDA2 The easiest way to describe the differences in dataset properties is to consider the matrix form. For traditional itemset mining, the matrix consists of binary entries. But for SUDA2, the matrix entries can contain any integer. We can transform a SUDA2-type matrix into a binary matrix by enumerating all of the <column, value> pairs. For each of these pairs, a column is created in the transformed binary matrix. For every value in a column in the SUDA2-type input matrix, the corresponding <column, value> location in the transformed binary matrix is given a one. For example, if the first column of the SUDA2-type matrix contains integers in the range of to 2, then the transformed matrix will have three first columns with a 1 in the specific column indicating the integer value in the SUDA2- type matrix. The essential difference between the SUDA2- type matrix and the traditional binary matrix datasets is the added constraint that among collections of columns such as the three columns corresponding to the first SUDA2-type column of to 2 values there is exactly one 1 value and the rest are values in every row. B. The MINIT algorithm Our new algorithm can be adapted to handle the more traditional dataset definition and to handle finding minimal τ-infrequent itemsets (MIIs). We call this adaption MINIT, for MINimal Infrequent itemsets. Initially, a ranking of items is prepared by computing the support of each of the items and then creating a list of items in ascending order of support. Minimal τ-infrequent itemsets are discovered by considering each item i j in rank order, recursively calling MINIT on the support set of the dataset with respect to i j considering only those items with higher rank than i j, and then checking each candidate MII against the original dataset. One mechanism that can be used to consider only higher-ranking items in the recursion is to maintain a liveness vector indicating which items remain viable at each level of the recursion. The initial call to the recursive algorithm presented in Algorithm 1 is MINIT(D, V [1:L], maxc). The liveness vector, V [1:L], is initialized to all true values and must be passed by value to lower levels of the recursion as a unique copy of this vector is required at every node in the recursion tree. For those inputs that require prohibitively large running times, supplying a limit for maxc may result in enough useful information computable within a reasonable amount of time. For those easier datasets, setting maxc to L will find all MIIs. A significant computational effort of MINIT is to search through the dataset for transactions that hold a specific item. To help this search occur quickly, we pre-process the dataset by building linked lists of s for each item. Essentially, we pre-compute D({i j }) for 1 j L by creating linked lists of pointers to the transactions. We also arrange the items in ascending order by support, which is required in order for MINIT to work correctly. Note that some of the items may Algorithm 1 MINIT(D, V [1:L], maxc) 1: Input: D = input dataset with N rows and L columns of binary numbers 2: Input: V [1:L] = a boolean vector indicating viability of each item 3: Input: maxc = upper bound on cardinality of MII to find in the search 4: Returns: A listing of all MIIs for dataset D with cardinality less than maxc 5: compute R list of all items in D in ascending order of support 6: if maxc == 1 /* stopping condition */ then 7: Return all items of R that appear less than τ times in D 8: else 9: M 1: for each item i j R do 11: D j D({i j }) 12: V [i j ] false 13: C j recursive call to MINIT(D j, V, maxc-1) 14: for each candidate itemset I C j do 15: if I {i j } is an MII in D then 16: M M (I {i j } ) 17: end if 18: end for 19: end for 2: Return M 21: end if be discarded (considered non-viable) in this pre-processing phase for reasons such as having a support equal to D (an item appearing in every transaction cannot be part of any MII). Whenever MINIT descends one level of the recursion a new sub-dataset is built to represent the support set D({i j }). In the interest of memory efficiency, MINIT maintains only one copy of the dataset. A linked list of s for those transactions in the sub-dataset is constructed to represent a sub-dataset. To support the recursion of MINIT, the new sub-dataset must be pre-processed. There are three tasks required to perform the pre-processing: 1) computing the support of each item, which is needed to produce a rank-ordering of the viable items by support within D({i j }); 2) determining the viability of each item (i.e., pruning some of the items from consideration); and 3) computing the support set of each viable item resulting in a memory efficient representation of the lists of s for each support set D({i j,i k }) for 1 k L, j k. Observe that the support set of {i k } within D({i j }) is the same as D({i j,i k }). In practice, MINIT can perform the first and third tasks concurrently to avoid two passes over the data. D({i j,i k }) can be computed and the size of that list is then used as
3 the support of i k in D({i j }). The second task can then be performed and the data structures built for the support sets of those pruned items are discarded. When most of the items remain viable, the technique of computing all support sets and deriving support from them is very effective. However, if many of the items are discarded as non-viable, building the support set lists for these discarded items is wasted effort. The statement at line 7 of Algorithm 1 can be modified to return all items of R that appear exactly τ times in D which transforms MINIT into an algorithm that finds all τ-occurrent rather than all τ-infrequent itemsets. IV. MINIMAL ITEMSET PROPERTIES We present properties of MIIs that MINIT relies upon in order to correctly find all MIIs. These properties are adapted from those presented in [9] for the SUDA2 algorithm and dataset characteristics. Consider a minimal τ-infrequent c-itemset I. By definition, I must have the following property. Property 1 (Rareness Property): If itemset I is a MII, then supp D (I) < τ. We note that if an itemset is minimal τ-infrequent, then it must be minimal δ-occurrent for some δ < τ. However, a minimal δ-occurrent c-itemset, I, is not necessarily minimal τ-infrequent for all τ > δ; there may be some size (c 1) subset of I with support ǫ, for δ < ǫ < τ. The following theorem shows that certain itemsets must exist within the dataset, D, in order for an MII to exist. We call the transactions holding those certain itemsets support rows. Theorem 2 (Support Row Property): Given a minimal τ- infrequent c-itemset I = {i 1,i 2,...,i c }, with Supp D (I) = δ, δ < τ, for each 1 j c there must exist τ δ support rows in D containing itemset I {i j } (but not item i j ). Proof: Suppose for some j there exists fewer than τ δ rows containing I = I {i j } and not i j. Then I, which exists in D(I), has support Supp D (I ) < τ, and therefore I is not minimal τ-infrequent. Observe that a support row works for only one item in I, thus there are at least c(τ δ) support rows. The existence of c(τ δ) support rows in D is both a necessary and sufficient condition for I to be minimal, as seen by the previous theorem and the following lemma. Lemma 3: Given an itemset I = {i 1,...,i c } that is τ- infrequent in D, with Supp D (I) = δ for δ < τ, and D has c(τ δ) support rows containing the c subsets of I of size c 1, then I is a minimal τ-infrequent itemset. Proof: Suppose I is not minimal. Then there exists J I that is also τ-infrequent within D. Clearly, from Theorem 2, J < c 1. However, it must be true that J I, where I is one of the c subsets of I of size c 1. Since J I, J appears in the δ rows holding I. Moreover, since J I, Theorem 2 states that J must appear in the τ δ support rows for I. Thus, J appears in at least τ rows in D so is not τ-infrequent. This leads to the following observation as to the minimum support required of a single item in order to appear in a τ- infrequent c-itemset. Theorem 4 (Minimum Support Property): Given a fixed τ and itemset cardinality c, an item i must have support Supp D ({i} ) c + τ 2 in order for i to be part of a minimal τ-infrequent c-itemset I. Proof: Let I be a minimal τ-infrequent itemset with I = c and Supp D (I) = δ. If i I, then i must appear in at least the δ reference rows of I and (c 1)(τ δ) support rows for the c 1 subsets of I that contain i and have cardinality c 1. Thus, Supp D ({i} ) δ + (c 1)(τ δ) = c(τ δ) + (2δ τ) = cτ τ + 2δ cδ Let δ = τ r, for some integer r >. Then, Supp D ({i} ) cτ τ + 2(τ r) c(τ r) = τ + r(c 2) For a fixed c > 2 and τ, this is minimum when r = 1 (i.e., δ = τ 1). The Minimum Support Property can give an efficient way to prune significant areas of the search space if several items have low support counts. Corollary 5 (Uniform Support Property): Given a dataset D and item i contained in every transaction of D, then i cannot be contained in any minimal τ-infrequent itemset I. Proof: If I is a minimal τ-infrequent itemset in D containing item i, then the Support Row Property ensures the existence of a row in D containing I {i} and not i. Since i appears in every transaction in D we have a contradiction. V. RECURSIVE ITEMSET PROPERTIES Given dataset D and some item a (called an anchor item), we consider the itemset properties of D and D({a} ) based upon the recursive algorithm 1. Lemma 6: Given I = {i 1,...,i c } is a minimal τ- infrequent itemset in D, for each anchor a = i j, 1 j c, the itemset I a = I {i j } is a minimal τ-infrequent itemset in D({a} ). Moreover, I a = I 1. Proof: Without loss of generality, fix j in the range of 1 to k inclusive. Let a = i j. Since every row in D({a} ) contains item a, the only way I a could appear at least τ times in D({a} ) is if I appeared in at least τ rows of D. Therefore, I a is τ-infrequent in D({a} ). Similarly, if I a is not minimal τ-infrequent in D({a} ), then there exists I a I a that is also τ-infrequent in D({a} ).
4 But I a {a} would also be τ-infrequent in D and is a proper subset of I. Hence, I a must be minimal τ-infrequent in D({a} ). Unfortunately, it is not the case that all minimal τ- infrequent itemsets in D({a} ) lead to minimal τ-infrequent itemsets in the original dataset D. However, the following theorem provides a method for finding those that do. Theorem 7: Given a dataset D, an item a, and I a as a minimal τ-infrequent itemset in D({a} ) with Supp D({a} ) (I a ) = δ, the itemset I = I a {a} is a minimal τ-infrequent itemset in D if and only if there exists τ δ rows in D containing I a but not containing item a. Proof: If I is a minimal τ-infrequent itemset in D with Supp D (I) = δ, then the Support Row Property ensures the existence of τ δ rows in D containing I a but not containing item a. For the other direction, assume τ δ rows exist in D containing I a but not a and that I a is a minimal τ-infrequent itemset in D({a} ). Note that Supp D (I) = δ. All that is required to show is that I has the requisite c(τ δ) support rows. Each of the (c 1)(τ δ) support rows in D({a} ), augmented with item a, form a support row in D for the itemset I. As all of these (c 1)(τ δ) support rows contain item a, the only other support rows needed to ensure I is a minimal τ-infrequent itemset in D are the τ δ rows stated in the theorem. Since I is τ-infrequent and since c(τ δ) support rows exist in D, I must be a minimal τ-infrequent itemset in D. The Recursive Property of MIIs helps define the boundaries of the search space by providing a clear indication of the maximum cardinality of candidate MIIs. VI. EXAMPLE To help understand the recursive algorithm and pruning techniques, we present the following example. Consider the input dataset as shown in Table I(a). We will follow the discovery of the 2-occurrent itemset I = {2,4,5}, which consists of ranks: 1,2,4. (a) Dataset 1 1,2, 4, 5,6 2 2,3, 4, 5,6 3 1,2, 3, 4,6 4 1,2, 3, 5,6 5 1,3, 4, 5,6 6 1,3, 6 7 1,2, 5 TABLE I EXAMPLE DATASET (b) Rank Items Rank Item Support Algorithm 1 finds itemset I by first computing the rank order of items as shown in Table I(b). Lemma 6 indicates that any itemset I that is τ-occurrent in D and has smallest item rank 1 (i.e., item {4} ) will have I {4} as a τ-occurrent itemset in D({4} ). So we compute D({4} ) as shown in Table II(a). Note that we can ignore item 6 in D({4} ) by Corollary 5. This brings us down one level in the recursion tree as we explore D({4} ). (a) Dataset 1 1, 2, 4,5, 6 2 2, 3, 4,5, 6 3 1, 2, 3,4, 6 5 1, 3, 4,5, 6 TABLE II DATASET D({4}) (b) Rank Items Rank Item Support As we enter Algorithm 1 recursively, we first construct a new rank item list for the dataset at this recursion level (see Table II(b)). For the second iteration of the loop at line 1 of Algorithm 1, the anchor item will be 2. Descending the recursion tree for this anchor will produce the dataset in Table III(a). (a) Dataset 1 1,2, 4, 5,6 2 2,3, 4, 5,6 3 1,2, 3, 4,6 TABLE III DATASET D({2,4}) (b) Viable items 1 1,5 2 3,5 3 1,3 Note that each of the viable items 1, 3, 5 all have support at or below our threshold τ = 2. So this recursion tree node returns the itemset list {{1}, {3}, {5} } to the next higher recursion node. To determine {2, 5} is a 2-occurrent itemset in D({4} ), we need only find sufficient support rows as described in Theorem 7. Observe that 5 in Table II(a) contains item 5 but does not contain 2. This one support row, along with a support of 2 for item 5 in D({4} ), is enough to conclude that {2, 5} is indeed a 2-occurrent itemset in D({4} ). It will therefore be included in the collection of itemsets passed up to the parent node of the recursion tree. At the root node of the recursion tree, the candidate itemset {2,5} is merged with item {4}. This candidate {2,4,5} is then checked for qualification as a 2-occurrent itemset in D, using Theorem 7. Since {2,5} has support 2 in D({4} ), one support row in D is sufficient. That support row is 4. Since this is the top level of the recursion, we conclude that {2,4,5} is a 2-occurrent itemset in D. We note that for Algorithm 1 to find a minimal τ- infrequent c-itemset, I, it must explore c levels of the recursion tree. Each level of the tree corresponds to one of the items in I. The item associated with the bottom level of the tree has support in that bottom-level dataset equal to the support of I in the original dataset D. In fact, the support remains the same at each level of the recursion tree.
5 VII. COMPLEXITY OF MINIMAL τ -OCCURRENT ITEMSET A. Variations of the problem MINING For a problem such as finding minimal τ-occurrent itemsets, there are variations that have important implications to the complexity of the problem. We consider the following problem variations in increasing order of computational difficulty: 1) The simplest form of a minimal τ-occurrent problem is a decision problem where the objective is, for a given input dataset, to determine if there exists any minimal τ-occurrent itemset and merely answer yes or no. 2) The next harder problem is a search problem where the objective is, for a given input dataset, to find one (any) minimal τ-occurrent itemset and print out the solution. There are actually two sub-variations to the search version of the problem: (i) find any solution for a specific record in the input dataset; and (ii) find any solution in any record in the input dataset. We note that the computationally easier form of the two subvariations is the less restrictive search problem. 3) The objective of the counting version is, for a given input dataset, to determine the number of minimal τ- occurrent itemsets and to print out the number. Even though there may be an exponential number of minimal τ-occurrent itemsets in a given dataset, it may be possible to count them in polynomial time and print out a polynomial-size representation of the exponential count (e.g., in binary representation). 4) The most computationally challenging variation is, for a given input dataset, to find and print out all of the τ-occurrent itemsets. Some attention has recently been given to finding rare patterns in datasets [6], [8], [9]. From this perspective, it makes sense to search for minimal τ-occurrent itemsets. A special case of this problem is to set τ = 1 meaning only minimal unique itemsets are sought. Yang provides a nice complexity analysis of the four variations of the maximal τ-occurrent problem. The counting variation of the maximal τ-occurrent problem is #Pcomplete [7] whereas searching for a single solution is possible to do in polynomial time. We show that by seeking minimal rather than maximal τ-occurrent itemsets, even the simplest variation, the decision version, is N P-complete. B. Computational complexity of minimal τ-occurrent itemsets Our proof is based on a proof presented by Daishin Nakamura in [1] which addresses only the variation of searching in a specific record for a 1-occurrent itemset (i.e., minimal unique itemset). The proof is by a reduction from the Hitting Set problem (see [11] for N P-complete proof techniques). An instance of the Hitting Set problem, H = (p,c,k), is defined as Given a collection C = {C 1,...,C q } of subsets of a finite set S = {1,...,p} and a positive integer k p, determine whether there exists a subset S S with S k such that S contains at least one element from each set of C. Theorem 8: Given a dataset and a fixed constant t 1, to determine if there exists any τ-occurrent itemset in the dataset is N P-complete. Proof: Given an instance of the Hitting Set problem H = (p,c,k) construct a q p matrix: x 1,1 x 1,2... x 1,p x 2,1 x 2,2... x 2,p M =..... x q,1 x q,2... x q,p where x i,j = 1 if j C i and x i,j = otherwise. Observe that every subset of the columns of M corresponds to a subset S S in the Hitting Set problem. Moreover, S is a hitting set in H if and only if the subset of columns whose index numbers are in S induce a matrix projection with at least one 1-entry in every row. This can be seen as each row i of M corresponds to C i in H. Now denote by Z a (t p)-matrix of zeroes. Construct a dataset matrix D: D = Z M. M where the number of copies of M is τ + 1. Now find a minimum τ-occurrent itemset I in D. If I contained any 1-entries, then a record holding I must come from the M portion of D. However, since each row M appears τ + 1 times in D, such an itemset could not be τ-occurrent. So, I consists of only zeros and appears in the first t rows of zeros in D. Let S be the set of columns associated with I. Since I is t-occurrent, each row in M must contain a 1-entry in at least one of the columns in S. This corresponds directly to a solution to H. Therefore, any algorithm for finding τ-occurrent itemsets in a dataset, for any τ 1, can be used to solve the Hitting Set problem. VIII. EXPERIMENTAL RESULTS All of the experiments were run on Dual Core AMD Opteron Processor 27s running at 2GHz with 8GB of RAM. The datasets we use come from All of the datasets are in the proper format for MINIT. A. Mushroom Data The mushroom dataset contains 8124 transactions and 119 items in its inventory. This particular dataset does not challenge MINIT as it can run within 5 seconds for any
6 7 6 delta-infrequent delta-occurrent 1.6e+6 1.4e+6 maxc==6 maxc==7 maxc== e+6 Number of MIIs 4 3 Number of MIIs 1e Fig. 1. Mushroom Dataset Fig. 3. Chess Dataset MII Counts threshold 1 τ 8124 with no restriction on itemset cardinality. What we can see from this dataset is the number of minimal τ-infrequent and minimal τ-occurrent itemsets for varying values of the threshold τ (Figure 1). We expect that the number of τ-occurrent is less than the number of τ- infrequent. Figure 1 shows just how drastically different they are. Since the τ-infrequent itemsets includes all of the τ- occurrent, the remainder of this section focuses only on the computing minimal τ-infrequent itemsets. B. Chess Data The chess dataset contains 3196 transactions and 75 items in its inventory. Although this dataset is much smaller than the mushroom dataset, it presents significantly more of a computation challenge to MINIT. Seconds Fig. 2. Chess Dataset Computation Time maxc==8 We imposed limits on the cardinality of τ-infrequent itemsets. For each maxc of 6, 7, and 8, we ran trials varying the threshold τ. The running time for maxc == 8 is shown in Figure 2. To see the growth pattern, the numbers of minimal τ-infrequent itemsets are shown for maxc of 6, 7, and 8 in Figure 3. C. T1I4D1K Data The T1I4D1K dataset, generated by the software described in [12], contains 1 transactions and 87 items in its inventory. It has an average number of 1 items per transaction and an average support of 4 for each item. Seconds Fig. 4. T1I4D1K Dataset Computation Time We ran trials with no limit on the cardinality of the MIIs varying the threshold from 1 to 1. It is interesting to see that the computation time (Figure 4) drops faster than the number of MIIs (Figure 5). D. Connect4 Data The Connect-4 dataset contains all legal 8-ply positions in the game of connect-4 in which neither player has won yet, and in which the next move is not forced. There are transactions and 43 columns (one for each of the 42 connect- 4 squares together with an outcome column - win, draw or lose). Once this dataset in transformed into a binary format it contains 129 items since each cell in the original dataset holds one of three possible values. This dataset presents the most computational challenge to MINIT. We imposed a limit on the cardinality of maxc == 6.
7 1.1e+7 1e+7 9e+6 The number of MIIs (Figure 7) has a substantially slower decline than the numbers for the previous datasets. Although the running time also declines more slowly, the decline is even slower than the decline of the MII counts. Number of MinIIs 8e+6 7e+6 6e+6 5e+6 4e+6 3e+6 2e+6 1e Fig. 5. T1I4D1K Dataset MII Counts IX. CONCLUSIONS We present a new algorithm, MINIT, for finding minimal τ-infrequent or minimal τ-occurrent itemsets. The computation time required on the four datasets presented suggest a correlation between the number of MIIs and the amount of computation time required. It would be interesting to see how well MINIT could run in a parallel or grid environment. It would also be useful to find other pruning strategies to improve the running time requirements. ACKNOWLEDGMENT This work was supported in part by the National Science Foundation under grant CTS Seconds Number of MIIs 12 maxc== Fig. 6. Connect4 Dataset Computation Time 85 maxc== Fig. 7. Connect4 Dataset MII Counts REFERENCES [1] R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases, in Proceedings of the 1993 International Conference on Management of Data (SIGMOD 93), May 1993, pp [2] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo, Fast discovery of association rules, in Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Eds. The AAAI Press, Menlo Park, 1996, pp [3] S. Brin, R. Motwani, J. Ullman, and S. Tsur, Dynamic itemset counting and implication rules for market basket data, in Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. ACM Press New York, NY, USA, 1997, pp [4] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, New algorithms for fast discovery of association rules. in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1997, pp [5] E. Boros, V. Gurvich, L. Khachiyan, and K. Makino, On the complexity of generating maximal frequent and minimal infrequent sets, in Symposium on Theoretical Aspects of Computer Science, 22, pp [Online]. Available: citeseer.ist.psu.edu/boros2complexity.html [6] D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. S. Sharma, Discovering all most specific sentences, ACM Trans. Database Syst., vol. 28, no. 2, pp , 23. [7] G. Yang, Computational aspects of mining maximal frequent patterns, Theoretical Computer Science, vol. 362, pp , 26. [8] A. M. Manning and D. J. Haglin, A new algorithm for finding minimal sample uniques for use in statistical disclosure assessment, in IEEE International Conference on Data Mining (ICDM5), Nov. 25, pp [9] A. M. Manning, D. J. Haglin, and J. A. Keane, A recursive search algorithm for statistical disclosure assessment, Data Mining and Knowledge Discovery, 27, conditionally accepted. [1] A. Takemura, Minimum unsafe and maximum safe sets of variables for disclosure risk assessment of individual records in a microdata set, Journal of the Japan Statistical Society, vol. 32, no. 1, pp , 22. [11] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co., ISBN , [12] R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases, in VLDB 94: Proceedings of the 2th International Conference on Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1994, pp
Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen
Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi
More informationAssociation Rule. Lecturer: Dr. Bo Yuan. LOGO
Association Rule Lecturer: Dr. Bo Yuan LOGO E-mail: yuanb@sz.tsinghua.edu.cn Overview Frequent Itemsets Association Rules Sequential Patterns 2 A Real Example 3 Market-Based Problems Finding associations
More informationDistributed Mining of Frequent Closed Itemsets: Some Preliminary Results
Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Claudio Lucchese Ca Foscari University of Venice clucches@dsi.unive.it Raffaele Perego ISTI-CNR of Pisa perego@isti.cnr.it Salvatore
More informationPositive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise
Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Guimei Liu 1,2 Jinyan Li 1 Limsoon Wong 2 Wynne Hsu 2 1 Institute for Infocomm Research, Singapore 2 School
More informationAn Intersection Inequality for Discrete Distributions and Related Generation Problems
An Intersection Inequality for Discrete Distributions and Related Generation Problems E. Boros 1, K. Elbassioni 1, V. Gurvich 1, L. Khachiyan 2, and K. Makino 3 1 RUTCOR, Rutgers University, 640 Bartholomew
More informationFinding All Minimal Infrequent Multi-dimensional Intervals
Finding All Minimal nfrequent Multi-dimensional ntervals Khaled M. Elbassioni Max-Planck-nstitut für nformatik, Saarbrücken, Germany; elbassio@mpi-sb.mpg.de Abstract. Let D be a database of transactions
More informationChapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining
Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of
More informationDiscovery of Functional and Approximate Functional Dependencies in Relational Databases
JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 7(1), 49 59 Copyright c 2003, Lawrence Erlbaum Associates, Inc. Discovery of Functional and Approximate Functional Dependencies in Relational Databases
More informationOn Approximating Minimum Infrequent and Maximum Frequent Sets
On Approximating Minimum Infrequent and Maximum Frequent Sets Mario Boley Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany mario.boley@iais.fraunhofer.de Abstract. The maximum cardinality
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan
More informationMining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies
Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Juliano Brito da Justa Neves 1 Marina Teresa Pires Vieira {juliano,marina}@dc.ufscar.br Computer Science Department
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationGuaranteeing the Accuracy of Association Rules by Statistical Significance
Guaranteeing the Accuracy of Association Rules by Statistical Significance W. Hämäläinen Department of Computer Science, University of Helsinki, Finland Abstract. Association rules are a popular knowledge
More informationMining Molecular Fragments: Finding Relevant Substructures of Molecules
Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli
More informationFree-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries
Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries To appear in Data Mining and Knowledge Discovery, an International Journal c Kluwer Academic Publishers
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 6
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationA New Concise and Lossless Representation of Frequent Itemsets Using Generators and A Positive Border
A New Concise and Lossless Representation of Frequent Itemsets Using Generators and A Positive Border Guimei Liu a,b Jinyan Li a Limsoon Wong b a Institute for Infocomm Research, Singapore b School of
More informationTransaction Databases, Frequent Itemsets, and Their Condensed Representations
Transaction Databases, Frequent Itemsets, and Their Condensed Representations Taneli Mielikäinen HIIT Basic Research Unit Department of Computer Science University of Helsinki, Finland Abstract. Mining
More informationBottom-Up Propositionalization
Bottom-Up Propositionalization Stefan Kramer 1 and Eibe Frank 2 1 Institute for Computer Science, Machine Learning Lab University Freiburg, Am Flughafen 17, D-79110 Freiburg/Br. skramer@informatik.uni-freiburg.de
More informationFree-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries
Data Mining and Knowledge Discovery, 7, 5 22, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency
More informationQuantitative Association Rule Mining on Weighted Transactional Data
Quantitative Association Rule Mining on Weighted Transactional Data D. Sujatha and Naveen C. H. Abstract In this paper we have proposed an approach for mining quantitative association rules. The aim of
More informationSelf-duality of bounded monotone boolean functions and related problems
Discrete Applied Mathematics 156 (2008) 1598 1605 www.elsevier.com/locate/dam Self-duality of bounded monotone boolean functions and related problems Daya Ram Gaur a, Ramesh Krishnamurti b a Department
More informationMining Positive and Negative Fuzzy Association Rules
Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing
More informationTheory of Dependence Values
Theory of Dependence Values ROSA MEO Università degli Studi di Torino A new model to evaluate dependencies in data mining problems is presented and discussed. The well-known concept of the association
More informationAn Efficient Implementation of a Joint Generation Algorithm
An Efficient Implementation of a Joint Generation Algorithm E. Boros 1, K. Elbassioni 1, V. Gurvich 1, and L. Khachiyan 2 1 RUTCOR, Rutgers University, 640 Bartholomew Road, Piscataway, NJ 08854-8003,
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/17/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.
More informationMaintaining Frequent Itemsets over High-Speed Data Streams
Maintaining Frequent Itemsets over High-Speed Data Streams James Cheng, Yiping Ke, and Wilfred Ng Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Kowloon,
More informationMining Approximative Descriptions of Sets Using Rough Sets
Mining Approximative Descriptions of Sets Using Rough Sets Dan A. Simovici University of Massachusetts Boston, Dept. of Computer Science, 100 Morrissey Blvd. Boston, Massachusetts, 02125 USA dsim@cs.umb.edu
More informationAssociation Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University
Association Rules CS 5331 by Rattikorn Hewett Texas Tech University 1 Acknowledgements Some parts of these slides are modified from n C. Clifton & W. Aref, Purdue University 2 1 Outline n Association Rule
More informationD B M G Data Base and Data Mining Group of Politecnico di Torino
Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket
More informationFrequent Itemset Mining
ì 1 Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM COCONUT Team (PART I) IMAGINA 17/18 Webpage: http://www.lirmm.fr/~lazaar/teaching.html Email: lazaar@lirmm.fr 2 Data Mining ì Data Mining (DM) or Knowledge
More informationRemoving trivial associations in association rule discovery
Removing trivial associations in association rule discovery Geoffrey I. Webb and Songmao Zhang School of Computing and Mathematics, Deakin University Geelong, Victoria 3217, Australia Abstract Association
More informationAssociation Rules. Fundamentals
Politecnico di Torino Politecnico di Torino 1 Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket counter Association rule
More informationD B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.
Definitions Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Itemset is a set including one or more items Example: {Beer, Diapers} k-itemset is an itemset that contains k
More informationD B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example
Association rules Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket
More informationDATA MINING LECTURE 3. Frequent Itemsets Association Rules
DATA MINING LECTURE 3 Frequent Itemsets Association Rules This is how it all started Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases.
More informationSummarizing Transactional Databases with Overlapped Hyperrectangles
Noname manuscript No. (will be inserted by the editor) Summarizing Transactional Databases with Overlapped Hyperrectangles Yang Xiang Ruoming Jin David Fuhry Feodor F. Dragan Abstract Transactional data
More informationApproximating a Collection of Frequent Sets
Approximating a Collection of Frequent Sets Foto Afrati National Technical University of Athens Greece afrati@softlab.ece.ntua.gr Aristides Gionis HIIT Basic Research Unit Dept. of Computer Science University
More informationGenerating Partial and Multiple Transversals of a Hypergraph
Generating Partial and Multiple Transversals of a Hypergraph Endre Boros 1, Vladimir Gurvich 2, Leonid Khachiyan 3, and Kazuhisa Makino 4 1 RUTCOR, Rutgers University, 640 Bartholomew Road, Piscataway
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 04 Association Analysis Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
More informationReductions for Frequency-Based Data Mining Problems
Reductions for Frequency-Based Data Mining Problems Stefan Neumann University of Vienna Vienna, Austria Email: stefan.neumann@univie.ac.at Pauli Miettinen Max Planck Institute for Informatics Saarland
More informationFUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH
FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH M. De Cock C. Cornelis E. E. Kerre Dept. of Applied Mathematics and Computer Science Ghent University, Krijgslaan 281 (S9), B-9000 Gent, Belgium phone: +32
More informationModified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context.
Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context. Murphy Choy Cally Claire Ong Michelle Cheong Abstract The rapid explosion in retail data calls for more effective
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization
More informationOutline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion
Outline Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Introduction Algorithm Apriori Algorithm AprioriTid Comparison of Algorithms Conclusion Presenter: Dan Li Discussion:
More informationDual-Bounded Generating Problems: Weighted Transversals of a Hypergraph
Dual-Bounded Generating Problems: Weighted Transversals of a Hypergraph E. Boros V. Gurvich L. Khachiyan K. Makino January 15, 2003 Abstract We consider a generalization of the notion of transversal to
More informationPattern Space Maintenance for Data Updates. and Interactive Mining
Pattern Space Maintenance for Data Updates and Interactive Mining Mengling Feng, 1,3,4 Guozhu Dong, 2 Jinyan Li, 1 Yap-Peng Tan, 1 Limsoon Wong 3 1 Nanyang Technological University, 2 Wright State University
More informationRegression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules
International Journal of Innovative Research in Computer Scien & Technology (IJIRCST) Regression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules Mir Md Jahangir
More informationLecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity
Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on
More informationAn Approach to Classification Based on Fuzzy Association Rules
An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based
More informationRealization Plans for Extensive Form Games without Perfect Recall
Realization Plans for Extensive Form Games without Perfect Recall Richard E. Stearns Department of Computer Science University at Albany - SUNY Albany, NY 12222 April 13, 2015 Abstract Given a game in
More informationA Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms T. Vijayakumar 1, V.Nivedhitha 2, K.Deeba 3 and M. Sathya Bama 4 1 Assistant professor / Dept of IT, Dr.N.G.P College of Engineering
More informationReductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York
Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association
More informationOn Generating All Minimal Integer Solutions for a Monotone System of Linear Inequalities
On Generating All Minimal Integer Solutions for a Monotone System of Linear Inequalities E. Boros 1, K. Elbassioni 2, V. Gurvich 1, L. Khachiyan 2, and K. Makino 3 1 RUTCOR, Rutgers University, 640 Bartholomew
More informationAn Incremental RNC Algorithm for Generating All Maximal Independent Sets in Hypergraphs of Bounded Dimension
An Incremental RNC Algorithm for Generating All Maximal Independent Sets in Hypergraphs of Bounded Dimension E. Boros K. Elbassioni V. Gurvich L. Khachiyan Abstract We show that for hypergraphs of bounded
More informationSelecting a Right Interestingness Measure for Rare Association Rules
Selecting a Right Interestingness Measure for Rare Association Rules Akshat Surana R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad
More informationMining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany
Mining Rank Data Sascha Henzgen and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {sascha.henzgen,eyke}@upb.de Abstract. This paper addresses the problem of mining rank
More informationEditorial Manager(tm) for Data Mining and Knowledge Discovery Manuscript Draft
Editorial Manager(tm) for Data Mining and Knowledge Discovery Manuscript Draft Manuscript Number: Title: Summarizing transactional databases with overlapped hyperrectangles, theories and algorithms Article
More informationMachine Learning: Pattern Mining
Machine Learning: Pattern Mining Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Wintersemester 2007 / 2008 Pattern Mining Overview Itemsets Task Naive Algorithm Apriori Algorithm
More informationCOMP 5331: Knowledge Discovery and Data Mining
COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline Kamber, and Jian Pei And slides provide by Raymond
More informationAbout the relationship between formal logic and complexity classes
About the relationship between formal logic and complexity classes Working paper Comments welcome; my email: armandobcm@yahoo.com Armando B. Matos October 20, 2013 1 Introduction We analyze a particular
More informationIntroduction to Complexity Theory
Introduction to Complexity Theory Read K & S Chapter 6. Most computational problems you will face your life are solvable (decidable). We have yet to address whether a problem is easy or hard. Complexity
More informationConstraint-Based Rule Mining in Large, Dense Databases
Appears in Proc of the 5th Int l Conf on Data Engineering, 88-97, 999 Constraint-Based Rule Mining in Large, Dense Databases Roberto J Bayardo Jr IBM Almaden Research Center bayardo@alummitedu Rakesh Agrawal
More informationMining Free Itemsets under Constraints
Mining Free Itemsets under Constraints Jean-François Boulicaut Baptiste Jeudy Institut National des Sciences Appliquées de Lyon Laboratoire d Ingénierie des Systèmes d Information Bâtiment 501 F-69621
More informationMeelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05
Meelis Kull meelis.kull@ut.ee Autumn 2017 1 Sample vs population Example task with red and black cards Statistical terminology Permutation test and hypergeometric test Histogram on a sample vs population
More informationFrequent Itemsets and Association Rule Mining. Vinay Setty Slides credit:
Frequent Itemsets and Association Rule Mining Vinay Setty vinay.j.setty@uis.no Slides credit: http://www.mmds.org/ Association Rule Discovery Supermarket shelf management Market-basket model: Goal: Identify
More informationDiscovering Non-Redundant Association Rules using MinMax Approximation Rules
Discovering Non-Redundant Association Rules using MinMax Approximation Rules R. Vijaya Prakash Department Of Informatics Kakatiya University, Warangal, India vijprak@hotmail.com Dr.A. Govardhan Department.
More informationIntroduction. An Introduction to Algorithms and Data Structures
Introduction An Introduction to Algorithms and Data Structures Overview Aims This course is an introduction to the design, analysis and wide variety of algorithms (a topic often called Algorithmics ).
More informationAn Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases
An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases Takeaki Uno, Tatsuya Asai 2 3, Yuzo Uchida 2, and Hiroki Arimura 2 National Institute of Informatics, 2--2, Hitotsubashi,
More informationA Global Constraint for Closed Frequent Pattern Mining
A Global Constraint for Closed Frequent Pattern Mining N. Lazaar 1, Y. Lebbah 2, S. Loudni 3, M. Maamar 1,2, V. Lemière 3, C. Bessiere 1, P. Boizumault 3 1 LIRMM, University of Montpellier, France 2 LITIO,
More informationEfficient discovery of statistically significant association rules
fficient discovery of statistically significant association rules Wilhelmiina Hämäläinen Department of Computer Science University of Helsinki Finland whamalai@cs.helsinki.fi Matti Nykänen Department of
More informationEffective Elimination of Redundant Association Rules
Effective Elimination of Redundant Association Rules James Cheng Yiping Ke Wilfred Ng Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay,
More informationPushing Tougher Constraints in Frequent Pattern Mining
Pushing Tougher Constraints in Frequent Pattern Mining Francesco Bonchi 1 and Claudio Lucchese 2 1 Pisa KDD Laboratory, ISTI - C.N.R., Area della Ricerca di Pisa, Italy 2 Department of Computer Science,
More informationLevelwise Search and Borders of Theories in Knowledge Discovery
Data Mining and Knowledge Discovery 1, 241 258 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Levelwise Search and Borders of Theories in Knowledge Discovery HEIKKI MANNILA
More informationA Study on Monotone Self-Dual Boolean Functions
A Study on Monotone Self-Dual Boolean Functions Mustafa Altun a and Marc D Riedel b a Electronics and Communication Engineering, Istanbul Technical University, Istanbul, Turkey 34469 b Electrical and Computer
More informationBoolean Analyzer - An Algorithm That Uses A Probabilistic Interestingness Measure to find Dependency/Association Rules In A Head Trauma Data
Boolean Analyzer - An Algorithm That Uses A Probabilistic Interestingness Measure to find Dependency/Association Rules In A Head Trauma Data Susan P. Imberman a, Bernard Domanski b, Hilary W. Thompson
More informationOn Differentially Private Frequent Itemsets Mining
On Differentially Private Frequent Itemsets Mining Chen Zeng University of Wisconsin-Madison zeng@cs.wisc.edu Jeffrey F. Naughton University of Wisconsin-Madison naughton@cs.wisc.edu Jin-Yi Cai University
More informationEFFICIENT MINING OF WEIGHTED QUANTITATIVE ASSOCIATION RULES AND CHARACTERIZATION OF FREQUENT ITEMSETS
EFFICIENT MINING OF WEIGHTED QUANTITATIVE ASSOCIATION RULES AND CHARACTERIZATION OF FREQUENT ITEMSETS Arumugam G Senior Professor and Head, Department of Computer Science Madurai Kamaraj University Madurai,
More informationFARMER: Finding Interesting Rule Groups in Microarray Datasets
FARMER: Finding Interesting Rule Groups in Microarray Datasets Gao Cong, Anthony K. H. Tung, Xin Xu, Feng Pan Dept. of Computer Science Natl. University of Singapore {conggao,atung,xuxin,panfeng}@comp.nus.edu.sg
More informationDATA MINING LECTURE 4. Frequent Itemsets, Association Rules Evaluation Alternative Algorithms
DATA MINING LECTURE 4 Frequent Itemsets, Association Rules Evaluation Alternative Algorithms RECAP Mining Frequent Itemsets Itemset A collection of one or more items Example: {Milk, Bread, Diaper} k-itemset
More informationDynamic Programming Approach for Construction of Association Rule Systems
Dynamic Programming Approach for Construction of Association Rule Systems Fawaz Alsolami 1, Talha Amin 1, Igor Chikalov 1, Mikhail Moshkov 1, and Beata Zielosko 2 1 Computer, Electrical and Mathematical
More informationComputer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms
Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Limitations of Algorithms We conclude with a discussion of the limitations of the power of algorithms. That is, what kinds
More informationDiscovery of Frequent Word Sequences in Text. Helena Ahonen-Myka. University of Helsinki
Discovery of Frequent Word Sequences in Text Helena Ahonen-Myka University of Helsinki Department of Computer Science P.O. Box 26 (Teollisuuskatu 23) FIN{00014 University of Helsinki, Finland, helena.ahonen-myka@cs.helsinki.fi
More informationAssocia'on Rule Mining
Associa'on Rule Mining Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 4 and 7, 2014 1 Market Basket Analysis Scenario: customers shopping at a supermarket Transaction
More informationDetecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.
Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on
More informationData Mining and Matrices
Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,
More informationThe Complexity of Mining Maximal Frequent Subgraphs
The Complexity of Mining Maximal Frequent Subgraphs Benny Kimelfeld IBM Research Almaden kimelfeld@us.ibm.com Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden kolaitis@cs.ucsc.edu ABSTRACT A frequent
More informationChapter 2 Quality Measures in Pattern Mining
Chapter 2 Quality Measures in Pattern Mining Abstract In this chapter different quality measures to evaluate the interest of the patterns discovered in the mining process are described. Patterns represent
More informationInferring minimal rule covers from relations
Inferring minimal rule covers from relations CLAUDIO CARPINETO and GIOVANNI ROMANO Fondazione Ugo Bordoni, Via B. Castiglione 59, 00142 Rome, Italy Tel: +39-6-54803426 Fax: +39-6-54804405 E-mail: carpinet@fub.it
More informationDUAL-BOUNDED GENERATING PROBLEMS: PARTIAL AND MULTIPLE TRANSVERSALS OF A HYPERGRAPH
DUAL-BOUNDED GENERATING PROBLEMS: PARTIAL AND MULTIPLE TRANSVERSALS OF A HYPERGRAPH ENDRE BOROS, VLADIMIR GURVICH, LEONID KHACHIYAN, AND KAZUHISA MAKINO Abstract. We consider two natural generalizations
More informationData Analytics Beyond OLAP. Prof. Yanlei Diao
Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of
More informationFrequent Itemset Mining
ì 1 Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM COCONUT Team IMAGINA 16/17 Webpage: h;p://www.lirmm.fr/~lazaar/teaching.html Email: lazaar@lirmm.fr 2 Data Mining ì Data Mining (DM) or Knowledge Discovery
More informationChapters 6 & 7, Frequent Pattern Mining
CSI 4352, Introduction to Data Mining Chapters 6 & 7, Frequent Pattern Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining Chapters
More informationAssociation Analysis Part 2. FP Growth (Pei et al 2000)
Association Analysis art 2 Sanjay Ranka rofessor Computer and Information Science and Engineering University of Florida F Growth ei et al 2 Use a compressed representation of the database using an F-tree
More informationComplexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler
Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard
More informationSelecting the Right Interestingness Measure for Association Patterns
Selecting the Right ingness Measure for Association Patterns Pang-Ning Tan Department of Computer Science and Engineering University of Minnesota 2 Union Street SE Minneapolis, MN 55455 ptan@csumnedu Vipin
More informationOutline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.
Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität
More informationConcepts of a Discrete Random Variable
Concepts of a Discrete Random Variable Richard Emilion Laboratoire MAPMO, Université d Orléans, B.P. 6759 45067 Orléans Cedex 2, France, richard.emilion@univ-orleans.fr Abstract. A formal concept is defined
More informationA An Overview of Complexity Theory for the Algorithm Designer
A An Overview of Complexity Theory for the Algorithm Designer A.1 Certificates and the class NP A decision problem is one whose answer is either yes or no. Two examples are: SAT: Given a Boolean formula
More informationCS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms
CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms Tim Roughgarden March 9, 2017 1 Preamble Our first lecture on smoothed analysis sought a better theoretical
More information