On Minimal Infrequent Itemset Mining

Size: px

Start display at page:

Download "On Minimal Infrequent Itemset Mining"

Lenard Wood
6 years ago
Views:

1 On Minimal Infrequent Itemset Mining David J. Haglin and Anna M. Manning Abstract A new algorithm for minimal infrequent itemset mining is presented. Potential applications of finding infrequent itemsets include statistical disclosure risk assessment, bioinformatics, and fraud detection. This is the first algorithm designed specifically for finding these rare itemsets. Many itemset properties used implicitly in the algorithm are proved. The problem is shown to be N P-complete. Experimental results are then presented. I. INTRODUCTION Because of its importance to finding association rules, much attention has been given to the problem of finding itemsets that appear frequently in a dataset (cf. [1] [4]). There have been hundreds of papers as well as workshops at conferences (e.g. FIMI 3 and FIMI 4 at IEEE ICDM 3 and IEEE ICDM 4) devoted to this subject. The definition of frequent often includes the notion of an integer threshold parameter, τ, delineating those itemset patterns considered frequent appearing at least τ times in the dataset from those patterns considered infrequent. While it is possible to consider very small values of τ, the focus is most often placed on finding very frequent patterns. Relatively less attention has been paid to infrequent itemsets. Yet they have many potential applications, including: 1) statistical disclosure risk assessment where rare patterns in anonymized census data can lead to statistical disclosure; 2) bioinformatics where rare patterns in microarray data may suggest genetic disorders; and 3) fraud detection where rare patterns in financial or tax data may suggest unusual activity associated with fraudulent behavior. In this paper we present a new algorithm for finding minimal infrequent patterns. This is the first algorithm designed specifically for finding minimal infrequent itemsets. It is based upon the SUDA2 algorithm developed for finding minimal unique itemsets (itemsets with no unique proper subsets) [8], [9]. We then show that the minimal infrequent itemset problem is N P-complete. Finally, experimental results are presented. II. PROBLEM SPECIFICATION Let I = {i 1,i 2,...,i L } be a set of items. An itemset is a subset I I. The cardinality of I, denoted by I, is the number of items in the itemset. As a shorthand, we will David J. Haglin is with the Department of Computer and Information Sciences, Minnesota State University, Mankato, MN 561, USA (david.haglin@mnsu.edu), fax: Anna M. Manning is with the School of Computer Science, University of Manchester, Oxford Rd., Manchester, M13 9PL, UK (anna@manchester.ac.uk), fax: write c-itemset to mean an itemset of cardinality c. A dataset, D = {t 1,t 2,...,t R }, is a collection of R transactions (sometimes called records) of the form t i = (i,t i ), where i is the transaction identifier () and T i I. We denote by D the number of transactions in the dataset. Given an itemset I, a transaction T is said to contain I if I T. The support set of an itemset I with respect to the dataset D is D(I) = {t i D : I T i }. The support of an itemset I in dataset D is the cardinality of the support set of I. That is, Supp D (I) = D(I). The relative support of an itemset, defined as Supp D (I)/ D, is a number between and 1 inclusive. Given a dataset D and an integer threshold τ, we say an itemset I is: τ-occurrent if D(I) = τ τ-frequent if D(I) τ τ-infrequent if D(I) < τ To describe an itemset as unique, we can either say it is 1-occurrent or it is 2-infrequent. In addition, we say an itemset is: minimal τ-occurrent if it is τ-occurrent and all of its proper subsets are (τ + 1)-frequent; minimal τ-infrequent if it is τ-infrequent and all of its proper subsets are τ-frequent; maximal τ-occurrent if it is τ-occurrent and all of its proper supersets are τ-infrequent; and maximal τ-frequent if it is τ-frequent and all of its proper supersets are τ-infrequent. Since there are datasets known to produce exponentially many τ-frequent itemsets [7], many strategies to compress the output have been considered. One obvious strategy is to find only maximal τ-frequent itemsets. Similarly, for τ- infrequent itemsets, it may be enough to find only minimal τ-infrequent itemsets. We assume that the input is given in binary matrix form where the number of rows is R, the number of columns is L, and an entry at (x,y) is a 1 if and only if the transaction whose = x contains item y. III. ALGORITHM MINIT We recently introduced an algorithm called SUDA2 [8], [9], which finds minimal unique itemsets (MUIs) in a dataset with different properties than those defined above. With certain parameter settings both SUDA2 and our new algorithm should find MUIs. However, the input datasets differ enough to render comparing running times between these two algorithms meaningless.

2 A. Dataset differences between MINIT and SUDA2 The easiest way to describe the differences in dataset properties is to consider the matrix form. For traditional itemset mining, the matrix consists of binary entries. But for SUDA2, the matrix entries can contain any integer. We can transform a SUDA2-type matrix into a binary matrix by enumerating all of the <column, value> pairs. For each of these pairs, a column is created in the transformed binary matrix. For every value in a column in the SUDA2-type input matrix, the corresponding <column, value> location in the transformed binary matrix is given a one. For example, if the first column of the SUDA2-type matrix contains integers in the range of to 2, then the transformed matrix will have three first columns with a 1 in the specific column indicating the integer value in the SUDA2- type matrix. The essential difference between the SUDA2- type matrix and the traditional binary matrix datasets is the added constraint that among collections of columns such as the three columns corresponding to the first SUDA2-type column of to 2 values there is exactly one 1 value and the rest are values in every row. B. The MINIT algorithm Our new algorithm can be adapted to handle the more traditional dataset definition and to handle finding minimal τ-infrequent itemsets (MIIs). We call this adaption MINIT, for MINimal Infrequent itemsets. Initially, a ranking of items is prepared by computing the support of each of the items and then creating a list of items in ascending order of support. Minimal τ-infrequent itemsets are discovered by considering each item i j in rank order, recursively calling MINIT on the support set of the dataset with respect to i j considering only those items with higher rank than i j, and then checking each candidate MII against the original dataset. One mechanism that can be used to consider only higher-ranking items in the recursion is to maintain a liveness vector indicating which items remain viable at each level of the recursion. The initial call to the recursive algorithm presented in Algorithm 1 is MINIT(D, V [1:L], maxc). The liveness vector, V [1:L], is initialized to all true values and must be passed by value to lower levels of the recursion as a unique copy of this vector is required at every node in the recursion tree. For those inputs that require prohibitively large running times, supplying a limit for maxc may result in enough useful information computable within a reasonable amount of time. For those easier datasets, setting maxc to L will find all MIIs. A significant computational effort of MINIT is to search through the dataset for transactions that hold a specific item. To help this search occur quickly, we pre-process the dataset by building linked lists of s for each item. Essentially, we pre-compute D({i j }) for 1 j L by creating linked lists of pointers to the transactions. We also arrange the items in ascending order by support, which is required in order for MINIT to work correctly. Note that some of the items may Algorithm 1 MINIT(D, V [1:L], maxc) 1: Input: D = input dataset with N rows and L columns of binary numbers 2: Input: V [1:L] = a boolean vector indicating viability of each item 3: Input: maxc = upper bound on cardinality of MII to find in the search 4: Returns: A listing of all MIIs for dataset D with cardinality less than maxc 5: compute R list of all items in D in ascending order of support 6: if maxc == 1 /* stopping condition */ then 7: Return all items of R that appear less than τ times in D 8: else 9: M 1: for each item i j R do 11: D j D({i j }) 12: V [i j ] false 13: C j recursive call to MINIT(D j, V, maxc-1) 14: for each candidate itemset I C j do 15: if I {i j } is an MII in D then 16: M M (I {i j } ) 17: end if 18: end for 19: end for 2: Return M 21: end if be discarded (considered non-viable) in this pre-processing phase for reasons such as having a support equal to D (an item appearing in every transaction cannot be part of any MII). Whenever MINIT descends one level of the recursion a new sub-dataset is built to represent the support set D({i j }). In the interest of memory efficiency, MINIT maintains only one copy of the dataset. A linked list of s for those transactions in the sub-dataset is constructed to represent a sub-dataset. To support the recursion of MINIT, the new sub-dataset must be pre-processed. There are three tasks required to perform the pre-processing: 1) computing the support of each item, which is needed to produce a rank-ordering of the viable items by support within D({i j }); 2) determining the viability of each item (i.e., pruning some of the items from consideration); and 3) computing the support set of each viable item resulting in a memory efficient representation of the lists of s for each support set D({i j,i k }) for 1 k L, j k. Observe that the support set of {i k } within D({i j }) is the same as D({i j,i k }). In practice, MINIT can perform the first and third tasks concurrently to avoid two passes over the data. D({i j,i k }) can be computed and the size of that list is then used as

3 the support of i k in D({i j }). The second task can then be performed and the data structures built for the support sets of those pruned items are discarded. When most of the items remain viable, the technique of computing all support sets and deriving support from them is very effective. However, if many of the items are discarded as non-viable, building the support set lists for these discarded items is wasted effort. The statement at line 7 of Algorithm 1 can be modified to return all items of R that appear exactly τ times in D which transforms MINIT into an algorithm that finds all τ-occurrent rather than all τ-infrequent itemsets. IV. MINIMAL ITEMSET PROPERTIES We present properties of MIIs that MINIT relies upon in order to correctly find all MIIs. These properties are adapted from those presented in [9] for the SUDA2 algorithm and dataset characteristics. Consider a minimal τ-infrequent c-itemset I. By definition, I must have the following property. Property 1 (Rareness Property): If itemset I is a MII, then supp D (I) < τ. We note that if an itemset is minimal τ-infrequent, then it must be minimal δ-occurrent for some δ < τ. However, a minimal δ-occurrent c-itemset, I, is not necessarily minimal τ-infrequent for all τ > δ; there may be some size (c 1) subset of I with support ǫ, for δ < ǫ < τ. The following theorem shows that certain itemsets must exist within the dataset, D, in order for an MII to exist. We call the transactions holding those certain itemsets support rows. Theorem 2 (Support Row Property): Given a minimal τ- infrequent c-itemset I = {i 1,i 2,...,i c }, with Supp D (I) = δ, δ < τ, for each 1 j c there must exist τ δ support rows in D containing itemset I {i j } (but not item i j ). Proof: Suppose for some j there exists fewer than τ δ rows containing I = I {i j } and not i j. Then I, which exists in D(I), has support Supp D (I ) < τ, and therefore I is not minimal τ-infrequent. Observe that a support row works for only one item in I, thus there are at least c(τ δ) support rows. The existence of c(τ δ) support rows in D is both a necessary and sufficient condition for I to be minimal, as seen by the previous theorem and the following lemma. Lemma 3: Given an itemset I = {i 1,...,i c } that is τ- infrequent in D, with Supp D (I) = δ for δ < τ, and D has c(τ δ) support rows containing the c subsets of I of size c 1, then I is a minimal τ-infrequent itemset. Proof: Suppose I is not minimal. Then there exists J I that is also τ-infrequent within D. Clearly, from Theorem 2, J < c 1. However, it must be true that J I, where I is one of the c subsets of I of size c 1. Since J I, J appears in the δ rows holding I. Moreover, since J I, Theorem 2 states that J must appear in the τ δ support rows for I. Thus, J appears in at least τ rows in D so is not τ-infrequent. This leads to the following observation as to the minimum support required of a single item in order to appear in a τ- infrequent c-itemset. Theorem 4 (Minimum Support Property): Given a fixed τ and itemset cardinality c, an item i must have support Supp D ({i} ) c + τ 2 in order for i to be part of a minimal τ-infrequent c-itemset I. Proof: Let I be a minimal τ-infrequent itemset with I = c and Supp D (I) = δ. If i I, then i must appear in at least the δ reference rows of I and (c 1)(τ δ) support rows for the c 1 subsets of I that contain i and have cardinality c 1. Thus, Supp D ({i} ) δ + (c 1)(τ δ) = c(τ δ) + (2δ τ) = cτ τ + 2δ cδ Let δ = τ r, for some integer r >. Then, Supp D ({i} ) cτ τ + 2(τ r) c(τ r) = τ + r(c 2) For a fixed c > 2 and τ, this is minimum when r = 1 (i.e., δ = τ 1). The Minimum Support Property can give an efficient way to prune significant areas of the search space if several items have low support counts. Corollary 5 (Uniform Support Property): Given a dataset D and item i contained in every transaction of D, then i cannot be contained in any minimal τ-infrequent itemset I. Proof: If I is a minimal τ-infrequent itemset in D containing item i, then the Support Row Property ensures the existence of a row in D containing I {i} and not i. Since i appears in every transaction in D we have a contradiction. V. RECURSIVE ITEMSET PROPERTIES Given dataset D and some item a (called an anchor item), we consider the itemset properties of D and D({a} ) based upon the recursive algorithm 1. Lemma 6: Given I = {i 1,...,i c } is a minimal τ- infrequent itemset in D, for each anchor a = i j, 1 j c, the itemset I a = I {i j } is a minimal τ-infrequent itemset in D({a} ). Moreover, I a = I 1. Proof: Without loss of generality, fix j in the range of 1 to k inclusive. Let a = i j. Since every row in D({a} ) contains item a, the only way I a could appear at least τ times in D({a} ) is if I appeared in at least τ rows of D. Therefore, I a is τ-infrequent in D({a} ). Similarly, if I a is not minimal τ-infrequent in D({a} ), then there exists I a I a that is also τ-infrequent in D({a} ).

4 But I a {a} would also be τ-infrequent in D and is a proper subset of I. Hence, I a must be minimal τ-infrequent in D({a} ). Unfortunately, it is not the case that all minimal τ- infrequent itemsets in D({a} ) lead to minimal τ-infrequent itemsets in the original dataset D. However, the following theorem provides a method for finding those that do. Theorem 7: Given a dataset D, an item a, and I a as a minimal τ-infrequent itemset in D({a} ) with Supp D({a} ) (I a ) = δ, the itemset I = I a {a} is a minimal τ-infrequent itemset in D if and only if there exists τ δ rows in D containing I a but not containing item a. Proof: If I is a minimal τ-infrequent itemset in D with Supp D (I) = δ, then the Support Row Property ensures the existence of τ δ rows in D containing I a but not containing item a. For the other direction, assume τ δ rows exist in D containing I a but not a and that I a is a minimal τ-infrequent itemset in D({a} ). Note that Supp D (I) = δ. All that is required to show is that I has the requisite c(τ δ) support rows. Each of the (c 1)(τ δ) support rows in D({a} ), augmented with item a, form a support row in D for the itemset I. As all of these (c 1)(τ δ) support rows contain item a, the only other support rows needed to ensure I is a minimal τ-infrequent itemset in D are the τ δ rows stated in the theorem. Since I is τ-infrequent and since c(τ δ) support rows exist in D, I must be a minimal τ-infrequent itemset in D. The Recursive Property of MIIs helps define the boundaries of the search space by providing a clear indication of the maximum cardinality of candidate MIIs. VI. EXAMPLE To help understand the recursive algorithm and pruning techniques, we present the following example. Consider the input dataset as shown in Table I(a). We will follow the discovery of the 2-occurrent itemset I = {2,4,5}, which consists of ranks: 1,2,4. (a) Dataset 1 1,2, 4, 5,6 2 2,3, 4, 5,6 3 1,2, 3, 4,6 4 1,2, 3, 5,6 5 1,3, 4, 5,6 6 1,3, 6 7 1,2, 5 TABLE I EXAMPLE DATASET (b) Rank Items Rank Item Support Algorithm 1 finds itemset I by first computing the rank order of items as shown in Table I(b). Lemma 6 indicates that any itemset I that is τ-occurrent in D and has smallest item rank 1 (i.e., item {4} ) will have I {4} as a τ-occurrent itemset in D({4} ). So we compute D({4} ) as shown in Table II(a). Note that we can ignore item 6 in D({4} ) by Corollary 5. This brings us down one level in the recursion tree as we explore D({4} ). (a) Dataset 1 1, 2, 4,5, 6 2 2, 3, 4,5, 6 3 1, 2, 3,4, 6 5 1, 3, 4,5, 6 TABLE II DATASET D({4}) (b) Rank Items Rank Item Support As we enter Algorithm 1 recursively, we first construct a new rank item list for the dataset at this recursion level (see Table II(b)). For the second iteration of the loop at line 1 of Algorithm 1, the anchor item will be 2. Descending the recursion tree for this anchor will produce the dataset in Table III(a). (a) Dataset 1 1,2, 4, 5,6 2 2,3, 4, 5,6 3 1,2, 3, 4,6 TABLE III DATASET D({2,4}) (b) Viable items 1 1,5 2 3,5 3 1,3 Note that each of the viable items 1, 3, 5 all have support at or below our threshold τ = 2. So this recursion tree node returns the itemset list {{1}, {3}, {5} } to the next higher recursion node. To determine {2, 5} is a 2-occurrent itemset in D({4} ), we need only find sufficient support rows as described in Theorem 7. Observe that 5 in Table II(a) contains item 5 but does not contain 2. This one support row, along with a support of 2 for item 5 in D({4} ), is enough to conclude that {2, 5} is indeed a 2-occurrent itemset in D({4} ). It will therefore be included in the collection of itemsets passed up to the parent node of the recursion tree. At the root node of the recursion tree, the candidate itemset {2,5} is merged with item {4}. This candidate {2,4,5} is then checked for qualification as a 2-occurrent itemset in D, using Theorem 7. Since {2,5} has support 2 in D({4} ), one support row in D is sufficient. That support row is 4. Since this is the top level of the recursion, we conclude that {2,4,5} is a 2-occurrent itemset in D. We note that for Algorithm 1 to find a minimal τ- infrequent c-itemset, I, it must explore c levels of the recursion tree. Each level of the tree corresponds to one of the items in I. The item associated with the bottom level of the tree has support in that bottom-level dataset equal to the support of I in the original dataset D. In fact, the support remains the same at each level of the recursion tree.

5 VII. COMPLEXITY OF MINIMAL τ -OCCURRENT ITEMSET A. Variations of the problem MINING For a problem such as finding minimal τ-occurrent itemsets, there are variations that have important implications to the complexity of the problem. We consider the following problem variations in increasing order of computational difficulty: 1) The simplest form of a minimal τ-occurrent problem is a decision problem where the objective is, for a given input dataset, to determine if there exists any minimal τ-occurrent itemset and merely answer yes or no. 2) The next harder problem is a search problem where the objective is, for a given input dataset, to find one (any) minimal τ-occurrent itemset and print out the solution. There are actually two sub-variations to the search version of the problem: (i) find any solution for a specific record in the input dataset; and (ii) find any solution in any record in the input dataset. We note that the computationally easier form of the two subvariations is the less restrictive search problem. 3) The objective of the counting version is, for a given input dataset, to determine the number of minimal τ- occurrent itemsets and to print out the number. Even though there may be an exponential number of minimal τ-occurrent itemsets in a given dataset, it may be possible to count them in polynomial time and print out a polynomial-size representation of the exponential count (e.g., in binary representation). 4) The most computationally challenging variation is, for a given input dataset, to find and print out all of the τ-occurrent itemsets. Some attention has recently been given to finding rare patterns in datasets [6], [8], [9]. From this perspective, it makes sense to search for minimal τ-occurrent itemsets. A special case of this problem is to set τ = 1 meaning only minimal unique itemsets are sought. Yang provides a nice complexity analysis of the four variations of the maximal τ-occurrent problem. The counting variation of the maximal τ-occurrent problem is #Pcomplete [7] whereas searching for a single solution is possible to do in polynomial time. We show that by seeking minimal rather than maximal τ-occurrent itemsets, even the simplest variation, the decision version, is N P-complete. B. Computational complexity of minimal τ-occurrent itemsets Our proof is based on a proof presented by Daishin Nakamura in [1] which addresses only the variation of searching in a specific record for a 1-occurrent itemset (i.e., minimal unique itemset). The proof is by a reduction from the Hitting Set problem (see [11] for N P-complete proof techniques). An instance of the Hitting Set problem, H = (p,c,k), is defined as Given a collection C = {C 1,...,C q } of subsets of a finite set S = {1,...,p} and a positive integer k p, determine whether there exists a subset S S with S k such that S contains at least one element from each set of C. Theorem 8: Given a dataset and a fixed constant t 1, to determine if there exists any τ-occurrent itemset in the dataset is N P-complete. Proof: Given an instance of the Hitting Set problem H = (p,c,k) construct a q p matrix: x 1,1 x 1,2... x 1,p x 2,1 x 2,2... x 2,p M =..... x q,1 x q,2... x q,p where x i,j = 1 if j C i and x i,j = otherwise. Observe that every subset of the columns of M corresponds to a subset S S in the Hitting Set problem. Moreover, S is a hitting set in H if and only if the subset of columns whose index numbers are in S induce a matrix projection with at least one 1-entry in every row. This can be seen as each row i of M corresponds to C i in H. Now denote by Z a (t p)-matrix of zeroes. Construct a dataset matrix D: D = Z M. M where the number of copies of M is τ + 1. Now find a minimum τ-occurrent itemset I in D. If I contained any 1-entries, then a record holding I must come from the M portion of D. However, since each row M appears τ + 1 times in D, such an itemset could not be τ-occurrent. So, I consists of only zeros and appears in the first t rows of zeros in D. Let S be the set of columns associated with I. Since I is t-occurrent, each row in M must contain a 1-entry in at least one of the columns in S. This corresponds directly to a solution to H. Therefore, any algorithm for finding τ-occurrent itemsets in a dataset, for any τ 1, can be used to solve the Hitting Set problem. VIII. EXPERIMENTAL RESULTS All of the experiments were run on Dual Core AMD Opteron Processor 27s running at 2GHz with 8GB of RAM. The datasets we use come from All of the datasets are in the proper format for MINIT. A. Mushroom Data The mushroom dataset contains 8124 transactions and 119 items in its inventory. This particular dataset does not challenge MINIT as it can run within 5 seconds for any

6 7 6 delta-infrequent delta-occurrent 1.6e+6 1.4e+6 maxc==6 maxc==7 maxc== e+6 Number of MIIs 4 3 Number of MIIs 1e Fig. 1. Mushroom Dataset Fig. 3. Chess Dataset MII Counts threshold 1 τ 8124 with no restriction on itemset cardinality. What we can see from this dataset is the number of minimal τ-infrequent and minimal τ-occurrent itemsets for varying values of the threshold τ (Figure 1). We expect that the number of τ-occurrent is less than the number of τ- infrequent. Figure 1 shows just how drastically different they are. Since the τ-infrequent itemsets includes all of the τ- occurrent, the remainder of this section focuses only on the computing minimal τ-infrequent itemsets. B. Chess Data The chess dataset contains 3196 transactions and 75 items in its inventory. Although this dataset is much smaller than the mushroom dataset, it presents significantly more of a computation challenge to MINIT. Seconds Fig. 2. Chess Dataset Computation Time maxc==8 We imposed limits on the cardinality of τ-infrequent itemsets. For each maxc of 6, 7, and 8, we ran trials varying the threshold τ. The running time for maxc == 8 is shown in Figure 2. To see the growth pattern, the numbers of minimal τ-infrequent itemsets are shown for maxc of 6, 7, and 8 in Figure 3. C. T1I4D1K Data The T1I4D1K dataset, generated by the software described in [12], contains 1 transactions and 87 items in its inventory. It has an average number of 1 items per transaction and an average support of 4 for each item. Seconds Fig. 4. T1I4D1K Dataset Computation Time We ran trials with no limit on the cardinality of the MIIs varying the threshold from 1 to 1. It is interesting to see that the computation time (Figure 4) drops faster than the number of MIIs (Figure 5). D. Connect4 Data The Connect-4 dataset contains all legal 8-ply positions in the game of connect-4 in which neither player has won yet, and in which the next move is not forced. There are transactions and 43 columns (one for each of the 42 connect- 4 squares together with an outcome column - win, draw or lose). Once this dataset in transformed into a binary format it contains 129 items since each cell in the original dataset holds one of three possible values. This dataset presents the most computational challenge to MINIT. We imposed a limit on the cardinality of maxc == 6.

7 1.1e+7 1e+7 9e+6 The number of MIIs (Figure 7) has a substantially slower decline than the numbers for the previous datasets. Although the running time also declines more slowly, the decline is even slower than the decline of the MII counts. Number of MinIIs 8e+6 7e+6 6e+6 5e+6 4e+6 3e+6 2e+6 1e Fig. 5. T1I4D1K Dataset MII Counts IX. CONCLUSIONS We present a new algorithm, MINIT, for finding minimal τ-infrequent or minimal τ-occurrent itemsets. The computation time required on the four datasets presented suggest a correlation between the number of MIIs and the amount of computation time required. It would be interesting to see how well MINIT could run in a parallel or grid environment. It would also be useful to find other pruning strategies to improve the running time requirements. ACKNOWLEDGMENT This work was supported in part by the National Science Foundation under grant CTS Seconds Number of MIIs 12 maxc== Fig. 6. Connect4 Dataset Computation Time 85 maxc== Fig. 7. Connect4 Dataset MII Counts REFERENCES [1] R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases, in Proceedings of the 1993 International Conference on Management of Data (SIGMOD 93), May 1993, pp [2] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo, Fast discovery of association rules, in Advances in Knowledge Discovery and Data Mining, U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Eds. The AAAI Press, Menlo Park, 1996, pp [3] S. Brin, R. Motwani, J. Ullman, and S. Tsur, Dynamic itemset counting and implication rules for market basket data, in Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. ACM Press New York, NY, USA, 1997, pp [4] M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, New algorithms for fast discovery of association rules. in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1997, pp [5] E. Boros, V. Gurvich, L. Khachiyan, and K. Makino, On the complexity of generating maximal frequent and minimal infrequent sets, in Symposium on Theoretical Aspects of Computer Science, 22, pp [Online]. Available: citeseer.ist.psu.edu/boros2complexity.html [6] D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. S. Sharma, Discovering all most specific sentences, ACM Trans. Database Syst., vol. 28, no. 2, pp , 23. [7] G. Yang, Computational aspects of mining maximal frequent patterns, Theoretical Computer Science, vol. 362, pp , 26. [8] A. M. Manning and D. J. Haglin, A new algorithm for finding minimal sample uniques for use in statistical disclosure assessment, in IEEE International Conference on Data Mining (ICDM5), Nov. 25, pp [9] A. M. Manning, D. J. Haglin, and J. A. Keane, A recursive search algorithm for statistical disclosure assessment, Data Mining and Knowledge Discovery, 27, conditionally accepted. [1] A. Takemura, Minimum unsafe and maximum safe sets of variables for disclosure risk assessment of individual records in a microdata set, Journal of the Japan Statistical Society, vol. 32, no. 1, pp , 22. [11] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co., ISBN , [12] R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases, in VLDB 94: Proceedings of the 2th International Conference on Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1994, pp

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi