Basic Association Rules

Size: px
Start display at page:

Download "Basic Association Rules"

Transcription

1 Basic Association Rules Downloaded 11/21/17 to Redistribution subject to SIAM license or coyright; see htt:// Abstract Previous aroaches for mining association rules generate large sets of association rules. Such sets are difficult for users to understand and manage. Here, the concet of a restricted conditional robability distribution is used to exlain an association rule. Based on this concet, a new tye of association rules, called basic association rules, is defined. We roose the GenBR algorithm to generate the set of classes of basic association rules. Theoretical analysis shows that the search sace of the algorithm can be translated to an n- cube grah. The set of classes of basic association rules generated by GenBR is easy for users to understand and manage. Our exeriments on synthetic and real datasets show that GenBR is either faster than revious aroaches or generates fewer rules or both. Keywords: Association Rules, Probabilistic/Statistical Methods, Data Reduction/Prerocessing, Restricted Conditional Probability Distributions, Inference Systems, and Basic Association Rules. 1 Introduction Techniques for mining association rules [1,2] were originally devised for alication to market basket data, but they have also been alied in many other domains to erform tasks [21,23,26]. Market basket data describes the items urchased from retail stores groued into transactions. A transaction tyically consists of items bought together at the same oint of time, but it may consist of items bought by a customer over a eriod of time. An itemset is a set of items, and a frequent itemset X is an itemset whose frequency in transactions, also referred to as its suort, denoted as su(x), is greater than a user secified suort threshold, minsu. The main task of association rule discovery is to extract frequent itemsets from market basket data and to generate association rules from these frequent itemsets. An association rule r is an imlication of the form X Y, where X and Y are two disjoint itemsets. The suort of the rule is the suort of X Y, denoted as su(r), which is given by the observed robability P(X = 1,Y = 1). The confidence of the rule, denoted conf(r), is given by the conditional observed robability P(X =1,Y = 1) / P(X = 1), which is denoted as (xy) / (x) in this aer. If an association rule has suort at least as great as minsu and Guichong Li and Howard J. Hamilton Deartment of Comuter Science University of Regina Regina, SK, Canada, S4S 0A2 {liguicho, hamilton}@cs.uregina.ca confidence at least as great as the confidence threshold called minconf, it is referred to as a valid association rule. An association rule with confidence 100% is an exact association rule; all other association rules are aroximate association rules. The Ariori algorithm [2] was roosed to discover all frequent itemsets and to generate all valid association rules corresonding to these itemsets by a fast algorithm, called FastGenRules. Many algorithms have since been roosed that reduce the time and sace required to find the frequent itemsets [2,14]. After all frequent itemsets have been found, valid association rules are generated. A serious roblem in association rule discovery is that the set of association rules can grow to be unwieldy as the number of transactions increases, esecially if the suort and confidence thresholds are small. As the number of frequent itemsets increases, the number of rules resented to the user tyically increases roortionately. Many of these rules may be redundant. The definition of redundancy for association rules has varied in revious aroaches. Toivonen et al. roosed finding a structural rule cover, which describes the same database rows as the original set of association rules [28]. Therefore, those rules that are not in the cover are regarded as redundant. In [11,20,24,30], the definition of redundant rules is based on several inference rules or an inference system. Therefore, all association rules that can be derived from other rules by alying inference rules are regarded as redundant. We adot the latter tye of definition. To address the roblem of rule redundancy, four tyes of research on mining association rules have been erformed. First, rules have been extracted based on userdefined temlates or item constraints [3,27]. Secondly, researchers have develoed interestingness measures to select only interesting rules [16,19,18]. Thirdly, researchers have roosed inference rules or inference systems to rune redundant rules and thus resent smaller, and usually more understandable sets of association rules to the user [5,11,20,24,30]. Finally, new frameworks for mining association rule have been roosed that find association rules with different formats or roerties [7,8,9]. The main roblems with revious aroaches are that they still generate too many rules, and these rules may be

2 redundant. For examle, a valid association rule X Y that is generated by one these aroaches may in fact be derived from some simler rule X Y with the same confidence as X Y, where X X and Y Y. Inference rules roosed by these aroaches do not resemble Armstrong axioms on functional deendencies. As well, in some aroaches, inference rules cannot infer the confidence of rules without extra information. In our research, we are creating an inference system on association rules, consisting of a set of inference rules such as augmentation and transitivity, which resembles the Armstrong axioms on functional deendencies and which allows the inference of the confidences of rules. The remainder of this aer is organized as follows. In Section 2, we resent related work. In Section 3, we define the concet of a basic association rule, and roose a new algorithm called GenBR for generating the set BR of classes of basic association rules from a set of frequent itemsets. The comutational comlexity of GenBR is also described. A comarison of our aroach and other aroaches is resented in Section 4. Our exeriments comared the erformance of our algorithm with that of revious algorithms on synthetic datasets and real-life datasets, with resect to the number of rules and the elased running time. Conclusions and future work are described in Section 5. 2 Previous Work Previous research showed that relatively small sets of association rules can be resented to users instead of all valid association rules. As well, for some aroaches, inference rules were suggested that allowed additional association rules to be derived from such small sets of rules. In this section, we describe three aroaches. First, reresentative association rules (RR) are based on a cover oerator with which other non-reresentative association rules can be generated [20]. Suose we have an association rule X Y. A cover oerator C, denoted C(X Y), is given by C(X Y) = {X Z V Z, V Y Z V = V } The set of all reresentative association rules is a minimal set of rules that covers all association rules by means of the cover oerator. The FastGenReresentative algorithm was roosed to efficiently comute a RR [20]. Secondly, a kind of non-redundant association rules with minimal antecedents and maximal consequents, called minimal non-redundant association rules, has been identified as articularly useful and relevant [5]. An association rule r: X Y is a minimal non-redundant association rule iff there does not exist an association rule r : X Y with su(r) = su(r ), conf(r) = conf(r ), X X and Y Y. A small non-redundant generating set for all valid association rules is formed by combining a generic basis GB for exact association rules and an informative basis IB for aroximate association rules. RI is defined as a transitive reduction of the informative basis corresonding to IB. Given a closure oerator c of the Galois connection, a set FC of frequent closed itemsets, the set G of their generators, and a artial order (inclusion relation) on the set of itemsets, the definitions of GB, IB and RI are as follows. GB = {r: g (f \ g) f FC g G f g f} IB = {r: g (f \ g) f FC g G c(g) f} RI = {r: g (f \ g) f FC g G c(g) f / f c(g) f f} Bastide et al. have roven that GB and IB contain only minimal non-redundant association rules and all exact association rules and aroximate association rules can be derived from GB and IB, resectively [5]. The GB and RI algorithms were roosed to generate a generic basis and a transitive reduction of the informative basis, resectively. According to the definition of a minimal non-redundant association rule, the suort and confidence of any association rules inferred from the generating set are the same as the suort and confidence of the rules from which they were inferred. The authors claim that none of the Armstrong axioms hold in non-redundant association rules. A similar aroach has been roosed for discovering a small cover for association rules based on closed itemsets, which adats the Duquenne-Guigues basis for exact association rules and the Luxenburger results for aroximate association rules [24]. Thirdly, informative cover has been roosed together with a new inference rule [11]. Let r, r be two association rules, denoted X Y and X Y, such that X Y X Y. If su(x ) su(x), we say that r covers r, denoted r r. The goal is to find an informative cover that covers all other association rules. The CoverRules algorithm has been roosed to generate an informative cover for association rules [11]. The cover oerator in the informative cover aroach is similar to the cover oerator in the reresentative association rule aroach. The difference between them is that the cover oerator of the informative cover aroach does not require the antecedent of the resulting association rule to be included in the antecedent of the initial association rule. In addition, the inference rocedure is not urely syntactic [11], because it uses Table 2.1. A Binary Dataset. A B C D E

3 information about the suort of the antecedent of the resulting association rule. The inference rules in the two aroaches are sound, but neither aroach infers the confidence of an association rule. We found that the rules generated by these aroaches may be redundant. If X Y is generated by any of these aroaches, it may be ossible to derive it 3.1 Definitions Definition Given a dataset D with I as a set of items and T as a set of transactions, an association rule X Y over a relation R I T is said to be in canonical form if Y = 1. According to this definition, we only consider the case from simler valid association rules. of Y containing a single item. Before we introduce other Examle 2.1. Suose we have the dataset shown in new notions, let us discuss another concet related to Table 2.1, and that minsu is 3 and minconf is 6. The conditional robability. sets of rules generated by the revious aroaches are A conditional robability distribution (CPD) P(Y X) shown in Table 2.2. is defined as P(X, Y) / P(X), where X and Y are random Consider the rules in Table 2.2 from the ersective variables [10]. Y is conditionally indeendent of Z given of a user. For RR, CE AD can be derived from simler X, denoted as I(Y, Z X), if and only if P(Y X) = P(Y X, association rules CE A and CE D by right union as Z), where X, Y, and Z are three disjoint sets of random with Armstrong axioms, and variables. The statement I(X, Z X) is referred to as a conf(ce AD) = conditional indeendence statement (CIS) [10] conf(ce A) conf(ce D). For GB, AB D is Four roerties that are satisfied by any joint unnecessary, because B D is in GB. For C, B AD can be derived from B A and B D by right union, assuming B A and B D are valid association rules. Transitivity cannot be used between GB and RI. For examle, E D in GB and D A in RI cannot be used to infer a valid association rule by transitivity. These examles show that some association rules in these generating sets are not in the most desirable form. ڤ 3 Discovery of Basic Association Rules We roose a new aroach to solve the roblems mentioned in Section 2. Table 2.2. The Generated Association Rules Algorithm Set Rule #Rule FastGenRules AR 36 FastGenRR RR CE AD, C AD, D E, D C, D A, B AD, AC DE, C DE, AE CD, E CD, A CD GB GB and RI C D, E D, AB D, E CD, CoverRules RI C CE D, B D, AC D, A D CE AD, C AD, D E, D C, D A, B AD, AC DE, C DE, E CD, A CD E CD, D E, C AD, D C, B AD, CE AD, D A robability distribution (JPD) are symmetry, decomosition, weak union, and contraction [10]. For examle, for the decomosition roerty, if I(X, Y W Z), then I (X, Y Z) and I (X, W Z). We describe a CIS in another way in Lemma Lemma Given a subset X X, we say I(Y, X \ X X ), i.e., Y is conditionally indeendent of X \ X given X, if and only if P(Y X) = P(Y X ), where X and Y are two disjoint sets of variables. Proof: Suose Z = X \ X. Then X, Y, and Z are three disjoint sets of variables. Since X = X Z, P(Y X) = P(Y X, Z). Assuming P(Y X) = P(Y X ), we derive P(Y X, Z) = P(Y X ). Thus, we have I(Y, Z X ), i.e., I(Y, X \ X X ) according to the definition of CIS. For the converse, assuming I(Y, X \ X X ), then according to the definition of CIS, we have P(Y X, X \ X ) = P(Y X ), i.e., P(Y X) = P(Y X ). ڤ Furthermore, we have Lemma Lemma Given two disjoint sets of variables X and Y, and X X, if I(Y, X \ X X ), then Z X \ X, I(Y, Z X ), i.e., Z X \ X, P(Y X) = P(Y X ) = P(Y X, Z). Proof: If Z = X \ X, the roof follows immediately. The decomosition roerty of conditional indeendencies states that if I(X, Y W Z), then I (X, Y Z) and I (X, W Z). Because I(Y, X \ X X ), Z X \ X, we obtain I(Y, Z X ) and I(Y, {X \ X } \ Z X ). From I(Y, X \ X X ), we obtain P(Y X) = P(Y X, X \ X ) = P(Y X ), and from I(Y, Z X ), we obtain P(Y X ) = P(Y X, Z). Hence, P(Y X) = P(Y X ) = P(Y X, Z). ڤ To exlain this idea more fully, we first give a more general definition as follows. Definition Given two disjoint sets of variables X and Y, and a conditional robability distribution P(Y X), a restricted conditional robability distribution (RCPD),

4 denoted as Pˆ (Yˆ Xˆ ), is a subset of P(Y X) defined by secifying subsets Domain(Yˆ ) Domain(Y) and Domain( Xˆ ) Domain(X). For binary variables X {0,1} and Y {0,1}, with Xˆ {1} and Yˆ {1}, the confidence (y x) of the association rule r: X Y is a ositive conditional robability of the RCPD Pˆ (Yˆ Xˆ ). In the following discussion, Pˆ (Yˆ Xˆ ) is simly denoted as Pˆ (Y X). Given X X, if we have Pˆ (Y X) = Pˆ (Y X ), we cannot guarantee that Z X \ X, Pˆ (Y X) = Pˆ (Y X, Z). Hence, the decomosition roerty cannot be alied in a RCPD. Examle Suose that we have the transaction dataset in Table 2.1 and that minsu is 3 and minconf is 6. Consider two association rules ACD E and D E. Although conf(acd E) = conf(d E) = 2/3, conf(cde) = 3/4, i.e., if X = {A, C, D} and Y = {E}. If we choose X ={D}, then Z = X \ X = {A, C}, (y x) = (y x, z) = (y x ) = 2/3, i.e., Pˆ (Y X) = Pˆ (Y X ). Let Z = {C} X \ X, since (y x, z) = 3/4, then (y x) (y x, z) and (y x ) (y x, z), i.e., Pˆ (Y X) Pˆ (Y X, Z) and Pˆ (Y X ) Pˆ (Y X, Z). ڤ In the context of RCPDs, we hoe to find a minimal subset X of X with resect to Y such that Pˆ (Y X) = Pˆ (Y X ), and Z X \ X, Pˆ (Y X) = Pˆ (Y X, Z). Definition Let X and Y be two disjoint sets of variables. X is a minimal conditional subset (MCS) of X with resect to Y if X is a subset of X that satisfies the following conditions: (1) Pˆ (Y X) = Pˆ (Y X ). (2) Z X \ X, Pˆ (Y X) = Pˆ (Y X, Z). (3) / X X such that conditions (1) and (2) hold for X. If X is a minimal conditional subset of itself with resect to Y, then X is conditionally minimal with resect to Y. For examle, in Table 2.1, AC is conditionally minimal with resect to E. We also define the dual. Definition Suose X and Y are disjoint sets of variables. If X is a MCS of X with resect to Y, then the restricted conditional robability distribution Pˆ (Yˆ Xˆ ) is a minimal conditional robability distribution (MCPD) of the RCPD Pˆ (Yˆ Xˆ ) with resect to Yˆ. If X is conditionally minimal with resect to Y, then Pˆ (Yˆ Xˆ ) is a MCPD of itself. If Pˆ (Yˆ Xˆ ) is a MCPD of Pˆ (Yˆ Xˆ ), then a series of equations follow, i.e., Z X \ X, Pˆ (Yˆ Xˆ ) = Pˆ (Y ˆ Xˆ, Ẑ ). Because MCS and MCPD are duals of each other, we use whichever is convenient. In revious research, MCS and MCPD have not been defined or emhasized by researchers. In the context of mining association rules, we use a RCPD Pˆ (Y X) for inference on association rules. We do not consider how Pˆ (XY) and Pˆ (X Y) behave. Examle Suose we have the transaction dataset shown in Table 2.1. Let X = {A, C, D}, Y = {E}, X = {A, C}. Because the confidences of both X Y and X Y are 2/3, i.e., the ositive conditional robability (e acd) = (e ac), we see that Pˆ (Y X) = Pˆ (Y X ) and that Z X \ X = {D}, Pˆ (Y X) = Pˆ (Y X, Z). Because (e acd) (e a) = 2/4 and (e acd) (e c) = 3/4, there is no X X = {A, C} such that Pˆ (Y X) = Pˆ (Y X ). So X is a MCS of X with resect to Y. Pˆ (Y X ) is a MCPD of Pˆ (Y X) with resect to Y. Similarly, {A}, {C}, and {E} are three MCSs of {A, C, E} with resect to D. ڤ In the context of mining association rules, a CPD P(Y X) is always referred to as a RCPD Pˆ (Yˆ Xˆ ). We define a new notion of minimal association rules analogously to minimal functional deendencies [25]. Definition Suose we have a set I of items and a transaction dataset D. A canonical association rule X Y over D is a basic association rule if X is conditionally minimal with resect to Y, i.e., / X X such that (1) P(Y X) = P(Y X ) and (2) Z X \ X, P(Y X) = P(Y X, Z). For examle, given the transaction dataset shown in Table 2.1, AC E is a basic association rule, and AC is a MCS of ACD with resect to E while P(E AC) is a MCPD of P(E ACE) with resect to E. 3.2 Comuting MCPDs According to the definition of basic association rules, either a MCS X with resect to Y or a MCPD P(Y X) corresonds to a basic association rule X Y. The confidence of the rule is a ositive conditional robability of P(Y X). Therefore, the crucial task of finding basic association rules is the comutation of all MCPDs. Given a set L of frequent itemsets, X L, and minconf, our aroach for comuting MCPDs corresonding to X is divided into two stes. We first construct a set of RCPDs in canonical form from X, and then we comute their MCPDs, in which all ositive conditional robabilities are at least as great as minconf. Our aroach is similar to the aroach for discovering the minimal directed I-Ma of a joint robability distribution (JPD) [10]. Suose that a ermutation (ordering) Y = {Y 1,,Y n } of a set of variables X = {X 1,,X n }, and (x) is a JPD of X. This aroach comutes any minimal set of redecessors i with resect

5 to Y i, and i satisfies (y i b i ) = (y i π i ), where i B i = {Y 1,,Y i-1 }. Hence, a directed minimal I-ma of (x) is constructed by designating i as arents of Y i. The differences from comuting I-ma from a JPD is that we do not ermutate items of a frequent itemset, and the conditions (contexts) in the restricted conditional robabilities contain all items in the frequent itemset excet for a single test item. For examle, to find MCPDs corresonding to the frequent itemset X 1 X 2 X 3 X 4 X 5, first the following RCPDs are constructed: P(X 1 X 2 X 3 X 4 X 5 ), P(X 2 X 1 X 3 X 4 X 5 ), P(X 3 X 1 X 2 X 4 X 5 ), P(X 4 X 1 X 2 X 3 X 5 ), and P(X 5 X 1 X 2 X 3 X 4 ). For P(X 1 X 2 X 3 X 4 X 5 ), one can observe all MCSs of X 2 X 3 X 4 X 5 with resect to X 1, such as a minimal subset {X 2 X 3 X 4 X 5 }, and Z {X 2 X 3 X 4 X 5 } \, (x 1 x 2 x 3 x 4 x 5 ) = (x 1 π) = (x 1 π, z), where P(X 1 ) is a MCPD of P(X 1 X 2 X 3 X 4 X 5 ). If (x 1 x 2 x 3 x 4 x 5 ) is less than minconf, then we sto the comutation of MCPDs. Otherwise, is a MCS of X 2 X 3 X 4 X 5 with resect to X 1. The corresonding basic association rule is X 1, as exlained in Section 3.1. Similarly, we comute the MCPDs of P(X 2 X 1 X 3 X 4 X 5 ), P(X 3 X 1 X 2 X 4 X 5 ), P(X 4 X 1 X 2 X 3 X 5 ), and P(X 5 X 1 X 2 X 3 X 4 ). Finally, a set of MCPDs is obtained. Thus, for each frequent itemset, a set of basic association rules with resect to X can be found. Because RCPDs do not obey decomosition, to comute a MCPD P(Y X ) of P(Y X), we should examine all cases, i.e., Z X \ {Y X }, P(Y X ) = P(Y X, Z). This rocess goes from to to bottom, and is deicted as a semi-lattice [12] in Figure 3.1. The frequent itemset itself, such as ABCDE, is laced at the first level. At the second level, we construct a set of RCPDs, such as P(A BCDE), P(B ACDE), P(E ABCD), P(D ABCE), and P(E ABCD). All RCPDs at the third level are constructed by setting their contexts as maximal subsets of the contexts of the RCPDs at the second level. At the fourth level, all RCPDs are formed by setting their contexts as the intersections of the contexts of the RCPDs at the third level. The number k of itemsets being intersected at level d of the semi-lattice is related to the length l of the frequent itemset at the first level. We have the formula k = d 2, 4 d l. For examle, for frequent 4-itemsets, the number of itemsets being intersected at level 4 is 2. For frequent 5-itemsets, the number of itemsets being intersected at level 5 is 3. The deth of the semi-lattice equals the size of the corresonding frequent itemset. For examle, for the comutation of the MCPDs of P(E ABCD), we first examine P(E ABC), P(E ABD), P(E ACD), and P(E BCD). If P(E ABCD) = P(E ABC) and P(E ABC) is minimal, then P(E ABC) is already a MCPD of P(E ABCD), and we examine P(E ABD), P(E ACD), and P(E BCD). Similar cases arise ABCDE P(A BCDE) P(B ACDE) P(E ABCD) P(D ABCE) P(C ABDE) level 2... P(E ABC) P(E ABD) P(E ACD) P(E BCD)... P(E AB) P(E AC) P(E AD) P(E BC) P(E BD) P(E CD) P(E A) P(E B) P(E C) P(E D) level 1 level 3 level 4 level 5 Figure 3.1. A Possible Semi-lattice for the Frequent Itemset ABCDE. for P(E ABD), P(E ACD), and P(E BCD). If P(E AB) is a MCPD of P(E ABCD), we only require P(E ABCD) = P(E ABC) = P(E ABD). The context of P(E AB) is the intersection of the contexts of P(E ABC) and P(E ABD). If P(E A) is a MCPD of P(E ABCD), we require P(E AB) = P(E AC) = P(E AD) = P(E ABCD), where P(E AB), P(E AC), and P(E AD) have already been obtained. Therefore, according to Definition and 3.1.4, P(E A) is a MCPD of P(E ABCD). During extension of the semi-lattice, the RCPDs in new children (the nodes at the next level) are also called the candidate MCPDs (not unique). We reduce the number of candidate MCPDs of new children by intersecting itemsets in the contexts of their arents (nodes at the revious level), and checking whether the candidate MCPDs of the children are equal to those of their arents, and whether the ositive conditional robabilities of their MCPDs are at least as great as minconf. To find all basic association rules, we should check all frequent itemsets and comute the corresonding minimal conditional robabilities. Therefore, we define the following concets. Definition Suose we have a transaction dataset D, minsu, minconf, and a set L of frequent itemsets over D. A class of basic association rules for X L is a set of basic association rules r:x A, denoted as C r (X), such that (1) X A X (2) X is a MCS of X \ A with resect to A (3) conf(r) minconf Definition Suose we have a transaction database D, minsu, and minconf. A basic association rule system, denoted as BR(D, minsu, minconf), is defined as the set of k distinct classes of valid basic association rules, i.e., BR(D, minsu, minconf) = {C i i r C r is a class of basic

6 association rules, 1 i k such that for all j, 1 j k, j i i, C r C j i r, C r C j r }. BR(D, minsu, minconf) is also simly denoted as BR when D, minsu, and minconf are clear from context. Examle Given the dataset in Table 2.1, minsu = 3, and minconf = 6, in Figure 3.2, we show how to comute all MCPDs corresonding to the frequent itemset ACDE. Figure 3.2 shows the search sace used to find MCSs by comuting a set of MCPDs corresonding to the frequent itemset ACDE. This search sace consists of four semi-lattices, in which the single node at the to level corresonds to the frequent itemset, and other nodes corresond to its subsets, each of which includes a ositive conditional robability. Regardless of the data in the dataset, at the second level in the structure, we always have four ositive conditional robabilities for ACDE. Each of them corresonds to an item in ACDE, such as (a cde), etc. At the third level of the structure, the itemsets aearing in the contexts of ositive conditional robabilities are always maximal subsets of the itemsets aearing in the context of ositive conditional robabilities in their arents. For examle, along the branch containing (a cde), de, ce and cd are maximal subsets of cde. If the ositive conditional robability of a child is equal to the ositive conditional robability of its arent and both ositive conditional robabilities are at least as great as minconf, then in Figure 3.2, they are connected with a bold arrow; e.g., a bold arrow is shown from the node including (a cde) to the node including (a ce), because (a cde) = (a ce). If the ositive conditional robability of a arent is not equal to the ositive conditional robability of its child, but the ositive conditional robability of the child is at least as great as minconf, we connect the arent node and the child node with a narrow arrow; e.g., a narrow arrow is shown from the node including (a cde) to the node including (a cd). If the ositive conditional robability of a arent is not equal to the ositive conditional robability of its child or a ositive conditional robability is less than minconf, further comutation of minimal conditional ACDE (a cd e) (c ade) (d ace) (e acd) (a de) (a ce) (a cd ) (c de) (c ae) (c ad) robabilities along this ath is terminated; e.g., a dotted arrow is shown from the node including (a cde) to the node including (a de). Hence, P(A CE) is a MCPD of P(A CDE). Similarly, we comute all MCPDs of P(C ADE), P(D ACE) and P(E ACD). As a result, a set of MCPDs with resect to the frequent itemset ACDE is obtained, i.e., P(A CE), P(C AE), P(D E), P(D C), P(D A) and P(E AC), where P(A CE) is a MCPD of P(A CDE) with resect to A, and P(C AE) is a MCPD of P(C ADE) with resect to C, etc. ڤ From the MCPDs corresonding the frequent itemset X, we can readily obtain a class of basic association rules, C r (X). A class of basic association rules derived from one frequent itemset may be comletely included in another class of basic association rules derived from another frequent itemset. Hence, classes of basic association rules that are comletely contained in other classes of basic association rules have no more information than the classes containing them. They are called redundant classes and are discarded. Examle Given the transaction dataset in Table 2.1, minsu = 3, and minconf = 6, from Examle 3.2.1, we obtain C r (ACDE) = {CE A, AE C, E D, C D, A D, AC E}. Similarly, we also obtain another class of basic association rules corresonding to the frequent itemset ADE, C r (ADE) = {A D, E D}. Since C r (ADE) C r (ACDE), C r (ADE) is discarded. ڤ 3.3 The GenBR Algorithm We roose the GenBR algorithm for generating BR. Its goal is different from that of the second ste of the Ariori algorithm [1], which generates all association rules. Our aroach consists of two main stes. Given a set of frequent itemsets and minconf, GenBR generates all classes of basic association rules. Secondly, the algorithm generates BR by discarding all redundant classes. The GenBR algorithm, resented in Figure 3.3, generates BR from a set of frequent itemsets L. For each frequent itemset I in L, the GenBC algorithm is called to generate a class of basic association rules corresonding to I. All classes discovered by GenBC are collected into (d ce) (d ae) (d ac) (e cd ) (e ad) (e ac) (a e) (a d ) (a c) (c e) (c d) (c a) (d e) (d c) (d a ) (e d) (e c) (e a) Figure 3.2. The Comutation of MCPDs.

7 Algorithm GenBR(L) Algorithm MinimalSubsets(i, I ) Purose: generate BR from a set L of frequent itemsets Purose: comute a set S of minimal conditional subsets of I Inut: L, a set of frequent itemsets, where L k L is all itemsets with resect to i. in L containing k items. Inut: i, an item such that I i is a frequent itemset. Outut: BR, a set of classes of basic association rules. I, an itemset. BAR = Ø Outut: S, a set of minimal conditional subsets of foreach I L k, k 2 I with resect to i. BAR = BAR GenBC(I) S = Ø end = su(i {i}) / su(i ) BR = RemoveRedundantClass(BAR) if ( minconf) then return BR S = {I } end S = MaximalSubsets(I ) Figure 3.3. The GenBR Algorithm. k = 2 Algorithm GenBC(I) while (S Ø) Purose: generate the class of basic association rules corresonding to I. foreach s S Inut: I, a frequent itemset. Outut: R, a class of basic association rules corresonding to I. 1 = su(s {i}) / su(s) if ( 1 = ) then R = Ø S = S {s} foreach item i I else S = S \ {s} I = I \ {i} end S = MinimalSubsets(i, I ) foreach s S foreach s S DelSuerset(s, S) // remove all s S such that R = R {s i} end s s from S return R S = IntersectionSet(S, k) end k = k + 1 Figure 3.4. The GenBC Algorithm. end return S BAR. The RemoveRedundantClass algorithm removes end redundant classes from BAR to give BR. Figure 3.5. The MinimalSubsets Algorithm. The GenBC algorithm, resented in Figure 3.4, generates the class of basic association rules corresonding to the frequent itemset I. The main loo of the GenBC algorithm is reeated for each item i in the frequent itemset I. It calls the MinimalSubsets algorithm for comuting the MCSs of I with resect to i. From these MCSs, the algorithm forms a set of basic association rules corresonding to I. The MinimalSubsets algorithm, resented in Figure 3.5, comutes the set of MCSs of I with resect to i. First, the algorithm determines whether = (i I ) minconf. If so, the algorithm initializes the minimal conditional subset S = {I }. The MaximalSubsets function roduces a set S of all maximal subsets of I as a set of candidate MCSs of I with resect to i, e.g., S = MaximalSubsets(BDE) = {BD, BE, DE}. Because this function is straightforward, we omit it. In the while loo, the algorithm examines the validity of candidate MCSs in S. For s S, the algorithm comutes the conditional robability (i s), and then comares (i s) with. If (i s) =, then s is a valid candidate MCS of I with resect to i, and it is stored in S. If smaller valid candidate MCSs of I are found, then suersets of them are removed from S. The DelSuerset function (omitted) does this task. The IntersectionSet(S, k) function (omitted) generates all smaller candidate MCSs of I, which are the intersections of itemsets in S in terms of the deth k of loo. For examle, the intersection of the two itemsets AE and AC is equal to A, and it is regarded as a candidate MCS. Examle Given the dataset in Table 2.1, minsu = 3, and minconf = 6, we describe the rocess of generating BR using GenBR. In the while loo of GenBR, we assume I = {ACDE} is selected from L. The GenBC algorithm is called to comute the class of basic association rules corresonding to ACDE. The main loo of the GenBC algorithm is reeated for each item in a frequent itemset. Inside the main loo, the MinimalSubsets algorithm is first called to generate all MCSs of CDE with resect to A. I = CDE and i = A. Because the ositive conditional robability (a cde) minconf, the MinimalSubsets algorithm s comuting MCPDs of P(A CDE), i.e., a set S of MCSs of CDE with resect to A. Initially, S = {I }. MaximalSubsets roduces a set S of all maximal subsets of CDE as candidate MCSs of

8 CDE with resect to A, S = {DE, CE, CD}. In the while loo, itemsets in S are checked to see if they are candidate MCSs of CDE with resect to A. When CE is examined, (a ce) = (a cde), so we obtain S = {BDE, CE}. For DE and CD, because (a de) (a cde) and (a cd) (a cde), DE and CD are removed from S, and S = {CE}. After DelSuerset, S = {CE} and S = {CE}. Because S has only one itemset, S = IntersectionSet(S, k) = Ø. The while loo in the MinimalSubsets algorithm exits, and S = {CE} is returned to GenBC. In GenBC, the basic association rule {CE A} is laced in R. Similarly, from the comutation of P(C ADE), we have a new basic association rule AE C, and R = {CE A, AE C}. From the comutation of P(D ACE), we have new basic association rules E D, C D and A D, and they are added into R. From the comutation of P(E ACD), a new basic association rule AC E is found. Finally, the algorithm generates a class of basic association rules corresonding to ACDE, R = {CE A, AE C, E D, C D, A E, AC E}. The algorithm returns R to GenBR. At this oint, BR = {{DE A, A B, D B, E B, A E}}. GenBR continues until all frequent itemsets in L, with size of at least 2, have been rocessed. Consequently, all basic association rules and their classes are generated and resented, as shown in Table 3.1. ڤ We define the following terminology over BR: (1) C r denotes a class of basic association rules. C i is a set of items, which aear in C r. (2) The class in which a basic association rule X Y resides is denoted as C r (X Y). For examle, C r (C D) might be referred to as one of C r (CDE), C r (ACD), C r (ACDE) and C r (CD), as shown in Table 3.1. (3) C i (X Y) denotes a set of items, which aear in C r (X Y). e.g, E C i (C D) ={C, D, E}. All redundant classes in Table 3.1 are discovered and discarded by the RemoveRedundantClass algorithm. A Table 3.1. A Set of Classes of Basic Association Rules. No. Class Basic association rule Redundant 1 C r (AB) B A Yes 2 C r (BD) B D Yes 3 C r (DE) E D, D E 4 C r (AD) D A, A D Yes 5 C r (CE) E C, C E Yes 6 C r (AC) A C, C A Yes 7 C r (CD) D C, C D 8 C r (ABD) B D, B A, D A, A D 9 C r (ADE) E D, A D Yes 10 C r (ACE) CE A, AC E, AE C Yes 11 C r (CDE) E D, E C, C E, C D 12 C r (ACD) A D, A C, C D, C A 13 C r (ACDE) E D, CE A, AC E, A D, AE C, C D redundant class C r (I) contains no more information than the class C r (I ) containing C r (I). For examle, in Table 3.1, C r (AB) C r (ABD). If a C r (I 1 ) is redundant and C r (I 1 ) C r (I 2 ), then I 1 I 2. All frequent itemsets are arranged in a semi-lattice based on their inclusion relation in order to discover all redundant classes. For examle, given the dataset shown in Table 2.1 and minsu = 3, all frequent itemsets form the semi-lattice shown in Figure 3.6. To identify the redundant class C r (AB), RemoveRedundantClass comares C r (AB) with C r (ABD) because AB ABD. To check whether C r (DE) is redundant, the algorithm comares C r (DE) with C r (ADE) and C r (CDE). Because C r (DE) C r (ADE) and C r (DE) C r (CDE), the algorithm does not need to comare C r (DE) with C r (ACDE), and C r (DE) is non-redundant. That means that to check whether C r (I 1 ) is redundant, we only comare C r (I 1 ) with all C r (I 2 ), where I 1 and I 2 are frequent itemsets, I 1 I 2, and I 1 = I 2-1, as justified by Theorem Theorem Given three itemsets I 1, I 2, and I 3, and I 1 I 2 I 3, if C r (I 1 ) C r (I 2 ), then C r (I 1 ) C r (I 3 ). Proof: Suose the condition is satisfied. Hence, r: X A C r (I 1 ), and a MCP (a x), but r C r (I 2 ), i.e., Z I 2 \ {X, A}, (a x) (a x, z). Because I 2 \ {X, A} I 3 \ {X, A}, Z I 3 \ {X, A}, (a x) (a x, z), i.e., P(A X) is not a MCPD of P(A I 3 \ A). Hence, r C r (I 3 ). ڤ In [22], we roosed an inference system for basic association rules. It is called the C-inference system, because it ermits a rule s confidence to be inferred. We roved that the C-inference system holds on BR and that all association rules can be derived from BR by the alication of inference rules in the C-inference system. The C-inference system is summarized in Table 3.2, where and q signify the confidences of association rules. Rules that cannot be derived from other rules by the C- inference system are called non-redundant association rules. ABD ACE ACDE ACD ADE CDE AB AC AD AE BD CD CE DE A B C D E L Hashtable Figure 3.6. A Semi-Lattice for All Frequent Itemsets

9 Table 3.2. The C-inference System on Basic Association Rules Inference Rule Premise Conclusion Augmentation X Y C r, X C i \ {X, Y} XX Y Pseudo-Transitivity q q X Y C r, XW Z C r XW Z Contraction q q X Y AR, XY Z AR X YZ Additivity q q X Y C r, W Z C r XW YZ Right union q q X Y C r, X Z C r X YZ Left union X Z C r, Y Z C r XY Z 3.4 The Comutational Comlexity of GenBR Let us analyze the comutational comlexity of GenBR. From Figure 3.1, we observed that itemsets contained in the context of RCPDs may be used to construct a secific semi-lattice for comuting MCPDs. All nodes in the semi-lattice are subitemsets of an itemset. This construction corresonds to the roblem of generating subsets in mathematics, or creating a hyercube (or a k-cube), or a Q n grah in grah theory [15]. We analyze the comutational comlexity of our algorithm based on the binomial theorem [13] as follows. According to the binomial theorem, the number of subitemsets of a k-itemset (nodes of the semi-lattice), denoted NS, is given by: NS = C 1 k + +C k k = 2 k 1 (3.1) By counting edges from the to of the semi lattice for a k-itemset, as shown in Figure 3.7, to its bottom, the number of edges, denoted NE, of the semi-lattice is given by: k 1 k NE = C k + 2C 2 k 3 1 k + 3C k + +(k 1)C k k 1 k = C k + 2C 2 k 3 k + 3C k + +(k 1)C 1 k + kc 0 k 0 kc k k k k i ik! = ick k = k i!( k i)! i= 1 k i= 1 ( k 1)! = k k ( i 1)!( k i)! i=1 k k 1 k 1 i 1 2 = k C = k k (3.2) i= 1 From Equation (3.1), we obtain the number of association rules NR for a k-itemset generated by the FastGenRules algorithm [1] as follows: NR = 2 k 2 (3.3) According to Figure 3.1, to comute the MCSs of ABCD with resect to E, in the worst case, the rocess forms a semi-lattice of ABCD. One edge in the semilattice corresonds to one comarison. For the whole itemset ABCDE, there are five semi-lattices of this kind. Hence, from Equation (3.2), the comutational comlexity TC of GenBR for a k-itemset is determined by: TC = k(k 1)(2 k 2 1) k 2 2 k 2 = O(k 2 2 k 2 ) (3.4) Given that l is the length of the longest itemsets in the set of frequent itemsets, the ratio of the comlexity of GenBR to that of FastGenRules is: R = O(l 2 2 l 2 )/O(2 l+1 ) = O(l 2 / 2 3 ) = O(l 2 ) (3.5) Although the time comlexity of GenBR exonentially increases with l, we show that GenBR erforms very well over actual datasets in the following section. 4 Comarison and Exerimental Results The overall comarison between our aroach and four revious aroaches is shown in Table 4.1. Table 4.1. The Overall Evaluation Algorithm Features Fast- Rules Fast- RR GBRI Cover- Rules GenBR Canonical form Non-redundant Infer a rule s suort Infer a rule s confidence Armstrong axioms-like #Rules All Fewer Most Fewest Best Elased time Fastest Slower Slowest Medium Fast First, consider the form of rules generated. Only GenBR generates non-redundant rules in canonical form. We believe that this tye of rules is easy for users to understand. Rules generated by GenBR are nonredundant, i.e., these rules and their confidences cannot be derived from simler rules and their confidences by using the inference rules defined in any of the other aroaches. Secondly, consider whether an inference rule ermits the suort and confidence of rules to be inferred. None of the aroaches can infer a rule s suort, and only GBRI and GenBR can infer confidences. Thirdly, consider whether the inference system resembles Armstrong s axioms; only the C-inference system does so. Fourthly, consider the number of rules generated. CoverRules generates the fewest, but GenBR organizes the basic association rules into classes related to the frequent itemsets. GenBR also tends to generate the fewest rules when minconf is high. Finally, with resect to the elased time, FastGenRules is the fastest on all datasets. GenBR is more efficient than the other aroaches, excet for a few cases. As described in the remainder of this section, exeriments on several synthetic and real-life datasets were conducted to comare the erformance of GenBR and revious algorithms with resect to the number of rules and the elased running time.

10 4.1 Exerimental Design The exerimental environment was a PC with a 2.53 GHz Intel CPU, 512MB of RAM, and Microsoft Windows XP. All algorithms were imlemented in Microsoft Visual Java++. The datasets used are listed in Table 4.2. Synthetic datasets, T10I4D100K and T20I6D100K, were generated by running GenData [17], which is a synthetic data generator from IBM. Several well-known benchmark real-world datasets were chosen from [29]. CustInfo-5 was obtained from the CustInfo database [4] by joining five tables. The number of attributes shown for the realworld datasets reresents the number after multi-valued attributes were converted to several binary attributes. Table 4.2. Synthetic and Real-world Datasets Dataset # Attributes #Items Length of Record or Transactions # Transactions Chess Connect Mushroom Molecular CustInfo T10I4D100K T20I6D100K To comare the erformance of GenBR and revious aroaches with resect to the number of rules and the elased running time, we imlemented the FastGenRules for generating association rules [2] as well as the GB algorithm for generating generic basis for exact rules, and the RI algorithm for generating informative basis for aroximate association rules [5]. The GB and the RI algorithms were combined into the GBRI algorithm for our exeriments. The FastGenReresentatives (denoted FastGenRR) algorithm generates a set RR of reresentative association rules [20]. CoverRules generates an informative cover of cover rules [11]. The GenBR algorithm consists of two stes to generate basic association rules. The first ste generates all classes of basic association rules. The second ste removes redundant classes. The GenBR algorithm is more similar to the FastGenRules algorithm than to the other algorithms. To comare the GenBR with FastGenRules with resect to the elased running time, we only consider the first ste of GenBR, which generates the basic association rules, and we ignore the elased time required for the second ste. 4.2 Results We first comared the algorithms with regard to the number of rules generated. GenBR generates fewer rules than FastGenRules, and in some cases fewer rules than FastGenRR, GBRI, and CoverRules. Table 4.3. Chess, minsu = 80 % Algorithm / minconf Fast- #Rules Elased time Rules (ms) #Rules GBRI Elased time Fast- #Rules RR Elased time Cover- #Rules Rules Elased time GenBR #Classes #Rules Elased time For Chess with minsu = 80%, the number of rules generated by the algorithms and their elased times are shown in Table 4.3. GenBR generates significantly fewer rules than FastGenRules for all settings of minconf. For examle, when minconf = 10%, FastGenRules generates rules, while GenBR generates rules. Among all aroaches, CoverRules generates the fewest rules when minconf is 90% or less. For examle, in Table 4.3, with minconf = 10%, CoverRules generates 226 rules while GenBR generates rules. However, when minconf is 100%, GenBR generates the fewest rules. For examle, in Table 4.3, GenBR generates only 6 rules, while the other aroaches generate 2228 or more rules. Table 4.4. Comarison wrt the Number of Rules. Dataset Minsu Fast- Rules Fast- RR GBRI Cover- Rules GenBR Chess 80% Chess 90% Connect-4 95% Connect-4 97% Mushroom 30% Mushroom 40% Molecular 5% Molecular 10% CustInfo-5 5% CustInfo-5 10% T10I4D100K 05% T10I4D100K 1% T20I6D100K 2% T20I6D100K 3% For other datasets and different arameters, the exerimental results concerning the number of rules generated are similar to those shown in Table 4.3. We summarize these exerimental results in Table 4.4. Because users are generally concerned with association rules with high confidences, we only describe the number of rules with 100% confidence (minconf = 100%) for all datasets besides CustInfo-5. For CustInfo-5, since there is no exact association rule when minsu = 5% or 10%, we use minconf = 90%. Under these conditions, GenBR

11 generates fewer rules than the other algorithms, excet for a few cases when CoverRules generates the fewest. Although the comutation comlexity of GenBR is exonentially greater than that of FastGenRules (as mentioned in Section 3.4), in our exeriments, we observed that the elased time for GenBR is significantly less than that for FastGenRules, excet for cases where both algorithms have their fastest erformance, which corresond to values of minconf aroaching 100%. For examle, in Table 4.3, when minconf is 10%, the elased time for GenBR is ms while that of FastGenRules is ms. When minconf is 100%, the elased time of GenBR is ms while the elased time of FastGenRules is 969 ms. Across all tested settings of minconf from 10% to 100%, GenBR has the lowest maximum elased time (25593 ms). The number of exact association rules greatly affects the elased time of GenBR. If there are more exact association rules, there may be more functional deendencies, and consequently, GenBR sends more time. We summarize the exerimental results related to elased time by recording the ratios of the elased time of GenBR to revious algorithms in Table 4.5. We set minconf = 100%. The bold entries identify cases where the corresonding algorithm is faster than GenBR. The results show that GenBR is faster than all algorithms excet FastGenRules on most resented datasets. Table 4.5. Evaluation wrt Elased Time Dataset Minsu Fast Fast Cover- Rules RR GBRI Rules Chess 80% Chess 90% Connect-4 95% Connect-4 97% Mushroom 30% Mushroom 40% Molecular 5% Molecular 10% CustInfo-5 5% CustInfo-5 10% T10I4D100K 05% T10I4D100K 1% T20I6D100K 2% T20I6D100K 3% Conclusions and Future Work 5.1 Conclusions In this aer, we roosed a new tye of association rules, called basic association rules. First, by referring the relational database theory on functional deendencies, we develoed the new concets of a restricted conditional robability distribution (RCPD), a minimal conditional robability distribution (MCPD), a minimal conditional subset (MCS), and a basic association rule. We established the C-inference system on basic association rules, which is similar to Armstrong s axioms on functional deendencies. Secondly, we roosed the GenBR algorithm for generating a set of classes of basic association rules from a set of frequent itemsets. GenBR efficiently generates basic association rules as comared with the revious aroaches for generating small sets of association rules. GenBR also generates fewer rules than revious aroaches when minconf is high. Thirdly, arguably, users will find basic association rules to be more manageable and understandable than reviously roosed reduced sets of association rules. The rules are concise, the number of rules is small, redundancy among rules in a class of basic association rules has been eliminated, and inference involving confidence values is ossible on the rules. This oint is argued at greater length in [22]. Fourthly, we showed that the search sace of our algorithm to comute basic association rules is a hyercube (n-cube) or Q n grah. This insight aided in our theoretical analysis of the algorithm. 5.2 Future Work An oen roblem is to find a more efficient algorithm for discovering basic association rules from frequent itemsets. Finding a heuristic method to discover basic association rules without generating frequent itemsets will also be challenging. We roosed the idea of a restricted conditional robability distribution as a foundation for mining association rules, and we distinguished it from the traditional conditional robability distribution. It can be regarded as a more general concet than the contextsecific conditional robability distribution described in [6]. Further work on restricted conditional robability distributions may be romising. We established the C-inference system on association rules, which may be regarded as an extension of Armstrong s axioms on functional deendencies. This kind of inference system works on exact and aroximate rules simultaneously. Further exloration of the roerties of the C-inference system is needed. References [1] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. SIGMOD 93, ages , Washington. DC, May [2] R. Agrawal and R. Srikant. Fast algorithm for mining association rules. In Proc. VLDB 94, Santiago, Chile, [3] E. Baralis and G. Psaila. Designing temlates for mining association rules. Journal of Intelligent Information Systems, 9(1):7-32, July [4] B. Barber and H. J. Hamilton. Extracting share frequent itemsets with infrequent subsets. Data

12 Mining and Knowledge Discovery, 7(2): , Aril [5] Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In Proc. of the First International Conference on Comutational Logic, ages , 200 [6] C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller. Context-secific indeendence in Bayesian networks. In Proc. 12 th Conf. on Uncertainty in Artificial Intelligence (UAI-96), ages , Portland, Oregon, [7] S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlation. In Proc. SIGMOD 97, ages , May [8] S. Brin, R. Motwani, J. D. Ullman and S. Tsur. Dynamic itemset counting and imlication rules for market basket data. In Proc. SIGMOD 97, ages , Montreal, Canada, June [9] C. L. Carter, H. J. Hamilton, and N. Cercone. Share based measures for itemsets. In Proc. PKDD 97, ages 14-24, Trondheim, Norway, June [10] E. Castillo, J. M. Gutierrez, and A. S. Hadi, Exert Systems and Probabilistic Network Models, Sringer, [11] L. Cristofor and D. Simovici. Generating an informative cover for association rules. In Proc. of the IEEE International Conference on Data Mining, [12] B. A. Davey and H. A. Priestley. Introduction to Lattices and Order, Fourth edition. Cambridge University Press, [13] E. G. Goodaire and M. M. Parmenter. Discrete Mathematics with Grah Theory, Second Edition. Prentice Hall, Uer Saddle River, NJ, [14] J. Han, J. Pei, and Y. Yin. Mining frequent atterns without candidate generation. In Proc. SIGMOD 00, ages 1-12, Dallas, TX, May 200 [15] T. W. Haynes, S. T. Hedetniemi, and P. J. Slater. Fundamentals of Domination in Grahs. Marcel Dekker, [16] R. J. Hilderman and H. J. Hamilton. Knowledge Discovery and Interest Measures, Kluwer Academic, Boston, [17] IBM. GenData, [18] M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic rules. In Proc. KDD 96, ages , Portland, Oregon, [19] M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivnen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In Proc. of the 3 rd CIKM Conference, ages , November [20] M. Kryszkiewicz. Reresentative association rules and minimum condition maximum consequence association rules. In Proc. PKDD 98. ages , Nantes, France, [21] W. Lee, S. J. Stolfo, and K. W. Mokl. A data mining framework for building intrusion detection models. In Proc. IEEE Symosium on Security and Privacy, [22] G. Li. Basic Association Rules, M.Sc. Thesis, Deartment of Comuter Science, University of Regina, [23] W. Lin, S. A. Alvarez, and C. Ruiz. Collaborative recommendation via adative association rules mining. In Proc. WEBKDD 2000, San Francisco, CA, Aug. 200 [24] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Closed set based discovery of small covers for association rules. In Proc. of the 15 th Conference on Advanced Databases, ages , [25] C. M. Ricardo. Database Systems: Princiles, Design, & Imlementation, Macmillan, 199 [26] K. Satou, G. Shibayama, T. Ono, Y. Yamamura, E. Furuichi, S. Kuhara, and T. Takagi. Finding association rules on heterogeneous genome data. In Proc. of the Pacific Symosium on Biocomuting 97 (PSB 97), ages , Hawaii, Jan [27] R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proc. KDD 97, ages [28] H. Toivonen, M. Klemettinen, P. Ronkainen, K. Hatonen, and H. Mannila. Pruning and grouing discovered association rules. In Proc. ECML-95 Worksho on Statistics, Machine Learning, and Knowledge Discovery in Database, ages 47-52, Aril [29] UCI Machine Learning Reository. ft://ft.ics.uci.edu/ub/machine-learningdatabases/ [30] M. Zaki, Generating non-redundant association rules. In Proc. KDD 2000, ages August 200

Efficient discovery of statistically significant association rules

Efficient discovery of statistically significant association rules Efficient discovery of statistically significant association rules Wilhelmiina Hämäläinen Deartment of Comuter Science University of Helsinki Finland whamalai@cs.helsinki.fi ABSTRACT In this aer, we introduce

More information

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Guimei Liu 1,2 Jinyan Li 1 Limsoon Wong 2 Wynne Hsu 2 1 Institute for Infocomm Research, Singapore 2 School

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights

More information

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan

More information

Data Mining Association Rules: Advanced Concepts and Algorithms. Lecture Notes for Chapter 7. Introduction to Data Mining

Data Mining Association Rules: Advanced Concepts and Algorithms. Lecture Notes for Chapter 7. Introduction to Data Mining Data Mining Association Rules: Advanced Concets and Algorithms Lecture Notes for Chater 7 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/24 1

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/17/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar 15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

A New Concise and Lossless Representation of Frequent Itemsets Using Generators and A Positive Border

A New Concise and Lossless Representation of Frequent Itemsets Using Generators and A Positive Border A New Concise and Lossless Representation of Frequent Itemsets Using Generators and A Positive Border Guimei Liu a,b Jinyan Li a Limsoon Wong b a Institute for Infocomm Research, Singapore b School of

More information

732A61/TDDD41 Data Mining - Clustering and Association Analysis

732A61/TDDD41 Data Mining - Clustering and Association Analysis 732A61/TDDD41 Data Mining - Clustering and Association Analysis Lecture 6: Association Analysis I Jose M. Peña IDA, Linköping University, Sweden 1/14 Outline Content Association Rules Frequent Itemsets

More information

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Association Rule. Lecturer: Dr. Bo Yuan. LOGO Association Rule Lecturer: Dr. Bo Yuan LOGO E-mail: yuanb@sz.tsinghua.edu.cn Overview Frequent Itemsets Association Rules Sequential Patterns 2 A Real Example 3 Market-Based Problems Finding associations

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket

More information

Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies

Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Juliano Brito da Justa Neves 1 Marina Teresa Pires Vieira {juliano,marina}@dc.ufscar.br Computer Science Department

More information

Association Rules. Fundamentals

Association Rules. Fundamentals Politecnico di Torino Politecnico di Torino 1 Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket counter Association rule

More information

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions. Definitions Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Itemset is a set including one or more items Example: {Beer, Diapers} k-itemset is an itemset that contains k

More information

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example Association rules Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING, LIG M2 SIF DMV course 207/208 Market basket analysis Analyse supermarket s transaction data Transaction = «market basket» of a customer Find which items are

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

COMP 5331: Knowledge Discovery and Data Mining

COMP 5331: Knowledge Discovery and Data Mining COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline Kamber, and Jian Pei And slides provide by Raymond

More information

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Claudio Lucchese Ca Foscari University of Venice clucches@dsi.unive.it Raffaele Perego ISTI-CNR of Pisa perego@isti.cnr.it Salvatore

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm Gabriel Noriega, José Restreo, Víctor Guzmán, Maribel Giménez and José Aller Universidad Simón Bolívar Valle de Sartenejas,

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

Frequent Itemset Mining

Frequent Itemset Mining ì 1 Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM COCONUT Team (PART I) IMAGINA 17/18 Webpage: http://www.lirmm.fr/~lazaar/teaching.html Email: lazaar@lirmm.fr 2 Data Mining ì Data Mining (DM) or Knowledge

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

GIVEN an input sequence x 0,..., x n 1 and the

GIVEN an input sequence x 0,..., x n 1 and the 1 Running Max/Min Filters using 1 + o(1) Comarisons er Samle Hao Yuan, Member, IEEE, and Mikhail J. Atallah, Fellow, IEEE Abstract A running max (or min) filter asks for the maximum or (minimum) elements

More information

On the Toppling of a Sand Pile

On the Toppling of a Sand Pile Discrete Mathematics and Theoretical Comuter Science Proceedings AA (DM-CCG), 2001, 275 286 On the Toling of a Sand Pile Jean-Christohe Novelli 1 and Dominique Rossin 2 1 CNRS, LIFL, Bâtiment M3, Université

More information

Discovering Non-Redundant Association Rules using MinMax Approximation Rules

Discovering Non-Redundant Association Rules using MinMax Approximation Rules Discovering Non-Redundant Association Rules using MinMax Approximation Rules R. Vijaya Prakash Department Of Informatics Kakatiya University, Warangal, India vijprak@hotmail.com Dr.A. Govardhan Department.

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - 1DL36 Fall 212" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht12 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

CS 584 Data Mining. Association Rule Mining 2

CS 584 Data Mining. Association Rule Mining 2 CS 584 Data Mining Association Rule Mining 2 Recall from last time: Frequent Itemset Generation Strategies Reduce the number of candidates (M) Complete search: M=2 d Use pruning techniques to reduce M

More information

Lecture Notes for Chapter 6. Introduction to Data Mining

Lecture Notes for Chapter 6. Introduction to Data Mining Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

More information

Mining Free Itemsets under Constraints

Mining Free Itemsets under Constraints Mining Free Itemsets under Constraints Jean-François Boulicaut Baptiste Jeudy Institut National des Sciences Appliquées de Lyon Laboratoire d Ingénierie des Systèmes d Information Bâtiment 501 F-69621

More information

Introduction to Probability and Statistics

Introduction to Probability and Statistics Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based

More information

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

DATA MINING LECTURE 3. Frequent Itemsets Association Rules DATA MINING LECTURE 3 Frequent Itemsets Association Rules This is how it all started Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases.

More information

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule The Grah Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule STEFAN D. BRUDA Deartment of Comuter Science Bisho s University Lennoxville, Quebec J1M 1Z7 CANADA bruda@cs.ubishos.ca

More information

On Minimal Infrequent Itemset Mining

On Minimal Infrequent Itemset Mining On Minimal Infrequent Itemset Mining David J. Haglin and Anna M. Manning Abstract A new algorithm for minimal infrequent itemset mining is presented. Potential applications of finding infrequent itemsets

More information

Uncertainty Modeling with Interval Type-2 Fuzzy Logic Systems in Mobile Robotics

Uncertainty Modeling with Interval Type-2 Fuzzy Logic Systems in Mobile Robotics Uncertainty Modeling with Interval Tye-2 Fuzzy Logic Systems in Mobile Robotics Ondrej Linda, Student Member, IEEE, Milos Manic, Senior Member, IEEE bstract Interval Tye-2 Fuzzy Logic Systems (IT2 FLSs)

More information

A PROBABILISTIC POWER ESTIMATION METHOD FOR COMBINATIONAL CIRCUITS UNDER REAL GATE DELAY MODEL

A PROBABILISTIC POWER ESTIMATION METHOD FOR COMBINATIONAL CIRCUITS UNDER REAL GATE DELAY MODEL A PROBABILISTIC POWER ESTIMATION METHOD FOR COMBINATIONAL CIRCUITS UNDER REAL GATE DELAY MODEL G. Theodoridis, S. Theoharis, D. Soudris*, C. Goutis VLSI Design Lab, Det. of Electrical and Comuter Eng.

More information

FE FORMULATIONS FOR PLASTICITY

FE FORMULATIONS FOR PLASTICITY G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND

More information

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University

Association Rules. Acknowledgements. Some parts of these slides are modified from. n C. Clifton & W. Aref, Purdue University Association Rules CS 5331 by Rattikorn Hewett Texas Tech University 1 Acknowledgements Some parts of these slides are modified from n C. Clifton & W. Aref, Purdue University 2 1 Outline n Association Rule

More information

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete

More information

Construction of High-Girth QC-LDPC Codes

Construction of High-Girth QC-LDPC Codes MITSUBISHI ELECTRIC RESEARCH LABORATORIES htt://wwwmerlcom Construction of High-Girth QC-LDPC Codes Yige Wang, Jonathan Yedidia, Stark Draer TR2008-061 Setember 2008 Abstract We describe a hill-climbing

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 04 Association Analysis Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - DL360 Fall 200 An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht0 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala

More information

A Unified 2D Representation of Fuzzy Reasoning, CBR, and Experience Based Reasoning

A Unified 2D Representation of Fuzzy Reasoning, CBR, and Experience Based Reasoning University of Wollongong Research Online Faculty of Commerce - aers (Archive) Faculty of Business 26 A Unified 2D Reresentation of Fuzzy Reasoning, CBR, and Exerience Based Reasoning Zhaohao Sun University

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

CS 484 Data Mining. Association Rule Mining 2

CS 484 Data Mining. Association Rule Mining 2 CS 484 Data Mining Association Rule Mining 2 Review: Reducing Number of Candidates Apriori principle: If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due

More information

Bayesian System for Differential Cryptanalysis of DES

Bayesian System for Differential Cryptanalysis of DES Available online at www.sciencedirect.com ScienceDirect IERI Procedia 7 (014 ) 15 0 013 International Conference on Alied Comuting, Comuter Science, and Comuter Engineering Bayesian System for Differential

More information

A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search

A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search Eduard Babkin and Margarita Karunina 2, National Research University Higher School of Economics Det of nformation Systems and

More information

Topic 7: Using identity types

Topic 7: Using identity types Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules

More information

Basic Data Structures and Algorithms for Data Profiling Felix Naumann

Basic Data Structures and Algorithms for Data Profiling Felix Naumann Basic Data Structures and Algorithms for 8.5.2017 Overview 1. The lattice 2. Apriori lattice traversal 3. Position List Indices 4. Bloom filters Slides with Thorsten Papenbrock 2 Definitions Lattice Partially

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Improved Capacity Bounds for the Binary Energy Harvesting Channel Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Decoding Linear Block Codes Using a Priority-First Search: Performance Analysis and Suboptimal Version

Decoding Linear Block Codes Using a Priority-First Search: Performance Analysis and Suboptimal Version IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 3, MAY 1998 133 Decoding Linear Block Codes Using a Priority-First Search Performance Analysis Subotimal Version Yunghsiang S. Han, Member, IEEE, Carlos

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Mining Positive and Negative Fuzzy Association Rules

Mining Positive and Negative Fuzzy Association Rules Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing

More information

Generation of Linear Models using Simulation Results

Generation of Linear Models using Simulation Results 4. IMACS-Symosium MATHMOD, Wien, 5..003,. 436-443 Generation of Linear Models using Simulation Results Georg Otte, Sven Reitz, Joachim Haase Fraunhofer Institute for Integrated Circuits, Branch Lab Design

More information

Research of PMU Optimal Placement in Power Systems

Research of PMU Optimal Placement in Power Systems Proceedings of the 5th WSEAS/IASME Int. Conf. on SYSTEMS THEORY and SCIENTIFIC COMPUTATION, Malta, Setember 15-17, 2005 (38-43) Research of PMU Otimal Placement in Power Systems TIAN-TIAN CAI, QIAN AI

More information

A Special Case Solution to the Perspective 3-Point Problem William J. Wolfe California State University Channel Islands

A Special Case Solution to the Perspective 3-Point Problem William J. Wolfe California State University Channel Islands A Secial Case Solution to the Persective -Point Problem William J. Wolfe California State University Channel Islands william.wolfe@csuci.edu Abstract In this aer we address a secial case of the ersective

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

Logistics Optimization Using Hybrid Metaheuristic Approach under Very Realistic Conditions

Logistics Optimization Using Hybrid Metaheuristic Approach under Very Realistic Conditions 17 th Euroean Symosium on Comuter Aided Process Engineering ESCAPE17 V. Plesu and P.S. Agachi (Editors) 2007 Elsevier B.V. All rights reserved. 1 Logistics Otimization Using Hybrid Metaheuristic Aroach

More information

Monopolist s mark-up and the elasticity of substitution

Monopolist s mark-up and the elasticity of substitution Croatian Oerational Research Review 377 CRORR 8(7), 377 39 Monoolist s mark-u and the elasticity of substitution Ilko Vrankić, Mira Kran, and Tomislav Herceg Deartment of Economic Theory, Faculty of Economics

More information

A Parallel Algorithm for Minimization of Finite Automata

A Parallel Algorithm for Minimization of Finite Automata A Parallel Algorithm for Minimization of Finite Automata B. Ravikumar X. Xiong Deartment of Comuter Science University of Rhode Island Kingston, RI 02881 E-mail: fravi,xiongg@cs.uri.edu Abstract In this

More information

arxiv: v2 [quant-ph] 2 Aug 2012

arxiv: v2 [quant-ph] 2 Aug 2012 Qcomiler: quantum comilation with CSD method Y. G. Chen a, J. B. Wang a, a School of Physics, The University of Western Australia, Crawley WA 6009 arxiv:208.094v2 [quant-h] 2 Aug 202 Abstract In this aer,

More information

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21 Trimester 1 Pretest (Otional) Use as an additional acing tool to guide instruction. August 21 Beyond the Basic Facts In Trimester 1, Grade 8 focus on multilication. Daily Unit 1: Rational vs. Irrational

More information

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding Homework Set # Rates definitions, Channel Coding, Source-Channel coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION O P E R A T I O N S R E S E A R C H A N D D E C I S I O N S No. 27 DOI:.5277/ord73 Nasrullah KHAN Muhammad ASLAM 2 Kyung-Jun KIM 3 Chi-Hyuck JUN 4 A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Frequent Itemset Mining

Frequent Itemset Mining 1 Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM IMAGINA 15/16 2 Frequent Itemset Mining: Motivations Frequent Itemset Mining is a method for market basket analysis. It aims at finding regulariges in

More information

Lilian Markenzon 1, Nair Maria Maia de Abreu 2* and Luciana Lee 3

Lilian Markenzon 1, Nair Maria Maia de Abreu 2* and Luciana Lee 3 Pesquisa Oeracional (2013) 33(1): 123-132 2013 Brazilian Oerations Research Society Printed version ISSN 0101-7438 / Online version ISSN 1678-5142 www.scielo.br/oe SOME RESULTS ABOUT THE CONNECTIVITY OF

More information

and INVESTMENT DECISION-MAKING

and INVESTMENT DECISION-MAKING CONSISTENT INTERPOLATIVE FUZZY LOGIC and INVESTMENT DECISION-MAKING Darko Kovačević, č Petar Sekulić and dal Aleksandar Rakićević National bank of Serbia Belgrade, Jun 23th, 20 The structure of the resentation

More information

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem An Ant Colony Otimization Aroach to the Probabilistic Traveling Salesman Problem Leonora Bianchi 1, Luca Maria Gambardella 1, and Marco Dorigo 2 1 IDSIA, Strada Cantonale Galleria 2, CH-6928 Manno, Switzerland

More information

Detection Algorithm of Particle Contamination in Reticle Images with Continuous Wavelet Transform

Detection Algorithm of Particle Contamination in Reticle Images with Continuous Wavelet Transform Detection Algorithm of Particle Contamination in Reticle Images with Continuous Wavelet Transform Chaoquan Chen and Guoing Qiu School of Comuter Science and IT Jubilee Camus, University of Nottingham Nottingham

More information

Bayesian Networks Practice

Bayesian Networks Practice Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation

More information

Matching Partition a Linked List and Its Optimization

Matching Partition a Linked List and Its Optimization Matching Partition a Linked List and Its Otimization Yijie Han Deartment of Comuter Science University of Kentucky Lexington, KY 40506 ABSTRACT We show the curve O( n log i + log (i) n + log i) for the

More information

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test) Chater 225 Tests for Two Proortions in a Stratified Design (Cochran/Mantel-Haenszel Test) Introduction In a stratified design, the subects are selected from two or more strata which are formed from imortant

More information

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

Effective Elimination of Redundant Association Rules

Effective Elimination of Redundant Association Rules Effective Elimination of Redundant Association Rules James Cheng Yiping Ke Wilfred Ng Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay,

More information

Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries

Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries Free-sets : a Condensed Representation of Boolean Data for the Approximation of Frequency Queries To appear in Data Mining and Knowledge Discovery, an International Journal c Kluwer Academic Publishers

More information

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction GOOD MODELS FOR CUBIC SURFACES ANDREAS-STEPHAN ELSENHANS Abstract. This article describes an algorithm for finding a model of a hyersurface with small coefficients. It is shown that the aroach works in

More information