Efficient discovery of statistically significant association rules

Size: px
Start display at page:

Download "Efficient discovery of statistically significant association rules"

Transcription

1 Efficient discovery of statistically significant association rules Wilhelmiina Hämäläinen Deartment of Comuter Science University of Helsinki Finland ABSTRACT In this aer, we introduce a new, effective algorithm for searching non-redundant, statistically significant association rules. StatAriori is based on the classical Ariori algorithm, but for each attribute set, we already generate the best non-redundant association rule. In addition, we estimate the uer-bound for the significance of more secial rules. If the uer-bound is too low to roduce any non-redundant, significant rules, the whole branch can be runed. StatAriori guarantees that no significant rules are missed (tye 2 error) and the number of non-significant rules generated is minimal (tye 1 error). In the classical Ariori, tye 2 error can be avoided by a sufficiently low minimum frequency threshold, in the cost of tye 1 error. In addition, the small frequency threshold causes exonential exlosion and the roblem becomes intractable. StatAriori can tackle many of these difficult roblems efficiently. According to emirical results, StatAriori needs only a small fraction of time and memory comared to the classical Ariori. Categories and Subject Descritors H.2.8 [Database Management]: Database Alications- Data mining General Terms association rules, statistical significance 1. INTRODUCTION Traditional association rules [2] are rules of form if event X = x occurs, then also event A = a is likely to occur. The commonness of the rule is measured by frequency P (X = x, A = a) and the strength of the rule by confidence P (A = a X = x). For comutational uroses it is required that both frequency and confidence should exceed some userdefined thresholds. The actual interestingness of the rule is usually decided afterwards, by some interestingness measure. P (X=x,A=a) P (X=x),P (A=a) Often the associations are interreted as deendencies between certain attribute value combinations. However, traditional association rules do not necessarily cature statistical deendencies, but they can associate absolutely indeendent events while ignoring strong deendencies. As a solution, it is often suggested (following Piatetsky-Shairo [13]) to measure the degree of deendence instead of the confidence (e.g. [16, 17, 1]). This roduces also statistically more sound results, since the statistical significance of a rule is a function of its frequency and degree of deendence. I.e., they define the robability that the rule had occurred by chance, if the events were actually indeendent. The main roblem of the traditional association rules is the ignorance of statistical significance. Frequency-confidencebased discovery roduces many surious rules but misses statistically significant rules. In statistics, these two error tyes are called tye 1 and tye 2 errors. In the worst case, all discovered rules can be surious [18, 19]. The roblems of association rules and esecially frequencyconfidence-framework are well-known ([18, 19, 4, 1, 5, 12]), but still there has been only few attemts to solve the roblem. The reason is that its is considered comutationally infeasible to search all statistically significant rules. However, statistically sound algorithms (e.g. [11, 12]) have been introduced for searching decision rules, with a fixed consequent. In addition, there is a number of aroaches (e.g. [8, 11, 12]) which use statistical measures, like χ 2, for selecting the best association rules. Unfortunately, none of these aroaches is suitable for association rules, and they can cause both tye 1 and tye 2 errors.[6]. In this aer, we introduce a new algorithm called StatAriori which solves the roblem. StatAriori guarantees that no significant rules are missed, while the number of surious rule candidates generated during the execution is ket minimal. In addition, it roduces only minimal significant rules, runing out all redundant rules. Comared to a modification of the classical Ariori algorithm, which also roduces all significant association rules, StatAriori is very efficient. It can tackle roblems which are imossible to comute with the classical Ariori. The secret of StatAriori lies in two runing techniques:

2 First, for each generated attribute set X, StatAriori estimates an uer-bound for the statistical significance of rules derived from X s suersets. If the uer-bound is lower than the required statistical significance, the whole branch can be runed. Second, StatAriori selects the best association rule immediately 1, when a otentially significant attribute set is generated. Thus, the statistical significance of all more common rules is known, and a branch can be runed, if it would roduce only redundant rules. The organization of the aer is the following: The basic definitions are given in Section 2 and the theoretical results in Section 3. The algorithm is resented in Section 4 and exerimental results in Section 5. The final conclusions are drawn in Section BASIC DEFINITIONS In the following we give basic definitions of the association rule, statistical deendence, statistical significance, and redundancy. The notations are introduced in Table 1. Table 1: Basic notations. Notation Meaning A, B, C,... binary attributes a, b, c,... {0, 1} attribute values R = {A 1,..., A k } set of all attributes R = k number of attributes in R Dom(R) = {0, 1} k attribute sace X, Y, Z R attribute sets Dom(X) = {0, 1} l Dom(R) domain of X, X = l (X = x) = {(A 1 = a 1 ),..., event, X = l (A l = a l )} t = {A 1 = t(a 1),..., A k = t(a k )} row r = {t 1,..., t n t i Dom(R)} relation (data set) r = n size of relation r σ X=x(r) = {t r t[x] = x} set of rows where X = x m(x = x) = σ X=x (r) number of rows where X = x P (X = x) = m(x=x) relative frequency of n X = x 2.1 Association rules Traditionally, association rules are defined in the frequencyconfidence framework: Definition 1 (Association rule). Let R be a set of binary attributes and r a relation according to R. Let X R and Y R \ X, be attribute sets and x Dom(X) and y Dom(Y ) their value combinations. The confidence of rule (X = x) (Y = y) is cf(x = x Y = y) = and the frequency of the rule is P (X = x, Y = y) P (X = x) fr(x = x Y = y) = P (X = x, Y = y). = P (Y = y X = x) 1 In addition to statistical connotations, word stat is an abbreviation for the Latin word statim, immediately. Given user-defined thresholds min cf, min fr [0, 1], rule (X = x) (Y = y) is an association rule in r, if (i) cf(x = x Y = y) min cf, and (ii) fr(x = x Y = y) min fr. The first condition requires that an association rule should be strong enough and the second condition requires that it should be common enough. In this aer, we call rules association rules, even if no thresholds min fr and min cf are secified. Usually, it is assumed that the consequent Y = y contains just one attribute (i.e. Y = 1) and the rule contains only ositive attribute values (A i = 1). Now, the rule can be exressed simly by listing the attributes, e.g. A 1, A 3, A 5 A 2. From the statistical oint of view, the direction of a rule (X Y or Y X) is a matter of choice. In this aer, we follow a common convention and assign the single attribute into the consequence. If the rule contains just two attributes, the direction is selected randomly. 2.2 Statistical deendence Statistical deendence is usually defined through statistical indeendence (e.g. [15, 9]): Definition 2 (Indeendence and deendence). Let X R and Y R \ X be sets of binary attributes. Events X = x and Y = y, x Dom(X), y Dom(Y ), are mutually indeendent, if P (X = x, Y = y) = P (X = x)p (Y = y). If the events are not indeendent, they are deendent. The strength of a statistical deendency between (X = x) and (Y = y) can be measured by a degree of deendence (also called deendence [20], degree of indeendence [21], or interest [5]): γ(x = x, Y = y) = P (X = x, Y = y) P (X = x)p (Y = y). (1) 2.3 Statistical significance The idea of statistical significance tests is to estimate the robability of the observed or a rarer henomenon, under some null hyothesis. When the objective is to test the significance of the deendency between X = x and Y = y, the null hyothesis is the indeendence assumtion: P (X = x, Y = y) = P (X = x)p (Y = y). If the estimated robability is very small, we can reject the indeendence assumtion, and assume that the observed deendency is not due to chance, but significant at level. The smaller is, the more significant the observation is. The significance of the observed frequency m(x, Y ) can be estimated exactly by the binomial distribution. Each row in

3 relation r, r = n, corresonds to an indeendent Bernoulli trial, whose outcome is either 1 (XY occurs) or 0 (XY does not occur). All rows are mutually indeendent. Assuming the indeendence of attributes X and Y, combination XY occurs on a row with robability P (X)P (Y ). Now the number of rows containing X, Y is a binomial random variable M with arameters P (X)P (Y ) and n. The mean of M is µ M = np (X)P (Y ) and its variance is σm 2 = np (X)P (Y )(1 P (X)P (Y )). Probability P (M m(x, Y )) gives the significance : = nx i=m(x,y ) n i (P (X)P (Y )) i (1 P (X)P (Y )) n i. (2) This can be aroximated by the standard normal distribution 1 Φ(t), where Φ(t(X, Y )) = 1 e u2 /2 du is the standard normal cumulative distribution function and t(x, Y ) = is standardized m(x, Y ): t(x, Y ) = 2π R t(x,y ) m(x, Y ) µm σ M m(x, Y ) np (X)P (Y ) np (X)P (Y )(1 P (X)P (Y )). (3) The cumulative distribution function Φ(t) is quite difficult to calculate, but for the association rule mining it is enough to know t(x, Y ). Since Φ(t) is monotonically increasing, robability is monotonically decreasing in the terms of t(x, Y ). Thus, we can use t as a measure function for ranking association rules according to their significance. On the other hand, we know that according to Chebyshev s inequality (the roof is given e.g. in [10, ]): P K < M µm σ M «< K 1 1 K. 2 I.e. P (t K) < 1 2K 2. For examle, requirement K 2 corresonds to statistical significance level = Bonferroni adjustment Usually the minimum requirement for any significance is It means that there is a 5% chance that a surious rule asses the significance test ( tye 1 error ). If we test rules, it is likely that we will find 500 surious rules. This so called multile testing roblem is inherent in the knowledge discovery, where we often erform an exhaustive search over all ossible atterns. As a solution, the more atterns we test, the stricter bounds for the significance we should use. The most well-known method is Bonferroni adjustment [14], where the desired significance level is divided by the number of tests m. The same effect is achieved by using mk instead of K as a minimum threshold for t. In the association rule discovery, we can give an uer bound for the number of rules to be tested. However, this rule is so strict that there is a risk that we do not recognize all significant atterns ( tye 2 error ). As a solution, we suggest that testing (the search) is finished, before the rules become too comlex, and m increases so high that no statistical significance can be guaranteed. 2.5 Redundancy A common goal in association rule discovery is to find the most general rules (containing the minimal number of attributes) which satisfy the given search criteria. There is no sense to outut comlex rules X Y, if their generalizations Z Y, Z X are at least equally significant. Generally, the goal is to find minimal (or most general) interesting rules, and rune out redundant rules [3]. In this aer we define redundancy as follows: Definition 3 (Minimal and redundant rules). Given some interestingness measure M, rule X Y is a minimal rule, if there does not exist any rule X Y such that X Y X Y and M(X Y ) M(X Y ). If the rule is not minimal, then it is redundant. In our case, function t is used as a measure function, but in rincial M can be any function which increases with the interestingness. When redundancy reduction is alied in ractice, we have to check two cases: 1. A rule is redundant, if there are at least equally good, more common rules. 2. A rule is not the most significant, minimal rule, if all its secializations are better. 3. THEORETICAL RESULTS The statistical significance of rule X Y is a function of its frequency P (X, Y ) and degree of deendence γ(x, Y ). Equation 3 can be resented equivalently as t(x Y ) = np (X, Y )(γ 1) γ P (X, Y ). (4) The higher the frequency is, the lower the degree of deendence can be, and vice versa. The following theorem exresses the relationshi between the frequency and the degree of deendence: Theorem 1. When t(x Y ) = K, P (X, Y ) = K 2 γ n(γ 1) 2 + K 2.

4 Proof. By solving t(x Y ) = np (X)(γ 1) γ P (X) = K. This result can be used for runing areas in the search sace, when an uer-bound for γ is known. The simlest method to search all statistically significant rules is to search all frequent sets with sufficiently small min fr and then select from each frequent set the rules with sufficient t. The following theorem gives a safe minimum frequency threshold for the whole data set. It guarantees that no significant rules are missed. Theorem 2. Let min = min{p (A i A i R}. Let K 2 be the desired significance level. For all sets X R and any A X (i) γ(x \ A A) 1 min and (ii) X A cannot be significant, unless Proof. By solving for γ = 1 min. K 2 min P (X) n(1 min ) 2 + K 2 2. min t(x \ A A) = np (X)(γ 1) γ P (X) K This result enables discovery of statistically significant association rules with the normal Ariori algorithm, roceeded by an aroriate rule generation hase. Unfortunately, the resulting min fr is so low that nearly all attribute sets are selected and the comutation becomes unfeasible. However, this aroach can be used as a benchmark for evaluating the efficiency and correctness of other algorithms. The theorem gives also the highest K-value which can be used and still find rules. This is imortant information, when the alicability of Bonferroni adjustment is decided. In the extreme case, it can mean that there is not enough data to draw any statistical conclusions. Corollary 1. If K > n(1 min) 1+ min, no attribute set can roduce significant rules. Better runing can be achieved by using several minimum frequencies. The following theorem defines a minimum frequency for the given set X: Theorem 3. Let min(x) = min{p (A i) A i X} and K P (X) < 2 min. Then for all Y X such that n(1 min ) 2 +K 2 2 min min(y ) = min(x) all rules Y \ A A are insignificant at level K. Proof. Let Y X such that min (Y ) = min (X) and A Y. Now γ(y \ A), P (Y ) P (X) and t(y \ A A) according to the assumtion. 1 min (X) np (X)(1 min (X)), which is < K min (X)(1 P (X) min (X)) The following corollary exresses the anti-monotonicity roerty used for runing: Corollary 2. No significant rules can be derived from Y, unless all subsets X Y with min (X) = min (Y ) have frequency K 2 min P (X). n(1 min ) 2 + K 2 2 min If set X is sufficiently frequent, it is called otentially significant. Efficient runing can be achieved by the following strategy: First, all attributes are arranged into an ascending order by their frequencies. Then the attribute sets are generated level by level, in a canonical order. From set X = {A i1,..., A il }, where P (A i1 ),..., P (A il ), we can generate sets X {A j }, where j > i l. Now all suersets of X have the same uer-bound for the degree of deendence, γ 1 P (A i1. If X is not otentially significant, then ) the whole branch can be runed. The uer-bound for t can be used for redundancy reduction, too. If X or its subsets have already roduced a rule with higher or equal t than the uer-bound, then the branch can be runed. 4. STATAPRIORI ALGORITHM Next, we give the StatAriori algorithm which imlements the theoretical results. In addition, we give an analysis of the worst-case time comlexity. 4.1 Algorithm The StatAriori algorithm is given in Figure 1. The main algorithm is otherwise similar to the traditional Ariori, excet it generates the best rule for each attribute set immediately. This enables efficient runing of non-roductive search aths. First all attribute sets are initialized by single attributes. If the suersets of the attribute cannot roduce any significant rules, the attribute is runed. After that the algorithm roceeds in the breadth-first manner, by generating new candidate sets (subroutine GenCands) and runing attribute sets whose suersets cannot roduce any minimal, significant rules (subroutine PruneCands). Subroutine SelectRules generates the best rules from discovered attribute sets and checks their redundancy relative to more general rules. For simlicity, the collections of attribute sets are reresented as arrays and each attribute set is defined as a record with four fields:

5 Figure 1: Algorithm StatAriori(R, r, K) for searching all minimal, significant association rules from data. Inut: set of attributes R, data set r, significance level K Outut: minimal, significant rules Method: 1 P S 1 = ; 2 for i = 0 to R // initialize 1-sets 3 P S 1 [i].set = {R[i]}; // database ass: 4 count frequencies P S 1[i].fr from r; 5 for i = 0 to R 6 if (otmaxt( r, P S 1 [i].fr, P S 1 [i].fr) K) 7 rune P S 1 [i]; 8 arrange P S 1 into ascending order by P S 1[i].fr; 9 l = 2; 10 while ((l R ) and ( P S l l 1)) 11 P ar = ; 12 (C l, P ar)=gencands(p S l 1 ); // Fig (P S l, P ar)=prunecands(c l, P ar, K); // Fig P S l =SelectRules(P S l, P S l 1 P ar); // Fig l++; // Outut minimal, significant rules: 16 for i = 2 to l 1 17 for j = 0 to P S i 1 18 if (((P S i [j].red == 0)) and (P S i [j].maxt K) and (not childrenbetter(l, j))) 19 outut best rule(s) of P S i [j]; set fr cons maxt attributes, frequency, consequence of the best rule, and t-value of the rule or its arent rule, if the rule is redundant. We note that in the imlementation, there is no need for searate collections, but all collections can be stored into one indexing structure, like a trie. Sets of l attributes are called l-sets. The collection of all l-candidate sets is denoted by C l and the collection of all otentially significant l-sets is denoted by P S l. GenCands (Figure 2) generates l-candidate sets from otentially significant l 1 sets. Candidate set X is generated only if it or any of its secializations Z X can roduce a significant rule. The necessary condition is that all X s arent sets Y X, Y = l 1, containing the same minimum attribute (the same uer bound for γ) are sufficiently frequent and thus in P S l 1. The only arent with a different minimum attribute does not have to be in P S l 1. However, if it is missing, it is added to a temorary collection P ar for the frequency counting. Its frequency is needed later when selecting the best rule of X. PruneCands (Figure 3) calculates the frequencies of l-candidate sets and secial arent sets from the database. Then it checks if a candidate set or any of its secializations can roduce a significant rule. If the frequency is too low, the Figure 2: Algorithm GenCands(P S l 1, l). for generating l-candidate sets. Inut: otentially significant l 1 sets P S l 1, sizel Outut: l-candidates C l, secial arents P ar Method: 1 C l = ; 2 next = 0; next2 = 0; 3 for (i = 0) to P S l for (j = i + 1) to P S l if ((commonmin(p S l 1 [i], P S l 1 [j])) and ( P S l 1 [i].set P S l 1 [i].set == l 2)) 6 C l [next].set=p S l 1 [i].set P S l 1 [j].set); 7 next++; // all arents with the same minimum // attribute have to be otentially significant 8 if ( Y C l [next].set (( Y == l 1) and (commin(y, C l [next].set)) and (Y / P S l 1 )) 9 next ; // remove from P S l 10 else 11 X = C l [i].set\ {minattr(x)}; 12 if (X / C l 1 ) 13 P ar[next2].set=x; // add secial arent 14 next2++; 15 return (C l, P ar) whole branch is runed. SelectRules (Fig. 4) checks the redundancy of otentially significant sets X. If any of the arent rules (more common rules) have higher or equal t-value than the uer-bound for t in X and its children, then X is runed. If X was saved, but some of its arents have higher t-value, then X is marked as redundant. X s t-value is relaced by the arents maximal t-value. In this way, it is always enough to check the immediate arents t-values for redundancy checking. The auxiliary functions used in the algorithm are defined in Figure 5. Figure 3: Algorithm PruneCands(C l, l, K) for runing l-candidate sets whose canonical suersets cannot be significant. Inut: l-candidates C l,size l, significance level K Outut: otentially significant l-sets P S l, secial arents P ar Method: 1 P S l = ; // database ass: 2 count frequencies C l [i].fr and P ar[j].fr from r; 3 for i = 0 to C l 1 4 min = min{p (A) A C l [i].set}; 5 if (otmaxt( r,c l [i].fr, min) K) 6 P S l = P S l C l [i]; 7 return (P S l, P ar);

6 Figure 4: Algorithm SelectRules(P S l, P S l 1 ) for selecting the best, minimal rules from P S l. Inut: otentially significant l-sets P S l and l 1-sets P S l 1 Outut: modified P S l Method: 1 for i = 0 to P S l 1 2 // calculate an uer-bound for t in the branch 3 maxγ = max{γ(p S l [i].set \ {A} A) A P S l [i]}; 4 P S l [i].cons=a; // consequent 5 P S[i].maxt = t(n, P S l [i].fr, maxγ) 6 if j((p S l 1 [j].set P S l [i].set) and (P S l 1 [j].maxt P S l [i].maxt))) 7 P S l [i].red = 1; // rule is redundant 8 P S l [i].maxt = P S l 1 [j].maxt; 9 return P S l Figure 5: Functions commonmin, t, otmaxt, and childrenbetter. commonmin(x, Y ) return (( A X Y ) and ( B X Y (P (B) P (A)))) minattr(x) return A X( B XP (A) P (B)) t(n, fr, γ) return n fr(γ 1) γ fr ; otmaxt(n, fr, min) return ( n fr(1 min) ) min(1 fr min) childrenbetter(l, ind) return ( C l+1 [j] (C l [ind] C l+1 [j]) (not (C l+1 [j].red) and (C l+1 [j].maxt > C l [ind].maxt))) 4.2 Time comlexity It is known that the roblem of searching all frequent attribute sets is NP -hard, in the terms of the number of attributes, k [7]. The worst case haens, when the most significant association rule involves all k attributes, and all 2 k attribute sets are generated. The worst case comlexity of the algorithm is O(min{k 2, n}2 k ). Theorem 4. The worst-case time comlexity of StatAriori is O(min{k 2, n}2 k ), where n is the number of rows and k is the number of attributes. Proof. The initialization (generation of 1-sets) takes n k stes. Producing l-sets and their best rules takes l 2 C l + 2n C l log k + 2 P S l l time stes. The first term is the time comlexity of the candidate generation. Each candidate has l arents and each arent can be found in the trie in l 1 stes. The second term is comlexity of the runing hase. The database is read (n rows) and on each row C l candidates are checked. In the worst case, all candidates have an extra arent which has to be checked, too. Checking takes in the worst case log k stes, when the data is stored as bit vectors and inclusion test is imlemented with logical bit oerations. The third term is the comlexity of the rule selection hase: for each of P S l sets, all l arents are checked. Checking is done at most twice: once for calculating the maximal t- value (selecting the best rule) and second time for checking the redundancy. Each checking can be imlemented in constant time, if the arent ointers are stored into a temorary structure in candidate generation hase. Since C l P S l, the total comlexity is P k l=2 min{l2, n} C l < min{k 2, n} P k ` k l=2 l = O(min{k 2, n}2 k ). 5. EXPERIMENTAL RESULTS StatAriori was tested with real and simulated data sets, which are often used for testing association rule discovery rograms. All data sets were achieved from FIMI reository for frequent itemset mining (htt://fimi.cs.helsinki.fi/). Two of the data sets, mushroom and chess, were dense and known to be athological cases for the traditional Ariori. The other two, T10I4D100K and T40I10D100K, were large but sarse sets, thus reminding tyical market-basket data. In addition, we tried modified data sets (market by asterisc, *) which were roduced from the original sets by selecting randomly half of the attributes. If the row became emty, it was also removed. All exeriments were executed on Intel Core Duo rocessor T GHz with 1 GB RAM, 2MG cache, and Linux oerating system. The results are given in Table 2. The original idea was to comare StatAriori to traditional Ariori. However, it turned out that traditional Ariori run out of memory (after long excecution) in most tests, because the minimum frequency threshold was so low. With higher thresholds, some significant rules could have been missed.

7 When Ariori was halted by the system, we have reorted the last arameter values. The only tests which Ariori assed successfully were with the modified mushroom data. With K = 90 StatAriori generated only 0.03% of the sets Ariori did and with K = %, i.e. less than a romille. With the original mushroom data Ariori was halted at level 6, but still the number of otentially significant sets was five times the number Stat- Ariori generated during the whole execution. It was also checked that all rules discovered by StatAriori were minimal significant rules, and all extra rules roduced by Ariori were redundant. Therefore, StatAriori roduced a smaller number of rules and the maximal length of rules was also smaller. T10I4D100K and T40I10D100K were difficult even for Stat- Ariori, even if the largest ossible K-value was used. This was not a surrise, since the minimum frequency threshold was only StatAriori finished T10I4D100K in about 2 hours (the exact rocessor time was lost). T40I10D100K took over a night (intermediate results were lost) and roduced rules with K = 315. With higher K values, some significant rules would have been lost. We note that both StatAriori and traditional Ariori algorithms were imlemented without any secial otimizations. The discovered attribute sets were stored into a trie structure, but the data was stored into bit vectors without any indexing structure. Since the database ass is the main bottle-neck, the erformance could be imroved with an efficient indexing structure. To test the effect of indexing structures, we tried an Ariori imlementation with a refix tree for the data by C. Borgelt (FIMI reository, htt://fimi.cs.helsinki.fi/src/). The refix tree imroved only the execution of those roblems which were anyway tractable, but it could not solve any new roblems. In addition, Borgelt s rogram lost some frequent (and otentially significant) 1-sets when the lowest frequency thresholds were used. We tested StatAriori also with an on-line Bonferroni adjustement. In the beginning the initial K (e.g. K = 2) is multilied with the squareroot of number of 1-sets, tyically k. The K value is udated after every l-collection to reflect the current number of tests. The resulting search was very efficient, but it suffers for one roblem: if the next K-value, K l, is too high for acceting any rules, we can backtrack to the revious value K i, i < l, which actually roduced significant rules. However, it is ossible that the number of rules, which are otentially significant at level K i, is so large that K should be udated. Thus, we should iterate between K i and K l until we find such K that {X X is otentially significant at level K} K. 6. CONCLUSIONS In this aer, we have introduced a new algorithm for searching minimal, statistically significant association rules. As far as we know, this is the first algorithm for discovering all statistically significant association rules. The revious algorithms have solved the roblem only for decision rules or used statistical measures which are not designed for association rules, thus roducing tye 1 or tye 2 errors. According to our exeriments with real and simulated data, the algorithm is reasonably efficient. It searches only a small fraction of search sace exlored by the traditional Ariori with a sufficiently small minimum frequency threshold. (In fact, if we would allow tye 2 errors like other aroaches, it would cometite with the existing algorithms.) In addition, it roduces only minimal association rules, without any further runing ste. The main bottle-neck is the same as with Ariori: the database ass. This can be imroved with a suitable indexing structure. Such a structure will be develoed in the future research, ideally for any association rules, including negations. 7. REFERENCES [1] C. Aggarwal and P. Yu. A new framework for itemset generation. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symosium on Princiles of Database Systems (PODS 1998), ages 18 24, New York, USA, ACM Press. [2] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, ages , Washington, D.C., [3] Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the First International Conference on Comutational Logic (CL 00), volume 1861 of Lecture Notes in Comuter Science, ages , London, UK, Sringer-Verlag. [4] F. Berzal, I. Blanco, D. Sánchez, and M. A. V. Miranda. A new framework to assess association rules. In Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis (IDA 01), volume 2189 of Lecture Notes In Comuter Science, ages , London, UK, Sringer-Verlag. [5] S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. In J. Peckham, editor, Proceedings ACM SIGMOD International Conference on Management of Data, ages ACM Press, [6] W. Hämäläinen. Assessing the statistical significance of association rules. Data Mining and Knowledge Discovery, Submitted. [7] C. Jermaine. Finding the most interesting correlations in a database: how hard can it be? Information Systems, 30(1):21 46, [8] B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 99), ages , New York, USA, ACM Press. [9] R. Meo. Theory of deendence values. ACM Transactions on Database Systems, 25(3): , [10] J. Milton and J. Arnold. Introduction to Probability

8 and Statistics: Princiles and Alications for Engineering and the Comuting Sciences. McGraw-Hill, New York, 4th edition, [11] S. Morishita and A. Nakaya. Parallel branch-and-bound grah search for correlated association rules. In M. Zaki and C.-T. Ho, editors, Revised Paers from Large-Scale Parallel Data Mining, Worksho on Large-Scale Parallel KDD Systems, SIGKDD, volume 1759 of Lecture Notes in Comuter Science, ages , London, UK, Sringer-Verlag. [12] S. Morishita and J. Sese. Transversing itemset lattices with statistical metric runing. In Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symosium on Princiles of database systems (PODS 00), ages , New York, USA, ACM Press. [13] G. Piatetsky-Shairo. Discovery, analysis, and resentation of strong rules. In G. Piatetsky-Shairo and W. Frawley, editors, Knowledge Discovery in Databases, ages AAAI/MIT Press, [14] J. Shaffer. Multile hyothesis testing. Annual Review of Psychology, 46: , [15] C. Silverstein, S. Brin, and R. Motwani. Beyond market baskets: Generalizing association rules to deendence rules. Data Mining and Knowledge Discovery, 2(1):39 68, [16] R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. SIGMOD Record, 25(2):1 12, [17] P. Tan and V. Kumar. Interestingness measures for association atterns: A ersective. Technical Reort TR00-036, Deartment of Comuter Science, University of Minnesota, [18] G. Webb. Discovering significant rules. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 06), ages , New York, USA, ACM Press. [19] G. I. Webb. Discovering significant atterns. Machine Learning, 68(1):1 33, [20] X. Wu, C. Zhang, and S. Zhang. Efficient mining of both ositive and negative association rules. ACM Transactions on Information Systems, 22(3): , [21] Y. Yao and N. Zhong. An analysis of quantitative measures associated with rules. In Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining (PAKDD 99), ages , London, UK, Sringer-Verlag.

9 Table 2: Comarison of StatAriori and traditional Ariori with different data sets. The fields are the following: #sets gives the total number of otentially significant sets generated, #rules gives the number of significant rules roduced, size the maximum number of attributes er rule, level the last level rocessed, and time the execution time in seconds. Most tests were intractable for the traditional Ariori, results before halting are reorted. Inut StatAriori Ariori set (n,k), K, min fr #sets #rules size level time #sets #rules size level time mushroom (8124,119) > K = 87,min fr = mushroom K = 90,min fr =???? mushroom*() K = 90, min fr = mushroom* K = 80, min fr = chess* (3196,74) K = 55, min fr == chess* > K = 56, min fr = T10I4D100K ( ,870) h K = 315, min fr = T10I4D100K* (96 218,298) > K = 310, min fr = T40I10D100K ( , 942)? ?? >8h K = 315, min fr = T40I10D100K* ( ,470) K = 310, min fr =

Guaranteeing the Accuracy of Association Rules by Statistical Significance

Guaranteeing the Accuracy of Association Rules by Statistical Significance Guaranteeing the Accuracy of Association Rules by Statistical Significance W. Hämäläinen Department of Computer Science, University of Helsinki, Finland Abstract. Association rules are a popular knowledge

More information

Efficient discovery of statistically significant association rules

Efficient discovery of statistically significant association rules fficient discovery of statistically significant association rules Wilhelmiina Hämäläinen Department of Computer Science University of Helsinki Finland whamalai@cs.helsinki.fi Matti Nykänen Department of

More information

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018 Comuter arithmetic Intensive Comutation Annalisa Massini 7/8 Intensive Comutation - 7/8 References Comuter Architecture - A Quantitative Aroach Hennessy Patterson Aendix J Intensive Comutation - 7/8 3

More information

Efficient search of association rules with Fisher s exact test

Efficient search of association rules with Fisher s exact test fficient search of association rules with Fisher s exact test W. Hämäläinen Abstract limination of spurious and redundant rules is an important problem in the association rule discovery. Statistical measure

More information

Basic Association Rules

Basic Association Rules Basic Association Rules Downloaded 11/21/17 to 37.44.199.215. Redistribution subject to SIAM license or coyright; see htt://www.siam.org/journals/ojsa.h Abstract Previous aroaches for mining association

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

GIVEN an input sequence x 0,..., x n 1 and the

GIVEN an input sequence x 0,..., x n 1 and the 1 Running Max/Min Filters using 1 + o(1) Comarisons er Samle Hao Yuan, Member, IEEE, and Mikhail J. Atallah, Fellow, IEEE Abstract A running max (or min) filter asks for the maximum or (minimum) elements

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

A Parallel Algorithm for Minimization of Finite Automata

A Parallel Algorithm for Minimization of Finite Automata A Parallel Algorithm for Minimization of Finite Automata B. Ravikumar X. Xiong Deartment of Comuter Science University of Rhode Island Kingston, RI 02881 E-mail: fravi,xiongg@cs.uri.edu Abstract In this

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar 15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Recent Developments in Multilayer Perceptron Neural Networks

Recent Developments in Multilayer Perceptron Neural Networks Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

On the Chvatál-Complexity of Knapsack Problems

On the Chvatál-Complexity of Knapsack Problems R u t c o r Research R e o r t On the Chvatál-Comlexity of Knasack Problems Gergely Kovács a Béla Vizvári b RRR 5-08, October 008 RUTCOR Rutgers Center for Oerations Research Rutgers University 640 Bartholomew

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

Parallelism and Locality in Priority Queues. A. Ranade S. Cheng E. Deprit J. Jones S. Shih. University of California. Berkeley, CA 94720

Parallelism and Locality in Priority Queues. A. Ranade S. Cheng E. Deprit J. Jones S. Shih. University of California. Berkeley, CA 94720 Parallelism and Locality in Priority Queues A. Ranade S. Cheng E. Derit J. Jones S. Shih Comuter Science Division University of California Berkeley, CA 94720 Abstract We exlore two ways of incororating

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit Chater 5 Statistical Inference 69 CHAPTER 5 STATISTICAL INFERENCE.0 Hyothesis Testing.0 Decision Errors 3.0 How a Hyothesis is Tested 4.0 Test for Goodness of Fit 5.0 Inferences about Two Means It ain't

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

Cryptanalysis of Pseudorandom Generators

Cryptanalysis of Pseudorandom Generators CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we

More information

Introduction to Probability and Statistics

Introduction to Probability and Statistics Introduction to Probability and Statistics Chater 8 Ammar M. Sarhan, asarhan@mathstat.dal.ca Deartment of Mathematics and Statistics, Dalhousie University Fall Semester 28 Chater 8 Tests of Hyotheses Based

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling Scaling Multile Point Statistics or Non-Stationary Geostatistical Modeling Julián M. Ortiz, Steven Lyster and Clayton V. Deutsch Centre or Comutational Geostatistics Deartment o Civil & Environmental Engineering

More information

Data Mining Association Rules: Advanced Concepts and Algorithms. Lecture Notes for Chapter 7. Introduction to Data Mining

Data Mining Association Rules: Advanced Concepts and Algorithms. Lecture Notes for Chapter 7. Introduction to Data Mining Data Mining Association Rules: Advanced Concets and Algorithms Lecture Notes for Chater 7 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/24 1

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Brownian Motion and Random Prime Factorization

Brownian Motion and Random Prime Factorization Brownian Motion and Random Prime Factorization Kendrick Tang June 4, 202 Contents Introduction 2 2 Brownian Motion 2 2. Develoing Brownian Motion.................... 2 2.. Measure Saces and Borel Sigma-Algebras.........

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION O P E R A T I O N S R E S E A R C H A N D D E C I S I O N S No. 27 DOI:.5277/ord73 Nasrullah KHAN Muhammad ASLAM 2 Kyung-Jun KIM 3 Chi-Hyuck JUN 4 A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

A New Perspective on Learning Linear Separators with Large L q L p Margins

A New Perspective on Learning Linear Separators with Large L q L p Margins A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical

More information

Monopolist s mark-up and the elasticity of substitution

Monopolist s mark-up and the elasticity of substitution Croatian Oerational Research Review 377 CRORR 8(7), 377 39 Monoolist s mark-u and the elasticity of substitution Ilko Vrankić, Mira Kran, and Tomislav Herceg Deartment of Economic Theory, Faculty of Economics

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

Multi-Operation Multi-Machine Scheduling

Multi-Operation Multi-Machine Scheduling Multi-Oeration Multi-Machine Scheduling Weizhen Mao he College of William and Mary, Williamsburg VA 3185, USA Abstract. In the multi-oeration scheduling that arises in industrial engineering, each job

More information

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Guimei Liu 1,2 Jinyan Li 1 Limsoon Wong 2 Wynne Hsu 2 1 Institute for Infocomm Research, Singapore 2 School

More information

DIFFERENTIAL evolution (DE) [3] has become a popular

DIFFERENTIAL evolution (DE) [3] has become a popular Self-adative Differential Evolution with Neighborhood Search Zhenyu Yang, Ke Tang and Xin Yao Abstract In this aer we investigate several self-adative mechanisms to imrove our revious work on [], which

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations PINAR KORKMAZ, BILGE E. S. AKGUL and KRISHNA V. PALEM Georgia Institute of

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Fault Tolerant Quantum Computing Robert Rogers, Thomas Sylwester, Abe Pauls

Fault Tolerant Quantum Computing Robert Rogers, Thomas Sylwester, Abe Pauls CIS 410/510, Introduction to Quantum Information Theory Due: June 8th, 2016 Sring 2016, University of Oregon Date: June 7, 2016 Fault Tolerant Quantum Comuting Robert Rogers, Thomas Sylwester, Abe Pauls

More information

Hidden Predictors: A Factor Analysis Primer

Hidden Predictors: A Factor Analysis Primer Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor

More information

8 STOCHASTIC PROCESSES

8 STOCHASTIC PROCESSES 8 STOCHASTIC PROCESSES The word stochastic is derived from the Greek στoχαστικoς, meaning to aim at a target. Stochastic rocesses involve state which changes in a random way. A Markov rocess is a articular

More information

Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel

Analysis of execution time for parallel algorithm to dertmine if it is worth the effort to code and debug in parallel Performance Analysis Introduction Analysis of execution time for arallel algorithm to dertmine if it is worth the effort to code and debug in arallel Understanding barriers to high erformance and redict

More information

1-way quantum finite automata: strengths, weaknesses and generalizations

1-way quantum finite automata: strengths, weaknesses and generalizations 1-way quantum finite automata: strengths, weaknesses and generalizations arxiv:quant-h/9802062v3 30 Se 1998 Andris Ambainis UC Berkeley Abstract Rūsiņš Freivalds University of Latvia We study 1-way quantum

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations Preconditioning techniques for Newton s method for the incomressible Navier Stokes equations H. C. ELMAN 1, D. LOGHIN 2 and A. J. WATHEN 3 1 Deartment of Comuter Science, University of Maryland, College

More information

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS

ON THE LEAST SIGNIFICANT p ADIC DIGITS OF CERTAIN LUCAS NUMBERS #A13 INTEGERS 14 (014) ON THE LEAST SIGNIFICANT ADIC DIGITS OF CERTAIN LUCAS NUMBERS Tamás Lengyel Deartment of Mathematics, Occidental College, Los Angeles, California lengyel@oxy.edu Received: 6/13/13,

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

FE FORMULATIONS FOR PLASTICITY

FE FORMULATIONS FOR PLASTICITY G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search

A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search A New Method of DDB Logical Structure Synthesis Using Distributed Tabu Search Eduard Babkin and Margarita Karunina 2, National Research University Higher School of Economics Det of nformation Systems and

More information

Metrics Performance Evaluation: Application to Face Recognition

Metrics Performance Evaluation: Application to Face Recognition Metrics Performance Evaluation: Alication to Face Recognition Naser Zaeri, Abeer AlSadeq, and Abdallah Cherri Electrical Engineering Det., Kuwait University, P.O. Box 5969, Safat 6, Kuwait {zaery, abeer,

More information

Privacy via Pseudorandom Sketches

Privacy via Pseudorandom Sketches Privacy via Pseudorandom Sketches Nina Mishra Mark Sandler December 18, 2005 Abstract Imagine a collection of individuals who each ossess rivate data that they do not wish to share with a third arty. This

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

Adaptive estimation with change detection for streaming data

Adaptive estimation with change detection for streaming data Adative estimation with change detection for streaming data A thesis resented for the degree of Doctor of Philosohy of the University of London and the Diloma of Imerial College by Dean Adam Bodenham Deartment

More information

SIMULATED ANNEALING AND JOINT MANUFACTURING BATCH-SIZING. Ruhul SARKER. Xin YAO

SIMULATED ANNEALING AND JOINT MANUFACTURING BATCH-SIZING. Ruhul SARKER. Xin YAO Yugoslav Journal of Oerations Research 13 (003), Number, 45-59 SIMULATED ANNEALING AND JOINT MANUFACTURING BATCH-SIZING Ruhul SARKER School of Comuter Science, The University of New South Wales, ADFA,

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Diverse Routing in Networks with Probabilistic Failures

Diverse Routing in Networks with Probabilistic Failures Diverse Routing in Networks with Probabilistic Failures Hyang-Won Lee, Member, IEEE, Eytan Modiano, Senior Member, IEEE, Kayi Lee, Member, IEEE Abstract We develo diverse routing schemes for dealing with

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Finite-State Verification or Model Checking. Finite State Verification (FSV) or Model Checking

Finite-State Verification or Model Checking. Finite State Verification (FSV) or Model Checking Finite-State Verification or Model Checking Finite State Verification (FSV) or Model Checking Holds the romise of roviding a cost effective way of verifying imortant roerties about a system Not all faults

More information

Algorithms for Air Traffic Flow Management under Stochastic Environments

Algorithms for Air Traffic Flow Management under Stochastic Environments Algorithms for Air Traffic Flow Management under Stochastic Environments Arnab Nilim and Laurent El Ghaoui Abstract A major ortion of the delay in the Air Traffic Management Systems (ATMS) in US arises

More information

GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION E. G. MANSOORI, M. J. ZOLGHADRI, S. D. KATEBI, H. MOHABATKAR, R. BOOSTANI AND M. H.

GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION E. G. MANSOORI, M. J. ZOLGHADRI, S. D. KATEBI, H. MOHABATKAR, R. BOOSTANI AND M. H. Iranian Journal of Fuzzy Systems Vol. 5, No. 2, (2008). 21-33 GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION E. G. MANSOORI, M. J. ZOLGHADRI, S. D. KATEBI, H. MOHABATKAR, R. BOOSTANI AND M. H. SADREDDINI

More information

Plotting the Wilson distribution

Plotting the Wilson distribution , Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion

More information

Research of PMU Optimal Placement in Power Systems

Research of PMU Optimal Placement in Power Systems Proceedings of the 5th WSEAS/IASME Int. Conf. on SYSTEMS THEORY and SCIENTIFIC COMPUTATION, Malta, Setember 15-17, 2005 (38-43) Research of PMU Otimal Placement in Power Systems TIAN-TIAN CAI, QIAN AI

More information

KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS

KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS 4 th International Conference on Earthquake Geotechnical Engineering June 2-28, 27 KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS Misko CUBRINOVSKI 1, Hayden BOWEN 1 ABSTRACT Two methods for analysis

More information

A Network-Flow Based Cell Sizing Algorithm

A Network-Flow Based Cell Sizing Algorithm A Networ-Flow Based Cell Sizing Algorithm Huan Ren and Shantanu Dutt Det. of ECE, University of Illinois-Chicago Abstract We roose a networ flow based algorithm for the area-constrained timing-driven discrete

More information

SIMULATION OF DIFFUSION PROCESSES IN LABYRINTHIC DOMAINS BY USING CELLULAR AUTOMATA

SIMULATION OF DIFFUSION PROCESSES IN LABYRINTHIC DOMAINS BY USING CELLULAR AUTOMATA SIMULATION OF DIFFUSION PROCESSES IN LABYRINTHIC DOMAINS BY USING CELLULAR AUTOMATA Udo Buschmann and Thorsten Rankel and Wolfgang Wiechert Deartment of Simulation University of Siegen Paul-Bonatz-Str.

More information

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation 6.3 Modeling and Estimation of Full-Chi Leaage Current Considering Within-Die Correlation Khaled R. eloue, Navid Azizi, Farid N. Najm Deartment of ECE, University of Toronto,Toronto, Ontario, Canada {haled,nazizi,najm}@eecg.utoronto.ca

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

Evaluating Process Capability Indices for some Quality Characteristics of a Manufacturing Process

Evaluating Process Capability Indices for some Quality Characteristics of a Manufacturing Process Journal of Statistical and Econometric Methods, vol., no.3, 013, 105-114 ISSN: 051-5057 (rint version), 051-5065(online) Scienress Ltd, 013 Evaluating Process aability Indices for some Quality haracteristics

More information

An Analysis of TCP over Random Access Satellite Links

An Analysis of TCP over Random Access Satellite Links An Analysis of over Random Access Satellite Links Chunmei Liu and Eytan Modiano Massachusetts Institute of Technology Cambridge, MA 0239 Email: mayliu, modiano@mit.edu Abstract This aer analyzes the erformance

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

Automatic Generation and Integration of Equations of Motion for Linked Mechanical Systems

Automatic Generation and Integration of Equations of Motion for Linked Mechanical Systems Automatic Generation and Integration of Equations of Motion for Linked Mechanical Systems D. Todd Griffith a, John L. Junkins a, and James D. Turner b a Deartment of Aerosace Engineering, Texas A&M University,

More information

Fig. 4. Example of Predicted Workers. Fig. 3. A Framework for Tackling the MQA Problem.

Fig. 4. Example of Predicted Workers. Fig. 3. A Framework for Tackling the MQA Problem. 217 IEEE 33rd International Conference on Data Engineering Prediction-Based Task Assignment in Satial Crowdsourcing Peng Cheng #, Xiang Lian, Lei Chen #, Cyrus Shahabi # Hong Kong University of Science

More information

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS #A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt

More information

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms New Schedulability Test Conditions for Non-reemtive Scheduling on Multirocessor Platforms Technical Reort May 2008 Nan Guan 1, Wang Yi 2, Zonghua Gu 3 and Ge Yu 1 1 Northeastern University, Shenyang, China

More information

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment Neutrosohic Sets and Systems Vol 14 016 93 University of New Mexico Otimal Design of Truss Structures Using a Neutrosohic Number Otimization Model under an Indeterminate Environment Wenzhong Jiang & Jun

More information