Guaranteeing the Accuracy of Association Rules by Statistical Significance

Size: px
Start display at page:

Download "Guaranteeing the Accuracy of Association Rules by Statistical Significance"

Transcription

1 Guaranteeing the Accuracy of Association Rules by Statistical Significance W. Hämäläinen Department of Computer Science, University of Helsinki, Finland Abstract. Association rules are a popular knowledge discovery method, even if the results can be highly misleading. The traditional frequencyconfidence framework can produce mostly spurious rules, while skipping the most significant rules. Several goodness measures have been proposed to solve this problem, but none of them is optimal. In this paper, we analyze the accruracy of the most popular and promising measure functions from the statistical pount of view. For each measure function, we analyze theoretically whether it can produce type 1 or type 2 errors. Finally, we report experiments which show that statistically significant rules can also produce more accurate predictions. 1 Introduction Traditional association rules [1] are rules of form if event X occurs, then also event A is likely to occur. The commonness of the rule is measured by frequency P (X, A) and the strength of the rule by confidence P (A X). For computational purposes it is required that both frequency and confidence should exceed some user-defined thresholds. The actual interestingness of the rule is usually decided afterwards, by some interestingness measure. Often the associations are interpreted as dependencies between certain attribute value combinations. However, traditional association rules do not necessarily capture statistical dependencies, but they can associate absolutely independent events while ignoring strong dependencies. Even if the rules express dependencies (indicated by lift, P (A X) P (A) ), they are not necessarily significant, but due to chance. In statistics, finding such spurious rules is called type 1 error. Webb [9] has tested the amount of spurious rules among frequent and strong rules, and found that in the worst case all discovered rules can be spurious. In practice, this means that the future data does not exhibit the discovered dependencies and the conclusions based on them are erroneous. On the other hand, the traditional association rules can reject the most significant rules. In statistics, this is called Type 2 error. The impact of type 2 error is harder to evaluate empirically, because the common search algorithms are based on frequency-based pruning. Several interestingness measures (see e.g. [4] for an overview) have been proposed to solve this problem. Some of them have their origins in statistics, but none of them tests the statistical significance of the dependency expressed in the rule.

2 2 W. Hämäläinen In this paper, we analyze, how well the most common and statistically promising measures capture the statistical significance. For each measure, we analyze theoretically, whether it can produce type 1 or type 2 errors. In addition, we report experiments which test the effect of statistical significance to prediction accuracy. The organization of the paper is the following: The basic definitions are given in Section 2. The theoretical analysis of common measures is given in Section 3. Experiments are reported in Section 4 and the final conclusions are drawn in Section 5. 2 Basic Definitions In the following we give basic definitions of the association rule, statistical dependence, statistical significance, and redundancy. The notations are introduced in Table 1. Table 1. Basic notations. Notation Meaning A, B, C,... binary attributes a, b, c,... {0, 1} attribute values R = {A 1,..., A k } set of all attributes R = k number of attributes in R Dom(R) = {0, 1} k attribute space X, Y, Z R attribute sets Dom(X) = {0, 1} l Dom(R) domain of X, X = l (X = x) = {(A 1 = a 1),..., event, X = l (A l = a l )} t = {A 1 = t(a 1),..., A k = t(a k )} row r = {t 1,..., t n t i Dom(R)} relation (data set) r = n size of relation r σ X=x (r) = {t r t[x] = x} set of rows where X = x m(x = x) = σ X=x (r) absolute frequency of X = x P (X = x) = m(x=x) n relative frequency of X = x 2.1 Association Rules Traditionally, association rules are defined in the frequency-confidence framework as follows: Definition 1 (Association rule). Let R be a set of binary attributes and r a relation according to R. Let X R, A R\X, x Dom(X), and a Dom(A).

3 Guaranteeing the Accuracy of Association Rules by Statistical Significance 3 The confidence of rule (X = x) (A = a) is cf(x = x A = a) = and the frequency 1 of the rule is P (X = x, A = a) P (X = x) fr(x = x A = a) = P (X = x, A = a). = P (A = a X = x) Given user-defined thresholds min cf, min fr [0, 1], rule (X = x) (A = a) is an association rule in r, if (i) cf(x = x A = a) min cf, and (ii) fr(x = x A = a) min fr. The first condition requires that an association rule should be strong enough and the second condition requires that it should be common enough. In this paper, we call rules association rules, even if no thresholds min fr and min cf are specified. Usually, it is assumed that the rule contains only positive attribute values (A i = 1). Now, the rule can be expressed simply by listing the attributes, e.g. A 1, A 3, A 5 A Statistical Dependence Statistical dependence is classically defined through statistical independence (e.g. [5]): Definition 2 (Independence and dependence). Let X R and AinR \ X be sets of binary attributes. Events X = x and A = a, x Dom(X), a Dom(A), are mutually independent, if P (X = x, A = a) = P (X = x)p (A = a). If the events are not independent, they are dependent. The strength of the statistical dependence between (X = x) and (A = a) can be measured by lift or interest: γ(x = x, A = a) = P (X = x, A = a) P (X = x)p (A = a). (1) 2.3 Statistical Significance The idea of statistical significance tests is to estimate the probability of the observed or a rarer phenomenon, under some null hypothesis. When the objective is to test the significance of the dependency between X = x and A = a, the 1 Often either frequency P (X, Y ) or absolute frequency m(x, Y ) is called support. In this paper we use statistical term frequency, which is unambiguous.

4 4 W. Hämäläinen null hypothesis is the independence assumption: P (X = x, A = a) = P (X = x)p (A = a). If the estimated probability p is very small, we can reject the independence assumption, and assume that the observed dependency is not due to chance, but significant at level p. The smaller p is, the more significant the observation is. The significance of the observed frequency m(x, A) can be estimated exactly by the binomial distribution. Each row in relation r, r = n, corresponds to an independent Bernoulli trial, whose outcome is either 1 (XA occurs) or 0 (XA does not occur). All rows are mutually independent. Assuming the independence of attributes X and A, combination XA occurs on a row with probability P (X)P (A). Now the number of rows containing X, A is a binomial random variable M with parameters P (X)P (A) and n. The mean of M is µ M = np (X)P (A) and its variance is σm 2 = np (X)P (A)(1 P (X)P (A)). Probability P (M m(x, A)) gives the significance p: p = m(x) i=m(x,a) ( n i ) (P (X)P (A)) i (1 P (X)P (A)) n i. (2) This can be approximated by the standard normal distribution where Φ(z(X, A)) = 1 distribution function and is standardized m(x, A): z(x,a) 2π z(x, A) = p 1 Φ(z), e u2 /2 du is the standard normal cumulative z(x, A) = m(x, A) µ M σ M m(x, A) np (X)P (A) np (X)P (A)(1 P (X)P (A)). (3) The cumulative distribution function Φ(z) is quite difficult to calculate, but for the association rule mining it is enough to know z(x, A). Since Φ(z) is monotonically increasing, probability p is monotonically decreasing in the terms of z(x, A). Thus, we can use the z-score as a measure function for ranking association rules according to their significance. On the other hand, we know that according to Chebyshev s inequality (the proof is given e.g. in [6, pp ]): P ( K < M µ M σ M ) < K 1 1 K 2. I.e. P (z K) < 1 2K 2. For example, requirement K 2 corresponds to statistical significance level p = 0.05.

5 Guaranteeing the Accuracy of Association Rules by Statistical Significance 5 3 Analysis of Common Measures In the following, we analyze common interestingness measures which could test the statistical sgnificance. The measures are χ 2, Pearson correlation coefficient φ, and J-measure. 3.1 χ 2 measure Definition The χ 2 -independence test may be the best-known statistical test for detecting dependencies between attributes. The idea of the χ 2 test is to compare the observed frequencies O(m(X)) to the expected frequencies E(m(X)) by χ 2 (X) = x Dom(X) O((m(X = x)) E(m(X = x))) 2. E(m(X = x)) When the test variable is approximately normally distributed, the test measure follows the χ 2 -distribution. Usually this assumption holds for large n. As a rule of thumb, it is suggested (e.g. [6, 630]) that all of the expected frequencies should be at least 5. When we test a dependency between two attribute sets, X and Y, the test metric is χ 2 (X, Y ) = 1 i=0 j=0 1 (m(x = i, Y = j) np (X = i)p (Y = j)) 2 np (X = i)p (Y = j) n(p (X = 1, Y = 1) P (X = 1)P (Y = 1)) 2. P (X = 1)P (X = 0)P (Y = 1)P (Y = 0) = If χ 2 (X, Y ) is less than the critical χ 2 value at level p and 1 degree of freedom, X and Y are statistically independent with probability 1 p. Otherwise, the dependency is significant at level p. Applying χ 2 in the association rule discovery The simplest way to use χ 2 - measure in the association rule discovery is to generate rules from frequent sets based on their χ 2 -values. For each frequent set Y, all rules of form Y \ C C with a sufficient χ 2 -value are selected (e.g. [3]). This approach does not find all rules which are significant in the χ 2 sense. First, the rules are preselected according to their frequency. If the minimum frequency is set too high, some significant rules are missed. Second, it is possible that a weak rule (P (C Y ) 0.5) is selected, because its companion rules Y C, Y C, and/or Y C are significant. The rule confidence can be used to check that P (C Y ) > P ( C Y ), but it does not guarantee that Y C is significant.

6 6 W. Hämäläinen The first problem would be solved, if we could search the rules directly with the χ 2 -measure. Unfortunately, this is not feasible, since χ 2 -measure is not monotonic. For any rule X C and its generalization Y C, Y X, it is unknown, whether χ 2 (X C) > χ 2 (Y C) or χ 2 (X C) χ 2 (Y C). Morishita et al. (e.g. [7]) have utilized the convexity of the χ 2 function, when the consequent C is fixed. The idea is to prune a branch containing rule Y C and all its specialization rules X C, Y X, if max{χ 2 (X C)} < min χ 2 for the given cutoff value min chi 2. Because χ 2 is convex, max{χ 2 (X C)} < min χ 2 can be bounded by equation { } np (Y, C)P ( C) χ 2 (X C) max (1 P (Y, C))P (C), np (Y, C)P (C). (1 P (Y, C))P ( C) Now the frequency-based pruning is not necessary and it is possible to find all rules with a sufficient χ 2 -value or the best rules in the χ 2 sense. This approach works correctly, when the goal is to find full dependencies. Analysis The main problem of the χ 2 -independence test is that it designed to measure dependencies between attributes. That is why it can fail to detect significant partial dependencies. On the other hand, χ 2 -test can yield a high value, thus indicating a significant dependency, even if the tested events were nearly independent. Negative correlations can be pruned by an extra test, P (X, Y ) > P (X)P (Y ), but it does not guarantee that the high χ 2 -value is due to X Y. Let us analyze the χ 2 -value, when P (X, Y ) = P (X)P (Y ) + d. Now χ 2 can be defined in the terms of d: χ 2 (X, Y ) = nd 2 P (X)P ( X)P (Y )P ( Y ). χ 2 is high, when n and d are large and P (X)P ( X)P (Y )P ( Y ) is small. The minimum value (16nd 2 ) is achieved, when P (X) = P (Y ) = 0.5, and the maximum, when P (X) and P (Y ) approach either 0 or 1. For example, if P (X) = P (Y ) = 0.01, χ 2 = 10000nd 2, and even minimal d suffices. E.g. if n = 1000, d for level 0.01, and P (X, Y ) = The problem is that if P (X) and/or P (Y ) are large, the relative difference d P (X)P (Y ) is small and the partial dependency between X and Y is not significant. Still the χ 2 d -value can be large, because P ( X)P ( Y ) is large. Thus, the high χ2 - value is due to partial dependency X Y, and X Y is a false discovery (type 1 error). Example 1. Let P (X) = P (Y ) = 1 ɛ for arbitrary small ɛ > 0. Let d be maximal i.e. d = P (X)(1 P (Y )) = (1 P (X))P (Y ) = ɛ(1 ɛ) < ɛ. (The d relative difference is still very small, P (X)P (Y ) = ɛ 1 ɛ.) Now χ2 (X, Y ) is very large, the same as the data size, n:

7 Guaranteeing the Accuracy of Association Rules by Statistical Significance 7 χ 2 = nd 2 P (X)P (Y )(1 P (X))(1 P (Y )) = nɛ2 (1 ɛ) 2 ɛ 2 (1 ɛ) 2 = n. Still, rule X Y is insignificant, since n(1 ɛ)ɛ nɛ z(x Y ) = (1 ɛ) 1 (1 ɛ) = 0, 2 2 ɛ when ɛ 0. The high χ 2 -value is due to partial dependency X Y, which has a high z-score: n(1 ɛ) z( X Y ) = n, 1 + ɛ when ɛ 0. Rules X Y and X Y are meaningless, with nɛ(1 ɛ) nɛ(1 ɛ) t = < = nɛ 0. 1 ɛ + ɛ 2 1 ɛ χ 2 -measure is less likely to cause type 2 errors. If an association rule is just sufficiently significant, it passes also the χ 2 -test. The reason is that the χ 2 -value of rule X Y increases quadratically in the terms of its z-score: Theorem 1. If z(x Y ) = K, then χ 2 (X, Y ) K 2. Proof. Let x = P (X) and y = P (Y ). If z(x Y ) = K, then nd 2 = K 2 xy(1 xy) and χ 2 (X, Y ) = since (1 x)(1 y) 1 xy for all x, y [0, 1]. nd 2 xy(1 x)(1 y) = K2 (1 xy) (1 x)(1 y) K2, 3.2 Correlation coefficient Pearson correlation coefficient is sometimes used to rank association rules. Traditionally, Pearson correlation coefficient is used to measure linear dependencies between numeric attributes. When the Pearson correlation coefficient φ is calculated for the binary attributes, it reduces to the square root of χ 2 /n: φ(x, Y ) = P (X, Y ) P (X)P (Y ) P (X)P ( X)P (Y )P ( Y ) = χ2 (X, Y ). n Like χ 2 (X, Y ), φ(x, Y ) = 0, when P (X, Y ) = P (X)P (Y ), and the variables are mutually independent. Otherwise, the sign of φ tells whether the correlation is positive (φ > 0) or negative (φ < 0). We will first show that a rule can be insignificant, even if correlation coefficient φ(x, Y ) = 1. This means that φ-measure can produce false discoveries (type 1 error).

8 8 W. Hämäläinen Observation 1 When P (X) and P (Y ) approach 1, it is possible that φ(x, Y ) = 1, even if z(x, Y ) < K for any K > 0. Proof. Let P (X) = P (Y ) = 1 ɛ for arbitrary small ɛ > 0. Let d be maximal i.e. d = P (X)(1 P (Y )) = (1 P (X))P (Y ) = ɛ(1 ɛ). Now the correlation coefficient is 1: φ(x, Y ) = d (1 ɛ)ɛ = P (X)(1 P (X))P (Y )(1 P (X)) (1 ɛ)ɛ = 1. Still, for any K > 0, z(x, Y ) < K: t(x, Y ) = n(1 ɛ)ɛ (1 ɛ) 1 (1 ɛ) 2 = nɛ 2 ɛ < K ɛ < 2K2 n+k 2. On the other hand, it is possible that the φ-measure rejects significant rules (type 2 error), especially when n is large. The following lemma shows that this can happen, when P (X) and P (Y ) are relatively small. The smaller they are, the smaller n suffices. That is why the correlation coefficient should be totally avoided as a interest measure for association rules. Observation 2 It is possible that φ(x, Y ) 0, when n, even if rule X Y is significant. Proof. Let z(x, Y ) = nd P (X)P (Y )(1 P (X)P (Y )) = K. Then d = K P (X)P (Y )(1 P (X)P (Y )) n and φ(x, Y ) = K P (X)P (Y )(1 P (X)P (Y )) K 1 P (X)P (Y ) =. np (X)P (Y )(1 P (X))(1 P (Y )) n(1 P (X))(1 P (Y )) When P (X) p and P (Y ) p for some p < 1, φ(x, Y ) = K 1+p 0, when n(1 p) n. 3.3 J-measure J-measure [8] is often used to assess the interestingness of association rules, although it is originally designed for lerning classification rules. J-measure is an information-theoretic measure derived from the mutual information. For decision rules X C, J-measure is defined as J(C X) = P (X, C) log P (C X) P (C) + P (X, C) log P ( C X) P ( C) [0, [. The larger J is, the more interesting the rule should be. On the other hand, J(X, C) = 0, when the variables X and C are mutually independent (assuming that P (X) > 0).

9 Guaranteeing the Accuracy of Association Rules by Statistical Significance 9 J-measure contains two terms from the mutual information, M I, between variables X and C: MI(X, C) = J(C X) + J(C X). Thus, it measures the information gain in two rules, X C and X C. Rule X C has a high J-value, if its complement rule X C has high confidence (type 1 error). In 1 the extreme case, when P (C X) = 0, J(C X) = P (X) log P ( C). Type 2 error (rejecting true discoveries) can also occur with a suitable distribution. One reason is that J-measure omits n, which is crucial for the statistical significance. It can be easily shown that J 0, when P (X, Y ) 0 or P (Y ) 1. In the latter case, rule X C cannot be significant, but it is possible that a rule is significant, even if its frequency is relatively small: Example 2. Let P (C X) = 0.75 and P (C) = 0.5. Now J(C X) = P (X)(0.75 log ) 0.94P (X) and z(x C) = np (X) 2 2 P (X). For example, when P (X) = 0.25, t = n 2, which high, when n is high. Still J(C X) 0.23, 7 which indicates that the rule is uninteresting. 3.4 Summary In Table 2, we give a summary of the analyzed measures. For each measure, we report, whether it can produce type 1 or type 2 error and all rules which affect the measure in addition to the actually measured rule. Table 2. Summary of measures M for assessing association rules. The occurrence of type 1 (accepting spurious rules) and type 2 (rejecting true discoveries) errors is indicated by + (occurs) and (does not occur). In addition, all rules which contribute to M(X Y ) are listed. For all measures except fr&cf, the antecedent and consequent of each rule can be switched. M Type 1 Type 2 Rules error error fr&cf + + X Y fr&γ X Y χ 2 + X Y, X Y, X Y, X Y φ + + X Y, X Y J + + X Y, X Y 4 Experimental Results The goal of the experiments was to test how well the z-score works in practice. According to the theoretical analysis, z-score overperforms χ 2, J-measure, and

10 10 W. Hämäläinen correlation factor, when the goal is to find statistically significant association rules. This means that the discovered dependencies should hold also in the unseen future data. If the rule is strong, the prediction error should also be low. An upper-bound for the mean error on unseen data is err (1 p α )P ( A X) + p α ( A), where p α is the significance level (probability that X and A are actually independent). Confidence P (A X) has a strong effect on the mean error, but when P (A X) = 1, the upper-bound for the error reduces to p α. To test the validity of this argument, only strong rules (min cf = 0.90) were searched from data sets. Each data set was divided into a training set and a test set. The rules were learnt from the training set and the prediction accuracy of 500 best rules was tested on the test set. The data sets are listed in Table 3. All of them were achieved from FIMI repository for frequent item set mining ( In the first experiment, z-score was compared to χ 2, J-measure, frequency, and certainty factor. Correlation coefficient φ was skipped, because it produces an identical ordering with χ 2. On the other hand, we included certainty factor, which has produced good results in other experiments (e.g. [2]). First, the association rules were searched in a normal frequency-based manner and the measure functions were used for pruning in the post-processing phase. All redundant rules were pruned, i.e. rules X A for which there is a more common rule, X A, X {A } X {A}, such that M(X A ) M(X A), for the given measure function M. The results are shown in Figure 1. The test points (numbers of best rules to be tested) are selected such that the rules with the same measure function value are tested together. In this way, the order of rules with the same rank does not affect the results. The z-score and χ 2 had quite similar performance: they performed best on mushroom, second best on T10I4D100K and T40I10D100K, and poorest on chess. Certainty factors performed best on all other sets except mushroom. Mushroom was the only data set, where the rules had relatively low certainty factors; in all other sets most rules had certainty factor 1.0. Redundancy reduction had a large (positive) impact on the prediction accuracy, especially with certainty factors. Table 3. Data sets and minimum frequencies used in the first experiment. data set n k min fr mushroom chess T10I4D100K T40I10D100K

11 Guaranteeing the Accuracy of Association Rules by Statistical Significance chi2 J z certainty fr chi2 J z certainty fr chi2 J z certainty fr chi2 J z certainty fr Fig. 1. The mean errors of frequent and strong association rules, using different measure functions. The number of best rules is given on the x-axis and the mean error on the y-axis. The data sets are mushroom (top left), chess (top right), T10I4D100K (bottom left) and T40I10D100K (bottom right). In the second experiment, association rules were searched directly with the z-score. The only requirement was that the rule holds on at least 5 rows. Unfortunately, the resulting min fr values were so low that the rules could not be searched with other measure functions, lacking efficient algorithms. For computational efficiency, chess and T40I10D100K were pruned by selecting every second attribute from all rows. The resulting data sets are notated chess* and T40I10D100K*. The results are given in Table 4 Now the results were quite different: more significant, strong rules were found than in the first experiment. The prediction error was also extremely low, zero in most cases. Nearly all of the rules were different than in he first experiment, i.e. the z-score was able to produce extremely accurate rules, when no min fr was specified.. 5 Conclusions The theoretical results show that all the evaluated measures, χ 2, J-measure, and correlation coefficient can produce either type 1 or type 2 errors. None of them

12 12 W. Hämäläinen Table 4. The most significant rules by z-scores. min fr and min cf give the minimum frequency and confidence in the discovered rule set. set n k z-score #rules mean error min fr min cf mushroom chess* T10I4D100K T40I10D100K* tests exactly the statistical significance of association rules. The only way to discover statistically significant rules is to use the z-score as a measure function. The empirical results suggest that the statistical significance can also guarantee the prediction accuracy, when the minimum frequency is not fixed. Since statistically significant association rules can be searched efficiently, they could be used to search also significant attribute dependencies in the χ 2 sense. References 1. R. Agrawal, T. Imielinski, and A.N. Swami. Mining association rules between sets of items in large databases. In P. Buneman and S. Jajodia, editors, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages , Washington, D.C., F. Berzal, I. Blanco, D. Sánchez, and M. Amparo Vila Miranda. A new framework to assess association rules. In Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis (IDA 01), pages , London, UK, Springer-Verlag. 3. X. Dong, F. Sun, X. Han, and R. Hou. Study of positive and negative association rules based on multi-confidence and chi-squared test. In X. Li, O.R. Zaïane, and Z. Li, editors, Proceedings of the Second International Conference on Advanced Data Mining and Applications (ADMA), volume 4093 of Lecture Notes in Computer Science, pages , Berlin / Heidelberg, Springer-Verlag. 4. L. Geng and H. J. Hamilton. Interestingness measures for data mining: A survey. ACM Computing Surveys, 38(3):9, R. Meo. Theory of dependence values. ACM Transactions on Database Systems, 25(3): , J.S. Milton and J.C. Arnold. Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences. McGraw-Hill, New York, 4th edition, S. Morishita and J. Sese. Transversing itemset lattices with statistical metric pruning. In Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS 00), pages , New York, USA, ACM Press. 8. P. Smyth and R.M. Goodman. An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering, 4(4): , G. I. Webb. Discovering significant patterns. Machine Learning, 68(1):1 33, 2007.

Efficient search of association rules with Fisher s exact test

Efficient search of association rules with Fisher s exact test fficient search of association rules with Fisher s exact test W. Hämäläinen Abstract limination of spurious and redundant rules is an important problem in the association rule discovery. Statistical measure

More information

Efficient discovery of statistically significant association rules

Efficient discovery of statistically significant association rules fficient discovery of statistically significant association rules Wilhelmiina Hämäläinen Department of Computer Science University of Helsinki Finland whamalai@cs.helsinki.fi Matti Nykänen Department of

More information

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies

Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Juliano Brito da Justa Neves 1 Marina Teresa Pires Vieira {juliano,marina}@dc.ufscar.br Computer Science Department

More information

Mining Positive and Negative Fuzzy Association Rules

Mining Positive and Negative Fuzzy Association Rules Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing

More information

Chapter 2 Quality Measures in Pattern Mining

Chapter 2 Quality Measures in Pattern Mining Chapter 2 Quality Measures in Pattern Mining Abstract In this chapter different quality measures to evaluate the interest of the patterns discovered in the mining process are described. Patterns represent

More information

Mining Interesting Infrequent and Frequent Itemsets Based on Minimum Correlation Strength

Mining Interesting Infrequent and Frequent Itemsets Based on Minimum Correlation Strength Mining Interesting Infrequent and Frequent Itemsets Based on Minimum Correlation Strength Xiangjun Dong School of Information, Shandong Polytechnic University Jinan 250353, China dongxiangjun@gmail.com

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Statistically Significant Dependencies for Spatial Co-location Pattern Mining and Classification Association Rule Discovery.

Statistically Significant Dependencies for Spatial Co-location Pattern Mining and Classification Association Rule Discovery. Statistically Significant Depencies for Spatial Co-location Pattern Mining and Classification Association Rule Discovery by Jundong Li A thesis submitted in partial fulfillment of the requirements for

More information

Removing trivial associations in association rule discovery

Removing trivial associations in association rule discovery Removing trivial associations in association rule discovery Geoffrey I. Webb and Songmao Zhang School of Computing and Mathematics, Deakin University Geelong, Victoria 3217, Australia Abstract Association

More information

CPDA Based Fuzzy Association Rules for Learning Achievement Mining

CPDA Based Fuzzy Association Rules for Learning Achievement Mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore CPDA Based Fuzzy Association Rules for Learning Achievement Mining Jr-Shian Chen 1, Hung-Lieh

More information

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH M. De Cock C. Cornelis E. E. Kerre Dept. of Applied Mathematics and Computer Science Ghent University, Krijgslaan 281 (S9), B-9000 Gent, Belgium phone: +32

More information

On Minimal Infrequent Itemset Mining

On Minimal Infrequent Itemset Mining On Minimal Infrequent Itemset Mining David J. Haglin and Anna M. Manning Abstract A new algorithm for minimal infrequent itemset mining is presented. Potential applications of finding infrequent itemsets

More information

Efficient search for statistically significant dependency rules in binary data

Efficient search for statistically significant dependency rules in binary data Department of Computer Science Series of Publications A Report A-2010-2 Efficient search for statistically significant dependency rules in binary data Wilhelmiina Hämäläinen To be presented, with the permission

More information

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October

More information

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise

Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Guimei Liu 1,2 Jinyan Li 1 Limsoon Wong 2 Wynne Hsu 2 1 Institute for Infocomm Research, Singapore 2 School

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Data mining, 4 cu Lecture 5:

Data mining, 4 cu Lecture 5: 582364 Data mining, 4 cu Lecture 5: Evaluation of Association Patterns Spring 2010 Lecturer: Juho Rousu Teaching assistant: Taru Itäpelto Evaluation of Association Patterns Association rule algorithms

More information

Chapter 4: Frequent Itemsets and Association Rules

Chapter 4: Frequent Itemsets and Association Rules Chapter 4: Frequent Itemsets and Association Rules Jilles Vreeken Revision 1, November 9 th Notation clarified, Chi-square: clarified Revision 2, November 10 th details added of derivability example Revision

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - 1DL36 Fall 212" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht12 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala

More information

Algorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview

Algorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview Algorithmic Methods of Data Mining, Fall 2005, Course overview 1 Course overview lgorithmic Methods of Data Mining, Fall 2005, Course overview 1 T-61.5060 Algorithmic methods of data mining (3 cp) P T-61.5060

More information

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018 Measuring Effect Size: Cohen s d Simply finding whether a

More information

Mining Strong Positive and Negative Sequential Patterns

Mining Strong Positive and Negative Sequential Patterns Mining Strong Positive and Negative Sequential Patter NANCY P. LIN, HUNG-JEN CHEN, WEI-HUA HAO, HAO-EN CHUEH, CHUNG-I CHANG Department of Computer Science and Information Engineering Tamang University,

More information

Efficient discovery of statistically significant association rules

Efficient discovery of statistically significant association rules Efficient discovery of statistically significant association rules Wilhelmiina Hämäläinen Deartment of Comuter Science University of Helsinki Finland whamalai@cs.helsinki.fi ABSTRACT In this aer, we introduce

More information

CS 584 Data Mining. Association Rule Mining 2

CS 584 Data Mining. Association Rule Mining 2 CS 584 Data Mining Association Rule Mining 2 Recall from last time: Frequent Itemset Generation Strategies Reduce the number of candidates (M) Complete search: M=2 d Use pruning techniques to reduce M

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

An Approach to Classification Based on Fuzzy Association Rules

An Approach to Classification Based on Fuzzy Association Rules An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - DL360 Fall 200 An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht0 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 04 Association Analysis Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

More information

Selecting a Right Interestingness Measure for Rare Association Rules

Selecting a Right Interestingness Measure for Rare Association Rules Selecting a Right Interestingness Measure for Rare Association Rules Akshat Surana R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad

More information

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Performance Measures Training and Testing Logging

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Performance Measures Training and Testing Logging Chapter IR:VIII VIII. Evaluation Laboratory Experiments Performance Measures Logging IR:VIII-62 Evaluation HAGEN/POTTHAST/STEIN 2018 Statistical Hypothesis Testing Claim: System 1 is better than System

More information

Σοφια: how to make FCA polynomial?

Σοφια: how to make FCA polynomial? Σοφια: how to make FCA polynomial? Aleksey Buzmakov 1,2, Sergei Kuznetsov 2, and Amedeo Napoli 1 1 LORIA (CNRS Inria NGE Université de Lorraine), Vandœuvre-lès-Nancy, France 2 National Research University

More information

A Large Deviation Bound for the Area Under the ROC Curve

A Large Deviation Bound for the Area Under the ROC Curve A Large Deviation Bound for the Area Under the ROC Curve Shivani Agarwal, Thore Graepel, Ralf Herbrich and Dan Roth Dept. of Computer Science University of Illinois Urbana, IL 680, USA {sagarwal,danr}@cs.uiuc.edu

More information

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on

More information

Formulas and Tables by Mario F. Triola

Formulas and Tables by Mario F. Triola Copyright 010 Pearson Education, Inc. Ch. 3: Descriptive Statistics x f # x x f Mean 1x - x s - 1 n 1 x - 1 x s 1n - 1 s B variance s Ch. 4: Probability Mean (frequency table) Standard deviation P1A or

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket

More information

Describing Data Table with Best Decision

Describing Data Table with Best Decision Describing Data Table with Best Decision ANTS TORIM, REIN KUUSIK Department of Informatics Tallinn University of Technology Raja 15, 12618 Tallinn ESTONIA torim@staff.ttu.ee kuusik@cc.ttu.ee http://staff.ttu.ee/~torim

More information

Association Rules. Fundamentals

Association Rules. Fundamentals Politecnico di Torino Politecnico di Torino 1 Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket counter Association rule

More information

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions. Definitions Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Itemset is a set including one or more items Example: {Beer, Diapers} k-itemset is an itemset that contains k

More information

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example Association rules Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Contents Introduction to Machine Learning Concept Learning Varun Chandola February 2, 2018 1 Concept Learning 1 1.1 Example Finding Malignant Tumors............. 2 1.2 Notation..............................

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Investigating Measures of Association by Graphs and Tables of Critical Frequencies

Investigating Measures of Association by Graphs and Tables of Critical Frequencies Investigating Measures of Association by Graphs Investigating and Tables Measures of Critical of Association Frequencies by Graphs and Tables of Critical Frequencies Martin Ralbovský, Jan Rauch University

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Pattern Structures 1

Pattern Structures 1 Pattern Structures 1 Pattern Structures Models describe whole or a large part of the data Pattern characterizes some local aspect of the data Pattern is a predicate that returns true for those objects

More information

Lecture Notes for Chapter 6. Introduction to Data Mining

Lecture Notes for Chapter 6. Introduction to Data Mining Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

More information

You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question.

You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question. Data 8 Fall 2017 Foundations of Data Science Final INSTRUCTIONS You have 3 hours to complete the exam. Some questions are harder than others, so don t spend too long on any one question. The exam is closed

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

Mining Approximative Descriptions of Sets Using Rough Sets

Mining Approximative Descriptions of Sets Using Rough Sets Mining Approximative Descriptions of Sets Using Rough Sets Dan A. Simovici University of Massachusetts Boston, Dept. of Computer Science, 100 Morrissey Blvd. Boston, Massachusetts, 02125 USA dsim@cs.umb.edu

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

Connections between mining frequent itemsets and learning generative models

Connections between mining frequent itemsets and learning generative models Connections between mining frequent itemsets and learning generative models Srivatsan Laxman Microsoft Research Labs India slaxman@microsoft.com Prasad Naldurg Microsoft Research Labs India prasadn@microsoft.com

More information

Application of Apriori Algorithm in Open Experiment

Application of Apriori Algorithm in Open Experiment 2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51.130 Application of Apriori Algorithm

More information

Naive Bayesian classifiers for multinomial features: a theoretical analysis

Naive Bayesian classifiers for multinomial features: a theoretical analysis Naive Bayesian classifiers for multinomial features: a theoretical analysis Ewald van Dyk 1, Etienne Barnard 2 1,2 School of Electrical, Electronic and Computer Engineering, University of North-West, South

More information

Statistical Methods for Astronomy

Statistical Methods for Astronomy Statistical Methods for Astronomy Probability (Lecture 1) Statistics (Lecture 2) Why do we need statistics? Useful Statistics Definitions Error Analysis Probability distributions Error Propagation Binomial

More information

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA

More information

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval Sargur Srihari University at Buffalo The State University of New York 1 A Priori Algorithm for Association Rule Learning Association

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers

An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers An Overview of Alternative Rule Evaluation Criteria and Their Use in Separate-and-Conquer Classifiers Fernando Berzal, Juan-Carlos Cubero, Nicolás Marín, and José-Luis Polo Department of Computer Science

More information

Efficient discovery of the top-k optimal dependency rules with Fisher s exact test of significance

Efficient discovery of the top-k optimal dependency rules with Fisher s exact test of significance Efficient discovery of the top-k optimal dependency rules with Fisher s exact test of significance Wilhelmiina Hämäläinen epartment of omputer Science University of Helsinki Finland whamalai@cs.helsinki.fi

More information

Topic 17: Simple Hypotheses

Topic 17: Simple Hypotheses Topic 17: November, 2011 1 Overview and Terminology Statistical hypothesis testing is designed to address the question: Do the data provide sufficient evidence to conclude that we must depart from our

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Sample width for multi-category classifiers

Sample width for multi-category classifiers R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Introduction to Probability

Introduction to Probability LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute

More information

Evaluating Hypotheses

Evaluating Hypotheses Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

16.400/453J Human Factors Engineering. Design of Experiments II

16.400/453J Human Factors Engineering. Design of Experiments II J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential

More information

NetBox: A Probabilistic Method for Analyzing Market Basket Data

NetBox: A Probabilistic Method for Analyzing Market Basket Data NetBox: A Probabilistic Method for Analyzing Market Basket Data José Miguel Hernández-Lobato joint work with Zoubin Gharhamani Department of Engineering, Cambridge University October 22, 2012 J. M. Hernández-Lobato

More information

Rough Set Model Selection for Practical Decision Making

Rough Set Model Selection for Practical Decision Making Rough Set Model Selection for Practical Decision Making Joseph P. Herbert JingTao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 {herbertj, jtyao}@cs.uregina.ca

More information

1 Differential Privacy and Statistical Query Learning

1 Differential Privacy and Statistical Query Learning 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Regression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules

Regression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules International Journal of Innovative Research in Computer Scien & Technology (IJIRCST) Regression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules Mir Md Jahangir

More information

COMP 5331: Knowledge Discovery and Data Mining

COMP 5331: Knowledge Discovery and Data Mining COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Jiawei Han, Micheline Kamber, and Jian Pei And slides provide by Raymond

More information

Background to Statistics

Background to Statistics FACT SHEET Background to Statistics Introduction Statistics include a broad range of methods for manipulating, presenting and interpreting data. Professional scientists of all kinds need to be proficient

More information

Interesting Patterns. Jilles Vreeken. 15 May 2015

Interesting Patterns. Jilles Vreeken. 15 May 2015 Interesting Patterns Jilles Vreeken 15 May 2015 Questions of the Day What is interestingness? what is a pattern? and how can we mine interesting patterns? What is a pattern? Data Pattern y = x - 1 What

More information

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics Mathematics Curriculum A. DESCRIPTION This is a full year courses designed to introduce students to the basic elements of statistics and probability. Emphasis is placed on understanding terminology and

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation 12.3.2008 Lauri Lahti Association rules Techniques for data mining and knowledge discovery in databases

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology Detection theory

More information

C4.5 - pruning decision trees

C4.5 - pruning decision trees C4.5 - pruning decision trees Quiz 1 Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can have? A: No. Quiz 1 Q: Is a tree with only pure leafs always the best classifier you can

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2 LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2 Data Analysis: The mean egg masses (g) of the two different types of eggs may be exactly the same, in which case you may be tempted to accept

More information

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler Complexity Theory Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität Wien 15 May, 2018 Reinhard

More information

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181. Complexity Theory Complexity Theory Outline Complexity Theory VU 181.142, SS 2018 6. The Polynomial Hierarchy Reinhard Pichler Institut für Informationssysteme Arbeitsbereich DBAI Technische Universität

More information