AN EFFICIENT PROCEDURE FOR MINING STATISTICALLY SIGNIFICANT FREQUENT ITEMSETS. Predrag Stanišić and Savo Tomović

Size: px
Start display at page:

Download "AN EFFICIENT PROCEDURE FOR MINING STATISTICALLY SIGNIFICANT FREQUENT ITEMSETS. Predrag Stanišić and Savo Tomović"

Transcription

1 PUBLICATIONS DE L INSTITUT MATHÉMATIQUE Nouvelle série, tome 87(101) (2010), DOI: /PIM S AN EFFICIENT PROCEDURE FOR MINING STATISTICALLY SIGNIFICANT FREQUENT ITEMSETS Predrag Staišić ad Savo Tomović Commuicated by Žarko Mijajlović Abstract. We suggest the origial procedure for frequet itemsets geeratio, which is more efficiet tha the appropriate procedure of the well kow Apriori algorithm. The correctess of the procedure is based o a special structure called Rymo tree. For its implemetatio, we suggest a modified sort-merge-joi algorithm. Fially, we explai how the support measure, which is used i Apriori algorithm, gives statistically sigificat frequet itemsets. 1. Itroductio Associatio rules have received lots of attetio i data miig due to their may applicatios i marketig, advertisig, ivetory cotrol, ad umerous other areas. The motivatio for discoverig associatio rules has come from the requiremet to aalyze large amouts of supermarket basket data. A record i such data typically cosists of the trasactio uique idetifier ad the items bought i that trasactio. Items ca be differet products which oe ca buy i supermarkets or o-lie shops, or car equipmet, or telecommuicatio compaies services etc. A typical supermarket may well have several thousad items o its shelves. Clearly, the umber of subsets of the set of items is immese. Eve though a purchase by a customer ivolves a small subset of this set of items, the umber of such subsets is very large. For example, eve if we assume that o customer has more tha five items i her/his shoppig cart, there are 5 ( ) i=1 = possible cotets of this cart, which correspods to the subsets havig o more tha five items of a set that has 10,000 items, ad this is ideed a large umber. The supermarket is iterested i idetifyig associatios betwee item sets; for example, it may be iterestig to kow how may of the customers who bought 2010 Mathematics Subject Classificatio: Primary 03B70; Secodary 68T27, 68Q17. Key words ad phrases: data miig, kowledge discovery i databases, associatio aalysis, Apriori algorithm. 109 i

2 110 STANIŠIĆ AND TOMOVIĆ bread ad cheese also bought butter. This kowledge is importat because if it turs out that may of the customers who bought bread ad cheese also bought butter, the supermarket will place butter physically close to bread ad cheese i order to stimulate the sales of butter. Of course, such a piece of kowledge is especially iterestig whe there is a substatial umber of customers who buy all three items ad a large fractio of those idividuals who buy bread ad cheese also buy butter. For example, the associatio rule bread cheese butter [support = 20%, cofidece = 85%] represets facts: 20% of all trasactios uder aalysis cotai bread, cheese ad butter; 85% of the customers who purchased bread ad cheese also purchased butter. The result of associatio aalysis is strog associatio rules, which are rules satisfyig a miimal support ad miimal cofidece threshold. The miimal support ad the miimal cofidece are iput parameters for associatio aalysis. The problem of associatio rules miig ca be decomposed ito two subproblems [1]: Discoverig frequet or large itemsets. Frequet itemsets have a support greater tha the miimal support; Geeratig rules. The aim of this step is to derive rules with high cofidece (strog rules) from frequet itemsets. For each frequet itemset l oe fids all oempty subsets of l; for each a l a oe geerates the rule a l a, if support(l) support(a) > miimal cofidece. We do ot cosider the secod sub-problem i this paper, because the overall performaces of miig associatio rules are determied by the first step. Efficiet algorithms for solvig the secod sub-problem are exposed i [12]. The paper is orgaized as follows. Sectio 2 provides formalizatio of frequet itemsets miig problem. Sectio 3 describes Apriori algorithm. Sectio 4 presets our cadidate geeratio procedure. Sectio 5 explais how to use hypothesis testig to validate frequet itemsets. 2. Prelimiaries Suppose that I is a fiite set; we refer to the elemets of I as items. Defiitio 2.1. A trasactio dataset o I is a fuctio T : {1,..., } P (I). The set T (k) is the k-th trasactio of T. The umbers 1,..., are the trasactio idetifiers (TIDs). Give a trasactio data set T o the set I, we would like to determie those subsets of I that occur ofte eough as values of T. Defiitio 2.2. Let T : 1,..., P (I) be a trasactio data set o a set of items I. The support cout of a subset K of the set of items I i T is the umber suppcout T (K) give by: suppcout T (K) = {k 1 k K T (k)}. The support of a item set K (i the followig text istead of a item set K we will use a itemset K) is the umber: supp T (K) = suppcout T (K)/.

3 MINING STATISTICALLY SIGNIFICANT FREQUENT ITEMSETS 111 The followig rather straightforward statemet is fudametal for the study of frequet itemsets. Theorem 2.1. Let T : 1,..., P (I) be a trasactio data set o a set of items I. If K ad K are two itemsets, the K K implies supp T (K ) supp T (K). Proof. The theorem states that supp T for a itemset has the ati-mootoe property. It meas that support for a itemset ever exceeds the support for its subsets. For proof, it is sufficiet to ote that every trasactio that cotais K also cotais K. The statemet from the theorem follows immediately. Defiitio 2.3. A itemset K is μ-frequet relative to the trasactio data set T if supp T (K) mu. We deote by F μ T the collectio of all μ-frequet itemsets relative to the trasactio data set T ad by F μ T,r the collectio of μ-frequet itemsets that cotai r items for r 1 (i the followig text we will use r-itemset istead of itemset that cotais r items). Note that F μ T = r 1 F μ T,r. If μ ad T are clear from the cotext, the we will omit either or both adormets from this otatio. 3. Apiori algorithm We will briefly preset the Apriori algorithm, which is the most famous algorithm i the associatio aalysis. The Apriori algorithm iteratively geerates frequet itemsets startig from frequet 1-itemsets to frequet itemsets of maximal size. Each iteratio of the algorithm cosists of two phases: cadidate geeratio ad support coutig. We will briefly explai both phases i the iteratio k. I the cadidate geeratio phase potetially frequet k-itemsets or cadidate k-itemsets are geerated. The ati-mootoe property of the itemset support is used i this phase ad it provides elimiatio or pruig of some cadidate itemsets without calculatig its actual support (cadidate cotaiig at least oe ot frequet subset is prued immediately, before support coutig phase, see Theorem 2.1). The support coutig phase cosists of calculatig support for all previously geerated cadidates. I the support coutig phase, it is essetial to efficietly determie if the cadidates are cotaied i particular trasactio, i order to icrease their support. Because of that, the cadidates are orgaized i hash tree [2]. The cadidates, which have eough support, are termed as frequet itemsets. The Apriori algorithm termiates whe oe of the frequet itemsets ca be geerated. We will use C T,k to deote cadidate k-itemsets which are geerated i the iteratio k of the Apriori algorithm. Pseudocode for the Apriori algorithm comes ext. Algorithm: Apriori Iput: A trasactio dataset T ; Miimal support threshold μ Output: F μ T

4 112 STANIŠIĆ AND TOMOVIĆ Method: 1. C T,1 = {{i} i I} 2. F μ T,1 = {K C T,1 supp T (K) μ} 3. for (k =2; F μ T,k 1 ; k++) C T,k = apriori_ge(f μ T,k 1 ) F μ T,k = {K C T,k supp T μ} ed for 4. F μ T = r 1 F μ T,r The apriori_ge is a procedure for cadidate geeratio i the Apriori algorithm [2]. I the iteratio k, it takes as argumet F μ T,k 1 the set of all μ-frequet (k 1)-itemsets (which is geerated i the previous iteratio) ad returs the set C T,k. The set C T,k cotais cadidate k-itemsets (potetially frequet k-itemsets) ad accordigly it cotais the set F μ T,k the set of all μ-frequet k-itemsets. The procedure works as follows. First, i the joi step, we joi F μ T,k with itself: INSERT INTO C T,k SELECT R 1.item 1, R 1.item 2,..., R 1.item k 1, R 2.item k 1 FROM F μ T,k 1 AS R 1, F μ T,k 1 AS R 2 WHERE R 1.item 1 = R 2.item 1 R 1.item k 2 = R 2.item k 2 R 1.item k 1 < R 2.item k 1 I the previous query the set F μ T,k is cosidered as a relatioal table with (k 1) attributes: item 1, item 2,..., item k 1. The previous query requires k 2 equality comparisos ad is implemeted with ested-loop joi algorithm i the origial Apriori algorithm from [2]. The ested-loop joi algorithm is expesive, sice it examies every pair of tuples i two relatios: for each l 1 F μ T,k 1 for each l 2 F μ T,k 1 if l 1 [1] = l 2 [1] l 1 [k 2] = l 2 [k 2] l 1 [k 1] < l 2 [k 1] ew_cad ={l 1 [1], l 1 [2],..., l 1 [k 1], l 2 [k 1]} Cosider the cost of ested-loop joi algorithm. The umber of pairs of itemsets (tuples) to be cosidered is F μ T,k 1 * F μ T,k 1, where F μ T,k 1 deotes the umber of itemsets i F μ T,k 1. For each itemset from F μ T,k 1, we have to perform a complete sca o F μ T,k 1. I the worst case the buffer ca hold oly oe block of F μ T,k 1, ad a total of F μ T,k 1 * b + b block trasfers would be required, where deotes the umber of blocks cotaiig itemsets of F μ T,k 1. Next i the prue step, we delete all cadidate itemsets c C T,k such that some (k 1)-subset of c is ot i F μ T,k 1 (see Theorem 2.1): for each c C T,k for each (k -1)-subset s i c if s / F μ T,k 1 delete c from C T,k I the ext sectio we suggest a more efficiet procedure for cadidate geeratio phase ad a efficiet algorithm for its implemetatio.

5 MINING STATISTICALLY SIGNIFICANT FREQUENT ITEMSETS New cadidate geeratio procedure We assume that ay itemset K is kept sorted accordig to some relatio <, where for all x, y K, x < y meas that the object x is i frot of the object y. Also, we assume that all trasactios i the database T ad all subsets of K are kept sorted i lexicographic order accordig to the relatio <. For cadidate geeratio we suggest the origial method by which the set C T,k is calculated by joiig F μ T,k 1 with F μ T,k 2, i the iteratio k, for k 3. A cadidate k-itemset is created from oe large (k 1)-itemset ad oe large (k 2)-itemset i the followig way. Let X = {x 1,..., x k 1 } F μ T,k 1 ad Y = {y 1,..., y k 2 } F μ T,k 2. Itemsets X ad Y are joied if ad oly if the followig coditio is satisfied: (4.1) x i = y i, (1 i k 3) x k 1 < y k 2 producig the cadidate k-itemset {x 1,..., x k 2, x k 1, y k 2 }. We will prove the correctess of the suggested method. I the followig text we will deote this method by C T,k = F μ T,k 1 F μ T,k 2. Let I = i 1,..., i be a set of items that cotais elemets. Deote by G I = (P (I), E) the Rymo tree (see Appedix) of P (I). The root of the tree is. A vertex K = {i p1,..., i pk } with i p1<i p2 < <ip k has i pk childre K j, where i pk < j. Let S r be the collectio of itemsets that have r elemets. The ext theorem suggest a techique for geeratig S r startig from S r 1 ad S r 2. Theorem 4.1. Let G be the Ryma tree of P (I), where I = i 1,..., i. If W S r, where r 3, the there exists a uique pair of distict sets U S r 1 ad V S r 2 that has a commo immediate acestor T S r 3 i G such that U V S r 3 ad W = U V. Proof. Let u ad v ad p be the three elemets of W that have the largest, the secod-largest ad the third-largest subscripts, respectively. Cosider the sets U = W {u} ad V = W {v, p}. Note that U S r 1 ad V S r 2. Moreover, Z = U V belogs to S r 3 because it cosists of the first r 3 elemets of W. Note that both U ad V are descedats of Z ad that U V = W (for r = 3 we have Z = ). The pair (U, V ) is uique. Ideed, suppose that W ca be obtaied i the same maer from aother pair of distict sets U 1 is r 1 ad V 1 S r 2 such that U 1 ad V 1 are immediate descedats of a set Z 1 S r 3. The defiitio of the Rymo tree G I implies that U 1 = Z 1 {i m, i q } ad V 1 = Z 1 {i y }, where the letters i Z 1 are idexed by a umber smaller tha mi{m, q, y}. The, Z 1 cosists of the first r 3 symbols of W, so Z 1 = Z. If m < q < y, the m is the third-highest idex of a symbol i W, q is the secod-highest idex of a symbol i W ad y is the highest idex of a symbol i W, so U 1 = U ad V 1 = V. The followig theorem directly proves correctess of our method C T,k = F μ T,k 1 F μ T,k 2. Theorem 4.2. Let T be a trasactio data set o a set of items I ad let k N such that k > 2. If W is a μ-frequet itemset ad W = k, the there exists

6 114 STANIŠIĆ AND TOMOVIĆ a μ-frequet itemset Z ad two itemsets {i m, i q } ad {i y } such that Z = k 3, Z W, W = Z {i m, i q, i y } ad both Z {i m, i q } ad Z {i y } are μ-frequet itemsets. Proof. If W is a itemset such that W = k, tha we already kow that W is the uio of two subsets U ad V of I such that U = k 1, V = k 2 ad that Z = U V has k 3 elemets (it follows from Theorem 2). Sice W is a μ-frequet itemset ad Z, U, V are subsets of W, it follows that each of these sets is also a μ-frequet itemset (it follows from Theorem 1). We have see i the previous sectio that i the origial Apriori algorithm from [2], the joi procedure is suggested. It geerates cadidate k-itemset by joiig two large (k 1)-itemsets, if ad oly if they have first (k 2) items i commo. Because of that, each joi operatio requires (k 2) equality comparisos. If a cadidate k-itemset is geerated by the method C T,k = F μ T,k 1 F μ T,k 2 for k 3, it is eough (k 3) equality comparisos to process. The method C T,k = F μ T,k 1 F μ T,k 2 ca be represeted by the followig SQL query: INSERT INTO C T,k SELECT R 1.item 1,..., R 1.item k 1, R 2.item k 2 FROM F μ T,k 1 AS R 1, F μ T,k 2 AS R 2 WHERE R 1.item 1 = R 2.item 1 R 1.item k 3 = R 2.item k 3 R 1.item k 1 < R 2.item k 2 For the implemetatio of the joi C T,k = F μ T,k 1 F μ T,k 2 we suggest a modificatio of sort-merge-joi algorithm (ote that F μ T,k 1 ad F μ T,k 2 are sorted because of the way they are costructed ad lexicographic order of itemsets). By the origial sort-merge-joi algorithm [9], it is possible to compute atural jois ad equi-jois. Let r(r) ad s(s) be the relatios ad R S deote their commo attributes. The algorithm keeps oe poiter o the curret positio i relatio r(r) ad aother oe poiter o the curret positio i relatio s(s). As the algorithm proceeds, the poiters move through the relatios. It is supposed that the relatios are sorted accordig to joiig attributes, so tuples with the same values o the joiig attributes are i a cosecutive order. Thereby, each tuple eeds to be read oly oce, ad, as a result, each relatio is also read oly oce. The umber of block trasfers is equal to the sum of the umber of blocks i both sets F μ T,k 1 ad F μ T,k 2, b1 + b2. We have see that ested-loop joi requires F μ T,k 1 * b1 + b1 block trasfers ad we ca coclude that merge-joi is more efficiet ( F μ T,k 1 * b1 b2 ). The modificatio of sort-merge-joi algorithm we suggest refers to the elimiatio of restrictios that a joi must be atural or equi-joi. First, we separate the coditio (4.1): (4.2) (4.3) x i = y i, 1 i k 3 x k 1 < y k 2.

7 MINING STATISTICALLY SIGNIFICANT FREQUENT ITEMSETS 115 Joiig C T,k = F μ T,k 1 F μ T,k 2 is calculated accordig to the coditio (4.2), i other words we compute the atural joi. For this, the described sort-merge-joi algorithm is used, ad our modificatio is: before X = {x 1,..., x k 1 } ad Y = {y 1,..., y k 2 }, for which X F μ T,k 1 ad Y F μ T,k 2 ad x i = y i, 1 i k 3 is true, are joied, we check if coditio (4.3) is satisfied, ad after that we geerate a cadidate k-itemset {x 1,..., x k 2, x k 1, y k 2 }. The pseudocode of apriori_ge fuctio comes ext. FUNCTION apriori_ge(f μ T,k 1, F μ T,k 2 ) 1. i = 0 2. j = 0 3. while i F μ T,k 1 j F μ T,k 2 iset 1 = F μ T,k 1 [i + +] S = {iset 1 } doe = false while doe = false i F μ T,k 1 iset 1a = F μ T,k 1 [i + +] if iset 1a [w] = iset 1 [w], 1 w k 2 the S = S {iset 1a } i + + else doe = true ed if ed while iset 2 = F μ T,k 2 [j] while j F μ T,k 2 iset 2[1,..., k 2] iset 1 [1,..., k 2] iset 2 = F μ T,k 2 [j + +] ed while while j F μ T,k 2 iset 1[w] = iset 2 [w], 1 w k 2 for each s S if iset 1 [k 1] iset 2 [k 2] the c = {iset 1 [1],..., iset 1 [k 1], iset 2 [k 2]} if c cotais-ot-frequet-subset the DELETE c else C T,k = C T,k {c} ed if ed for j++ iset 2 = F μ T,k 2 [j] ed while ed while We implemeted the origial Apriori [2] ad the algorithm proposed here. Algorithms are implemeted i C i order to evaluate its performaces. Experimets are performed o PC with a CPU Itel(R) Core(TM)2 clock rate of 2.66GHz ad

8 116 STANIŠIĆ AND TOMOVIĆ with 2GB of RAM. Also, ru time used here meas the total executio time, i.e., the period betwee iput ad output istead of CPU time measured i the experimets i some literature. I the experimets dataset which ca be foud o is used. It cotais biary trasactios. The average legth of trasactios is 8. Table 1 shows that the origial Apriori algorithm from [2] is outperformed by the algorithm preseted here. Table 1. Executio time i secods mi sup(%) Apriori Rymo tree based implemetatio Validatig frequet itemsets I this sectio we use hypothesis testig to validate frequet itemsets which have bee geerated i the Apriori algorithm. Hypothesis testig is a statistical iferece procedure to determie whether a hypothesis should be accepted or rejected based o the evidece gathered from the data. Examples of hypothesis tests iclude verifyig the quality of patters extracted by may data miig algorithms ad validatig the sigificace of the performace differece betwee two classificatio models. I hypothesis testig, we are usually preseted with two cotrastig hypothesis, which are kow, respectively, as the ull hypothesis ad the alterative hypothesis. The geeral procedure for hypothesis testig cosists of the followig four steps: Formulate the ull ad alterative hypotheses to be tested. Defie a test statistic θ that determies whether the ull hypothesis should be accepted or rejected. The probability distributio associated with the test statistic should be kow. Compute the value of θ from the observed data. Use the kowledge of the probability distributio to determie a quatity kow as p-value. Defie a sigificace level a which cotrols the rage of θ values i which the ull hypothesis should be rejected. The rage of values for θ is kow as the rejectio regio. Now, we will back to miig frequet itemsets ad we will evaluate the quality of the discovered frequet itemsets from a statistical perspective. The criterio for judgig whether a itemset is frequet depeds o the support of the itemset. Recall that support measures the umber of trasactios i which the itemset is actually observed. A itemset X is cosidered frequet i the data set T, if supp T (X) > mi sup, where misup is a user-specified threshold.

9 MINING STATISTICALLY SIGNIFICANT FREQUENT ITEMSETS 117 The problem ca be formulated ito the hypothesis testig framework i the followig way. To validate if the itemset X is frequet i the data set T, we eed to decide whether to accept the ull hypothesis, H 0 : supp T (X) = mi sup, or the alterative hypothesis H 1 : supp T (X) > mi sup. If the ull hypothesis is rejected, the X is cosidered as frequet itemset. To perform the test, the probability distributio for supp T (X) must also be kow. Theorem 5.1. The measure supp T (X) for the itemset X i trasactio data set T has the biomial distributio with mea supp T (X) ad variace 1 (supp T (X)* (1 supp T (X))), where is the umber of trasactios i T. Proof. We will use measure suppcout T (X) ad calculate the mea ad the variace for it ad later derive mea ad variace for the measure supp T (X). The measure suppcout T (X) = X presets the umber of trasactios i T that cotai itemset X, ad supp T (X) = suppcout T (X)/ (Defiitio 2.2). The measure X is aalogous to determiig the umber of heads that shows up whe tossig cois. Let us calculate E(X ) ad D(X ). Mea is E(X ) = * p, where p is the probability of success, which meas (i our case) the itemset X appears i oe trasactio. Accordig to the Beroulli law it is: ε > 0, lim N P { X / p ε} = 1. Freely speakig, for large (we work with large databases so ca be cosidered large), we ca use relative frequecy istead of probability. So, we ow have: For variace we compute: E(X ) = p X D(X ) = p(1 p) X ( 1 X = X ) = X ( X ) Now we compute E(supp T (X)) ad D(supp T (X)). Recall that supp T (X) = X /. We have: ( X ) E(supp T (X)) = E = 1 E(X ) = X = supp T (X). ( X ) D(supp T (X)) = D = 1 2 D(X ) = 1 X ( X ) 2 = 1 supp T (X)(1 supp T (X)) The biomial distributio ca be further approximated usig a ormal distributio if is sufficietly large, which is typically the case i associatio aalysis. Regardig the previous paragraph ad Theorem 5.1, uder the ull hypothesis supp T (X) is assumed to be ormally distributed with mea mi sup ad variace 1 (mi sup(1 mi sup)). To test whether the ull hypothesis should be accepted or rejected, the followig statistic ca be used: (5.1) W N = supp T (X) mi sup (mi sup(1 mi sup))/.

10 118 STANIŠIĆ AND TOMOVIĆ The previous statistic, accordig to the Cetral Limit Theorem, has the distributio N(0, 1). The statistic essetially measures the differece betwee the observed support supp T (X) ad the mi sup threshold i uits of stadard deviatio. Let N = 10000, supp T (X) = 0.11, mi sup = 0.1 ad α = The last parameter is the desired sigificace level. It cotrols Type 1 error which is rejectig the ull hypothesis eve though the hypothesis is true. I the Apriori algorithm we compare supp T (X) = 0.11 > 0.1 = mi sup ad we declare X as frequet itemset. Is this validatio procedure statistically correct? Uder the hypothesis H 1 statistics W is positive ad for the rejectio regio we choose R = {(x 1,..., x ) w > k}, k > 0. Let us fid k = P H0 {W > k} P H0 {W > k} = k = 3.09 Now we compute w = ( ) ( 0.1*(1 0.1) ) 1/ = We ca see that w > k, so we are withi the rejectio regio ad H 1 is accepted, which meas the itemset X is cosidered statistically sigificat. 6. Appedix Rymo tree was itroduced i [8] i order to provide a uified search-based framework for several problems i artificial itelligece; the Rymo tree is also useful for data miig algorithms. I Defiitios 6.1 ad 6.2 we defie ecessary cocepts ad i Defiitio 6.3 we defie the Rymo tree. Defiitio 6.1. Let S be a set ad let d : S N be a ijective fuctio. The umber d(x) is the idex of x S. If P S, the view of P is the subset view(d, P ) = {s S d(s) > max p P d(p)}. Defiitio 6.2. A collectio of sets C is hereditary if U C ad W U implies W C. Defiitio 6.3. Let C be a hereditary collectio of subsets of a set S. The graph G = (C, E) is a Rymo tree for C ad the idexig fuctio d if: the root of the G is the childre of a ode P are the sets of the form P {s}, where s view(d, P ) If S = {s 1,..., s } ad d(s i ) = i for 1 i, we will omit the idexig fuctio from the defiitio of the Rymo tree for P (S). A key property of a Rymo tree is stated ext. Theorem 6.1. Let G be a Rymo tree for a hereditary collectio C of subsets of a set S ad a idexig fuctio d. Every set P of C occurs exactly oce i the tree. Proof. The argumet is by iductio o p = P. If p = 0, the P is the root of the tree ad the theorem obviously holds.

11 MINING STATISTICALLY SIGNIFICANT FREQUENT ITEMSETS 119 Suppose that the theorem holds for sets havig fewer tha p elemets, ad let P C be such that P = p. Sice C is hereditary, every set of the form P {x} with x P belogs to C ad, by the iductive hypothesis, occurs exactly oce i the tree. Let z be the elemet of P that has the largest value of the idex fuctio d. The view(p {z}) cotais z ad P is a child of the vertex P {z}. Sice the paret of P is uique, it follows that P occurs exactly oce i the tree. Note that i the Rymo tree of a collectio of the form P (S), the collectio of sets of S r that cosists of sets located at distace r from the root deotes all subsets of the size r of S. Refereces [1] R. Agrawal, R. Srikat, Fast algorithms for miig associatio rules, Proc. VLDB-94 (1994), [2] F. P. Coee, P. Leg, S. Ahmed, T-trees, vertical partitioi-g ad distributed associatio rule miig, Proc. It. Cof. Discr. Math (2003), [3] F. P. Coee, P. Leg, S. Ahmed, Data structures for associatio rule miig: T-trees ad P-trees, IEEE Tras. Data Kowledge Egi. 16(6), (2004), [4] F. P. Coee, P. Leg, G. Goulboure, Tree structures for miig associatio rules, J. Data Mi. Kowledge Discovery 8(1) (2004), [5] G. Goulboure, F. P. Coee, P. Leg, Algorithms for computig associatio rules usig a partial-support tree, J. Kowledge-based Syst. 13 (1999), [6] G. Grahe, J. Zhu, Efficietly usig prefix-trees i miig frequet itemsets, Proc. IEEE ICDM Workshop Frequet Itemset Miig Implemetatios (2003). [7] J. Ha, J. Pei, P.S. Yu, Miig frequet patters without cadidate geeratio, Proc. ACM SIGMOD Cof. Maagemet of Data, (2000), [8] R. Rymo, Search through systematic set eumeratio, Proc. 3rd It. Cof. Priciples of Kowledge Represetatio ad Reasoig, (1992), [9] A. Silberschatz, H. F. Korth, S. Sudarsha, Database System Cocepts, McGraw Hill, New York (2006) [10] A. D. Simovici, C. Djeraba, Mathematical tools for data miig (Set Theory, Partial Orders, Combiatorics), Spriger-Verlag, Lodo, [11] P. Staišić, S. Tomović, Apriori multiple algorithm for miig associatio rules, Iformatio Techology ad Cotrol 37(4) (2008), [12] P. N. Ta., M. Steibach, V. Kumar, Itroductio to Data Miig, Addiso Wesley, Departmet of Mathematics ad Computer Sciece (Received ) Uiversity of Moteegro (Revised ) Podgorica Moteegro pedjas@ac.me savotom@gmail.com

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018 CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples

More information

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y. Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

The Random Walk For Dummies

The Random Walk For Dummies The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

Commutativity in Permutation Groups

Commutativity in Permutation Groups Commutativity i Permutatio Groups Richard Wito, PhD Abstract I the group Sym(S) of permutatios o a oempty set S, fixed poits ad trasiet poits are defied Prelimiary results o fixed ad trasiet poits are

More information

Chapter 6: Mining Frequent Patterns, Association and Correlations

Chapter 6: Mining Frequent Patterns, Association and Correlations Chapter 6: Miig Frequet Patters, Associatio ad Correlatios Basic cocepts Frequet itemset miig methods Costrait-based frequet patter miig (ch7) Associatio rules 1 What Is Frequet Patter Aalysis? Frequet

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

Power and Type II Error

Power and Type II Error Statistical Methods I (EXST 7005) Page 57 Power ad Type II Error Sice we do't actually kow the value of the true mea (or we would't be hypothesizig somethig else), we caot kow i practice the type II error

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 23 Daiel B. Rowe, Ph.D. Departmet of Mathematics, Statistics, ad Computer Sciece Copyright 2017 by D.B. Rowe 1 Ageda: Recap Chapter 9.1 Lecture Chapter 9.2 Review Exam 6 Problem Solvig Sessio. 2

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 6 Simple alternatives and the Neyman-Pearson lemma STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Sample Size Determination (Two or More Samples)

Sample Size Determination (Two or More Samples) Sample Sie Determiatio (Two or More Samples) STATGRAPHICS Rev. 963 Summary... Data Iput... Aalysis Summary... 5 Power Curve... 5 Calculatios... 6 Summary This procedure determies a suitable sample sie

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

Application to Random Graphs

Application to Random Graphs A Applicatio to Radom Graphs Brachig processes have a umber of iterestig ad importat applicatios. We shall cosider oe of the most famous of them, the Erdős-Réyi radom graph theory. 1 Defiitio A.1. Let

More information

Axioms of Measure Theory

Axioms of Measure Theory MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc) Classificatio of problem & problem solvig strategies classificatio of time complexities (liear, arithmic etc) Problem subdivisio Divide ad Coquer strategy. Asymptotic otatios, lower boud ad upper boud:

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Analysis of Algorithms. Introduction. Contents

Analysis of Algorithms. Introduction. Contents Itroductio The focus of this module is mathematical aspects of algorithms. Our mai focus is aalysis of algorithms, which meas evaluatig efficiecy of algorithms by aalytical ad mathematical methods. We

More information

1 Hash tables. 1.1 Implementation

1 Hash tables. 1.1 Implementation Lecture 8 Hash Tables, Uiversal Hash Fuctios, Balls ad Bis Scribes: Luke Johsto, Moses Charikar, G. Valiat Date: Oct 18, 2017 Adapted From Virgiia Williams lecture otes 1 Hash tables A hash table is a

More information

Lecture 2: April 3, 2013

Lecture 2: April 3, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,

More information

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency Math 152. Rumbos Fall 2009 1 Solutios to Review Problems for Exam #2 1. I the book Experimetatio ad Measuremet, by W. J. Youde ad published by the by the Natioal Sciece Teachers Associatio i 1962, the

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

University of California, Los Angeles Department of Statistics. Hypothesis testing

University of California, Los Angeles Department of Statistics. Hypothesis testing Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Elemets of a hypothesis test: Hypothesis testig Istructor: Nicolas Christou 1. Null hypothesis, H 0 (claim about µ, p, σ 2, µ

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Math F215: Induction April 7, 2013

Math F215: Induction April 7, 2013 Math F25: Iductio April 7, 203 Iductio is used to prove that a collectio of statemets P(k) depedig o k N are all true. A statemet is simply a mathematical phrase that must be either true or false. Here

More information

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. XI-1 (1074) MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. R. E. D. WOOLSEY AND H. S. SWANSON XI-2 (1075) STATISTICAL DECISION MAKING Advaced

More information

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

CS / MCS 401 Homework 3 grader solutions

CS / MCS 401 Homework 3 grader solutions CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

TEACHER CERTIFICATION STUDY GUIDE

TEACHER CERTIFICATION STUDY GUIDE COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Lecture Overview. 2 Permutations and Combinations. n(n 1) (n (k 1)) = n(n 1) (n k + 1) =

Lecture Overview. 2 Permutations and Combinations. n(n 1) (n (k 1)) = n(n 1) (n k + 1) = COMPSCI 230: Discrete Mathematics for Computer Sciece April 8, 2019 Lecturer: Debmalya Paigrahi Lecture 22 Scribe: Kevi Su 1 Overview I this lecture, we begi studyig the fudametals of coutig discrete objects.

More information

Lecture XVI - Lifting of paths and homotopies

Lecture XVI - Lifting of paths and homotopies Lecture XVI - Liftig of paths ad homotopies I the last lecture we discussed the liftig problem ad proved that the lift if it exists is uiquely determied by its value at oe poit. I this lecture we shall

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Skip Lists. Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 S 3 S S 1

Skip Lists. Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 S 3 S S 1 Presetatio for use with the textbook, Algorithm Desig ad Applicatios, by M. T. Goodrich ad R. Tamassia, Wiley, 2015 Skip Lists S 3 15 15 23 10 15 23 36 Skip Lists 1 What is a Skip List A skip list for

More information

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9 BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous

More information

Disjoint Systems. Abstract

Disjoint Systems. Abstract Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Sequences of Definite Integrals, Factorials and Double Factorials

Sequences of Definite Integrals, Factorials and Double Factorials 47 6 Joural of Iteger Sequeces, Vol. 8 (5), Article 5.4.6 Sequeces of Defiite Itegrals, Factorials ad Double Factorials Thierry Daa-Picard Departmet of Applied Mathematics Jerusalem College of Techology

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled 1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how

More information

Testing Statistical Hypotheses for Compare. Means with Vague Data

Testing Statistical Hypotheses for Compare. Means with Vague Data Iteratioal Mathematical Forum 5 o. 3 65-6 Testig Statistical Hypotheses for Compare Meas with Vague Data E. Baloui Jamkhaeh ad A. adi Ghara Departmet of Statistics Islamic Azad iversity Ghaemshahr Brach

More information

Comparison Study of Series Approximation. and Convergence between Chebyshev. and Legendre Series

Comparison Study of Series Approximation. and Convergence between Chebyshev. and Legendre Series Applied Mathematical Scieces, Vol. 7, 03, o. 6, 3-337 HIKARI Ltd, www.m-hikari.com http://d.doi.org/0.988/ams.03.3430 Compariso Study of Series Approimatio ad Covergece betwee Chebyshev ad Legedre Series

More information

Disjoint set (Union-Find)

Disjoint set (Union-Find) CS124 Lecture 7 Fall 2018 Disjoit set (Uio-Fid) For Kruskal s algorithm for the miimum spaig tree problem, we foud that we eeded a data structure for maitaiig a collectio of disjoit sets. That is, we eed

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

The Choquet Integral with Respect to Fuzzy-Valued Set Functions

The Choquet Integral with Respect to Fuzzy-Valued Set Functions The Choquet Itegral with Respect to Fuzzy-Valued Set Fuctios Weiwei Zhag Abstract The Choquet itegral with respect to real-valued oadditive set fuctios, such as siged efficiecy measures, has bee used i

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY Structure 2.1 Itroductio Objectives 2.2 Relative Frequecy Approach ad Statistical Probability 2. Problems Based o Relative Frequecy 2.4 Subjective Approach

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Weakly Connected Closed Geodetic Numbers of Graphs

Weakly Connected Closed Geodetic Numbers of Graphs Iteratioal Joural of Mathematical Aalysis Vol 10, 016, o 6, 57-70 HIKARI Ltd, wwwm-hikaricom http://dxdoiorg/101988/ijma01651193 Weakly Coected Closed Geodetic Numbers of Graphs Rachel M Pataga 1, Imelda

More information

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 5: Parametric Hypothesis Testig: Comparig Meas GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review from last week What is a cofidece iterval? 2 Review from last week What is a cofidece

More information

Stat 319 Theory of Statistics (2) Exercises

Stat 319 Theory of Statistics (2) Exercises Kig Saud Uiversity College of Sciece Statistics ad Operatios Research Departmet Stat 39 Theory of Statistics () Exercises Refereces:. Itroductio to Mathematical Statistics, Sixth Editio, by R. Hogg, J.

More information

A representation approach to the tower of Hanoi problem

A representation approach to the tower of Hanoi problem Uiversity of Wollogog Research Olie Departmet of Computig Sciece Workig Paper Series Faculty of Egieerig ad Iformatio Scieces 98 A represetatio approach to the tower of Haoi problem M. C. Er Uiversity

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1 PH 425 Quatum Measuremet ad Spi Witer 23 SPIS Lab Measure the spi projectio S z alog the z-axis This is the experimet that is ready to go whe you start the program, as show below Each atom is measured

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018 HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018 We are resposible for 2 types of hypothesis tests that produce ifereces about the ukow populatio mea, µ, each of which has 3 possible

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

A Note on the Symmetric Powers of the Standard Representation of S n

A Note on the Symmetric Powers of the Standard Representation of S n A Note o the Symmetric Powers of the Stadard Represetatio of S David Savitt 1 Departmet of Mathematics, Harvard Uiversity Cambridge, MA 0138, USA dsavitt@mathharvardedu Richard P Staley Departmet of Mathematics,

More information

MA238 Assignment 4 Solutions (part a)

MA238 Assignment 4 Solutions (part a) (i) Sigle sample tests. Questio. MA38 Assigmet 4 Solutios (part a) (a) (b) (c) H 0 : = 50 sq. ft H A : < 50 sq. ft H 0 : = 3 mpg H A : > 3 mpg H 0 : = 5 mm H A : 5mm Questio. (i) What are the ull ad alterative

More information

Topic 18: Composite Hypotheses

Topic 18: Composite Hypotheses Toc 18: November, 211 Simple hypotheses limit us to a decisio betwee oe of two possible states of ature. This limitatio does ot allow us, uder the procedures of hypothesis testig to address the basic questio:

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary

Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary Recursive Algorithm for Geeratig Partitios of a Iteger Sug-Hyuk Cha Computer Sciece Departmet, Pace Uiversity 1 Pace Plaza, New York, NY 10038 USA scha@pace.edu Abstract. This article first reviews the

More information

General IxJ Contingency Tables

General IxJ Contingency Tables page1 Geeral x Cotigecy Tables We ow geeralize our previous results from the prospective, retrospective ad cross-sectioal studies ad the Poisso samplig case to x cotigecy tables. For such tables, the test

More information

The Growth of Functions. Theoretical Supplement

The Growth of Functions. Theoretical Supplement The Growth of Fuctios Theoretical Supplemet The Triagle Iequality The triagle iequality is a algebraic tool that is ofte useful i maipulatig absolute values of fuctios. The triagle iequality says that

More information