Mining Probabilistic Association Rules from Uncertain Databases with Pruning

Size: px
Start display at page:

Download "Mining Probabilistic Association Rules from Uncertain Databases with Pruning"

Transcription

1 Miig Probabilistic Associatio Rules from Ucertai Databases with Pruig Erich A. Peterso Departmet of Computer Sciece Uiversity of Arkasas at Little Rock Little Rock, AR 7224 Liag Zhag Departmet of Biological Statistics ad Computatioal Biology Corell Uiversity Peiyi Tag Departmet of Computer Sciece Uiversity of Arkasas at Little Rock Little Rock, AR Ithaca, NY Abstract-I this paper, we rigorously defie the problem of miig probabilistic associatio rules from ucertai databases. We further aalyze the probability distributio space of a cadidate probabilistic associatio rule, ad propose a efficiet miig algorithm with pruig to fid all probabilistic associatio rules from ucertai databases. I. INTRODUCTION I recet years, a great deal of iterest i miig probabilistic frequet itemsets (p-fis) from ucertai databases has bee geerated [1]-[6], [8]. Sice the ladmark work of Berecker et al. [3], almost all the research i miig p-fis uses a probabilistic model based o possible world sematics. While frequet item sets are useful, the most COlmo purpose for miig frequet itemsets is the to fid associatio rules from them. While geeratig associatio rules from frequet itemsets is straightforward for certai databases [7], to fid probabilistic associatio rules from probabilistic frequet itemsets is ot trivial. I this paper, we give a rigorous defiitio of a probabilistic associatio rule (p-ar), based o possible world sematics, usig two ratios: (1) a frequecy ratio ad (2) a cofidece ratio 'fi, both betwee ad 1. Usig two ratios istead of oe (cofidece ratio), as i [8], we foud that if'fl <, X =} Y is always a probabilistic associatio rule with respect to a miimum probability threshold miprob, give that X U Y is a probabilistic frequet itemset with respect to ad miprob. This observatio offers the meas of a extremely useful pruig techique whe performig probabilistic associatio rule miig. I other words, as log as 'fi < ad Z is a probabilistic frequet itemset with respect to, X =} Z - X is guarateed to be a probabilistic associatio rule for ay XcZ. We also derive the recursive equatio for the probability distributio of a p-ar ad show a dyamic prograrmig algorithm for miig p-ars. We further itroduce additioal pruig techiques i the miig algorithm based o the value of 'fi which reduces the computatio required to determie whether a cadidate is a probabilistic associatio rule. The rest of the paper is orgaized as follows: Sectio II itroduces the ucertai database model used i this paper, probabilistic support based o possible world sematics, ad the probabilistic frequet itemset. This sectio also icludes our defiitio of a probabilistic associatlo rule usig two ratios. Sectio III aalyzes the probability distributio of associatio rules ad presets a theorem of pruig whe 'fi <. Sectio IV presets the dyamic programmig algorithm to determie whether a cadidate is a probabilistic associatio rule ad a algorithm for eumeratig cadidates, both with pruig. Sectio V describes the results of our prelimiary evaluatio of our miig algorithm. Sectio VI provides a discussio of related ad future work. Sectio VII cocludes the paper. II. A. Ucertai Database Model PRELIMINARIES A ucertai database of size deoted as T is a set of trasactios, T = {t 1,..., t}. A trasactio t j (1 :s; j :s; ) is a set of items, each of which has a o-zero probability o < p :s; 1 that it exists i the trasactio tj. That is, p = Pr(x E tj) for ay item x cotaied i tj (x E tj). The items i a trasactio with a existetial probability betwee ad 1 are called ucertai items, whereas the itemset with existetial probability 1 are called certai items. Whe defiig probabilistic frequet itemsets ad associatio rules from ucertai databases, the priciples ad theories of possible world sematics must be used. Uder possible world sematics, a possible world is a certai database istatiated from the ucertai database by icludig or ot icludig every ucertai item. If the assumptio of idepedece betwee the trasactios i T, as well as the items i each trasactio holds, the probability of each possible world w ca be calculated as follows: Pr(w) = II (II Pr(x E t f ). II (1- Pr(x E t f ))) (1) tet(w) xet x<1-t where T( w) is the certai database of possible world w, t is a trasactio i T( w), t f is the correspodig trasactio i the ucertai database T from which t is istatiated, ad x is a ucertai item of t f. I the traditioal certai database model, a itemset is either cotaied i a trasactio or ot. I the ucertai database model, we caot say itemset X is cotaied i trasactio tj with certaity if X cotais at least oe ucertai item, eve though X <;;; tj. What we have is a probability that X is cotaied i trasactio tj, Pr(X <;;; tj). Accordig to

2 possible world sematics, this is the margial probability that X is cotaied i tj as follows Pr(w) (2) where W is the set of all possible worlds ad tj (w) is the j-th trasactio i the certai database of possible world w. From (1) ad (2), this margial probability ca be expressed as follows [3]: Pr(X <:;; tj) = II Pr(x E tj) (3) xex The probability that tj does ot cotai X thus is B. Probabilistic Support ad Frequet ltemsets Give a ucertai database of size, T, ad a itemset X, the probabilistic support of X i T, deoted as Sup(X, T), is a radom variable that ca have values, 1,...,. Accordig to possible world sematics, the probability that Sup(X, T) = i, ::; i ::;, deoted as Pi (X), is the margial probability (4) Pr(w) (5) wew,sup(x,t(w)) = i where Sup(X, T(w)) is the support of X i the certai database T( w) of possible world w, which is the umber of trasactios i T ( w) that cotai X. Pi (X) for i =,..., is the probability mass fuctio (pmt) of the support of X i T ad we have L:7=oPi(X) = l. Let S be a set of i trasactios from T. The probability that every trasactio i S cotais itemset X ad every trasactio i T -S does ot cotai X is II Pr(X <:;; t) II (1 -Pr(X <:;; t)) tes tet-s This is obtaied by sulmig the probability of each possible world i which every trasactio i S cotais itemset X ad every trasactio i T -S does ot cotai X, usig (1), (3) ad (4). Sice there are may differet subsets S <:;; T with size i, the probability of Sup(X, T) = i, Pi(X), is the summatio of the probabilities above over differet subsets S [3]: Pi(X) = L II Pr(X <:;; t) II (l -Pr(X <:;; t)) (6) S,; T, IISII i tes tet-s Give a miimum support ratio E (,1), a itemset X is a probabilistic frequet itemset (p-fi) if the probability that Sup(X, T) is greater tha or equal to IITII (i.e. ) is above a miimum probability threshold deoted as miprob. Sice Pi(X) is the pmf of Sup(X, T), the probability that Sup(X, T) is greater tha or equal to is L:7=Il Pi(X), Hece, we say that X is a frequet itemset if ad oly if L:7=Il Pi(X) ;::: miprob. Computig the pmf Pi(X) usig (6) is expesive. Berecker et al. proposed a dyamic programmig algorithm to calculate Pi (X) to mie probabilistic frequet itemsets efficietly i [3]. C. Probabilistic Associatio Rules (p-ars) Give a ucertai database T of size, a miimum frequecy E (,1), a miimum cofidece T) E (,1), ad a miimum probability threshold miprob, X =? Y is a probabilistic associatio rule (p-ar) if ad oly if X Y =, ad the probability that Sup(X U Y, T) ;::: ad Sup(XUY, T) ;::: T)Sup(X, T) is greater or equal to miprob. Accordig to possible worlds sematics, the probability that Sup(X UY, T) ;::: ad Sup(X UY, T) ;::: T)Sup(X, T), deoted as P(X, Y, C T), T), is the sum of the probability of all the small possible worlds w satisfyig Sup(XUY, T(w)) ;::: ad Sup(XUY, T(w)) T)Sup(X, T(w)), where T(w) is the certai database of small world w. That is, P(X, Y,, T), T) = Pr(w) we W, Sup(X U Y, T(w»), Sup(X U Y, T(w)) 2: "Sup(X, T(w)) (7) The problem of miig probabilistic associatio rules, is to fid all the probabilistic associatio rules from a ucertai base T, give the miimum frequecy, the miimum cofidece T), ad the miimum probability threshold miprob. III. PROBABILITY DISTRIBUTION FOR ASSOCIATION RULES Calculatig P(X, Y, C T), T) usig (7) is ifeasible, as the umber of possible worlds are too umerous. Thus, we eed to fid aother way to calculate it. Let us cosider the probability that a trasactio tj cotais both X ad Y with X Y = : Pr(X uy <:;; tj) II Pr(z E tj) zexuy II Pr(x E tj) II Pr(y E tj) xex The third equality above is due to X Y = ad the fourth follow (3). yey Pr(X <:;; tj )Pr(Y <:;; tj) (8) ad the secod The probability that a trasactio tj cotais X ad but ot Y is, thus, Pr(X <:;; tj 1\ Y tj) = Pr(X <:;; t)(l -Pr(Y <:;; t)) (9) because Pr(X <:;; tj 1\ Y t j) + Pr(X <:;; tj 1\ Y <:;; t = j) Pr(X <:;; tj). Cosider the possible worlds w where j trasactios cotai both X ad Y ad aother i trasactios cotai X but ot Y. These j + i trasactios are the trasactios cotaiig X ad the rest -(j +i) trasactios do ot cotai X. Thus, the support of X ad Xu Y i theses possible worlds are j + i ad j, respectively. Let S1 ad S2 be the sets of the trasactios that cotai X but ot Y, ad X ad Y, respectively ad we have IISIII = i ad IIS211 = j. Let Sl ad S2 also deote the same sets of trasactios i the ucertai database. The, the probability that the i trasactios of S1 cotai X but ot Y,

3 j j o (1 - 'I]) (a) whe rt > t; (b) whe rt :S t; Fig. 1. Probability Distributio of Pi,j (X, Y) the j trasactios of S2 cotai both X ad Y, ad the rest of -(j + i) trasactios do ot cotai X is as follows: II Pr(X <:;; t)(l -Pr(Y <:;; t)). II Pr(X <:;; t)pr(y <:;; t). II tet-5s-51 accordig to (9), (8) ad (4). (1 -Pr(X <:;; t)) Further, the probability that there are j trasactios i T cotaiig both X ad Y ad i trasactios cotaiig X but ot Y, or equivaletly the Sup(X U Y, T) = j ad Sup(X, T) -Sup(X U Y, T) = i, deoted as Pi, j(x, Y), is give as follows: Pi, j(x, Y ) II Pr(X <:;; t)(l -Pr(Y <:;; t)). 31 C;; T, te = i, 32 C;; T, = j II Pr(X <:;; t)pr(y <:;; t). II (1 -Pr(X <:;; t)) (1) Obviously, j + i :s: ad :s: i,j :s:. Pi, j (X, Y) is actually the pmf of the umber of trasactios cotaiig X but ot Y ad the umber of trasactios cotaiig both X ad Y. Also, we have L Pi, j(x, Y) = 1. O::;i, j::;, j+i::; Therefore, the probability that Sup(X U Y, T)?: ad Sup(X U Y, T)?: T)Sup(X, T), (deoted as P(X, Y,, T), T» is the sum of those Pi, j (X, Y) satisfyig both j?: ad j?: T)( j + i). Sice j?: T)(j + i) is equivalet to j?: 1: 1) i, we have P(X, Y,, T), T) = ::; j ::;, j + i ::; j2:: i,i2:: Pi, j(x, Y) (11) Figure I(a) shows the poits where Pi, j (X, Y) is defied. The shaded area is the itersectio of the two triagles (1) j?:, i?: ad j + i :s: ad (2) j?: 1: 1) i, i?: ad j + i :s:. The summatio of the Pi, j(x, Y) of the poits i the shaded area gives P(X, Y, C T), T) as show i (11). X =} Y is a probabilistic associatio rule if ad oly if this P(X, Y,, T), T) is greater or equal to the miimum probability threshold miprob. Notice that the itersectio of lies j = ad j+i = -2ryi is (i, j) = ((1 -T)), T)). Thus, whe T) :s:, the itersectio of the two triagles is the first triagle j?:, i?: ad j + i :s: itself as illustrated i Figure I(b). The summatio of the Pi, j (X, Y) i this triagle is the probability that j = Sup(X U Y, T)?:. If X U Y is a p-fi with respect to defied i Sectio II-B ad T) :s:, the summatio of Pi, j (X, Y) i this triagle is greater tha or equal to miprob. Thus, P(X, Y, C T), T)?: miprob ad X =} Y is a p-ar with respect to ad T). We sulmarize this fidig i the followig theorem. Theorem 1: If itemset Z is a probabilistic frequet itemset with respect to the miimum frequecy ad the miimum probability threshold miprob ad the miimum cofidece T) is less tha or equal to, X =} Y is a probabilistic associatio rule with respect to, T) ad miprob for ay X C Z ad Y = Z -X. Proof If Z = X U Y is a probabilistic frequet itemset with respect to ad miprob, the the probability that Sup(X U Y, T)?:, where = IITII, is greater tha or equal to miprob. The probability that Sup(X U Y, T)?: is equal to L i2:,j+ is;, Pi, j(x,y). We thus have ::; j ::; L i >, j + i <, ::; j ::;- Pi, j(x, Y)?: miprob.

4 = We ow show that j?:, j + i ::; ad?: T) implies j?: I'/ First we have i::; - j::; - = (1 -O. Sice j?: ad?: T), we have t > _ = > _ T) i -(1 -O T) which gives j?: -2rii. Sice j?:, j + i::; ad?: T) implies j?: -2rii, we have i 2':, j + i, ' ::s:: j ::s::, j 2': -2ryi Pi,j(X, Y) 'i 2:::, j + i :-:;, :::; j :::; > miprob. Pi,j(X, Y) Thus, X =} Y is a probabilistic associatio rule with respect to, T) ad miprob. Theorem 1 tells us that if T) ::; ad Z is a p-fi, we ca simply eumerate all the subset X of Z ad every X =} Z -X is a p-ar; thus, we eed oly mie p-ars whe T) >. IV. MINING PROBABILISTIC ASSOCIATION RULES A. Dyamic Programmig Algorithm The complexity of computig Pi,j(X, Y) directly by (1) is expoetial with. We ca use dyamic programmig method to reduce it to O(3). Let Tk be a ucertai database made of the first k trasactios {tl,..., td from T. That is, To = ad Tk = Tk-I U {tk} for k = 1,...,. Let the probability that there are j trasactios i Tk cotaiig both X ad Y ad i trasactios cotaiig X but ot Y, or equivaletly Sup(XUY, Tk l = j ad Sup(X, Tk) -SUp(XUY, Tk) = i, be deoted as p (X, Y). Obviously, i + j::; k ad ::; i,j::; k i p (X, Y) ad we have P (k i,j ) (X, Y) = 1. ""5. i, j ""5. k,j+i""5. k (12) That is, all the p (X, Y) outside of the triagle of ::; i, j::; k,j+i::; k are zeros. Notice that P)(X, Y) is the Pi,j(X, Y) give i (1). Sice Tk = Tk-I U {td, p(x, Y) ca be calculated from p -I )(X, Y) usig the followig recursio: P (k i, j ) (X, Y) p,(x, Y)Pr(X tk)(1 -Pr(Y tk)) p-=- I )(X, Y)Pr(X tk)pr(y tk) + p -I )(X, Y)(I -Pr(X tk)) (13) with pl!'i)x, Y) = ad P2I (X, Y) = due to (12). This is because trasactio tk either cotais both X ad Y, or cotais X but ot Y, or does ot cotai X at all. The recursive equatio (13) above is derived from the Law of Total Probability ad the fact that trasactio tk is idepedet from all the trasactios i Tk-I. I particular, whe the case is that Tk has j trasactios cotaiig both X ad Y ad i trasactios cotaiig X but ot Y, oly oe of the followig situatios will be true: (1) tj does ot cotai X ad Tk-I has j trasactios cotaiig both X ad Y ad i trasactios cotaiig X but ot Y, (2) tj cotais both X ad Y ad Tk-I has j -1 trasactios cotaiig both X ad Y ad i trasactios cotaiig X but ot Y, or (3) tj cotais X but ot Y ad Tk-I has j trasactios cotaiig both X ad Y ad i -I trasactios cotaiig X but ot Y. Note that the values of p -I \X, Y) are oly used to calculate the values of p (X, Y). Oce used, they do ot eed to exist. Thus, we ca use two 2-dimesioal data arrays ( fo[i,j]) ad (h[i,j]) to store values of p(x, Y) for eve ad odd k, respectively. The followig triple-ested loop will calculate ad leave p)(x, Y) i f mod 2[i,j]: { 1 if i = Iitialize array fo with fo [i, j j] = o' therwlse Iitialize array h with zero for all elemets; for k = for i =, for j = l, k, k -i tmp = fk-i mod 2[i,j](1 -pr); if (i > ) tmp += f(k-i ) mod 2[i-1,j]pr(1-pk); if (j > ) tmp += f(k-i ) mod 2[i,j -l]prpk; fk mod 2[i,j] = tmp; where pr ad Pk are the shorthads for Pr(X tk) ad Pr(Y tk), respectively. The ested loop above is the dyamic programmig method to calculate p (X, Y) for k = 1,..., usig (13). The if statemets i the loop body are to deal with boudary coditios for p (X, Y), where i = or j = which require values PI!'IJ) (X, Y) ad p/) (X, Y). These values should be all zeros, because either of i ad j ca be egative. Iteratio k (k = 1,...,) of the outer-most loop calculates each p (X, Y) ad stores them i ik mod 2[i, j] i the triagle delieated by ::; i, j::; k, j+ i::; k. It uses the p -I )(X, Y) values calculated by the previous iteratio k - 1 ad stores them i f(k-i ) mod 2[i,j] i the triagle delieated by ::; i,j ::; k -l,j + i ::; k - 1 (whe k > 1), as well as the iitial values stored i the f(k-i ) mod 2[i,j] o the diagoal of j + i = k. The iitial values o the diagoal j + i = k of f(k-i ) mod 2[i,j] are p -I )(X, Y) with j + i = k > k -1. They should be zeros, because j + i caot exceed k -1 i p -I )(X, Y). (Also see (12». The iitial value of fo[o,o] is P6 6(X, Y). Accordig to (12), this value should be 1. Thus, the array fo should be iitialized as follows.. { I if i = j = fort,] otherwise ad h should be iitialized with all zero. B. Pruig To determie whether X =} Y is a p-ar with respect to,t), ad miprob (whe T) > O-we ca either sum the values of Pi, j (X, Y) of the poits i the shaded area as i Figure l(a) ad see if the sum is greater tha or equal to miprob, or sum the values of Pi, j (X, Y) of the poits i the =

5 = u-shaded area to see if the sum is less tha or equal to 1 - miprob. The itersectio of the lies j+i = ad j = l rj i is (i,j) = ((I-7]),7]) as show i Figure lea). To compute the sum of the values of Pi,j (X, Y) i the shaded area, we oly eed to calculate the values of Pi,j (X, Y) i the trapezoid bouded by i =, i = (1-7]), j =, ad j = - i. If we wat to calculate the sum of the values OfPi,j(X, Y) i the ushaded area, we oly eed to calculate the values of Pi,j (X, Y) i the trapezoid bouded by j =, j = 7],, ad i = i = - j. We wat to chose the smaller trapezoid. The areas of trapezoids for computig the shaded ad u-shaded areas are ( l- rj2) 2 d ( 2 rj - rj2) 2. I 2 a 2 respective y. Th us, I 'f 2 7] > _ 1, l.e.. ' 7]?:.5, we chose to calculate the trapezoid for the shaded area. Otherwise, we calculate the trapezoid for the u-shaded area. The dyamic programmig algorithm with pruig to determie whether X =} Y is a p-ar is show i Figure 2. It is assumed that fuctio p-ar(x, Y, C 7], miprob) is called oly whe X Y = ad X U Y is a p-fi with respect to ad miprob. C. Fidig Associatio Rules For X =} Y with X Y = to be a p-ar with respect to, 7] ad miprob, X U Y has to be a p-fi with respect to ad miprob. We use the apriori set eumeratio [7] ad Berecker's dyamic programmig algorithm [3] to fid probabilistic frequet itemsets (p-fis) Z with respect to ad miprob. For each p-fi Z, we use a algorithm to eumerate the cadidates X, Y with Xu Y = Z ad X Y =. Figure 3 shows the eumeratio tree of p-ar cadidates X, Y for p-fi Z = abed. At each iteratio of the eumeratio, a item is moved from atecedet X which is less tha all the items i cosequet Y ad add it to Y, to be a child of X, Y. Formally, let X = Xl... X ad Y = YI... Ym ad items i X ad Y are all ordered accordig to the total order of the items, i.e. Xi < Xj ad Yi < Yj for i < j. The X, Y has oly those childre Xl... Xi-IXi+1... X, XiYI... Ym such that Xi < YI Notice that the cosequets i each of the descedats of X, Y are supersets of Y. Accordig to the ati-mootoicity priciple proved by [8], if X =} Y is ot a p-ar, oe of the descedats of X, Y would be a p-ar ad, thus, the etire subtree rooted at ode X, Y ca be prued out. V. PERFORMANCE EVALUATION Here some prelimiary tests are dissemiated, whose results were culled from a implemetatio of the p-ar miig algorithm described above. The ucertai database we used was coverted from the certai database che s s from the Frequet Itelset Miig Dataset Repository fimi.ua.ac.be/data/. The trasformatio of the che s s database was performed as follows: for each item i each trasactio, we assig a radom probability P betwee o ad 1 draw from Beta distributio with ex = 1 ad (3 = 1, to be used as the item's existetial probability. This procedure was ru five times to produce five radom datasets. Each experimet (data poit i the performace evaluatio) was ru o all five datasets ad the results of each averaged. To reduce the amout of time eeded to ru each experimet, the total umber of uique items was reduced to 14, oly the boolea p-ar(x, Y,, 7], miprob) { } if (7] :s; ) retur true; Iitialize array fo with fo [i, j] = { I if i = j. o th erwlse Iitialize array h with zero for all elemets; float sum = ; if (7]?:.5) for k = for i = for j = l, O,mi(k, l(1-7])j), k - i tmp = fk-l mod 2[i,j](1 -p; if (i > ) tmp =+ f ( k-l ) mod 2[i-l,j]p,; (1 -p; if (j > ) tmp =+ f ( k-l ) mod 2[i,j -1]p';Pk; fk mod 2 [i, j] = tmp; if (k = /\ j?: i7] l/\ j?: i2ry i l ) sum += fk mod 2[i,j]; if (sum?: miprob) retur true; ed if retur false; else for k = for i = for j = l,, O,mi(k -i, l7]j) tmp = fk-l mod 2[i,j](1 -p,;); if (i > ) tmp =+ f ( k-l ) mod 2[i-l,j]p,; (1 -p; if (j > ) tmp =+ f ( k-l ) mod 2[i,j -1]p';Pk; fk mod 2 [i, j] = tmp; if (k = /\ (j < l7] J V j < l2ry ij ) sum += fk mod 2[i,j]; if (sum> 1 - miprob) retur false; ed if retur true; ed if Fig. 2. Fig. 3. Dyamic Programmig Algorithm abed, abc, d abd, e aed, b bcd, a ab, cd ae, bd be, ad ad, be bd, ae cd, ab a, bed b, aed e, abd d, abc, abed Eumeratio of AR Cadidates

6 TABLE I. AVERAGE TIME IN p-aro (MS) ' first 1 trasactios were used, ad oly rules X =} Y of total legth 6 (legth of X plus the legth of Y) or less were cosidered as possible p-ars. The aforemetioed restrictios were ecessary because miig probabilistic associatio rules is extremely time-expesive: to fid all probabilistic frequet itemsets is already expoetial. The, to eumerate allocatio rule cadidates for each frequet itemset as show i Figure 3 is expoetial with respect to the legth of the frequet itemset. Berecker et al.'s dyamic programmig algorithm [3] is used to fid probabilistic frequet itemsets. For each probabilistic frequet itemset foud, we follow the eumeratio i Figure 3 with apriori pruig ad call fuctio p-aro i Figure 2 to determie whether a cadidate is a p-ar. We measured two times i our aalysis: (1) the total time of miig all probabilistic associatio rules ad (2) the average time of ruig fuctio p-aro i Figure 2. The tests are ru o a Dell Precisio T75 machie with Itel Xeo CPU (E562) 2.4 GHz wi 4 Cores ad 48GB RAM. I all experimets we fix miprob =.9, ad vary (from.1 to.9 i.1 icremets) ad 'T} (from.1 to.95 i.5 icremets). I Table I the average time i millisecods eeded to determie whether a cadidate is a probabilistic associatio rule (i.e., the average time take to execute the fuctio p AR()) for various values of 'T} ad. I the table, oe sees the drastic effect of pruig accordig to the value of 'T} with respect to. I particular, oe sees that whe 'T} <, that the time is virtually zero, as the fuctio p-aro simply returs t rue. Oe also sees the effect of the differig areas of the trapezoid beig computed whe 'T} >. For =.1,.2,.3 ad.4, times all peeked at 'T} =.45. We see that the time is decreasig as 'T} decreases from.45 to. This is because the trapezoid coverig the ushaded area, which is computed whe 'T} <.5, is decreasig as 'T} decreases. The time is also decreasig as 'T} icreases from.5 to.95. This is because the trapezoid coverig the shaded area, which is computed whe 'T} 2:.5, is decreasig as 'T} icreases. We also see the time is decreasig as 'T} icreases from to.95 for =.5,...,.9 for the same reaso. Table II shows the total average time of miig probabilistic frequet itemsets (p-fis) ad probabilistic associatio rules (p-ars) i secods. Oe ca see that whe 'T} is held costat, TABLE II. AVERAGE TOTAL EXECUTION TIME (SECS) ' I I I I the total miig time decreases over icreasig values of, because for lower values of far more p-fis are geerated. Whe is held costat, we see agai that for values of > 'T}, the total time is small because of the ear zero time eeded to retur t rue from p-aro; however, we see that as 'T} icreases past, total time decreases as the umber of cadidate p-ars decreases. For example, for =.1, the total miig time decreases as 'T} icreases from.25 to.95. This is because with a higher 'T} fewer cadidate p-ars are geerated. The low total time for 'T} =.1 is because the ruig time of fuctio p-aro is very ear zero. Similarly for =.4, we see that the total time decreases as 'T} icreases from.45 to.95. The total times for.1 'T}.45 are low; agai, because of the ear zero time eeded to retur true from p-aro. VI. RELATED AND FUTURE WORK Most work i miig ucertai data based o possible world sematics are cetered aroud fidig probabilistic frequet itemsets [3], [6], [8] or probabilistic frequet closed itemsets [4], [5]. I [8], Su et al. preseted a algorithm for miig probabilistic associatio rules. The algorithm dissemiated i [8], uses parameter misup, a iteger betwee

7 o ad the size of database for the ummum support, ad micof (7] i our paper), a real umber betwee ad 1 for the miimum cofidece, to defie a probabilistic associatio rule, whereas we use two ratios (miimum frequecy) ad 7] (miimum cofidece), both betwee ad 1. As a cosequece, we are able to fid the mathematical relatioship betwee the two parameters (Theorem 1) by aalyzig the probability distributio space, ad use it to prue the p-ar search space sigificatly whe 7]. We further prue the search space by choosig the smaller area of the probability distributio space depedig o the value of 7], whereas [8] always calculates the probability distributio i the shaded area o matter how small 7] is. The associatio rule probability distributio i [8] is calculated from itemset support probability distributio, usig de-covolutio through Fast Fourier Trasformatio. I this paper, we use the two-dimesioal dyamic programmig approach to calculate a cadidate associatio rule's probability distributio directly. The theoretical time complexity of the algorithm i [8] is ( 2 Jog), where ours is (3) ad ( 1), whe 7] > ad 7], respectively. Eve whe 7] >, sice our algorithm has deep pruig with 7] close to 1 or, it is our belief that our algorithm will perform well, particularly whe 7] is close to 1 or O. Oe future work, is to compare the performace betwee the two algorithms usig practical experimets-usig databases of differet sizes ad usig varyig thresholds. [8] Liwe Su, Reyold Cheg, David W. Cheug, ad Jiefeg Cheg. Miig ucertai data with probabilistic guaratees. I Proceedigs of the 16th ACM SIGKDD Iteratioal Coferece o Kowledge Discovery ad Data Miig, KDD ' 1, pages , New York, NY, USA, 21. ACM. VII. CONCLUSION I this paper, we defied probabilistic associatio rules (p-ars) usig two miimum ratios, miimum frequecy ad miimum cofidece 7]. We thoroughly aalyzed the probability distributio space for a cadidate associatio rule, ad proposed a efficiet dyamic programmig algorithm with pruig for miig probabilistic associatio rules. [l] REFERENCES CK Chui, B Kao, ad E Hug. Miig frequet itemsets from ucertai data. Advaces i Kowledge Discovery ad Data Miig, pages 47-58, 27. [2] C Chui ad B Kao. A decremetai approach for miig frequet itemsets from ucertai data. PAKDD '8, Proceedigs of the 12th Pacific-Asia coferece o Advaces i kowledge discovery ad data miig, pages 64-75, Ja 28. [3] Thomas Bemecker, Has-Peter Kriegel, Matthias Rez, Floria Verhei, ad Adreas Zuefte. Probabilistic frequet itemset miig i ucertai databases. I Proceedigs of the 15th ACM SIGKDD iteratioal coriferece o Kowledge discovery ad data miig, KDD '9, pages , New York, NY, USA, 29. ACM. [4] Peiyi Tag ad Erich A. Peterso. Miig probabilistic frequet closed itemsets i ucertai databases. I Proceedigs of the 49-th Aual Associatio for Computig Machiery Southeast Coferece (ACMSE'll), pages 86-91, Keesaw, GA, USA, March 211. [5] Yogxi Tog, Lei Che, ad Boli Dig. Discoverig threshold-based frequet closed itemsets over probabilistic data. I Data Egieerig (ICDE), 212 IEEE 28th Iteratioal Coferece o, pages , 212. [6] Liag Wag, D.W.-L. Cheug, R. Cheg, Sau-Da Lee, ad X.S. Yag. Efficiet miig of frequet item sets o large ucertai databases. Kowledge ad Data Egieerig, ieee Trasactios o, 24(12): , 212. [7] Rakesh Agrawal ad Ramakrisha Srikat. Fast algorithms for miig associatio rules. I Proceedigs of the 2th Iteratioal Coferece o Very Large Data Bases, pages , 1994.

Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary

Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary Recursive Algorithm for Geeratig Partitios of a Iteger Sug-Hyuk Cha Computer Sciece Departmet, Pace Uiversity 1 Pace Plaza, New York, NY 10038 USA scha@pace.edu Abstract. This article first reviews the

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Exercises Advanced Data Mining: Solutions

Exercises Advanced Data Mining: Solutions Exercises Advaced Data Miig: Solutios Exercise 1 Cosider the followig directed idepedece graph. 5 8 9 a) Give the factorizatio of P (X 1, X 2,..., X 9 ) correspodig to this idepedece graph. P (X) = 9 P

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

4.3 Growth Rates of Solutions to Recurrences

4.3 Growth Rates of Solutions to Recurrences 4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis

Recursive Algorithms. Recurrences. Recursive Algorithms Analysis Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects

More information

1 Hash tables. 1.1 Implementation

1 Hash tables. 1.1 Implementation Lecture 8 Hash Tables, Uiversal Hash Fuctios, Balls ad Bis Scribes: Luke Johsto, Moses Charikar, G. Valiat Date: Oct 18, 2017 Adapted From Virgiia Williams lecture otes 1 Hash tables A hash table is a

More information

CS / MCS 401 Homework 3 grader solutions

CS / MCS 401 Homework 3 grader solutions CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Kinetics of Complex Reactions

Kinetics of Complex Reactions Kietics of Complex Reactios by Flick Colema Departmet of Chemistry Wellesley College Wellesley MA 28 wcolema@wellesley.edu Copyright Flick Colema 996. All rights reserved. You are welcome to use this documet

More information

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

Analysis of Algorithms. Introduction. Contents

Analysis of Algorithms. Introduction. Contents Itroductio The focus of this module is mathematical aspects of algorithms. Our mai focus is aalysis of algorithms, which meas evaluatig efficiecy of algorithms by aalytical ad mathematical methods. We

More information

Metric Space Properties

Metric Space Properties Metric Space Properties Math 40 Fial Project Preseted by: Michael Brow, Alex Cordova, ad Alyssa Sachez We have already poited out ad will recogize throughout this book the importace of compact sets. All

More information

A Block Cipher Using Linear Congruences

A Block Cipher Using Linear Congruences Joural of Computer Sciece 3 (7): 556-560, 2007 ISSN 1549-3636 2007 Sciece Publicatios A Block Cipher Usig Liear Cogrueces 1 V.U.K. Sastry ad 2 V. Jaaki 1 Academic Affairs, Sreeidhi Istitute of Sciece &

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Lecture 9: Hierarchy Theorems

Lecture 9: Hierarchy Theorems IAS/PCMI Summer Sessio 2000 Clay Mathematics Udergraduate Program Basic Course o Computatioal Complexity Lecture 9: Hierarchy Theorems David Mix Barrigto ad Alexis Maciel July 27, 2000 Most of this lecture

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled 1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values Iteratioal Joural of Applied Operatioal Research Vol. 4 No. 1 pp. 61-68 Witer 2014 Joural homepage: www.ijorlu.ir Cofidece iterval for the two-parameter expoetiated Gumbel distributio based o record values

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Desig ad Aalysis of Algorithms Probabilistic aalysis ad Radomized algorithms Referece: CLRS Chapter 5 Topics: Hirig problem Idicatio radom variables Radomized algorithms Huo Hogwei 1 The hirig problem

More information

Axioms of Measure Theory

Axioms of Measure Theory MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that

More information

Lecture 2: April 3, 2013

Lecture 2: April 3, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

On forward improvement iteration for stopping problems

On forward improvement iteration for stopping problems O forward improvemet iteratio for stoppig problems Mathematical Istitute, Uiversity of Kiel, Ludewig-Mey-Str. 4, D-24098 Kiel, Germay irle@math.ui-iel.de Albrecht Irle Abstract. We cosider the optimal

More information

Rank Modulation with Multiplicity

Rank Modulation with Multiplicity Rak Modulatio with Multiplicity Axiao (Adrew) Jiag Computer Sciece ad Eg. Dept. Texas A&M Uiversity College Statio, TX 778 ajiag@cse.tamu.edu Abstract Rak modulatio is a scheme that uses the relative order

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

The Random Walk For Dummies

The Random Walk For Dummies The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Zeros of Polynomials

Zeros of Polynomials Math 160 www.timetodare.com 4.5 4.6 Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered with fidig the solutios of polyomial equatios of ay degree

More information

A New Recursion for Space-Filling Geometric Fractals

A New Recursion for Space-Filling Geometric Fractals A New Recursio for Space-Fillig Geometric Fractals Joh Shier Abstract. A recursive two-dimesioal geometric fractal costructio based upo area ad perimeter is described. For circles the radius of the ext

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

TEACHER CERTIFICATION STUDY GUIDE

TEACHER CERTIFICATION STUDY GUIDE COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra

More information

Bertrand s Postulate

Bertrand s Postulate Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a

More information

ROLL CUTTING PROBLEMS UNDER STOCHASTIC DEMAND

ROLL CUTTING PROBLEMS UNDER STOCHASTIC DEMAND Pacific-Asia Joural of Mathematics, Volume 5, No., Jauary-Jue 20 ROLL CUTTING PROBLEMS UNDER STOCHASTIC DEMAND SHAKEEL JAVAID, Z. H. BAKHSHI & M. M. KHALID ABSTRACT: I this paper, the roll cuttig problem

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

Markov Decision Processes

Markov Decision Processes Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

4.1 SIGMA NOTATION AND RIEMANN SUMS

4.1 SIGMA NOTATION AND RIEMANN SUMS .1 Sigma Notatio ad Riema Sums Cotemporary Calculus 1.1 SIGMA NOTATION AND RIEMANN SUMS Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each

More information

Math 609/597: Cryptography 1

Math 609/597: Cryptography 1 Math 609/597: Cryptography 1 The Solovay-Strasse Primality Test 12 October, 1993 Burt Roseberg Revised: 6 October, 2000 1 Itroductio We describe the Solovay-Strasse primality test. There is quite a bit

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)

Classification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc) Classificatio of problem & problem solvig strategies classificatio of time complexities (liear, arithmic etc) Problem subdivisio Divide ad Coquer strategy. Asymptotic otatios, lower boud ad upper boud:

More information

PRACTICE FINAL/STUDY GUIDE SOLUTIONS

PRACTICE FINAL/STUDY GUIDE SOLUTIONS Last edited December 9, 03 at 4:33pm) Feel free to sed me ay feedback, icludig commets, typos, ad mathematical errors Problem Give the precise meaig of the followig statemets i) a f) L ii) a + f) L iii)

More information

Topic 5: Basics of Probability

Topic 5: Basics of Probability Topic 5: Jue 1, 2011 1 Itroductio Mathematical structures lie Euclidea geometry or algebraic fields are defied by a set of axioms. Mathematical reality is the developed through the itroductio of cocepts

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Analysis of Algorithms -Quicksort-

Analysis of Algorithms -Quicksort- Aalysis of Algorithms -- Adreas Ermedahl MRTC (Mälardales Real-Time Research Ceter) adreas.ermedahl@mdh.se Autum 2004 Proposed by C.A.R. Hoare i 962 Worst- case ruig time: Θ( 2 ) Expected ruig time: Θ(

More information

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018 CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical

More information

SRC Technical Note June 17, Tight Thresholds for The Pure Literal Rule. Michael Mitzenmacher. d i g i t a l

SRC Technical Note June 17, Tight Thresholds for The Pure Literal Rule. Michael Mitzenmacher. d i g i t a l SRC Techical Note 1997-011 Jue 17, 1997 Tight Thresholds for The Pure Literal Rule Michael Mitzemacher d i g i t a l Systems Research Ceter 130 Lytto Aveue Palo Alto, Califoria 94301 http://www.research.digital.com/src/

More information

subject to A 1 x + A 2 y b x j 0, j = 1,,n 1 y j = 0 or 1, j = 1,,n 2

subject to A 1 x + A 2 y b x j 0, j = 1,,n 1 y j = 0 or 1, j = 1,,n 2 Additioal Brach ad Boud Algorithms 0-1 Mixed-Iteger Liear Programmig The brach ad boud algorithm described i the previous sectios ca be used to solve virtually all optimizatio problems cotaiig iteger variables,

More information

Lecture 5: April 17, 2013

Lecture 5: April 17, 2013 TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 5: April 7, 203 Scribe: Somaye Hashemifar Cheroff bouds recap We recall the Cheroff/Hoeffdig bouds we derived i the last lecture idepedet

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

CSE 202 Homework 1 Matthias Springer, A Yes, there does always exist a perfect matching without a strong instability.

CSE 202 Homework 1 Matthias Springer, A Yes, there does always exist a perfect matching without a strong instability. CSE 0 Homework 1 Matthias Spriger, A9950078 1 Problem 1 Notatio a b meas that a is matched to b. a < b c meas that b likes c more tha a. Equality idicates a tie. Strog istability Yes, there does always

More information

Dynamic Programming. Sequence Of Decisions

Dynamic Programming. Sequence Of Decisions Dyamic Programmig Sequece of decisios. Problem state. Priciple of optimality. Dyamic Programmig Recurrece Equatios. Solutio of recurrece equatios. Sequece Of Decisios As i the greedy method, the solutio

More information

Dynamic Programming. Sequence Of Decisions. 0/1 Knapsack Problem. Sequence Of Decisions

Dynamic Programming. Sequence Of Decisions. 0/1 Knapsack Problem. Sequence Of Decisions Dyamic Programmig Sequece Of Decisios Sequece of decisios. Problem state. Priciple of optimality. Dyamic Programmig Recurrece Equatios. Solutio of recurrece equatios. As i the greedy method, the solutio

More information

Disjoint set (Union-Find)

Disjoint set (Union-Find) CS124 Lecture 7 Fall 2018 Disjoit set (Uio-Fid) For Kruskal s algorithm for the miimum spaig tree problem, we foud that we eeded a data structure for maitaiig a collectio of disjoit sets. That is, we eed

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Mathematical Induction

Mathematical Induction Mathematical Iductio Itroductio Mathematical iductio, or just iductio, is a proof techique. Suppose that for every atural umber, P() is a statemet. We wish to show that all statemets P() are true. I a

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Skip lists: A randomized dictionary

Skip lists: A randomized dictionary Discrete Math for Bioiformatics WS 11/12:, by A. Bocmayr/K. Reiert, 31. Otober 2011, 09:53 3001 Sip lists: A radomized dictioary The expositio is based o the followig sources, which are all recommeded

More information

Chapter 6: Mining Frequent Patterns, Association and Correlations

Chapter 6: Mining Frequent Patterns, Association and Correlations Chapter 6: Miig Frequet Patters, Associatio ad Correlatios Basic cocepts Frequet itemset miig methods Costrait-based frequet patter miig (ch7) Associatio rules 1 What Is Frequet Patter Aalysis? Frequet

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Polynomial identity testing and global minimum cut

Polynomial identity testing and global minimum cut CHAPTER 6 Polyomial idetity testig ad global miimum cut I this lecture we will cosider two further problems that ca be solved usig probabilistic algorithms. I the first half, we will cosider the problem

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter 2 & Teachig

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Complex Stochastic Boolean Systems: Generating and Counting the Binary n-tuples Intrinsically Less or Greater than u

Complex Stochastic Boolean Systems: Generating and Counting the Binary n-tuples Intrinsically Less or Greater than u Proceedigs of the World Cogress o Egieerig ad Computer Sciece 29 Vol I WCECS 29, October 2-22, 29, Sa Fracisco, USA Complex Stochastic Boolea Systems: Geeratig ad Coutig the Biary -Tuples Itrisically Less

More information

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter & Teachig Material.

More information

( ) = p and P( i = b) = q.

( ) = p and P( i = b) = q. MATH 540 Radom Walks Part 1 A radom walk X is special stochastic process that measures the height (or value) of a particle that radomly moves upward or dowward certai fixed amouts o each uit icremet of

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

6. Sufficient, Complete, and Ancillary Statistics

6. Sufficient, Complete, and Ancillary Statistics Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary

More information