Mining Probabilistic Association Rules from Uncertain Databases with Pruning
|
|
- Jane Lucas
- 6 years ago
- Views:
Transcription
1 Miig Probabilistic Associatio Rules from Ucertai Databases with Pruig Erich A. Peterso Departmet of Computer Sciece Uiversity of Arkasas at Little Rock Little Rock, AR 7224 Liag Zhag Departmet of Biological Statistics ad Computatioal Biology Corell Uiversity Peiyi Tag Departmet of Computer Sciece Uiversity of Arkasas at Little Rock Little Rock, AR Ithaca, NY Abstract-I this paper, we rigorously defie the problem of miig probabilistic associatio rules from ucertai databases. We further aalyze the probability distributio space of a cadidate probabilistic associatio rule, ad propose a efficiet miig algorithm with pruig to fid all probabilistic associatio rules from ucertai databases. I. INTRODUCTION I recet years, a great deal of iterest i miig probabilistic frequet itemsets (p-fis) from ucertai databases has bee geerated [1]-[6], [8]. Sice the ladmark work of Berecker et al. [3], almost all the research i miig p-fis uses a probabilistic model based o possible world sematics. While frequet item sets are useful, the most COlmo purpose for miig frequet itemsets is the to fid associatio rules from them. While geeratig associatio rules from frequet itemsets is straightforward for certai databases [7], to fid probabilistic associatio rules from probabilistic frequet itemsets is ot trivial. I this paper, we give a rigorous defiitio of a probabilistic associatio rule (p-ar), based o possible world sematics, usig two ratios: (1) a frequecy ratio ad (2) a cofidece ratio 'fi, both betwee ad 1. Usig two ratios istead of oe (cofidece ratio), as i [8], we foud that if'fl <, X =} Y is always a probabilistic associatio rule with respect to a miimum probability threshold miprob, give that X U Y is a probabilistic frequet itemset with respect to ad miprob. This observatio offers the meas of a extremely useful pruig techique whe performig probabilistic associatio rule miig. I other words, as log as 'fi < ad Z is a probabilistic frequet itemset with respect to, X =} Z - X is guarateed to be a probabilistic associatio rule for ay XcZ. We also derive the recursive equatio for the probability distributio of a p-ar ad show a dyamic prograrmig algorithm for miig p-ars. We further itroduce additioal pruig techiques i the miig algorithm based o the value of 'fi which reduces the computatio required to determie whether a cadidate is a probabilistic associatio rule. The rest of the paper is orgaized as follows: Sectio II itroduces the ucertai database model used i this paper, probabilistic support based o possible world sematics, ad the probabilistic frequet itemset. This sectio also icludes our defiitio of a probabilistic associatlo rule usig two ratios. Sectio III aalyzes the probability distributio of associatio rules ad presets a theorem of pruig whe 'fi <. Sectio IV presets the dyamic programmig algorithm to determie whether a cadidate is a probabilistic associatio rule ad a algorithm for eumeratig cadidates, both with pruig. Sectio V describes the results of our prelimiary evaluatio of our miig algorithm. Sectio VI provides a discussio of related ad future work. Sectio VII cocludes the paper. II. A. Ucertai Database Model PRELIMINARIES A ucertai database of size deoted as T is a set of trasactios, T = {t 1,..., t}. A trasactio t j (1 :s; j :s; ) is a set of items, each of which has a o-zero probability o < p :s; 1 that it exists i the trasactio tj. That is, p = Pr(x E tj) for ay item x cotaied i tj (x E tj). The items i a trasactio with a existetial probability betwee ad 1 are called ucertai items, whereas the itemset with existetial probability 1 are called certai items. Whe defiig probabilistic frequet itemsets ad associatio rules from ucertai databases, the priciples ad theories of possible world sematics must be used. Uder possible world sematics, a possible world is a certai database istatiated from the ucertai database by icludig or ot icludig every ucertai item. If the assumptio of idepedece betwee the trasactios i T, as well as the items i each trasactio holds, the probability of each possible world w ca be calculated as follows: Pr(w) = II (II Pr(x E t f ). II (1- Pr(x E t f ))) (1) tet(w) xet x<1-t where T( w) is the certai database of possible world w, t is a trasactio i T( w), t f is the correspodig trasactio i the ucertai database T from which t is istatiated, ad x is a ucertai item of t f. I the traditioal certai database model, a itemset is either cotaied i a trasactio or ot. I the ucertai database model, we caot say itemset X is cotaied i trasactio tj with certaity if X cotais at least oe ucertai item, eve though X <;;; tj. What we have is a probability that X is cotaied i trasactio tj, Pr(X <;;; tj). Accordig to
2 possible world sematics, this is the margial probability that X is cotaied i tj as follows Pr(w) (2) where W is the set of all possible worlds ad tj (w) is the j-th trasactio i the certai database of possible world w. From (1) ad (2), this margial probability ca be expressed as follows [3]: Pr(X <:;; tj) = II Pr(x E tj) (3) xex The probability that tj does ot cotai X thus is B. Probabilistic Support ad Frequet ltemsets Give a ucertai database of size, T, ad a itemset X, the probabilistic support of X i T, deoted as Sup(X, T), is a radom variable that ca have values, 1,...,. Accordig to possible world sematics, the probability that Sup(X, T) = i, ::; i ::;, deoted as Pi (X), is the margial probability (4) Pr(w) (5) wew,sup(x,t(w)) = i where Sup(X, T(w)) is the support of X i the certai database T( w) of possible world w, which is the umber of trasactios i T ( w) that cotai X. Pi (X) for i =,..., is the probability mass fuctio (pmt) of the support of X i T ad we have L:7=oPi(X) = l. Let S be a set of i trasactios from T. The probability that every trasactio i S cotais itemset X ad every trasactio i T -S does ot cotai X is II Pr(X <:;; t) II (1 -Pr(X <:;; t)) tes tet-s This is obtaied by sulmig the probability of each possible world i which every trasactio i S cotais itemset X ad every trasactio i T -S does ot cotai X, usig (1), (3) ad (4). Sice there are may differet subsets S <:;; T with size i, the probability of Sup(X, T) = i, Pi(X), is the summatio of the probabilities above over differet subsets S [3]: Pi(X) = L II Pr(X <:;; t) II (l -Pr(X <:;; t)) (6) S,; T, IISII i tes tet-s Give a miimum support ratio E (,1), a itemset X is a probabilistic frequet itemset (p-fi) if the probability that Sup(X, T) is greater tha or equal to IITII (i.e. ) is above a miimum probability threshold deoted as miprob. Sice Pi(X) is the pmf of Sup(X, T), the probability that Sup(X, T) is greater tha or equal to is L:7=Il Pi(X), Hece, we say that X is a frequet itemset if ad oly if L:7=Il Pi(X) ;::: miprob. Computig the pmf Pi(X) usig (6) is expesive. Berecker et al. proposed a dyamic programmig algorithm to calculate Pi (X) to mie probabilistic frequet itemsets efficietly i [3]. C. Probabilistic Associatio Rules (p-ars) Give a ucertai database T of size, a miimum frequecy E (,1), a miimum cofidece T) E (,1), ad a miimum probability threshold miprob, X =? Y is a probabilistic associatio rule (p-ar) if ad oly if X Y =, ad the probability that Sup(X U Y, T) ;::: ad Sup(XUY, T) ;::: T)Sup(X, T) is greater or equal to miprob. Accordig to possible worlds sematics, the probability that Sup(X UY, T) ;::: ad Sup(X UY, T) ;::: T)Sup(X, T), deoted as P(X, Y, C T), T), is the sum of the probability of all the small possible worlds w satisfyig Sup(XUY, T(w)) ;::: ad Sup(XUY, T(w)) T)Sup(X, T(w)), where T(w) is the certai database of small world w. That is, P(X, Y,, T), T) = Pr(w) we W, Sup(X U Y, T(w»), Sup(X U Y, T(w)) 2: "Sup(X, T(w)) (7) The problem of miig probabilistic associatio rules, is to fid all the probabilistic associatio rules from a ucertai base T, give the miimum frequecy, the miimum cofidece T), ad the miimum probability threshold miprob. III. PROBABILITY DISTRIBUTION FOR ASSOCIATION RULES Calculatig P(X, Y, C T), T) usig (7) is ifeasible, as the umber of possible worlds are too umerous. Thus, we eed to fid aother way to calculate it. Let us cosider the probability that a trasactio tj cotais both X ad Y with X Y = : Pr(X uy <:;; tj) II Pr(z E tj) zexuy II Pr(x E tj) II Pr(y E tj) xex The third equality above is due to X Y = ad the fourth follow (3). yey Pr(X <:;; tj )Pr(Y <:;; tj) (8) ad the secod The probability that a trasactio tj cotais X ad but ot Y is, thus, Pr(X <:;; tj 1\ Y tj) = Pr(X <:;; t)(l -Pr(Y <:;; t)) (9) because Pr(X <:;; tj 1\ Y t j) + Pr(X <:;; tj 1\ Y <:;; t = j) Pr(X <:;; tj). Cosider the possible worlds w where j trasactios cotai both X ad Y ad aother i trasactios cotai X but ot Y. These j + i trasactios are the trasactios cotaiig X ad the rest -(j +i) trasactios do ot cotai X. Thus, the support of X ad Xu Y i theses possible worlds are j + i ad j, respectively. Let S1 ad S2 be the sets of the trasactios that cotai X but ot Y, ad X ad Y, respectively ad we have IISIII = i ad IIS211 = j. Let Sl ad S2 also deote the same sets of trasactios i the ucertai database. The, the probability that the i trasactios of S1 cotai X but ot Y,
3 j j o (1 - 'I]) (a) whe rt > t; (b) whe rt :S t; Fig. 1. Probability Distributio of Pi,j (X, Y) the j trasactios of S2 cotai both X ad Y, ad the rest of -(j + i) trasactios do ot cotai X is as follows: II Pr(X <:;; t)(l -Pr(Y <:;; t)). II Pr(X <:;; t)pr(y <:;; t). II tet-5s-51 accordig to (9), (8) ad (4). (1 -Pr(X <:;; t)) Further, the probability that there are j trasactios i T cotaiig both X ad Y ad i trasactios cotaiig X but ot Y, or equivaletly the Sup(X U Y, T) = j ad Sup(X, T) -Sup(X U Y, T) = i, deoted as Pi, j(x, Y), is give as follows: Pi, j(x, Y ) II Pr(X <:;; t)(l -Pr(Y <:;; t)). 31 C;; T, te = i, 32 C;; T, = j II Pr(X <:;; t)pr(y <:;; t). II (1 -Pr(X <:;; t)) (1) Obviously, j + i :s: ad :s: i,j :s:. Pi, j (X, Y) is actually the pmf of the umber of trasactios cotaiig X but ot Y ad the umber of trasactios cotaiig both X ad Y. Also, we have L Pi, j(x, Y) = 1. O::;i, j::;, j+i::; Therefore, the probability that Sup(X U Y, T)?: ad Sup(X U Y, T)?: T)Sup(X, T), (deoted as P(X, Y,, T), T» is the sum of those Pi, j (X, Y) satisfyig both j?: ad j?: T)( j + i). Sice j?: T)(j + i) is equivalet to j?: 1: 1) i, we have P(X, Y,, T), T) = ::; j ::;, j + i ::; j2:: i,i2:: Pi, j(x, Y) (11) Figure I(a) shows the poits where Pi, j (X, Y) is defied. The shaded area is the itersectio of the two triagles (1) j?:, i?: ad j + i :s: ad (2) j?: 1: 1) i, i?: ad j + i :s:. The summatio of the Pi, j(x, Y) of the poits i the shaded area gives P(X, Y, C T), T) as show i (11). X =} Y is a probabilistic associatio rule if ad oly if this P(X, Y,, T), T) is greater or equal to the miimum probability threshold miprob. Notice that the itersectio of lies j = ad j+i = -2ryi is (i, j) = ((1 -T)), T)). Thus, whe T) :s:, the itersectio of the two triagles is the first triagle j?:, i?: ad j + i :s: itself as illustrated i Figure I(b). The summatio of the Pi, j (X, Y) i this triagle is the probability that j = Sup(X U Y, T)?:. If X U Y is a p-fi with respect to defied i Sectio II-B ad T) :s:, the summatio of Pi, j (X, Y) i this triagle is greater tha or equal to miprob. Thus, P(X, Y, C T), T)?: miprob ad X =} Y is a p-ar with respect to ad T). We sulmarize this fidig i the followig theorem. Theorem 1: If itemset Z is a probabilistic frequet itemset with respect to the miimum frequecy ad the miimum probability threshold miprob ad the miimum cofidece T) is less tha or equal to, X =} Y is a probabilistic associatio rule with respect to, T) ad miprob for ay X C Z ad Y = Z -X. Proof If Z = X U Y is a probabilistic frequet itemset with respect to ad miprob, the the probability that Sup(X U Y, T)?:, where = IITII, is greater tha or equal to miprob. The probability that Sup(X U Y, T)?: is equal to L i2:,j+ is;, Pi, j(x,y). We thus have ::; j ::; L i >, j + i <, ::; j ::;- Pi, j(x, Y)?: miprob.
4 = We ow show that j?:, j + i ::; ad?: T) implies j?: I'/ First we have i::; - j::; - = (1 -O. Sice j?: ad?: T), we have t > _ = > _ T) i -(1 -O T) which gives j?: -2rii. Sice j?:, j + i::; ad?: T) implies j?: -2rii, we have i 2':, j + i, ' ::s:: j ::s::, j 2': -2ryi Pi,j(X, Y) 'i 2:::, j + i :-:;, :::; j :::; > miprob. Pi,j(X, Y) Thus, X =} Y is a probabilistic associatio rule with respect to, T) ad miprob. Theorem 1 tells us that if T) ::; ad Z is a p-fi, we ca simply eumerate all the subset X of Z ad every X =} Z -X is a p-ar; thus, we eed oly mie p-ars whe T) >. IV. MINING PROBABILISTIC ASSOCIATION RULES A. Dyamic Programmig Algorithm The complexity of computig Pi,j(X, Y) directly by (1) is expoetial with. We ca use dyamic programmig method to reduce it to O(3). Let Tk be a ucertai database made of the first k trasactios {tl,..., td from T. That is, To = ad Tk = Tk-I U {tk} for k = 1,...,. Let the probability that there are j trasactios i Tk cotaiig both X ad Y ad i trasactios cotaiig X but ot Y, or equivaletly Sup(XUY, Tk l = j ad Sup(X, Tk) -SUp(XUY, Tk) = i, be deoted as p (X, Y). Obviously, i + j::; k ad ::; i,j::; k i p (X, Y) ad we have P (k i,j ) (X, Y) = 1. ""5. i, j ""5. k,j+i""5. k (12) That is, all the p (X, Y) outside of the triagle of ::; i, j::; k,j+i::; k are zeros. Notice that P)(X, Y) is the Pi,j(X, Y) give i (1). Sice Tk = Tk-I U {td, p(x, Y) ca be calculated from p -I )(X, Y) usig the followig recursio: P (k i, j ) (X, Y) p,(x, Y)Pr(X tk)(1 -Pr(Y tk)) p-=- I )(X, Y)Pr(X tk)pr(y tk) + p -I )(X, Y)(I -Pr(X tk)) (13) with pl!'i)x, Y) = ad P2I (X, Y) = due to (12). This is because trasactio tk either cotais both X ad Y, or cotais X but ot Y, or does ot cotai X at all. The recursive equatio (13) above is derived from the Law of Total Probability ad the fact that trasactio tk is idepedet from all the trasactios i Tk-I. I particular, whe the case is that Tk has j trasactios cotaiig both X ad Y ad i trasactios cotaiig X but ot Y, oly oe of the followig situatios will be true: (1) tj does ot cotai X ad Tk-I has j trasactios cotaiig both X ad Y ad i trasactios cotaiig X but ot Y, (2) tj cotais both X ad Y ad Tk-I has j -1 trasactios cotaiig both X ad Y ad i trasactios cotaiig X but ot Y, or (3) tj cotais X but ot Y ad Tk-I has j trasactios cotaiig both X ad Y ad i -I trasactios cotaiig X but ot Y. Note that the values of p -I \X, Y) are oly used to calculate the values of p (X, Y). Oce used, they do ot eed to exist. Thus, we ca use two 2-dimesioal data arrays ( fo[i,j]) ad (h[i,j]) to store values of p(x, Y) for eve ad odd k, respectively. The followig triple-ested loop will calculate ad leave p)(x, Y) i f mod 2[i,j]: { 1 if i = Iitialize array fo with fo [i, j j] = o' therwlse Iitialize array h with zero for all elemets; for k = for i =, for j = l, k, k -i tmp = fk-i mod 2[i,j](1 -pr); if (i > ) tmp += f(k-i ) mod 2[i-1,j]pr(1-pk); if (j > ) tmp += f(k-i ) mod 2[i,j -l]prpk; fk mod 2[i,j] = tmp; where pr ad Pk are the shorthads for Pr(X tk) ad Pr(Y tk), respectively. The ested loop above is the dyamic programmig method to calculate p (X, Y) for k = 1,..., usig (13). The if statemets i the loop body are to deal with boudary coditios for p (X, Y), where i = or j = which require values PI!'IJ) (X, Y) ad p/) (X, Y). These values should be all zeros, because either of i ad j ca be egative. Iteratio k (k = 1,...,) of the outer-most loop calculates each p (X, Y) ad stores them i ik mod 2[i, j] i the triagle delieated by ::; i, j::; k, j+ i::; k. It uses the p -I )(X, Y) values calculated by the previous iteratio k - 1 ad stores them i f(k-i ) mod 2[i,j] i the triagle delieated by ::; i,j ::; k -l,j + i ::; k - 1 (whe k > 1), as well as the iitial values stored i the f(k-i ) mod 2[i,j] o the diagoal of j + i = k. The iitial values o the diagoal j + i = k of f(k-i ) mod 2[i,j] are p -I )(X, Y) with j + i = k > k -1. They should be zeros, because j + i caot exceed k -1 i p -I )(X, Y). (Also see (12». The iitial value of fo[o,o] is P6 6(X, Y). Accordig to (12), this value should be 1. Thus, the array fo should be iitialized as follows.. { I if i = j = fort,] otherwise ad h should be iitialized with all zero. B. Pruig To determie whether X =} Y is a p-ar with respect to,t), ad miprob (whe T) > O-we ca either sum the values of Pi, j (X, Y) of the poits i the shaded area as i Figure l(a) ad see if the sum is greater tha or equal to miprob, or sum the values of Pi, j (X, Y) of the poits i the =
5 = u-shaded area to see if the sum is less tha or equal to 1 - miprob. The itersectio of the lies j+i = ad j = l rj i is (i,j) = ((I-7]),7]) as show i Figure lea). To compute the sum of the values of Pi,j (X, Y) i the shaded area, we oly eed to calculate the values of Pi,j (X, Y) i the trapezoid bouded by i =, i = (1-7]), j =, ad j = - i. If we wat to calculate the sum of the values OfPi,j(X, Y) i the ushaded area, we oly eed to calculate the values of Pi,j (X, Y) i the trapezoid bouded by j =, j = 7],, ad i = i = - j. We wat to chose the smaller trapezoid. The areas of trapezoids for computig the shaded ad u-shaded areas are ( l- rj2) 2 d ( 2 rj - rj2) 2. I 2 a 2 respective y. Th us, I 'f 2 7] > _ 1, l.e.. ' 7]?:.5, we chose to calculate the trapezoid for the shaded area. Otherwise, we calculate the trapezoid for the u-shaded area. The dyamic programmig algorithm with pruig to determie whether X =} Y is a p-ar is show i Figure 2. It is assumed that fuctio p-ar(x, Y, C 7], miprob) is called oly whe X Y = ad X U Y is a p-fi with respect to ad miprob. C. Fidig Associatio Rules For X =} Y with X Y = to be a p-ar with respect to, 7] ad miprob, X U Y has to be a p-fi with respect to ad miprob. We use the apriori set eumeratio [7] ad Berecker's dyamic programmig algorithm [3] to fid probabilistic frequet itemsets (p-fis) Z with respect to ad miprob. For each p-fi Z, we use a algorithm to eumerate the cadidates X, Y with Xu Y = Z ad X Y =. Figure 3 shows the eumeratio tree of p-ar cadidates X, Y for p-fi Z = abed. At each iteratio of the eumeratio, a item is moved from atecedet X which is less tha all the items i cosequet Y ad add it to Y, to be a child of X, Y. Formally, let X = Xl... X ad Y = YI... Ym ad items i X ad Y are all ordered accordig to the total order of the items, i.e. Xi < Xj ad Yi < Yj for i < j. The X, Y has oly those childre Xl... Xi-IXi+1... X, XiYI... Ym such that Xi < YI Notice that the cosequets i each of the descedats of X, Y are supersets of Y. Accordig to the ati-mootoicity priciple proved by [8], if X =} Y is ot a p-ar, oe of the descedats of X, Y would be a p-ar ad, thus, the etire subtree rooted at ode X, Y ca be prued out. V. PERFORMANCE EVALUATION Here some prelimiary tests are dissemiated, whose results were culled from a implemetatio of the p-ar miig algorithm described above. The ucertai database we used was coverted from the certai database che s s from the Frequet Itelset Miig Dataset Repository fimi.ua.ac.be/data/. The trasformatio of the che s s database was performed as follows: for each item i each trasactio, we assig a radom probability P betwee o ad 1 draw from Beta distributio with ex = 1 ad (3 = 1, to be used as the item's existetial probability. This procedure was ru five times to produce five radom datasets. Each experimet (data poit i the performace evaluatio) was ru o all five datasets ad the results of each averaged. To reduce the amout of time eeded to ru each experimet, the total umber of uique items was reduced to 14, oly the boolea p-ar(x, Y,, 7], miprob) { } if (7] :s; ) retur true; Iitialize array fo with fo [i, j] = { I if i = j. o th erwlse Iitialize array h with zero for all elemets; float sum = ; if (7]?:.5) for k = for i = for j = l, O,mi(k, l(1-7])j), k - i tmp = fk-l mod 2[i,j](1 -p; if (i > ) tmp =+ f ( k-l ) mod 2[i-l,j]p,; (1 -p; if (j > ) tmp =+ f ( k-l ) mod 2[i,j -1]p';Pk; fk mod 2 [i, j] = tmp; if (k = /\ j?: i7] l/\ j?: i2ry i l ) sum += fk mod 2[i,j]; if (sum?: miprob) retur true; ed if retur false; else for k = for i = for j = l,, O,mi(k -i, l7]j) tmp = fk-l mod 2[i,j](1 -p,;); if (i > ) tmp =+ f ( k-l ) mod 2[i-l,j]p,; (1 -p; if (j > ) tmp =+ f ( k-l ) mod 2[i,j -1]p';Pk; fk mod 2 [i, j] = tmp; if (k = /\ (j < l7] J V j < l2ry ij ) sum += fk mod 2[i,j]; if (sum> 1 - miprob) retur false; ed if retur true; ed if Fig. 2. Fig. 3. Dyamic Programmig Algorithm abed, abc, d abd, e aed, b bcd, a ab, cd ae, bd be, ad ad, be bd, ae cd, ab a, bed b, aed e, abd d, abc, abed Eumeratio of AR Cadidates
6 TABLE I. AVERAGE TIME IN p-aro (MS) ' first 1 trasactios were used, ad oly rules X =} Y of total legth 6 (legth of X plus the legth of Y) or less were cosidered as possible p-ars. The aforemetioed restrictios were ecessary because miig probabilistic associatio rules is extremely time-expesive: to fid all probabilistic frequet itemsets is already expoetial. The, to eumerate allocatio rule cadidates for each frequet itemset as show i Figure 3 is expoetial with respect to the legth of the frequet itemset. Berecker et al.'s dyamic programmig algorithm [3] is used to fid probabilistic frequet itemsets. For each probabilistic frequet itemset foud, we follow the eumeratio i Figure 3 with apriori pruig ad call fuctio p-aro i Figure 2 to determie whether a cadidate is a p-ar. We measured two times i our aalysis: (1) the total time of miig all probabilistic associatio rules ad (2) the average time of ruig fuctio p-aro i Figure 2. The tests are ru o a Dell Precisio T75 machie with Itel Xeo CPU (E562) 2.4 GHz wi 4 Cores ad 48GB RAM. I all experimets we fix miprob =.9, ad vary (from.1 to.9 i.1 icremets) ad 'T} (from.1 to.95 i.5 icremets). I Table I the average time i millisecods eeded to determie whether a cadidate is a probabilistic associatio rule (i.e., the average time take to execute the fuctio p AR()) for various values of 'T} ad. I the table, oe sees the drastic effect of pruig accordig to the value of 'T} with respect to. I particular, oe sees that whe 'T} <, that the time is virtually zero, as the fuctio p-aro simply returs t rue. Oe also sees the effect of the differig areas of the trapezoid beig computed whe 'T} >. For =.1,.2,.3 ad.4, times all peeked at 'T} =.45. We see that the time is decreasig as 'T} decreases from.45 to. This is because the trapezoid coverig the ushaded area, which is computed whe 'T} <.5, is decreasig as 'T} decreases. The time is also decreasig as 'T} icreases from.5 to.95. This is because the trapezoid coverig the shaded area, which is computed whe 'T} 2:.5, is decreasig as 'T} icreases. We also see the time is decreasig as 'T} icreases from to.95 for =.5,...,.9 for the same reaso. Table II shows the total average time of miig probabilistic frequet itemsets (p-fis) ad probabilistic associatio rules (p-ars) i secods. Oe ca see that whe 'T} is held costat, TABLE II. AVERAGE TOTAL EXECUTION TIME (SECS) ' I I I I the total miig time decreases over icreasig values of, because for lower values of far more p-fis are geerated. Whe is held costat, we see agai that for values of > 'T}, the total time is small because of the ear zero time eeded to retur t rue from p-aro; however, we see that as 'T} icreases past, total time decreases as the umber of cadidate p-ars decreases. For example, for =.1, the total miig time decreases as 'T} icreases from.25 to.95. This is because with a higher 'T} fewer cadidate p-ars are geerated. The low total time for 'T} =.1 is because the ruig time of fuctio p-aro is very ear zero. Similarly for =.4, we see that the total time decreases as 'T} icreases from.45 to.95. The total times for.1 'T}.45 are low; agai, because of the ear zero time eeded to retur true from p-aro. VI. RELATED AND FUTURE WORK Most work i miig ucertai data based o possible world sematics are cetered aroud fidig probabilistic frequet itemsets [3], [6], [8] or probabilistic frequet closed itemsets [4], [5]. I [8], Su et al. preseted a algorithm for miig probabilistic associatio rules. The algorithm dissemiated i [8], uses parameter misup, a iteger betwee
7 o ad the size of database for the ummum support, ad micof (7] i our paper), a real umber betwee ad 1 for the miimum cofidece, to defie a probabilistic associatio rule, whereas we use two ratios (miimum frequecy) ad 7] (miimum cofidece), both betwee ad 1. As a cosequece, we are able to fid the mathematical relatioship betwee the two parameters (Theorem 1) by aalyzig the probability distributio space, ad use it to prue the p-ar search space sigificatly whe 7]. We further prue the search space by choosig the smaller area of the probability distributio space depedig o the value of 7], whereas [8] always calculates the probability distributio i the shaded area o matter how small 7] is. The associatio rule probability distributio i [8] is calculated from itemset support probability distributio, usig de-covolutio through Fast Fourier Trasformatio. I this paper, we use the two-dimesioal dyamic programmig approach to calculate a cadidate associatio rule's probability distributio directly. The theoretical time complexity of the algorithm i [8] is ( 2 Jog), where ours is (3) ad ( 1), whe 7] > ad 7], respectively. Eve whe 7] >, sice our algorithm has deep pruig with 7] close to 1 or, it is our belief that our algorithm will perform well, particularly whe 7] is close to 1 or O. Oe future work, is to compare the performace betwee the two algorithms usig practical experimets-usig databases of differet sizes ad usig varyig thresholds. [8] Liwe Su, Reyold Cheg, David W. Cheug, ad Jiefeg Cheg. Miig ucertai data with probabilistic guaratees. I Proceedigs of the 16th ACM SIGKDD Iteratioal Coferece o Kowledge Discovery ad Data Miig, KDD ' 1, pages , New York, NY, USA, 21. ACM. VII. CONCLUSION I this paper, we defied probabilistic associatio rules (p-ars) usig two miimum ratios, miimum frequecy ad miimum cofidece 7]. We thoroughly aalyzed the probability distributio space for a cadidate associatio rule, ad proposed a efficiet dyamic programmig algorithm with pruig for miig probabilistic associatio rules. [l] REFERENCES CK Chui, B Kao, ad E Hug. Miig frequet itemsets from ucertai data. Advaces i Kowledge Discovery ad Data Miig, pages 47-58, 27. [2] C Chui ad B Kao. A decremetai approach for miig frequet itemsets from ucertai data. PAKDD '8, Proceedigs of the 12th Pacific-Asia coferece o Advaces i kowledge discovery ad data miig, pages 64-75, Ja 28. [3] Thomas Bemecker, Has-Peter Kriegel, Matthias Rez, Floria Verhei, ad Adreas Zuefte. Probabilistic frequet itemset miig i ucertai databases. I Proceedigs of the 15th ACM SIGKDD iteratioal coriferece o Kowledge discovery ad data miig, KDD '9, pages , New York, NY, USA, 29. ACM. [4] Peiyi Tag ad Erich A. Peterso. Miig probabilistic frequet closed itemsets i ucertai databases. I Proceedigs of the 49-th Aual Associatio for Computig Machiery Southeast Coferece (ACMSE'll), pages 86-91, Keesaw, GA, USA, March 211. [5] Yogxi Tog, Lei Che, ad Boli Dig. Discoverig threshold-based frequet closed itemsets over probabilistic data. I Data Egieerig (ICDE), 212 IEEE 28th Iteratioal Coferece o, pages , 212. [6] Liag Wag, D.W.-L. Cheug, R. Cheg, Sau-Da Lee, ad X.S. Yag. Efficiet miig of frequet item sets o large ucertai databases. Kowledge ad Data Egieerig, ieee Trasactios o, 24(12): , 212. [7] Rakesh Agrawal ad Ramakrisha Srikat. Fast algorithms for miig associatio rules. I Proceedigs of the 2th Iteratioal Coferece o Very Large Data Bases, pages , 1994.
Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary
Recursive Algorithm for Geeratig Partitios of a Iteger Sug-Hyuk Cha Computer Sciece Departmet, Pace Uiversity 1 Pace Plaza, New York, NY 10038 USA scha@pace.edu Abstract. This article first reviews the
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationExercises Advanced Data Mining: Solutions
Exercises Advaced Data Miig: Solutios Exercise 1 Cosider the followig directed idepedece graph. 5 8 9 a) Give the factorizatio of P (X 1, X 2,..., X 9 ) correspodig to this idepedece graph. P (X) = 9 P
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationRandomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)
Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationRecursive Algorithms. Recurrences. Recursive Algorithms Analysis
Recursive Algorithms Recurreces Computer Sciece & Egieerig 35: Discrete Mathematics Christopher M Bourke cbourke@cseuledu A recursive algorithm is oe i which objects are defied i terms of other objects
More information1 Hash tables. 1.1 Implementation
Lecture 8 Hash Tables, Uiversal Hash Fuctios, Balls ad Bis Scribes: Luke Johsto, Moses Charikar, G. Valiat Date: Oct 18, 2017 Adapted From Virgiia Williams lecture otes 1 Hash tables A hash table is a
More informationCS / MCS 401 Homework 3 grader solutions
CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationKinetics of Complex Reactions
Kietics of Complex Reactios by Flick Colema Departmet of Chemistry Wellesley College Wellesley MA 28 wcolema@wellesley.edu Copyright Flick Colema 996. All rights reserved. You are welcome to use this documet
More informationOPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES
OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES Peter M. Maurer Why Hashig is θ(). As i biary search, hashig assumes that keys are stored i a array which is idexed by a iteger. However, hashig attempts to bypass
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory
1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.
More informationAnalysis of Algorithms. Introduction. Contents
Itroductio The focus of this module is mathematical aspects of algorithms. Our mai focus is aalysis of algorithms, which meas evaluatig efficiecy of algorithms by aalytical ad mathematical methods. We
More informationMetric Space Properties
Metric Space Properties Math 40 Fial Project Preseted by: Michael Brow, Alex Cordova, ad Alyssa Sachez We have already poited out ad will recogize throughout this book the importace of compact sets. All
More informationA Block Cipher Using Linear Congruences
Joural of Computer Sciece 3 (7): 556-560, 2007 ISSN 1549-3636 2007 Sciece Publicatios A Block Cipher Usig Liear Cogrueces 1 V.U.K. Sastry ad 2 V. Jaaki 1 Academic Affairs, Sreeidhi Istitute of Sciece &
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationLecture 9: Hierarchy Theorems
IAS/PCMI Summer Sessio 2000 Clay Mathematics Udergraduate Program Basic Course o Computatioal Complexity Lecture 9: Hierarchy Theorems David Mix Barrigto ad Alexis Maciel July 27, 2000 Most of this lecture
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationThe picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled
1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how
More informationRandom Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.
Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationIP Reference guide for integer programming formulations.
IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more
More informationChapter 4. Fourier Series
Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,
More informationConfidence interval for the two-parameter exponentiated Gumbel distribution based on record values
Iteratioal Joural of Applied Operatioal Research Vol. 4 No. 1 pp. 61-68 Witer 2014 Joural homepage: www.ijorlu.ir Cofidece iterval for the two-parameter expoetiated Gumbel distributio based o record values
More informationLecture 2. The Lovász Local Lemma
Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio
More informationDesign and Analysis of Algorithms
Desig ad Aalysis of Algorithms Probabilistic aalysis ad Radomized algorithms Referece: CLRS Chapter 5 Topics: Hirig problem Idicatio radom variables Radomized algorithms Huo Hogwei 1 The hirig problem
More informationAxioms of Measure Theory
MATH 532 Axioms of Measure Theory Dr. Neal, WKU I. The Space Throughout the course, we shall let X deote a geeric o-empty set. I geeral, we shall ot assume that ay algebraic structure exists o X so that
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More information(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3
MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special
More information62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +
62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of
More informationOn forward improvement iteration for stopping problems
O forward improvemet iteratio for stoppig problems Mathematical Istitute, Uiversity of Kiel, Ludewig-Mey-Str. 4, D-24098 Kiel, Germay irle@math.ui-iel.de Albrecht Irle Abstract. We cosider the optimal
More informationRank Modulation with Multiplicity
Rak Modulatio with Multiplicity Axiao (Adrew) Jiag Computer Sciece ad Eg. Dept. Texas A&M Uiversity College Statio, TX 778 ajiag@cse.tamu.edu Abstract Rak modulatio is a scheme that uses the relative order
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationThe Random Walk For Dummies
The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationZeros of Polynomials
Math 160 www.timetodare.com 4.5 4.6 Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered with fidig the solutios of polyomial equatios of ay degree
More informationA New Recursion for Space-Filling Geometric Fractals
A New Recursio for Space-Fillig Geometric Fractals Joh Shier Abstract. A recursive two-dimesioal geometric fractal costructio based upo area ad perimeter is described. For circles the radius of the ext
More information1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable
More informationTEACHER CERTIFICATION STUDY GUIDE
COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra
More informationBertrand s Postulate
Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a
More informationROLL CUTTING PROBLEMS UNDER STOCHASTIC DEMAND
Pacific-Asia Joural of Mathematics, Volume 5, No., Jauary-Jue 20 ROLL CUTTING PROBLEMS UNDER STOCHASTIC DEMAND SHAKEEL JAVAID, Z. H. BAKHSHI & M. M. KHALID ABSTRACT: I this paper, the roll cuttig problem
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationOn Random Line Segments in the Unit Square
O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,
More informationMarkov Decision Processes
Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More information4.1 SIGMA NOTATION AND RIEMANN SUMS
.1 Sigma Notatio ad Riema Sums Cotemporary Calculus 1.1 SIGMA NOTATION AND RIEMANN SUMS Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each
More informationMath 609/597: Cryptography 1
Math 609/597: Cryptography 1 The Solovay-Strasse Primality Test 12 October, 1993 Burt Roseberg Revised: 6 October, 2000 1 Itroductio We describe the Solovay-Strasse primality test. There is quite a bit
More informationSequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet
More informationClassification of problem & problem solving strategies. classification of time complexities (linear, logarithmic etc)
Classificatio of problem & problem solvig strategies classificatio of time complexities (liear, arithmic etc) Problem subdivisio Divide ad Coquer strategy. Asymptotic otatios, lower boud ad upper boud:
More informationPRACTICE FINAL/STUDY GUIDE SOLUTIONS
Last edited December 9, 03 at 4:33pm) Feel free to sed me ay feedback, icludig commets, typos, ad mathematical errors Problem Give the precise meaig of the followig statemets i) a f) L ii) a + f) L iii)
More informationTopic 5: Basics of Probability
Topic 5: Jue 1, 2011 1 Itroductio Mathematical structures lie Euclidea geometry or algebraic fields are defied by a set of axioms. Mathematical reality is the developed through the itroductio of cocepts
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More informationAnalysis of Algorithms -Quicksort-
Aalysis of Algorithms -- Adreas Ermedahl MRTC (Mälardales Real-Time Research Ceter) adreas.ermedahl@mdh.se Autum 2004 Proposed by C.A.R. Hoare i 962 Worst- case ruig time: Θ( 2 ) Expected ruig time: Θ(
More informationSequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018
CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical
More informationSRC Technical Note June 17, Tight Thresholds for The Pure Literal Rule. Michael Mitzenmacher. d i g i t a l
SRC Techical Note 1997-011 Jue 17, 1997 Tight Thresholds for The Pure Literal Rule Michael Mitzemacher d i g i t a l Systems Research Ceter 130 Lytto Aveue Palo Alto, Califoria 94301 http://www.research.digital.com/src/
More informationsubject to A 1 x + A 2 y b x j 0, j = 1,,n 1 y j = 0 or 1, j = 1,,n 2
Additioal Brach ad Boud Algorithms 0-1 Mixed-Iteger Liear Programmig The brach ad boud algorithm described i the previous sectios ca be used to solve virtually all optimizatio problems cotaiig iteger variables,
More informationLecture 5: April 17, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 5: April 7, 203 Scribe: Somaye Hashemifar Cheroff bouds recap We recall the Cheroff/Hoeffdig bouds we derived i the last lecture idepedet
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationA sequence of numbers is a function whose domain is the positive integers. We can see that the sequence
Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as
More informationCSE 202 Homework 1 Matthias Springer, A Yes, there does always exist a perfect matching without a strong instability.
CSE 0 Homework 1 Matthias Spriger, A9950078 1 Problem 1 Notatio a b meas that a is matched to b. a < b c meas that b likes c more tha a. Equality idicates a tie. Strog istability Yes, there does always
More informationDynamic Programming. Sequence Of Decisions
Dyamic Programmig Sequece of decisios. Problem state. Priciple of optimality. Dyamic Programmig Recurrece Equatios. Solutio of recurrece equatios. Sequece Of Decisios As i the greedy method, the solutio
More informationDynamic Programming. Sequence Of Decisions. 0/1 Knapsack Problem. Sequence Of Decisions
Dyamic Programmig Sequece Of Decisios Sequece of decisios. Problem state. Priciple of optimality. Dyamic Programmig Recurrece Equatios. Solutio of recurrece equatios. As i the greedy method, the solutio
More informationDisjoint set (Union-Find)
CS124 Lecture 7 Fall 2018 Disjoit set (Uio-Fid) For Kruskal s algorithm for the miimum spaig tree problem, we foud that we eeded a data structure for maitaiig a collectio of disjoit sets. That is, we eed
More informationDefinitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.
Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22
CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first
More informationMathematical Induction
Mathematical Iductio Itroductio Mathematical iductio, or just iductio, is a proof techique. Suppose that for every atural umber, P() is a statemet. We wish to show that all statemets P() are true. I a
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationSkip lists: A randomized dictionary
Discrete Math for Bioiformatics WS 11/12:, by A. Bocmayr/K. Reiert, 31. Otober 2011, 09:53 3001 Sip lists: A radomized dictioary The expositio is based o the followig sources, which are all recommeded
More informationChapter 6: Mining Frequent Patterns, Association and Correlations
Chapter 6: Miig Frequet Patters, Associatio ad Correlatios Basic cocepts Frequet itemset miig methods Costrait-based frequet patter miig (ch7) Associatio rules 1 What Is Frequet Patter Aalysis? Frequet
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationPolynomial identity testing and global minimum cut
CHAPTER 6 Polyomial idetity testig ad global miimum cut I this lecture we will cosider two further problems that ca be solved usig probabilistic algorithms. I the first half, we will cosider the problem
More informationQuick Review of Probability
Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter 2 & Teachig
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationENGI 4421 Confidence Intervals (Two Samples) Page 12-01
ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly
More informationDouble Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution
Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationComplex Stochastic Boolean Systems: Generating and Counting the Binary n-tuples Intrinsically Less or Greater than u
Proceedigs of the World Cogress o Egieerig ad Computer Sciece 29 Vol I WCECS 29, October 2-22, 29, Sa Fracisco, USA Complex Stochastic Boolea Systems: Geeratig ad Coutig the Biary -Tuples Itrisically Less
More informationWHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT
WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? Harold G. Loomis Hoolulu, HI ABSTRACT Most coastal locatios have few if ay records of tsuami wave heights obtaied over various time periods. Still
More information4.1 Sigma Notation and Riemann Sums
0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas
More informationQuick Review of Probability
Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter & Teachig Material.
More information( ) = p and P( i = b) = q.
MATH 540 Radom Walks Part 1 A radom walk X is special stochastic process that measures the height (or value) of a particle that radomly moves upward or dowward certai fixed amouts o each uit icremet of
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More information6. Sufficient, Complete, and Ancillary Statistics
Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary
More information