Tree Pattern Aggregation for Scalable XML Data Dissemination

Size: px
Start display at page:

Download "Tree Pattern Aggregation for Scalable XML Data Dissemination"

Transcription

1 Tree Pttern Aggregtion for Slle XML Dt Dissemintion Chee-Yong Chn, Wenfei Fn Λ, Psl Feler y, Minos Groflkis, Rjeev Rstogi Bell Ls, Luent Tehnologies Astrt With the rpi growth of XML-oument trffi on the Internet, slle ontent-se issemintion of XML ouments to lrge, ynmi group of onsumers hs eome n importnt reserh hllenge. To inite the type of ontent tht they re intereste in, t onsumers typilly speify their susriptions using some XML pttern speifition lnguge (e.g., XPth). Given the lrge volume of susriers, system slility n effiieny mnte the ility to ggregte the set of onsumer susriptions to smller set of ontent speifitions, so s to oth reue their storgespe requirements s well s spee up the oumentsusription mthing proess. In this pper, we provie the first systemti stuy of susription ggregtion where susriptions re speifie with tree ptterns (n importnt sulss of XPth expressions). The min hllenge is to ggregte n input set of tree ptterns into smller set of generlize tree ptterns suh tht: (1) given spe onstrint on the totl size of the susriptions is met, n (2) the loss in preision (ue to ggregtion) uring oument filtering is minimize. We propose n effiient tree-pttern ggregtion lgorithm tht mkes effetive use of oument-istriution sttistis in orer to ompute preise set of ggregte tree ptterns within the llotte spe uget. As prt of our solution, we lso evelop severl novel lgorithms for tree-pttern ontinment n minimiztion, s well s lest-upper-oun omputtion for set of tree ptterns. These results re of interest in their own right, n n prove useful in other omins, suh s XML query optimiztion. Extensive results from prototype implementtion vlite our pproh. 1 Introution XML (extensile Mrkup Lnguge) [16] hs eome the ominnt stnr for t enoing n exhnge Λ Currently on leve from Temple University n supporte in prt y NSF Creer Awr IIS y Current ffilition: Institut EURECOM, Sophi Antipolis, Frne Permission to opy without fee ll or prt of this mteril is grnte provie tht the opies re not me or istriute for iret ommeril vntge, the VLDB opyright notie n the title of the pulition n its te pper, n notie is given tht opying is y permission of the Very Lrge Dt Bse Enowment. To opy otherwise, or to repulish, requires fee n/or speil permission from the Enowment. Proeeings of the 28th VLDB Conferene, Hong Kong, Chin, 22 on the Internet, inluing e-business trnstions in oth Business-to-Business (B2B) n Business-to-Consumer (B2C) pplitions. Given the rpi growth of XML trffi on the Internet, the effetive n effiient elivery of XML ouments hs eome n importnt issue. Consequently, there is growing interest in the re of XML ontent-se filtering n routing (e.g., [4]), whih resses the prolem of effetively ireting high volumes of XML-oument trffi to intereste onsumers se on oument ontents. Unlike onventionl routing, where pkets re route se on limite, fixe set of ttriutes (e.g., soure/estintion IP resses n port numers), ontent-se routing is se on generl ptterns of the oument ontents, whih is signifintly more flexile n emning. Consumers typilly speify their susriptions, initing the type of XML ontent tht they re intereste in, using some XML pttern speifition lnguge (e.g., XPth [15]). For eh inoming XML oument, ontent-se router mthes the oument ontents ginst the set of susriptions to ientify the (su)set of intereste onsumers, n then routes the oument to them. Thus, in ontent-se routing, the estintion of n XML oument is generlly unknown to the t prouer, n is ompute ynmilly se on the oument ontents n the tive set of susriptions. Effetive support for slle, ontent-se XML routing is ruil to enling effiient n timely elivery of relevnt XML ouments to lrge, ynmi group of onsumers. Given the lrge volume of potentil onsumers, system slility n effiieny mte the ility to juiiously ggregte the set of onsumer susriptions to smller set of ontent speifitions. The gol, of ourse, is to oth reue the susriptions storge spe requirements (e.g., so tht the routing tle fits in min memory), s well s spee up the filtering of inoming XML trffi. For instne, ore router in B2B pplition my hoose to ggregte susriptions se on geogrphil lotion, ffilition, or omin-speifi informtion (e.g., teleommunitions). Susription ggregtion essentilly involves ggregting n initil set of susriptions S into smller set A suh tht ny oument tht mthes some susription in S lso mthes some susription in A. However, sine there is typilly loss of preision ssoite with suh ggregtion, the ouments mthe y the ggregte set A is, in generl, superset of those mthe y the originl set S. As result, oument my e route to onsumers who hve not susrie to it, thus resulting in n inrese in the mount of unwnte

2 * Bh CD SONY () p CD Bh () p CD Bh () p Bh () p CD Bh CD SONY Clssil Jzz Pop (e) T Figure 1: Exmple Tree Ptterns n XML Doument Tree. oument trffi. In orer to voi suh spurious forwring of ouments, it is esirle to minimize the numer of suh flse mthes (i.e., minimize the loss in preision) with respet to the given spe onstrint for the ggregte susriptions. So fr, there hs only een limite work on susription ggregtion, minly for very simple susription moels. For exmple, in [12], eh susription is set of ttriute-preite pirs (e.g., fissue = GE ; prie < 12; volume > 1g), n n ggregte susription is llowe to ontin wilr vlues, initing the entire set of omin vlues for ertin ttriutes. 1 In this pper, we provie the first systemti stuy of the susription ggregtion prolem where susriptions re speifie using the muh more expressive moel of tree ptterns. Tree ptterns represent n importnt sulss of XPth expressions tht offers nturl mens for speifying tree-struture onstrints in XML n LDAP pplitions [3]. Compre to erlier work se on ttriute/preite-se susriptions, effetively ggregting tree-ptterns poses muh more hllenging prolem sine susriptions involve oth ontent informtion (noe lels) s well s struture informtion (prent-hil n nestor-esennt reltionships). Briefly, our tree pttern ggregtion prolem n e stte s follows: Given n input set of tree ptterns S n spe onstrint, ggregte S into smller set of generlize tree ptterns tht meets the spe onstrint, n for whih the loss in preision ue to ggregtion is minimize. Exmple 1.1 Consier the two similr tree-pttern-se susriptions p n p shown in Figure 1, where p mthes ny oument with root element lele CD tht hs oth su-element lele SONY s well s su-element (with n ritrry lel) tht in turn hs su-element lele Bh ; n p mthes ny oument tht hs some element lele CD with suelement lele Bh. Here the noe lele Λ (wilr) mthes ny lel, while the noe lele == (esennt) mthes some (possily empty) pth. The XML oument T shown in Figure 1(e) mthes (or stisfies) p ut not p euse the su-element lele Bh in 1 Due to spe onstrints, more etile overview of relte work n e foun in the ppenix. T oes not hve prent element lele CD. For effiieny resons, one might wnt to ggregte the set of tree ptterns fp ;p g into single tree pttern. Two exmples of ggregte tree ptterns for fp ;p g re p n p (in Figure 1) sine ny oument tht stisfies p or p lso stisfies oth p n p. Although oth p n p hve the sme numer of noes, p is intuitively more preise thn p with respet to fp ;p g sine p preserves the nestor-esennt reltionship etween the CD n Bh elements s require y p n p. Inee, ny XML oument tht stisfies p lso stisfies p (n thus we sy tht p ontins p ). 2 To the est of our knowlege, our work is the first to ress this timely susription ggregtion prolem for XML t issemintion. Our min ontriutions n e summrize s follows. ffl We stuy the properties of tree ptterns n evelop effiient lgorithms for eiing tree pttern ontinment, minimizing tree pttern, n omputing the most preise ggregte (i.e., the lest upper oun ) for set of ptterns. Our results re not only interesting in their own right, ut lso provie solutions for speil ses of our tree pttern ggregtion prolem. ffl We propose novel, effiient metho tht exploits orse sttistis on the unerlying istriution of XML ouments to ompute preise set of ggregte ptterns within the llotte spe uget. Speifilly, our sheme employs the oument sttistis to estimte the seletivity of tree pttern, whih is lso use s mesure of the pttern s preiseness. Thus, our ggregtion prolem reues to tht of fining ompt set of ggregte ptterns with miniml loss in seletivity, for whih we present greey heuristi. ffl We emonstrte experimentlly the effetiveness of our pproh in omputing spe-effiient n preise set of ggregte tree ptterns. The usefulness of our results on tree ptterns n their ggregtion is not limite to ontent-se routing, ut lso extens to other pplition omins suh s the optimiztion of XML queries involving tree ptterns n the proessing/issemintion of susription queries in multist environment [9] (where ggregtion n e use to reue server lo n network trffi). Further, our work n results re omplementry to reent work on effiient inexing strutures for XPth expressions [2, 6]. The fous of this erlier reserh is to spee up oument filtering with given set of XPth susriptions using pproprite inexing shemes. In ontrst, our work fouses on effetively reuing the volume of susriptions tht nee to e mthe in orer to ensure slility given oune storge resoures for routing. Clerly, our tehniques n e use s pre-proessing step for the inexes of [2, 6] when hr onstrints on the size of the inex must e met. Due to spe limittions, the proofs of ll theoretil results n e foun in the full version of this pper [5].

3 2 Prolem Formultion 2.1 Definitions A tree pttern isnunorerenoe-leletreethtspeifies ontent n struture onitions on n XML oument. More speifilly, tree pttern p hs set of noes, enote y Noes(p), where eh noe v in Noes(p) hs lel, enote y lel(v), whih n either e tg nme, Λ (wilr tht mthes ny tg), or == (theesennt opertor). In prtiulr, the root noe hs speil lel =:. We use Sutree(v; p) to enote the sutree of p roote t v, referre to s su-pttern of p. Some exmples of tree ptterns re epite in Figure 2. To efine the semntis of tree pttern p, wefirstgive the semntis of su-pttern Sutree(v; p), wherev is not the root noe of p. Rell tht XML ouments re typilly represente s noe-lele trees, referre to s XML trees. Let T e n XML tree n t e noe in T. We sy tht T stisfies Sutree(v; p) t noet, enotey (T;t) j= Sutree(v; p), if the following onitions hol: (1) if lel(v) is tg, then t hs hil noe t lele lel(v) suh tht for eh hil noe v of v, (T;t ) j= Sutree(v ;p);(2)iflel(v) =Λ,thent hs hil noe t lele with n ritrry tg suh tht for eh hil noe v of v, (T;t ) j= Sutree(v ;p);n(3)iflel(v) ===, then t hs esennt noe t (possily t = t) suh tht for eh hil v of v, (T;t ) j= Sutree(v ;p). We next efine the semntis of tree ptterns. Let T e n XML tree with root t root,np e tree pttern with root v root. We sy tht T stisfies p, enote y T j= p, iff for eh hil noe v of v root,(1)iflel(v) is tg, thent root is lele with n for eh hil noe v of v, (T;t root ) j= Sutree(v ;p) (here lel(v) speifies the tg of t root ); (2) if lel(v) =Λ, thent root my hve ny lel n for eh hil noe v of v, (T;t root ) j= Sutree(v ;p); (3)iflel(v) = ==, thent root hs esennt noe t (possily t = t root ) suh tht T j= p, where T is the sutree roote t t,np is ientil to Sutree(v; p) exept tht is the lel for the root noe v (inste of lel(v)). Oserve tht v root is trete ifferently fromthe rest of the noes of p. The motivtion ehin this is illustrte y p i in Figure 2, whih speifies the following: for ny XML tree T stisfying p i, its root must e lele with n moreover, it must ontin two onseutive elements somewhere. This nnot e expresse without our speil root lel (s tree ptterns o not llow union opertor). Exmple 2.1 Consier the tree pttern p in Figure 2. An XML oument T stisfies p if its root element stisfies ll the following onitions: (1) its lel is ; (2) it must hve hil element with n ritrry tg, whih in turn hs hil element with lel ; n (3) it must hve esennt element whih hs oth -hil element n n -hil element. Thus, p essentilly speifies (existentil) onjuntive onitions on XML ouments. It shoul e note tht ouments stisfying p my hve tgs/sutrees not mentione in p. For instne, the root element of T my hve -hil element, n the -elements of T my hve -esennt elements. 2 A tree pttern p is si to e onsistent if n only if there exists n XML oument tht stisfies p. We only onsier onsistent tree ptterns in our work. Further, the tree ptterns efine ove n e nturlly generlize to ommote simple onitions n preites (e.g., issue = GE n prie < 1). To simplify the isussion, we o not onsier suh extensions in this pper. It is worth mentioning tht tree pttern n e esily onverte to n equivlent XPth expression [15] in whih eh su-pttern is expresse s onition/qulifier [5]. Thus, our tree ptterns re grph representtions of lss of XPth expressions, whih re similr to the tree ptterns tht hve een stuie for XML queries (e.g., [3, 17]). It is tempting to onsier using lrger frgment of XPth to express susription ptterns. However, it turns out tht even mil generliztion of our tree ptterns (e.g., with the ition of union/isjuntion opertors) les to muh higher omplexity (onp-hr or eyon) for si opertions suh s ontinment omputtion (e.g., see [1]). A tree pttern q is si to e ontine in nother tree pttern p, enote y q v p, if n only if for ny XML tree T,ifT stisfies q then T lso stisfies p. Ifq v p, we refer to p s the ontiner pttern n q s the ontine pttern. We sy tht p n q re equivlent, enote y p q, if p v q n q v p. This efinition n e generlize to sets of tree ptterns: set of tree ptterns S is ontine in nother set of tree ptterns S, enote y S v S,if for eh p 2 S, there exists p 2 S suh tht p v p. Continment for su-ptterns is efine similrly. The size oftreeptternp, enote y jpj, issimply the rinlity of its noe set. For exmple, referring to Figure 2, jp j =7n jp j = Prolem Sttement The tree pttern ggregtion prolem tht we investigte in this pper n now e stte s follows. Given set of tree pttern susriptions S n spe oun k on the totl size of the ggregte susriptions, ompute set of tree ptterns S tht stisfies ll of the following three onitions: (C1) S v S (i.e., S is t lest s generl s S), (C2) P p 2S jp j»k (i.e., S is onise ), n (C3) S is s preise s possile, in the sense tht there oes not exist nother set of tree ptterns S tht stisfies the first two onitions n S v S. Clerly, the tree pttern ggregtion prolem my not neessrily hve unique solution sine it is possile to hve two sets S n S tht stisfy the first two onitions ut S 6v S n S 6v S. Therefore, we nee to evise some mesure to quntify the gooness of nite solutions in terms of oth their oniseness s well s preiseness. With respet to oniseness, we re intereste in miniml tree ptterns tht o not ontin ny reunnt noes. More preisely, we sy tht tree pttern p is minimize if for ny tree pttern p suh tht p p, itisthe se tht jpj»jp j. With respet to preiseness, it n e

4 * * * () p () p () p () p * x * * y * (e) p e (f) p f (g) p g (h) p h (i) p i Figure 2: Exmples of Tree Ptterns. shown tht the ontinment reltionship v on the universe of tree ptterns tully efines lttie. In prtiulr, the notions of upper oun n lest upper oun re of relevne to the ggregtion prolem n, therefore, we efine them formlly here. An upper oun of two tree ptterns p n q is tree pttern u suh tht p v u n q v u, i.e., for ny XML tree T,ifT j= p or T j= q then T j= u. Thelest upper oun (LUB) ofp n q, enote y p t q, is n upper oun u of p n q suh tht, for ny upper oun u of p n q, u v u. One gin, we generlize the notion of LUBs to set S of tree ptterns. An upper oun of S is tree pttern U, enote y S v U, suh tht p v U for every p 2 S. The LUB of S, enote y ts, is n upper oun U of S suh tht for ny upper oun U of S, U v U. Clerly, if p is n ggregte tree pttern for set of tree ptterns S (i.e., S v p), then p is n upper oun of S. Oserve tht, if p is the LUB of S,thenpis the most preise ggregte tree pttern for S. In ft, it n e shown tht ts exists n is unique up to equivlene for ny set S of tree ptterns [5]; thus, it is meningful to tlk out ts s the most preise ggregte tree pttern. Exmple 2.2 Consier gin the tree ptterns in Figure 2. Oserve tht p p ; n sine jp j > jp j, p is not minimize pttern. In ft, exept for p, ll the tree ptterns in Figure 2 re minimize ptterns. Note tht p 6v p euse the root noe of p oes not hve tg- hil noe; n p 6v p euse there exists no noe in p tht is prent noe of oth tg--noe n tg--noe. Oserve tht p v p n p v p ; i.e., p is n upper oun of p n p. However, p 6= p tp sine we hve nother tree pttern, p e, whih is n upper oun of p n p suh tht p e v p. Inee, p e = p t p with jp e j < jp j + jp j. Note, however, tht the size of n LUB is not neessrily lwys smller thn the size of its onstituent ptterns. For exmple, p h = p t p f ut jp h j > jp j + jp f j. Note tht p is n upper oun of fp ;p ;p ;p e ;p f ;p g ;p h g. 2 We onlue this setion y presenting some itionl nottion use in this pper. For noe v in tree pttern p, we enote the set of hil noes of v in p y Chil(v; p). We lso efine prtil orering μ on noe lels suh tht if x n x re tg nmes, then (1) x μ Λ μ == n (2) x μ x iff x = x. Given two noes v n w, MxLel(v; w) is efine to e the lest upper oun of their lels lel(v) n lel(w) s follows: 8 lel(v) if lel(v) =lel(w); >< == if (lel(v) ===) MxLel(v;w) = >: or (lel(w) ===); * otherwise. For exmple, M xlel(; ) =Λ nm xlel(λ; ==) = ==. For nottionl onveniene, we refer to noe v in tree pttern s n `-noe if lel(v) =`, n refer to v s tg-noe if lel(v) 62 f=:; Λ; ==g. 3 Computing the Most Preise Aggregte In this setion, we onsier speil se of our tree pttern ggregtion prolem, nmely, when the ggregte set S onsists of single tree pttern n there is no spe onstrint. For this se, we provie n lgorithm to ompute the most preise ggregte tree pttern (i.e., LUB) for set of tree ptterns. Some of the lgorithms given in this setion re lso key omponents of our solution for the generl prolem, whih is presente in the next setion. Given two input tree ptterns p n q, Algorithm LUB in Figure 3 omputes the most preise ggregte tree pttern for fp; qg (i.e., the LUB of p n q). It trverses p n q top-own n omputes the tightest ontiner su-ptterns for eh pir of su-ptterns p = Sutree(v; p) n q = Sutree(w; q) enountere, where v n w re noes in p n q, respetively. The tightest ontiner su-ptterns of p n q re set R of su-ptterns suh tht: (1) R onsists of ontiner su-ptterns 2 of p n q, i.e., for ny XML oument T n ny element t in T,if (T;t) j= p or (T;t) j= q then (T;t) j= r for eh r 2 R; n, 2 Note tht su-pttern of tree ptterns p n q is n upper-oun of p n q, n we use these two terms interhngely.

5 Algorithm LUB (p; q) Input: p n q retreeptterns. Output: A tree pttern representing the LUB of p n q. 1) if (q v p) then return p; 2) if (p v q) then return q; 3) Initilize T CSuP t[v;w] =;, 8 v 2 Noes(p); 8 w 2 Noes(q); 4) Let v root n w root enote the root noes of p n q,resp.; 5) for eh v 2 Chil(v root;p) o 6) for eh w 2 Chil(w root;q) o 7) T CSuP t[v;w] =LUB SUB (v; w; T CSuP t); 8) Crete tree pttern x with root noe lel =: n the set of hil [ su-ptterns T CSuP t[v; w]; v2chil(v root;p);w2chil(w root ;q) 9) return MINIMIZE (x); Algorithm LUB SUB (v; w; T CSuP t) Input: v, w re noes in tree ptterns p, q (respetively), T CSuP t is 2-imensionl rry suh tht T CSuP t[v; w] is the set of tightest ontiner su-ptterns of Sutree(v;p) n Sutree(w; q). Output: T CSuP t[v;w]. 1) if (T CSuP t[v;w] 6= ;) then 2) return T CSuP t[v; w]; 3) else if (Sutree(w; q) v Sutree(v;p)) then 4) return fsutree(v; p)g; 5) else if (Sutree(v;p) v Sutree(w; q)) then 6) return fsutree(w; q)g; 7) else 8) Initilize R = ;; R = ;; R = ;; 9) for eh v 2 Chil(v; p) o 1) for eh w 2 Chil(w; q) o 11) R = R [ LUB SUB (v ;w ; T CSuP t); 12) for eh v 2 Chil(v; p) o 13) R = R [ LUB SUB (v ; w; T CSuP t); 14) for eh w 2 Chil(w; q) o 15) R = R [ LUB SUB (v; w ; T CSuP t); 16) Let x e the pttern with root noe lel MxLel(v;w) n set of hil sutree ptterns R; 17) Let x e the pttern with root noe lel == n set of hil sutree ptterns R ; 18) Let x e the pttern with root noe lel == n set of hil sutree ptterns R ; 19) return T CSuP t[v; w] =fx; x ;x g; Figure 3: Lest-Upper-Boun Computtion Algorithm. (2) R is tightest in the sense tht for ny other set of ontiner su-ptterns R of p n q tht stisfies onition (1), ny XML oument T n ny element t in T,if(T;t) j= r for eh r 2 R then (T;t) j= r for ll r 2 R. Intuitively, R is olletion of onitions impose y oth p n q suh tht if T stisfies p or q t t,thent lso stisfies the onjuntion of these onitions t t. We now show how the LUB for p n q n e ompute from the tightest ontiner su-ptterns. Let v root n w root e the roots of ptterns p n q, respetively. Note tht oument T tht stisfies p lso stisfies, for eh v 2 Chil(v root ;p), the restrition of p to the root noe n only Sutree(v; p). Consequently, oument T tht stisfies p or q must lso stisfy the pttern x onsisting of root noe (with lel ) whose hilren re the tightest ontiner suptterns for eh pir Sutree(v; p) n Sutree(w; q), where v 2 Chil(v root ;p) n w 2 Chil(w root ;q). This pttern x is thus n LUB of p n q. The min suroutine in our LUB omputtion (Algorithm LUB SUB) omputes the tightest ontiner suptterns of p n q s follows. If q v p (resp. p v q ), then p (resp. q ) is the tightest ontiner supttern; otherwise, the tightest ontiner su-ptterns re setfx; x ;x g of su-ptterns, whih re efine in the following mnner. The root noe of x is lele with MxLel(v; w) n the hil sutrees of x re the tightest ontiner su-ptterns of eh hil sutree of p n eh hil sutree of q. Intuitively, the root of x orrespons to the roots of p n q (with lel equl to the lest upper oun of tht of p n q ). In other wors, x preserves the positions of the orresponing noes in p n q. However, this position-preserving generliztion is not suffiient sine p n q my hve ommon suptterns t ifferent positions reltive to their roots. For exmple, p n p f in Figure 2 hve ommon su-pttern roote t n -noe tht hs oth -hil n -hil, ut this pttern is lote t ifferent positions reltive to the roots of p n p f. To pture these off-position ommon su-ptterns, we nee to ompute x n x. The hil sutrees of x re the tightest ontiner su-ptterns of q itself n eh hil sutree of p ; n the lel of the root noe of x is == to ommote ommon su-ptterns t ifferent positions reltive to the roots of p n q. Similrly, the root noe of x hs lel ==, n the hil sutrees of x re the tightest ontiner su-ptterns of p itself n eh hil sutree of q. By omputing the tightest ontiner su-ptterns reursively, the lgorithm omputes the LUB of the input tree ptterns p n q. By inution on the strutures of p n q, we n show the following result [5]. Proposition 3.1: Given two tree ptterns p n q, Algorithm LUB (p; q) omputes p t q. 2 Exmple 3.1 Given p n p f in Figure 2, Algorithm LUB returns p h, whih is inee p t p f. To help explin the omputtion of p h, we use the nottion x n to refer the n th noe (in some tree pttern) tht is lele x, where eh olletion of noes shring the sme lel re orere se on their pre-orer sequene; for exmple, in p h, we use == 1 n == 3 to refer to the leftmost n rightmost ==-noes, respetively. Algorithm LUB SUB (invoke y Algorithm LUB) first extrts the position preserving tightest ontiner su-ptterns for Sutree( 1 ;p ) n Sutree(; p f ), whih yiels the su-pttern Sutree( 1 ;p h ) (in Steps 9 11). Note tht the root noe of Sutree( 1 ;p h ) is lele euse oth the root noes of Sutree( 1 ;p ) n Sutree(; p f ) re lele. The su-ptterns Sutree( 2 ;p ) n Sutree(; p f ), however, hve quite ifferent strutures n thus position-preserving ttempt to extrt their ommon su-ptterns only yiels

6 Sutree(Λ 1 ;p h ). In prtiulr, the ommon su-pttern onsisting of n -noe with oth -hil-noe n -hil-noe is not pture y the ove proess euse they our t ifferent positions reltive to the root noes of Sutree( 2 ;p ) n Sutree(; p f ). To extrt suh off-position ommon su-ptterns, Algorithm LUB SUB ompres Sutree( 1 ;p ) with Sutree(; p f ) n Sutree(; p f ), s well s ompres Sutree(; p f ) with Sutree( 2 ;p ) (in Steps 12 15). Inee, this yiels Sutree(== 3 ;p h ) whih hs ==-root sine this ommon su-pttern ours t ifferent positions reltive to the root noes of Sutree( 1 ;p ) n Sutree(; p f ). It shoul e mentione tht oth Sutree(== 1 ;p h ) n Sutree(== 2 ;p h ) re lso proue y the off-position proessing, s Algorithm LUB SUB reursively proesses the su-pttern Sutree( 2 ;p ) with Sutree(; p f ) n Sutree(; p f ), respetively. Finlly, the lgorithm removes the reunnt noes in the result tree pttern y using minimiztion lgorithm (whih will e expline shortly) to generte the LUB p h. 2 It is strightforwr to show tht our LUB opertor t, onsiere s inry opertor, is ommuttive n ssoitive, i.e., p 1 t p 2 = p 2 t p 1 n p 1 t (p 2 t p 3 ) = (p 1 t p 2 ) t p 3. As result, Algorithm LUB n e nturlly extene to ompute the LUB of ny set of tree ptterns. We next explin the etils of the two uxiliry lgorithms use in Algorithm LUB. Algorithm LUB nees to hek the ontinment of tree ptterns, whih is implemente y Algorithm CONTAINS in Figure 4. Given two input tree ptterns p n q, the lgorithm etermines if q v p. It mintins two-imensionl rry Sttus, whih is initilize with Sttus[v; w] = null to inite tht v 2 Noes(p) n w 2 Noes(q) hve not een ompre; otherwise, Sttus[v; w] 2 ftrue; flseg suh tht Sttus[v; w] = true if n only if Sutree(w; q) v Sutree(v; p). Clerly, q v p if n only if Sttus[v root ;w root ]=true,wherev root n w root enote the root noes of p n q, respetively. The min suroutine in our ontinment lgorithm is Algorithm CONTAINS SUB. Astrtly, CONTAINS SUB trverses p n q top-own n uptes Sttus[v; w] for eh pir of noes v 2 N oes(p) n w 2 N oes(q) visite s follows. Let p n q enote Sutree(v; p) n Sutree(w; q), respetively. If Sttus[v; w] hs lrey een ompute (i.e., Sttus[v; w] 6= null), then its vlue is returne. Otherwise, our lgorithm etermines whether q v p, s follows. If lel(v) 6= ==, then Sttus[v; w] = true iff lel(w) μ lel(v) n eh hil sutree of v ontins some hil sutree of w. Otherwise, if lel(v) = ==, two itionl onitions nee to e tken into ount. This is euse unlike Λ-noe or tg-nme-noe, ==-noe in ontiner tree pttern n lso e mppe to (possily empty) hin of noes in ontine tree pttern. For exmple, onsier the tree ptterns p n p f in Figure 2. Note tht p f v p,n the ==-noe in p is not mppe to ny noe in p f in the sense tht p f woul still e ontine in p if the ==-noe Algorithm CONTAINS (p; q) Input: p n q re two tree ptterns. Output: Returns true if q v p; flse otherwise. 1) Initilize Sttus[v; w] =null, 8 v 2 Noes(p); 8 w 2 Noes(q); 2) Let v root n w root enote the root noes of p n q,resp.; 3) if (Chil(v root;p)=;) then 4) return true; 5) else 6) return CONTAINS SUB (v root;w root; Sttus); Algorithm CONTAINS SUB (v; w; Sttus) Input: v, w re noes in tree ptterns p, q (respetively), Sttus is 2-imensionl rry suh tht eh Sttus[v; w] 2fnull; flse; trueg. Output: Sttus[v;w]. 1) if (Sttus[v;w] 6= null) then 2) return Sttus[v; w]; 3) if (v is lef noe in p) then 4) Sttus[v; w] =(lel(w) μ lel(v)); 5) else if (lel(w) 6μ lel(v)) then 6) Sttus[v; w] =flse; 7) else 8) Sttus[v; w] = 1 ^ CONTAINS SUB (v ;w ; Sttus) A ; v 2Chil(v;p) w 2Chil(w;q) 9) if (Sttus[v;w] =flse) n (lel(v) ===) then 1) V Sttus[v; w] = v 2Chil(v;p) CONTAINS SUB (v ; w; Sttus); 11) if (Sttus[v;w] =flse) n (lel(v) ===) _ then 12) Sttus[v; w] = CONTAINS SUB (v; w ; Sttus); 13) return Sttus[v;w]; w 2Chil(w;q) Figure 4: Tree-Pttern Continment Algorithm. in p is elete. On the other hn, for the tree ptterns p n p g in Figure 2, p g v p n the ==-noe in p is mppe to oth the Λ- n-noes in p g in the sense tht Sutree(Λ;p g ) v Sutree(==; p ) n Sutree(; p g ) v Sutree(==; p ). These two itionl senrios re hnle y Steps 1 n 12 in Algorithm CONTAINS SUB: Step 1 ounts for the se where ==-noe (v itself) is mppe to n empty hin of noes, n Step 12 for the se where ==-noe (v itself) is mppe to nonempty hin. Note tht in Steps 8 n 12, the expression W w inchil(w;q) CONTAINS SUB (x; w ; Sttus) returns flseif Chil(w; q) =;. By inution on the strutures of p n q, we n show the following result. Proposition 3.2: Given two tree ptterns p n q, Algorithm CONTAINS(p; q) etermines if q v p in O(jpj jqj) time. 2 The qurti time omplexity of our tree-pttern ontinment lgorithm is ue to, mong other things, the ft tht eh pir of su-ptterns in p n q is heke t most one, euse of the use of the Sttus rry. To simplify the isussion, we hve omitte from Algorithm CON- TAINS ertin sutle etils tht involve tree ptterns with

7 hins of ==- nλ-noes. Suh ses require some itionl pre-proessing to onvert the tree pttern to some nonil form, ut this oes not inrese our lgorithm s time omplexity. To ensure tht our tree ptterns re onise, we nee to ientify n eliminte reunnt noes in them. Given treeptternp, minimize tree pttern p equivlent to p n e ompute using reursive lgorithm MIN- IMIZE. Strting with the root of p, our minimiztion lgorithm performs the following two steps to minimize the su-pttern Sutree(v; p) roote t noe v in p: (1)Forny v ;v 2 Chil(v; p), ifsutree(v ;p) v Sutree(v ;p), then elete Sutree(v ;p) from Sutree(v; p); n, (2) For eh v 2 Chil(v; p) (tht ws not elete in the first step), reursively minimize Sutree(v ;p). The omplete etils n e foun in [5]. Proposition 3.3: Algorithm MINIMIZE minimizes ny tree pttern p in O(jpj 2 ) time. 2 Proposition 3.4: For ny minimize tree ptterns p n p, p p iff p = p (i.e., they re synttilly equl). 2 Given the low omputtionl omplexities of CON- TAINS n MINIMIZE, one might expet tht this woul lso e the se for Algorithm LUB. Unfortuntely, in the worst se, the size of the (minimize) LUB of two tree ptterns n e exponentilly lrge (see [5] for etile nlysis). Our implementtion results, however, emonstrte tht our LUB lgorithm exhiits resonly low vergese omplexity in prtie. 4 Seletivity-se Aggregtion Algorithm While the LUB lgorithm presente in the previous setion n e use to ompute single, most preise ggregte tree pttern for given set S of ptterns, the size of the LUB my e too lrge n, therefore, my violte the speifie spe onstrint k on the totl size of the ggregte susriptions (Setion 2.2). Thus, in orer to fit our ggregtes within the llotte spe uget, we relx the requirement of single preise ggregte y permitting our solution to e set S = fp 1 ;p 2 ;:::;p m g (inste of single pttern), suh tht eh pttern q 2 S is ontine in some pttern p i 2 S. Of ourse, we lso require tht S provie the tightest ontinment for ptterns in S for the given spe onstrint (Setion 2.2); tht is, the numer of XML ouments tht stisfy some tree pttern in S ut not S, is smll. A simple mesure of the preiseness of S is its seletivity, whih is essentilly the frtion of filtere XML ouments tht stisfy some pttern in S. Thus, our ojetive is to ompute set S of ggregte ptterns whose seletivity is verylose to tht of S. Clerly, the seletivity of our tree ptterns is highly epenent on the istriution of the unerlying olletion of XML ouments (enote y D). It is, however, infesile to mintin the etile istriution D of streming XML ouments for our ggregtion the spe requirements woul e enormous! Inste, our pproh is se on uiling onise synopsis of D on-line (i.e., s ouments re streming y), n using tht synopsis to estimte (pproximte) tree-pttern seletivities. At high level, our ggregtion lgorithm itertively omputes sets tht is oth seletive n stisfies the spe onstrint, strting with S = S (i.e., the originl set S of ptterns), n performing the following sequene of steps in eh itertion: 1. Generte nite set of ggregte tree ptterns C onsisting of ptterns in S n LUBs of similr pttern pirs in S. 2. Prune eh pttern p in C y eleting/merging noes in p in orer to reue its size. 3. Choose nite p 2 C to reple ll ptterns in S tht re ontine in p. Our nite-seletion strtegy is se on mrginl gins [14]: The selete nite p is the one tht results in the minimum loss in seletivity per unit reution in the size of S (ue to the replement of ptterns in S y p). Note tht our pruning step (Step 2) ove mkes nite ggregte ptterns less seletive (in ition to eresing their size). Thus, y repling ptterns in S y ptterns in C, we re effetively trying to reue the size of S y giving up some of its seletivity. In the following susetions, we esrie in more etil our lgorithm for omputing S. We egin y presenting our pproh for estimting the seletivity of tree ptterns over the unerlying oument istriution, whih is ritil to hoosing goo replement nite in Step 3 ove. 4.1 Seletivity Estimtion for Tree Ptterns The Doument Tree Synopsis. As mentione ove, it is simply impossile to mintin the urte oument istriution D (i.e., the full set of streming ouments) in orer to otin urte seletivity estimtes for our tree ptterns. Inste, our pproh is to pproximte D y onise synopsis struture, whih we refer to s the oument tree. Our oument tree synopsis for D, enote y DT, ptures pth sttistis for ouments in D, n is uilt on-line s XML ouments strem y. The oument tree essentilly hs the sme struture s n XML tree, exept for two ifferenes. First, the root noe of DT hs the speil lel. Seon, eh non-root noe t in DT hs frequeny ssoite with it, whih we enote y freq(t). Intuitively, if l 1 =l 2 = =l n is the sequene of tg nmes on noes long the pth from the root to t (exluing the lel for the root), then freq(t) represents the numer of ouments T in D tht ontin pth with tg sequene l 1 =l 2 = =l n originting t the root of T. The frequeny for the root noe of DT is set to N, the numer of ouments in D. As XML ouments strem y, DT is inrementlly mintine s follows. For eh rriving oument T,we first onstrut the skeleton tree T s for oument T.Inthe skeleton tree T s, eh noe hs t most one hil with given tg. T s is uilt from T y simply olesing two hilren of noe in T if they shre ommon tg. Clerly, y trversing noes in T in top-own fshion, n olesing

8 x x (e) Doument Tree x x (f) Compresse Doument Tree () T1 () T2 () T3 () Skeleton tree for T1 x x (g) p1 (h) p2 x * (i) p3 Figure 5: Exmple Douments, Skeleton Tree, Doument Tree, n Ptterns. hil noes with ommon tgs, we n onstrut T s from T in single pss (using n event-se XML prser). As n exmple, Figure 5() epits the skeleton tree for the XML-oument tree in Figure 5(). Next, we use T s to upte the sttistis mintine in our oument tree synopsis DT s follows. For eh pth in T s, with tg sequene sy l 1 =l 2 = =l n,lette the lst noe on the orresponing (unique) pth in DT. We inrement freq(t) y 1. Figure 5(e) shows the oument tree (with noe frequenies) for the XML trees T 1, T 2,n T 3 in Figure 5() to (). Note tht it is possile to further ompress DT y using tehniques similr in spirit to the methos employe y Aoulng et l. [1] for summrizing pth trees. The key ie is to merge noes with the lowest frequenies n store, with eh merge noe, the verge of the originl frequenies for noes in DT tht were merge. This is illustrte in Figure 5(f) for the oument tree in Figure 5(e), n with the lel use to inite merge noes. Due to spe onstrints, in the reminer of this susetion, we only present solutions to the seletivity estimtion prolem using the unompresse tree DT. However, our propose methos n e esily extene to work even when DT is ompresse [5]. We shoul note here tht our seletivity estimtion prolem for tree ptterns iffers from the work of Aoulng et l. [1] in two importnt respets. First, in [1], the uthors onsier the prolem of estimting seletivity for only simple pths tht onsist of -noe followe y tg noes. In ontrst, we estimte seletivities of generl tree ptterns with rnhes, n *- or -noes ritrrily istriute in the tree. Seon, we re intereste in seletivity t the grnulrity of ouments, so our gol is to estimte the numer of XML ouments tht mth tree pttern; inste, [1] resses the seletivity prolem t the grnulrity of iniviul oument elements tht re isovere y pth. It is esy to see tht these re two very ifferent estimtion prolems. Seletivity Estimtion Proeure. Rell tht the seletivity of tree pttern p is the frtion of ouments T in D tht stisfy p. By onstrution, our DT synopsis gives urte seletivity estimtes for tree ptterns omprising single hin of tg-noes (i.e., with no * or ). However, otining urte seletivity estimtes for ritrry tree ptterns with rnhes, *, n is, in generl, not possile with DT summries. This is euse, while DT ptures the numer of ouments ontining single pth, it oes not store oument ientities. As result, for pir of ritrry pths in tree pttern, it is impossile to etermine the ext numer of ouments tht ontin oth pths or ouments tht ontin one pth, ut not the other. Our estimtion proeure solves this prolem, y mking the following simplifying ssumption: The istriution of eh pth in tree pttern is inepenent of other pths. Thus, we estimte the seletivity of tree pttern ontining no == or Λ lels, simply s the prout of the seletivities of eh root to lef pth in the pttern. For ptterns ontining == or Λ, we onsier ll possile instntitions for == n Λ with element tgs, n then hoose s our pttern seletivity the mximum seletivity vlue over ll instntitions. (This is similr to the efinition of fuzzy OR opertor in fuzzy logi [13].) We illustrte our seletivity estimtion methoology in the following exmple. Exmple 4.1 Consier the prolem of estimting the seletivities of the tree ptterns shown in Figures 5(g) to (i) using the oument tree shown in Figure 5(e). The totl numer of ouments, N,is3. Clerly, the numer of ouments stisfying pttern p 1 whih onsists of single pth, n e estimte urtely y following the pth in DT n returning the frequeny for the -noe (t the en of the pth) in DT. Thus, the seletivity of p 1 is 2=3 whih is urte sine only ouments T 2 n T 3 stisfy p 1.Estimting the numer of ouments ontining pttern p 2, however, is somewht more triky. This is euse there re two pths with tg sequenes x== n x=== in DT tht mth p 2 (orresponing to instntiting with x n x=). Summing the frequenies for the two -noes t the en of these pths gives us n nswer of 4 whih over-estimtes the numer of ouments stisfying p 2 (only ouments T 2 n T 3 stisfy p 2 ). To voi oule-ounting frequenies, we estimte the numer of ouments stisfying p 2 to e the mximum (n not the sum) of frequenies over ll pths in DT tht mth p 2. Thus, the seletivity of p 2 is estimte s 2=3. Finlly, the seletivity of p 3 is ompute y onsiering ll possile instntitions for n *, n hoosing the one with the mximum seletivity. The two possile instntitions for tht result in non-zero seletivities re x n x=, n Λ n e instntite with either ; or for == = x, n or for == = x=. Choosing == = x n Λ = results in the mximum seletivity sine the prout of the seletivities of pths x== n x== is mximum, n is equl to (3=3) (2=3) = 2=3. 2 Algorithm SEL (epite in Figure 6), invoke with input prmeters v = v root (root of pttern p) nt = t root (root of DT), omputes the seletivity for n ritrry tree

9 Algorithm SEL(v, t) Input: v is noe in tree pttern p, t is noe in DT. Output: SelSuP t[v; t]. 1) if (SelSuP t[v; t] is lrey ompute) then 2) return SelSuP t[v; t]; 3) else if (lel(t) 6μ lel(v)) then 4) return SelSuP t[v; t] =; 5) else if (v is lef) then 6) return freq(t)=n; 7) for eh hil v 2 Chil(v; p) o 8) Sel v = mx t2chil(t;dt )fsel (v ;t )g; 9) Sel = Q v 2Chil(v;p) Selv ; 1) if (lel(v) ===) then 11) Sel v = Q v 2Chil(v;p) SEL(v;t); 12) Sel = mxfsel;sel vg; 13) Sel v = mx t2chil(t;dt )fsel(v; t )g; 14) Sel = mxfsel;sel vg; 15) return SelSuP t[v; t] =Sel Figure 6: Tree Pttern Seletivity Estimtion Algorithm. pttern p in O(jDTj jpj) time. In the lgorithm, for noes v 2 p n t 2 DT, SelSuP t[v; t] stores the seletivity of the su-ptternsutree(v; p) with respet to the sutree of DT roote t noe t. This seletivity is estimte similr to the seletivity for pttern p, exept tht we now onsier ll instntitions of Sutree(v; p) (otine y instntiting == n Λ with element tgs), n the seletivity of eh instntition is ompute with respet to t s the root inste of the root of DT. For instne, suppose tht v is the -noe in p 3 (in Figure 5(i)), n t is the hil -noe of the x-noe in DT (in Figure 5(e)). Then, the seletivity of Sutree(v; p 3 ) with respet to t is essentilly the prout of the seletivity of pths =Λ n = with respet to noe t,whihis1 (2=3). Thus, SelSuP t[v; t] = 2=3. Our gol is to ompute SelSuP t[v root ;t root ]. For pir of noes v n t, Algorithm SEL omputes SelSuP t[v; t] from SelSuP t[ ] vlues for the hilren of v n t. Clerly, if lel(t) 6μ lel(v) (Steps 3-4 of the lgorithm), then every pth in Sutree(v; p) egins with lel ifferent from lel(t) n thus the seletivity of eh of the pths is. If lel(t) μ lel(v) n v is lef (Steps 5-6), then we simply instntite lel(v) (if lel(v) === or *) with lel(t), giving seletivity of freq(t)=n. On the other hn, if v is n internl noe of p, then in ition to instntiting lel(v) with lel(t), we lso nee to ompute, for every hil v of v, the instntition for Sutree(v ;p) tht hs the mximum seletivity with respet to some hil t of t. SineSelSuP t[v ;t ] is the seletivity of Sutree(v ;p) with respet to t,the prout of mx t2chil(t;dt ) SelSuP t[v ;t ] for the hilren v of v gives the seletivity of Sutree(v; p) with respet to t. Finlly, if lel(v) = ==, then == n e simply null, in whihse theseletivityof Sutree(v; p) with respet to t is ompute s esrie in Step 11, or == is instntite to sequene onsisting of lel(t) followe y lel(t ),wheret is the hil of t suh tht the seletivity of Sutree(v; p) with respet to t is mximize (Step 13). Oserve tht, in Steps 8 n 13, if t hs no hilren, then mx t2chil(t;dt )f:::g evlutes to. 4.2 Tree Pttern Aggregtion Algorithm We re now rey to present our greey heuristi lgorithm for the tree pttern ggregtion prolem efine in Setion 2.2 (whih is, in generl, n NP-hr lustering prolem [5]). As esrie erlier, to ggregte n input set of tree ptterns S into spe-effiient n preise set, our lgorithm (Algorithm AGGREGATE in Figure 7) itertively prunes the tree ptterns in S y repling smll suset of tree ptterns with more onise upper-oun ggregte pttern, until S stisfies the givenspe onstrint. During eh itertion, our lgorithm first genertes smll set of potentil nite ggregte ptterns C, n selets from these the (lolly) est nite pttern, i.e., the nite tht mximizes the gin in spe while minimizing the expete loss in seletivity. Algorithm AGGREGATE (S; k) Input: S is set of tree ptterns, k is spe onstrint. Output: A set of tree ptterns S suh tht S v S n P p2s jpj»k. 1) Initilize S = S; 2) while P ( p2s jpj >k) o 3) C 1 = fx j x = PRUNE(p; jpj 1); p 2 S g; 4) C 2 = fx j x = PRUNE(p t q; jpj + jqj 1); p; q 2 S g; 5) C = C 1 [ C 2; 6) Selet x 2 C suh tht Benefit(x) is mximum; 7) S = S fp j p v x; p 2 S g [ fxg; 8) return S ; Figure 7: Tree Pttern Aggregtion Algorithm. Cnite Genertion. We now explin the proess for generting the nite set C in Steps 3 5 of Algorithm AGGREGATE. To reue the size of iniviul nite ptterns of the form p or ptq, eh nite is prune y invoking Algorithm PRUNE (etils in [5]). Given n input pttern p n spe onstrint n, Algorithm PRUNE prunes p to smller tree pttern p suh tht p v p n jp j»n. The lgorithm trets tg-noes s more seletive thn Λ- n==-noes, n therefore tries to prune wy Λ- n ==-noes efore the tg-noes. Speifilly, the lgorithm first prunes the Λ- n==-noes in p y (1) repling eh jent pir of non-tg-noes v; w with single ==-noe, if w is the only hil of v, n (2) eliminting sutrees tht onsist of only non-tg-noes. If the tree pttern is still not smll enough fter the pruning of the nontg-noes, we strt pruning the tg-noes. There re two wys to reue the size of tree pttern p y one noe. The first is to elete some lef noe in p, n the seon is to ollpse two noes v n w into single ==-noe, where lel(v) 6= =: n Chil(v;p) = fwg. To help selet goo lef noe to elete (or, pir of noes to ollpse), we mke use of the seletivity of the tg nmes. More speifilly, we use our oument tree synopsis DT to estimte the totl numer of ourrenes of tg nme in the oument olletion D, n then hoose the tgs with higher totl frequenies (whih re less seletive) s nites for pruning.

10 Cnite Seletion. One the set of nite ggregte ptterns hs een generte, we nee some riterion for seleting the est nite to insert into S. For this purpose, we ssoite enefit vlue with eh nite ggregte pttern x 2 C, enote y Benefit(x), se on its mrginl gin [14]; tht is, we efine Benefit(x) s the rtio of the svings in spe to the loss in seletivity of using x over fp j p v x; p 2 S g. More formlly, if v xroot, t root,nv proot represent the root noes of x, DT, n p 2 S,thenBenefit(x) is equl to: P pvx;p2s jpj jxj SEL(v xroot ;t root) mx pvx;p2s SEL(v proot ;t root) Note tht we ompute the seletivity loss y ompring the seletivity of the nite ggregte pttern x with tht of the lest seletive pttern ontine in it. This gives goo pproximtion of the seletivity loss in ses when the ptterns p; q 2 S use to generte x re similr n overlp in the oument tree DT. The nite ggregte pttern with the highest enefit vlue is hosen to reple the ptterns ontine in it in S (Steps 6 7). 5 Experimentl Stuy To verify the effetiveness of our tree pttern ggregtion lgorithms, we hve onute n extensive performne stuy using rel-life DTDs n lrge numers of tree ptterns. Our results inite tht our propose ggregtion tehniques hieve signifint reutions in the numer s well s totl size of tree ptterns with miniml loss in seletivity. 5.1 Experimentl Teste n Methoology Our generl methoology for evluting the effetiveness of pttern ggregtion lgorithm A is s follows. Given lrge input set of tree ptterns S n spe onstrint k, weusea to ompute set of ggregte ptterns S for S, wheres v S n P p2s jpj»k(our spe onstrint is expresse in terms of numer of noes, sine ptterns n e ritrrily lrge). We then mesure the loss in preision when using S inste of S to filter XML ouments. Oserve tht when k = 1, S ontins single ontiner pttern ( == ). To mesure the loss in preision of the ggregte set S, we use suset D of representtive set of XML ouments, suh tht no oument in D mthes ny tree pttern in our initil pttern set S. The reson, of ourse, is tht XML ouments tht mth S re lso gurntee to mth S, so they re unlikely to ffet our preisionloss mesurements. As S eomes less preise, some ouments in D will e erroneously reporte s mthes. Let Mthes(D ;S ) e the numer of ouments in D tht mth S ; the loss in preision of S over S n e estimte s SelLoss(S ;S) = Mthes(D ;S )=jd j. An ggregtion lgorithm is oviously more effetive if SelLoss(S ;S) remins smll s P p2s jpj ereses. XML Douments. We use two rel-life DTDs to generte our XML oument t set. The first one, the Extensile Hypertext Mrkup Lnguge (XHTML) DTD [7], is reformultion of HTML s n XML pplition n is rguly the oument type most wiely use over the Internet. The XHTML DTD (version 1.) ontins 77 elements with 1377 ttriutes. The seon DTD, the News Inustry Text Formt (NITF) DTD[8], is supporte y most of the worl s mjor news genies. The NITF DTD (version 2.5) ontins 123 elements with 513 ttriutes. We generte our t set of XML ouments using IBM s XML Genertor tool [11]. Both the XHTML n NITF DTDs ontin reursive strutures, whih n e neste to proue XML ouments with ritrry numer of levels. We e the option of generting ouments skewe oring to Zipf istriution [18], where some tg nmes pper more frequently thn others, s is generlly the se with rel-life t. For eh eh DTD n eh skew vlue D = f; 1; 2g, we generte two isjoint sets of 5 XML ouments with pproximtely 1 noes n 1 levels on verge. The first set orrespons to the olletion of XML ouments use to onstrut the oument tree DT for seletivity estimtion; the seon set is use to mesure the loss in preision of the ggregtion lgorithms. Both sets were generte with the sme prmeters, n thus n e expete to hve similr istriutions. In eh experiment, we use the omine XML ouments for oth the XHTML n NITF DTDs, i.e., we use totl of 1 ouments for the oument tree DT, n ( ifferent) 1 ouments for mesuring the loss in preision. XPth Expressions. To generte the set of tree ptterns S, we implemente n XPth expression genertor tht tkes DTD s input n retes set of vli XPth expressions se on set of prmeters tht ontrol: (1) the mximum height h of the tree ptterns; (2) the proilities p Λ n p == of hving wilr Λ or esennt == opertor t noe of tree pttern; (3) the proility p h of hving more thn one hil t given noe; n (4) the skew S of the Zipf istriution use for seleting element tg nmes. For eh DTD n eh skew vlue S = f; 1; 2g, we generte set of 5 tree ptterns with h =1n p Λ = p == = p h =:1. Eh experiment ws run with tree ptterns from oth the XHTML n NITF DTDs, i.e., 1 tree ptterns whih mounte to more thn 1 noes. Algorithms. We ompre two ifferent ggregtion lgorithms in our experiments. The first ( nive ) lgorithm, PRUNE, is se on simple noe pruning n works s follows. At eh itertion, it selets tree pttern p mx from S with the lrgest numer of tg-noes, ollpses multiple Λ- n==-noes, n eletes prunle noe (i.e., lef noe or noe lote next to ==-noes) with the highest frequeny (i.e., lest seletive) in the oument tree DT. If there is lrey tree pttern ientil to the prune pttern, then the uplite is remove from S. The lgorithm itertes until the spe onstrint is stisfie. The seon lgorithm, AGGR, is our greey tree pttern ggregtion lgorithm (from Figure 7) with oth nite genertion n seletion (se on mximizing the enefit). Our experiments were onute on 866 MHz Intel Pentium III

11 Seletivity Loss (%) Prune (θ D =) Prune (θ D =1) Prune (θ D =2) Aggr (θ D =) Aggr (θ D =1) Aggr (θ D =2) Seletivity Loss (%) Prune (θ S =) Prune (θ S =1) Prune (θ S =2) Aggr (θ S =) Aggr (θ S =1) Aggr (θ S =2) Seletivity Loss (%) Prune (θ D =θ S =) Prune (θ D =θ S =1) Prune (θ D =θ S =2) Aggr (θ D =θ S =) Aggr (θ D =θ S =1) Aggr (θ D =θ S =2) Numer of Noes (x1,) Numer of noes (x1,) Numer of noes (x1,) () Vrying D ( S =) () Vrying S ( D =) () Vrying S n D Figure 8: Evlution of the Aggregtion Algorithms. mhine with 512 MB of min memory running Linux. Both lgorithms omplete the ggregtion of 1 tree ptterns in pproximtely 1 minutes. 5.2 Experimentl Results We first ompre the performne of the two ggregtion lgorithms y vrying the skew for element tgs in the XML ouments n in the XPth expressions. We rn the experiments with no skew, with skewe XML ouments, with skewe XPth expressions, n with skew in oth the XML ouments n XPth expressions. In the lst se, we skew the istriution for element nmes in the opposite iretion (pplying the sme skew to oth the XML ouments n XPth expressions woul yiel similr results s with no skew). The experimentl results re shown in Figures 8(), 8(), n 8(), where the spe onstrint, expresse in terms of the numer of noes, is vrie long the x-xis, n the y-xis inites the oserve loss in seletivity for given spe onstrint, i.e., the perentge of XML ouments tht re erroneously reporte s mthes. We lso mesure the enefits of ggregtion in terms of filtering performne, using the XTrie mthing lgorithm esrie in [6]. Sine the ost of filtering in XTrie grows linerly with the numer of XPth expressions, we expet to oserve signifint improvement in filtering spee s the rinlity of S ereses. Non-skewe worklo. When neither the XML t nor the tree ptterns ontin skew (i.e., D = S = ), the AGGR lgorithm n ggregte tree ptterns up to 15% of their originl size with only 25% loss in preision (the results for non-skewe t re reporte in ll grphs of Figure 8). In ontrst, the preision of PRUNE lgorithm strts to egre muh sooner, n the loss in preision rehes lmost 1% t 25% of the initil spe. The etter performne of AGGR n e ttriute to three min ftors: (1) the upper oun omputtion genertes goo nites with few noes n little loss in preision, (2) the seletivity-se heuristis help to etet n isr nites tht orrespon to ptterns with low seletivity (i.e., frequently ourring for given DTD), n (3) the overing omputtion enles reunnt tree ptterns to e eliminte erly. Skewe XML ouments. Rel-worl XML ouments re generlly not uniformly istriute mong the vli XML t for given DTD. When XML ouments re skewe (Figure 8()), we oserve tht the effetiveness of the AGGR lgorithm inreses. The reson for this is tht, s t eomes more skewe, the XML ouments ten to form lusters with ouments within luster eing more similr thn those in ifferent lusters; this, in turn, improves the ury of seletivity estimtion. The PRUNE lgorithm lso enefits from the skew (lthough to lesser extent) euse of its frequeny-se pruning heuristi. Skewe tree ptterns. We lso oserve signifint improvement in our ggregtion lgorithm when the element nmes of tree ptterns re skewe (Figure 8()). Inee, the skew inues lustering of ptterns suh tht similr tree ptterns re groupe into the sme luster, whih onsequently inreses the proportion of ptterns tht evelop ontinment reltionships. This permits the ggregtion lgorithm to reue the size of S with miniml loss of seletivity, y omputing tighter upper oun ptterns n isring overe ptterns. Skewe worklo. The two ggregtion lgorithms perform est when oth the XML t n the tree ptterns re skewe in ifferent iretions (Figure 8()). With high skew vlues, there is little overlp etween the element nmes of the XML ouments n the tree ptterns, n AGGR remins highly seletive with only few hunres noes. The PRUNE lgorithm lso exhiits signifint improvements n mintins 5% seletivity even fter the originl numer of noes re reue to less thn thir. Filtering spee. As mentione previously, the ost of mthing tree ptterns ginst inoming XML ouments is proportionl to the numer of tree ptterns. Sine AGGR genertes nites y omputing upper ouns, the nites over more ptterns, n s result, the numer of ptterns in S shrinks fster with AGGR. Figure 9 shows tht the verge filtering time per oument ereses fster (s spe is inrese) for AGGR thn for the PRUNE lgorithm. Our ggregtion lgorithm is therefore more effe-

Tree Pattern Aggregation for Scalable XML Data Dissemination

Tree Pattern Aggregation for Scalable XML Data Dissemination Tree Pttern Aggregtion or Slle XML Dt Dissemintion Chee-Yong Chn, Wenei Fn, Psl Feler, Minos Grolkis, Rjeev Rstogi Bell Ls, Luent Tehnologies yhn,wenei,minos,rstogi @reserh.ell-ls.om, Psl.Feler@eureom.r

More information

CS 491G Combinatorial Optimization Lecture Notes

CS 491G Combinatorial Optimization Lecture Notes CS 491G Comintoril Optimiztion Leture Notes Dvi Owen July 30, August 1 1 Mthings Figure 1: two possile mthings in simple grph. Definition 1 Given grph G = V, E, mthing is olletion of eges M suh tht e i,

More information

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of: 22: Union Fin CS 473u - Algorithms - Spring 2005 April 14, 2005 1 Union-Fin We wnt to mintin olletion of sets, uner the opertions of: 1. MkeSet(x) - rete set tht ontins the single element x. 2. Fin(x)

More information

Lecture 6: Coding theory

Lecture 6: Coding theory Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those

More information

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs Isomorphism of Grphs Definition The simple grphs G 1 = (V 1, E 1 ) n G = (V, E ) re isomorphi if there is ijetion (n oneto-one n onto funtion) f from V 1 to V with the property tht n re jent in G 1 if

More information

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4 Am Blnk Leture 13 Winter 2016 CSE 332 CSE 332: Dt Astrtions Sorting Dt Astrtions QuikSort Cutoff 1 Where We Are 2 For smll n, the reursion is wste. The onstnts on quik/merge sort re higher thn the ones

More information

2.4 Theoretical Foundations

2.4 Theoretical Foundations 2 Progrmming Lnguge Syntx 2.4 Theoretil Fountions As note in the min text, snners n prsers re se on the finite utomt n pushown utomt tht form the ottom two levels of the Chomsky lnguge hierrhy. At eh level

More information

XML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW

XML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW XML n Dtses Exm Prepertion Disuss Answers to lst yer s exm Sestin Mneth NICTA n UNSW CSE@UNSW -- Semester 1, 2008 (1) For eh of the following, explin why it is not well-forme XML (is WFC or the XML grmmr

More information

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides. 1 PYTHAGORAS THEOREM 1 1 Pythgors Theorem In this setion we will present geometri proof of the fmous theorem of Pythgors. Given right ngled tringle, the squre of the hypotenuse is equl to the sum of the

More information

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014 S 224 DIGITAL LOGI & STATE MAHINE DESIGN SPRING 214 DUE : Mrh 27, 214 HOMEWORK III READ : Relte portions of hpters VII n VIII ASSIGNMENT : There re three questions. Solve ll homework n exm prolems s shown

More information

XML and Databases. Outline. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths

XML and Databases. Outline. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths Outline Leture Effiient XPth Evlution XML n Dtses. Top-Down Evlution of simple pths. Noe Sets only: Core XPth. Bottom-Up Evlution of Core XPth. Polynomil Time Evlution of Full XPth Sestin Mneth NICTA n

More information

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Now we must transform the original model so we can use the new parameters. = S max. Recruits MODEL FOR VARIABLE RECRUITMENT (ontinue) Alterntive Prmeteriztions of the pwner-reruit Moels We n write ny moel in numerous ifferent ut equivlent forms. Uner ertin irumstnes it is onvenient to work with

More information

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 ) Neessry n suient onitions for some two vrile orthogonl esigns in orer 44 C. Koukouvinos, M. Mitrouli y, n Jennifer Seerry z Deite to Professor Anne Penfol Street Astrt We give new lgorithm whih llows us

More information

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite! Solutions for HW9 Exerise 28. () Drw C 6, W 6 K 6, n K 5,3. C 6 : W 6 : K 6 : K 5,3 : () Whih of the following re iprtite? Justify your nswer. Biprtite: put the re verties in V 1 n the lk in V 2. Biprtite:

More information

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106 8. Problem Set Due Wenesy, Ot., t : p.m. in - Problem Mony / Consier the eight vetors 5, 5, 5,..., () List ll of the one-element, linerly epenent sets forme from these. (b) Wht re the two-element, linerly

More information

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Project 6: Minigoals Towards Simplifying and Rewriting Expressions MAT 51 Wldis Projet 6: Minigols Towrds Simplifying nd Rewriting Expressions The distriutive property nd like terms You hve proly lerned in previous lsses out dding like terms ut one prolem with the wy

More information

Lecture 2: Cayley Graphs

Lecture 2: Cayley Graphs Mth 137B Professor: Pri Brtlett Leture 2: Cyley Grphs Week 3 UCSB 2014 (Relevnt soure mteril: Setion VIII.1 of Bollos s Moern Grph Theory; 3.7 of Gosil n Royle s Algeri Grph Theory; vrious ppers I ve re

More information

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression 5-2 Dt Strutures n Algorithms Dt Compression n Huffmn s Algorithm th Fe 2003 Rjshekr Rey Outline Dt ompression Lossy n lossless Exmples Forml view Coes Definition Fixe length vs. vrile length Huffmn s

More information

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition Dt Strutures, Spring 24 L. Joskowiz Dt Strutures LEURE Humn oing Motivtion Uniquel eipherle oes Prei oes Humn oe onstrution Etensions n pplitions hpter 6.3 pp 385 392 in tetook Motivtion Suppose we wnt

More information

Lecture 11 Binary Decision Diagrams (BDDs)

Lecture 11 Binary Decision Diagrams (BDDs) C 474A/57A Computer-Aie Logi Design Leture Binry Deision Digrms (BDDs) C 474/575 Susn Lyseky o 3 Boolen Logi untions Representtions untion n e represente in ierent wys ruth tle, eqution, K-mp, iruit, et

More information

CSC2542 State-Space Planning

CSC2542 State-Space Planning CSC2542 Stte-Spe Plnning Sheil MIlrith Deprtment of Computer Siene University of Toronto Fll 2010 1 Aknowlegements Some the slies use in this ourse re moifitions of Dn Nu s leture slies for the textook

More information

Section 2.1 Special Right Triangles

Section 2.1 Special Right Triangles Se..1 Speil Rigt Tringles 49 Te --90 Tringle Setion.1 Speil Rigt Tringles Te --90 tringle (or just 0-60-90) is so nme euse of its ngle mesures. Te lengts of te sies, toug, ve very speifi pttern to tem

More information

6.5 Improper integrals

6.5 Improper integrals Eerpt from "Clulus" 3 AoPS In. www.rtofprolemsolving.om 6.5. IMPROPER INTEGRALS 6.5 Improper integrls As we ve seen, we use the definite integrl R f to ompute the re of the region under the grph of y =

More information

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA Common intervls of genomes Mthieu Rffinot CNRS LIF Context: omprtive genomis. set of genomes prtilly/totlly nnotte Informtive group of genes or omins? Ex: COG tse Mny iffiulties! iology Wht re two similr

More information

The DOACROSS statement

The DOACROSS statement The DOACROSS sttement Is prllel loop similr to DOALL, ut it llows prouer-onsumer type of synhroniztion. Synhroniztion is llowe from lower to higher itertions sine it is ssume tht lower itertions re selete

More information

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point GCSE C Emple 7 Work out 9 Give your nswer in its simplest form Numers n inies Reiprote mens invert or turn upsie own The reiprol of is 9 9 Mke sure you only invert the frtion you re iviing y 7 You multiply

More information

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours Mi-Term Exmintion - Spring 0 Mthemtil Progrmming with Applitions to Eonomis Totl Sore: 5; Time: hours. Let G = (N, E) e irete grph. Define the inegree of vertex i N s the numer of eges tht re oming into

More information

Solving the Class Diagram Restructuring Transformation Case with FunnyQT

Solving the Class Diagram Restructuring Transformation Case with FunnyQT olving the lss Digrm Restruturing Trnsformtion se with FunnyQT Tssilo Horn horn@uni-kolenz.e Institute for oftwre Tehnology, University Kolenz-Lnu, Germny FunnyQT is moel querying n moel trnsformtion lirry

More information

Chapter 4 State-Space Planning

Chapter 4 State-Space Planning Leture slides for Automted Plnning: Theory nd Prtie Chpter 4 Stte-Spe Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Motivtion Nerly ll plnning proedures re serh proedures Different

More information

Lecture 8: Abstract Algebra

Lecture 8: Abstract Algebra Mth 94 Professor: Pri Brtlett Leture 8: Astrt Alger Week 8 UCSB 2015 This is the eighth week of the Mthemtis Sujet Test GRE prep ourse; here, we run very rough-n-tumle review of strt lger! As lwys, this

More information

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18 Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re

More information

CS 360 Exam 2 Fall 2014 Name

CS 360 Exam 2 Fall 2014 Name CS 360 Exm 2 Fll 2014 Nme 1. The lsses shown elow efine singly-linke list n stk. Write three ifferent O(n)-time versions of the reverse_print metho s speifie elow. Eh version of the metho shoul output

More information

Solutions to Problem Set #1

Solutions to Problem Set #1 CSE 233 Spring, 2016 Solutions to Prolem Set #1 1. The movie tse onsists of the following two reltions movie: title, iretor, tor sheule: theter, title The first reltion provies titles, iretors, n tors

More information

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching CS261: A Seon Course in Algorithms Leture #5: Minimum-Cost Biprtite Mthing Tim Roughgren Jnury 19, 2016 1 Preliminries Figure 1: Exmple of iprtite grph. The eges {, } n {, } onstitute mthing. Lst leture

More information

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh Computtionl Biology Leture 8: Genome rerrngements, finding miml mthes Sd Mneimneh We hve seen how to rerrnge genome to otin nother one sed on reversls nd the knowledge of the preserved loks or genes. Now

More information

Lecture 3. XML Into RDBMS. XML and Databases. Memory Representations. Memory Representations. Traversals and Pre/Post-Encoding. Memory Representations

Lecture 3. XML Into RDBMS. XML and Databases. Memory Representations. Memory Representations. Traversals and Pre/Post-Encoding. Memory Representations Leture XML into RDBMS XML n Dtses Sestin Mneth NICTA n UNSW Leture XML Into RDBMS CSE@UNSW -- Semester, 00 Memory Representtions Memory Representtions Fts DOM is esy to use, ut memory hevy. in-memory size

More information

Factorising FACTORISING.

Factorising FACTORISING. Ftorising FACTORISING www.mthletis.om.u Ftorising FACTORISING Ftorising is the opposite of expning. It is the proess of putting expressions into rkets rther thn expning them out. In this setion you will

More information

Logic, Set Theory and Computability [M. Coppenbarger]

Logic, Set Theory and Computability [M. Coppenbarger] 14 Orer (Hnout) Definition 7-11: A reltion is qusi-orering (or preorer) if it is reflexive n trnsitive. A quisi-orering tht is symmetri is n equivlene reltion. A qusi-orering tht is nti-symmetri is n orer

More information

NON-DETERMINISTIC FSA

NON-DETERMINISTIC FSA Tw o types of non-determinism: NON-DETERMINISTIC FS () Multiple strt-sttes; strt-sttes S Q. The lnguge L(M) ={x:x tkes M from some strt-stte to some finl-stte nd ll of x is proessed}. The string x = is

More information

CIT 596 Theory of Computation 1. Graphs and Digraphs

CIT 596 Theory of Computation 1. Graphs and Digraphs CIT 596 Theory of Computtion 1 A grph G = (V (G), E(G)) onsists of two finite sets: V (G), the vertex set of the grph, often enote y just V, whih is nonempty set of elements lle verties, n E(G), the ege

More information

Finite State Automata and Determinisation

Finite State Automata and Determinisation Finite Stte Automt nd Deterministion Tim Dworn Jnury, 2016 Lnguges fs nf re df Deterministion 2 Outline 1 Lnguges 2 Finite Stte Automt (fs) 3 Non-deterministi Finite Stte Automt (nf) 4 Regulr Expressions

More information

A Disambiguation Algorithm for Finite Automata and Functional Transducers

A Disambiguation Algorithm for Finite Automata and Functional Transducers A Dismigution Algorithm for Finite Automt n Funtionl Trnsuers Mehryr Mohri Cournt Institute of Mthemtil Sienes n Google Reserh 51 Merer Street, New York, NY 1001, USA Astrt. We present new ismigution lgorithm

More information

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution Tehnishe Universität Münhen Winter term 29/ I7 Prof. J. Esprz / J. Křetínský / M. Luttenerger. Ferur 2 Solution Automt nd Forml Lnguges Homework 2 Due 5..29. Exerise 2. Let A e the following finite utomton:

More information

Welcome. Balanced search trees. Balanced Search Trees. Inge Li Gørtz

Welcome. Balanced search trees. Balanced Search Trees. Inge Li Gørtz Welome nge Li Gørt. everse tehing n isussion of exerises: 02110 nge Li Gørt 3 tehing ssistnts 8.00-9.15 Group work 9.15-9.45 isussions of your solutions in lss 10.00-11.15 Leture 11.15-11.45 Work on exerises

More information

Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework

Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework R-17 SASIMI 015 Proeeings Tehnology Mpping Metho for Low Power Consumption n High Performne in Generl-Synhronous Frmework Junki Kwguhi Yukihie Kohir Shool of Computer Siene, the University of Aizu Aizu-Wkmtsu

More information

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza An Unfoling Approh to Moel Cheking Jvier Esprz Lbortory for Fountions of Computer Siene University of Einburgh Conurrent progrms Progrm: tuple P T 1 T n of finite lbelle trnsition systems T i A i S i i

More information

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233,

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233, Surs n Inies Surs n Inies Curriulum Rey ACMNA:, 6 www.mthletis.om Surs SURDS & & Inies INDICES Inies n surs re very losely relte. A numer uner (squre root sign) is lle sur if the squre root n t e simplifie.

More information

CS 573 Automata Theory and Formal Languages

CS 573 Automata Theory and Formal Languages Non-determinism Automt Theory nd Forml Lnguges Professor Leslie Lnder Leture # 3 Septemer 6, 2 To hieve our gol, we need the onept of Non-deterministi Finite Automton with -moves (NFA) An NFA is tuple

More information

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b CS 294-2 9/11/04 Quntum Ciruit Model, Solovy-Kitev Theorem, BQP Fll 2004 Leture 4 1 Quntum Ciruit Model 1.1 Clssil Ciruits - Universl Gte Sets A lssil iruit implements multi-output oolen funtion f : {0,1}

More information

Subsequence Automata with Default Transitions

Subsequence Automata with Default Transitions Susequene Automt with Defult Trnsitions Philip Bille, Inge Li Gørtz, n Freerik Rye Skjoljensen Tehnil University of Denmrk {phi,inge,fskj}@tu.k Astrt. Let S e string of length n with hrters from n lphet

More information

Durable Top-k Search in Document Archives

Durable Top-k Search in Document Archives Durle Top-k Serh in Doument Arhives Leong Hou U, Nikos Mmoulis, Klus Bererih, Sriknt Bethur Deprtment of Computer Siene, University of Hong Kong Pokfulm Ro, Hong Kong {hleongu, nikos}@s.hku.hk Mx-Plnk

More information

I 3 2 = I I 4 = 2A

I 3 2 = I I 4 = 2A ECE 210 Eletril Ciruit Anlysis University of llinois t Chigo 2.13 We re ske to use KCL to fin urrents 1 4. The key point in pplying KCL in this prolem is to strt with noe where only one of the urrents

More information

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version A Lower Bound for the Length of Prtil Trnsversl in Ltin Squre, Revised Version Pooy Htmi nd Peter W. Shor Deprtment of Mthemtil Sienes, Shrif University of Tehnology, P.O.Bo 11365-9415, Tehrn, Irn Deprtment

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

If the numbering is a,b,c,d 1,2,3,4, then the matrix representation is as follows:

If the numbering is a,b,c,d 1,2,3,4, then the matrix representation is as follows: Reltions. Solutions 1. ) true; ) true; ) flse; ) true; e) flse; f) true; g) flse; h) true; 2. 2 A B 3. Consier ll reltions tht o not inlue the given pir s n element. Oviously, the rest of the reltions

More information

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6 CS311 Computtionl Strutures Regulr Lnguges nd Regulr Grmmrs Leture 6 1 Wht we know so fr: RLs re losed under produt, union nd * Every RL n e written s RE, nd every RE represents RL Every RL n e reognized

More information

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem Rep Fun Gme Properties Arrow s Theorem Arrow s Impossiility Theorem Leture 12 Arrow s Impossiility Theorem Leture 12, Slide 1 Rep Fun Gme Properties Arrow s Theorem Leture Overview 1 Rep 2 Fun Gme 3 Properties

More information

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE M. STISSING, C. N. S. PEDERSEN, T. MAILUND AND G. S. BRODAL Bioinformtis Reserh Center, n Dept. of Computer Siene, University

More information

INTRODUCTION TO AUTOMATA THEORY

INTRODUCTION TO AUTOMATA THEORY Chpter 3 INTRODUCTION TO AUTOMATA THEORY In this hpter we stuy the most si strt moel of omputtion. This moel els with mhines tht hve finite memory pity. Setion 3. els with mhines tht operte eterministilly

More information

Maximum size of a minimum watching system and the graphs achieving the bound

Maximum size of a minimum watching system and the graphs achieving the bound Mximum size of minimum wthing system n the grphs hieving the oun Tille mximum un système e ontrôle minimum et les grphes tteignnt l orne Dvi Auger Irène Chron Olivier Hury Antoine Lostein 00D0 Mrs 00 Déprtement

More information

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS The University of ottinghm SCHOOL OF COMPUTR SCIC A LVL 2 MODUL, SPRIG SMSTR 2015 2016 MACHIS AD THIR LAGUAGS ASWRS Time llowed TWO hours Cndidtes my omplete the front over of their nswer ook nd sign their

More information

POSITIVE IMPLICATIVE AND ASSOCIATIVE FILTERS OF LATTICE IMPLICATION ALGEBRAS

POSITIVE IMPLICATIVE AND ASSOCIATIVE FILTERS OF LATTICE IMPLICATION ALGEBRAS Bull. Koren Mth. So. 35 (998), No., pp. 53 6 POSITIVE IMPLICATIVE AND ASSOCIATIVE FILTERS OF LATTICE IMPLICATION ALGEBRAS YOUNG BAE JUN*, YANG XU AND KEYUN QIN ABSTRACT. We introue the onepts of positive

More information

arxiv: v2 [math.co] 31 Oct 2016

arxiv: v2 [math.co] 31 Oct 2016 On exlue minors of onnetivity 2 for the lss of frme mtrois rxiv:1502.06896v2 [mth.co] 31 Ot 2016 Mtt DeVos Dryl Funk Irene Pivotto Astrt We investigte the set of exlue minors of onnetivity 2 for the lss

More information

Compression of Palindromes and Regularity.

Compression of Palindromes and Regularity. Compression of Plinromes n Regulrity. Kyoko Shikishim-Tsuji Center for Lierl Arts Eution n Reserh Tenri University 1 Introution In [1], property of likstrem t t view of tse is isusse n it is shown tht

More information

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005 RLETON UNIVERSIT eprtment of Eletronis ELE 2607 Swithing iruits erury 28, 05; 0 pm.0 Prolems n Most Solutions, Set, 2005 Jn. 2, #8 n #0; Simplify, Prove Prolem. #8 Simplify + + + Reue to four letters (literls).

More information

Geodesics on Regular Polyhedra with Endpoints at the Vertices

Geodesics on Regular Polyhedra with Endpoints at the Vertices Arnol Mth J (2016) 2:201 211 DOI 101007/s40598-016-0040-z RESEARCH CONTRIBUTION Geoesis on Regulr Polyher with Enpoints t the Verties Dmitry Fuhs 1 To Sergei Thnikov on the osion of his 60th irthy Reeive:

More information

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem.

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem. 27 Lesson 2: The Pythgoren Theorem nd Similr Tringles A Brief Review of the Pythgoren Theorem. Rell tht n ngle whih mesures 90º is lled right ngle. If one of the ngles of tringle is right ngle, then we

More information

Let s divide up the interval [ ab, ] into n subintervals with the same length, so we have

Let s divide up the interval [ ab, ] into n subintervals with the same length, so we have III. INTEGRATION Eonomists seem muh more intereste in mrginl effets n ifferentition thn in integrtion. Integrtion is importnt for fining the epete vlue n vrine of rnom vriles, whih is use in eonometris

More information

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx Applitions of Integrtion Are of Region Between Two Curves Ojetive: Fin the re of region etween two urves using integrtion. Fin the re of region etween interseting urves using integrtion. Desrie integrtion

More information

6. Suppose lim = constant> 0. Which of the following does not hold?

6. Suppose lim = constant> 0. Which of the following does not hold? CSE 0-00 Nme Test 00 points UTA Stuent ID # Multiple Choie Write your nswer to the LEFT of eh prolem 5 points eh The k lrgest numers in file of n numers n e foun using Θ(k) memory in Θ(n lg k) time using

More information

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points: Eidgenössishe Tehnishe Hohshule Zürih Eole polytehnique fédérle de Zurih Politenio federle di Zurigo Federl Institute of Tehnology t Zurih Deprtement of Computer Siene. Novemer 0 Mrkus Püshel, Dvid Steurer

More information

Automata and Regular Languages

Automata and Regular Languages Chpter 9 Automt n Regulr Lnguges 9. Introution This hpter looks t mthemtil moels of omputtion n lnguges tht esrie them. The moel-lnguge reltionship hs multiple levels. We shll explore the simplest level,

More information

Lesson 2.1 Inductive Reasoning

Lesson 2.1 Inductive Reasoning Lesson 2.1 Inutive Resoning Nme Perio Dte For Eerises 1 7, use inutive resoning to fin the net two terms in eh sequene. 1. 4, 8, 12, 16,, 2. 400, 200, 100, 50, 25,, 3. 1 8, 2 7, 1 2, 4, 5, 4. 5, 3, 2,

More information

Eigenvectors and Eigenvalues

Eigenvectors and Eigenvalues MTB 050 1 ORIGIN 1 Eigenvets n Eigenvlues This wksheet esries the lger use to lulte "prinipl" "hrteristi" iretions lle Eigenvets n the "prinipl" "hrteristi" vlues lle Eigenvlues ssoite with these iretions.

More information

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic Chpter 3 Vetor Spes In Chpter 2, we sw tht the set of imges possessed numer of onvenient properties. It turns out tht ny set tht possesses similr onvenient properties n e nlyzed in similr wy. In liner

More information

Lesson 2.1 Inductive Reasoning

Lesson 2.1 Inductive Reasoning Lesson 2.1 Inutive Resoning Nme Perio Dte For Eerises 1 7, use inutive resoning to fin the net two terms in eh sequene. 1. 4, 8, 12, 16,, 2. 400, 200, 100, 50, 25,, 3. 1 8, 2 7, 1 2, 4, 5, 4. 5, 3, 2,

More information

Particle Physics. Michaelmas Term 2011 Prof Mark Thomson. Handout 3 : Interaction by Particle Exchange and QED. Recap

Particle Physics. Michaelmas Term 2011 Prof Mark Thomson. Handout 3 : Interaction by Particle Exchange and QED. Recap Prtile Physis Mihelms Term 2011 Prof Mrk Thomson g X g X g g Hnout 3 : Intertion y Prtile Exhnge n QED Prof. M.A. Thomson Mihelms 2011 101 Rep Working towrs proper lultion of ey n sttering proesses lnitilly

More information

A Primer on Continuous-time Economic Dynamics

A Primer on Continuous-time Economic Dynamics Eonomis 205A Fll 2008 K Kletzer A Primer on Continuous-time Eonomi Dnmis A Liner Differentil Eqution Sstems (i) Simplest se We egin with the simple liner first-orer ifferentil eqution The generl solution

More information

Statistics in medicine

Statistics in medicine Sttistis in meiine Workshop 1: Sreening n ignosti test evlution Septemer 22, 2016 10:00 AM to 11:50 AM Hope 110 Ftm Shel, MD, MS, MPH, PhD Assistnt Professor Chroni Epiemiology Deprtment Yle Shool of Puli

More information

Part I: Study the theorem statement.

Part I: Study the theorem statement. Nme 1 Nme 2 Nme 3 A STUDY OF PYTHAGORAS THEOREM Instrutions: Together in groups of 2 or 3, fill out the following worksheet. You my lift nswers from the reding, or nswer on your own. Turn in one pket for

More information

ANALYSIS AND MODELLING OF RAINFALL EVENTS

ANALYSIS AND MODELLING OF RAINFALL EVENTS Proeedings of the 14 th Interntionl Conferene on Environmentl Siene nd Tehnology Athens, Greee, 3-5 Septemer 215 ANALYSIS AND MODELLING OF RAINFALL EVENTS IOANNIDIS K., KARAGRIGORIOU A. nd LEKKAS D.F.

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE M. STISSING, C. N. S. PEDERSEN, T. MAILUND AND G. S. BRODAL Bioinformtis Reserh Center, n Dept. of Computer Siene, University

More information

Analysis of Temporal Interactions with Link Streams and Stream Graphs

Analysis of Temporal Interactions with Link Streams and Stream Graphs Anlysis of Temporl Intertions with n Strem Grphs, Tiphine Vir, Clémene Mgnien http:// ltpy@ LIP6 CNRS n Soronne Université Pris, Frne 1/23 intertions over time 0 2 4 6 8,,, n for 10 time units time 2/23

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Nondeterministic Automata vs Deterministic Automata

Nondeterministic Automata vs Deterministic Automata Nondeterministi Automt vs Deterministi Automt We lerned tht NFA is onvenient model for showing the reltionships mong regulr grmmrs, FA, nd regulr expressions, nd designing them. However, we know tht n

More information

Metaheuristics for the Asymmetric Hamiltonian Path Problem

Metaheuristics for the Asymmetric Hamiltonian Path Problem Metheuristis for the Asymmetri Hmiltonin Pth Prolem João Pero PEDROSO INESC - Porto n DCC - Fule e Ciênis, Universie o Porto, Portugl jpp@f.up.pt Astrt. One of the most importnt pplitions of the Asymmetri

More information

Situation Calculus. Situation Calculus Building Blocks. Sheila McIlraith, CSC384, University of Toronto, Winter Situations Fluents Actions

Situation Calculus. Situation Calculus Building Blocks. Sheila McIlraith, CSC384, University of Toronto, Winter Situations Fluents Actions Plnning gent: single gent or multi-gent Stte: complete or Incomplete (logicl/probbilistic) stte of the worl n/or gent s stte of knowlege ctions: worl-ltering n/or knowlege-ltering (e.g. sensing) eterministic

More information

Total score: /100 points

Total score: /100 points Points misse: Stuent's Nme: Totl sore: /100 points Est Tennessee Stte University Deprtment of Computer n Informtion Sienes CSCI 2710 (Trnoff) Disrete Strutures TEST 2 for Fll Semester, 2004 Re this efore

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

Monochromatic Plane Matchings in Bicolored Point Set

Monochromatic Plane Matchings in Bicolored Point Set CCCG 2017, Ottw, Ontrio, July 26 28, 2017 Monohromti Plne Mthings in Biolore Point Set A. Krim Au-Affsh Sujoy Bhore Pz Crmi Astrt Motivte y networks interply, we stuy the prolem of omputing monohromti

More information

Section 2.3. Matrix Inverses

Section 2.3. Matrix Inverses Mtri lger Mtri nverses Setion.. Mtri nverses hree si opertions on mtries, ition, multiplition, n sutrtion, re nlogues for mtries of the sme opertions for numers. n this setion we introue the mtri nlogue

More information

Mining Frequent Web Access Patterns with Partial Enumeration

Mining Frequent Web Access Patterns with Partial Enumeration Mining Frequent We Aess Ptterns with Prtil Enumertion Peiyi Tng Deprtment of Computer Siene University of Arknss t Little Rok 2801 S. University Ave. Little Rok, AR 72204 Mrkus P. Turki Deprtment of Computer

More information

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem Rep Voting Prdoxes Properties Arrow s Theorem Arrow s Impossiility Theorem Leture 12 Arrow s Impossiility Theorem Leture 12, Slide 1 Rep Voting Prdoxes Properties Arrow s Theorem Leture Overview 1 Rep

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Bi-decomposition of large Boolean functions using blocking edge graphs

Bi-decomposition of large Boolean functions using blocking edge graphs Bi-eomposition of lrge Boolen funtions using loking ege grphs Mihir Chouhury n Krtik Mohnrm Deprtment of Eletril n Computer Engineering, Rie University, Houston {mihir,kmrm}@rie.eu Astrt Bi-eomposition

More information

Computing the Quartet Distance between Evolutionary Trees in Time O(n log n)

Computing the Quartet Distance between Evolutionary Trees in Time O(n log n) Computing the Qurtet Distne etween Evolutionry Trees in Time O(n log n) Gerth Stølting Brol, Rolf Fgererg Christin N. S. Peersen Mrh 3, 2003 Astrt Evolutionry trees esriing the reltionship for set of speies

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

= state, a = reading and q j

= state, a = reading and q j 4 Finite Automt CHAPTER 2 Finite Automt (FA) (i) Derterministi Finite Automt (DFA) A DFA, M Q, q,, F, Where, Q = set of sttes (finite) q Q = the strt/initil stte = input lphet (finite) (use only those

More information

A Study on the Properties of Rational Triangles

A Study on the Properties of Rational Triangles Interntionl Journl of Mthemtis Reserh. ISSN 0976-5840 Volume 6, Numer (04), pp. 8-9 Interntionl Reserh Pulition House http://www.irphouse.om Study on the Properties of Rtionl Tringles M. Q. lm, M.R. Hssn

More information