Tree Pattern Aggregation for Scalable XML Data Dissemination

Size: px
Start display at page:

Download "Tree Pattern Aggregation for Scalable XML Data Dissemination"

Transcription

1 Tree Pttern Aggregtion or Slle XML Dt Dissemintion Chee-Yong Chn, Wenei Fn, Psl Feler, Minos Grolkis, Rjeev Rstogi Bell Ls, Luent Tehnologies Astrt With the rpi growth o XML-oument tri on the Internet, slle ontent-se issemintion o XML ouments to lrge, ynmi group o onsumers hs eome n importnt reserh hllenge. To inite the type o ontent tht they re intereste in, t onsumers typilly speiy their susriptions using some XML pttern speiition lnguge e.g., XPth). Given the lrge volume o susriers, system slility n eiieny mnte the ility to ggregte the set o onsumer susriptions to smller set o ontent speiitions, so s to oth reue their storgespe requirements s well s spee up the oumentsusription mthing proess. In this pper, we provie the irst systemti stuy o susription ggregtion where susriptions re speiie with tree ptterns n importnt sulss o XPth expressions). The min hllenge is to ggregte n input set o tree ptterns into smller set o generlize tree ptterns suh tht: 1) given spe onstrint on the totl size o the susriptions is met, n 2) the loss in preision ue to ggregtion) uring oument iltering is minimize. We propose n eiient tree-pttern ggregtion lgorithm tht mkes eetive use o oument-istriution sttistis in orer to ompute preise set o ggregte tree ptterns within the llotte spe uget. As prt o our solution, we lso evelop severl novel lgorithms or tree-pttern ontinment n minimiztion, s well s lest-upper-oun omputtion or set o tree ptterns. These results re o interest in their own right, n n prove useul in other omins, suh s XML query optimiztion. Extensive results rom prototype implementtion vlite our pproh. 1 Introution XML extensile Mrkup Lnguge) [16] hs eome the ominnt stnr or t enoing n exhnge Currently on leve rom Temple University n supporte in prt y NSF Creer Awr IIS Current ilition: Institut EURECOM, Sophi Antipolis, Frne Permission to opy without ee ll or prt o this mteril is grnte provie tht the opies re not me or istriute or iret ommeril vntge, the VLDB opyright notie n the title o the pulition n its te pper, n notie is given tht opying is y permission o the Very Lrge Dt Bse Enowment. To opy otherwise, or to repulish, requires ee n/or speil permission rom the Enowment. Proeeings o the 28th VLDB Conerene, Hong Kong, Chin, 2002 on the Internet, inluing e-business trnstions in oth Business-to-Business B2B) n Business-to-Consumer B2C) pplitions. Given the rpi growth o XML tri on the Internet, the eetive n eiient elivery o XML ouments hs eome n importnt issue. Consequently, there is growing interest in the re o XML ontent-se iltering n routing e.g., [4]), whih resses the prolem o eetively ireting high volumes o XML-oument tri to intereste onsumers se on oument ontents. Unlike onventionl routing, where pkets re route se on limite, ixe set o ttriutes e.g., soure/estintion IP resses n port numers), ontent-se routing is se on generl ptterns o the oument ontents, whih is signiintly more lexile n emning. Consumers typilly speiy their susriptions, initing the type o XML ontent tht they re intereste in, using some XML pttern speiition lnguge e.g., XPth [15]). For eh inoming XML oument, ontent-se router mthes the oument ontents ginst the set o susriptions to ientiy the su)set o intereste onsumers, n routes the oument to them. Thus, in ontent-se routing, the estintion o n XML oument is generlly unknown to the t prouer, n is ompute ynmilly se on the oument ontents n the tive set o susriptions. Eetive support or slle, ontent-se XML routing is ruil to enling eiient n timely elivery o relevnt XML ouments to lrge, ynmi group o onsumers. Given the lrge volume o potentil onsumers, system slility n eiieny mte the ility to juiiously ggregte the set o onsumer susriptions to smller set o ontent speiitions. The gol, o ourse, is to oth reue the susriptions storge spe requirements e.g., so tht the routing tle its in min memory), s well s spee up the iltering o inoming XML tri. For instne, ore router in B2B pplition my hoose to ggregte susriptions se on geogrphil lotion, ilition, or omin-speii inormtion e.g., teleommunitions). Susription ggregtion essentilly involves ggregting n initil set o susriptions into smller set suh tht ny oument tht mthes some susription in lso mthes some susription in. However, sine there is typilly loss o preision ssoite with suh ggregtion, the ouments mthe y the ggregte set is, in generl, superset o those mthe y the originl set. As result, oument my e route to onsumers who hve not susrie to it, thus resulting in n inrese in the mount o unwnte

2 Bh CD SONY ) p CD Bh ) p CD Bh ) p Bh ) p CD Bh CD SONY Clssil Jzz Pop e) T Figure 1: Exmple Tree Ptterns n XML Doument Tree. oument tri. In orer to voi suh spurious orwring o ouments, it is esirle to minimize the numer o suh lse mthes i.e., minimize the loss in preision) with respet to the given spe onstrint or the ggregte susriptions. So r, there hs only een limite work on susription ggregtion, minly or very simple susription moels. For exmple, in [12], eh susription is set o ttriute-preite ') pirs e.g., GE! #"$ &% ), n n ggregte susription is llowe to ontin wilr vlues, initing the entire set o omin vlues or ertin ttriutes. 1 In this pper, we provie the irst systemti stuy o the susription ggregtion prolem where susriptions re speiie using the muh more expressive moel o tree ptterns. Tree ptterns represent n importnt sulss o XPth expressions tht oers nturl mens or speiying tree-struture onstrints in XML n LDAP pplitions []. Compre to erlier work se on ttriute/preite-se susriptions, eetively ggregting tree-ptterns poses muh more hllenging prolem sine susriptions involve oth ontent inormtion noe lels) s well s struture inormtion prent-hil n nestor-esennt reltionships). Briely, our tree pttern ggregtion prolem n e stte s ollows: Given n input set o tree ptterns n spe onstrint, ggregte into smller set o generlize tree ptterns tht meets the spe onstrint, n or whih the loss in preision ue to ggregtion is minimize. Exmple 1.1 Consier the two similr tree-pttern-se susriptions n + shown in Figure 1, where, mthes ny oument with root element lele CD tht hs oth su-element lele SONY s well s su-element with n ritrry lel) tht in turn hs su-element lele Bh n + mthes ny oument tht hs some element lele CD with suelement lele Bh. Here the noe lele - wilr) mthes ny lel, while the noe lele.. esennt) mthes some possily empty) pth. The XML oument / shown in Figure 1e) mthes or stisies) 0 ut not + euse the su-element lele Bh in 1 Due to spe onstrints, more etile overview o relte work n e oun in the ppenix. / oes not hve prent element lele CD. For eiieny resons, one might wnt to ggregte the set o tree ptterns! + into single tree pttern. Two exmples o ggregte tree ptterns or1,'!0+ re 2 n 0 in Figure 1) sine ny oument tht stisies or + lso stisies oth 2 n. Although oth 2 n 0 hve the sme numer o noes, 2 is intuitively more preise thn 0 with respet to1! + sine 2 preserves the nestor-esennt reltionship etween the CD n Bh elements s require y n +. Inee, ny XML oument tht stisies 2 lso stisies n thus we sy tht ontins 02 ). 4 To the est o our knowlege, our work is the irst to ress this timely susription ggregtion prolem or XML t issemintion. Our min ontriutions n e summrize s ollows. 5 We stuy the properties o tree ptterns n evelop eiient lgorithms or eiing tree pttern ontinment, minimizing tree pttern, n omputing the most preise ggregte i.e., the lest upper oun ) or set o ptterns. Our results re not only interesting in their own right, ut lso provie solutions or speil ses o our tree pttern ggregtion prolem. 5 We propose novel, eiient metho tht exploits orse sttistis on the unerlying istriution o XML ouments to ompute preise set o ggregte ptterns within the llotte spe uget. Speiilly, our sheme employs the oument sttistis to estimte the seletivity o tree pttern, whih is lso use s mesure o the pttern s preiseness. Thus, our ggregtion prolem reues to tht o ining ompt set o ggregte ptterns with miniml loss in seletivity, or whih we present greey heuristi. 5 We emonstrte experimentlly the eetiveness o our pproh in omputing spe-eiient n preise set o ggregte tree ptterns. The useulness o our results on tree ptterns n their ggregtion is not limite to ontent-se routing, ut lso extens to other pplition omins suh s the optimiztion o XML queries involving tree ptterns n the proessing/issemintion o susription queries in multist environment [9] where ggregtion n e use to reue server lo n network tri). Further, our work n results re omplementry to reent work on eiient inexing strutures or XPth expressions [2, 6]. The ous o this erlier reserh is to spee up oument iltering with given set o XPth susriptions using pproprite inexing shemes. In ontrst, our work ouses on eetively reuing the volume o susriptions tht nee to e mthe in orer to ensure slility given oune storge resoures or routing. Clerly, our tehniques n e use s pre-proessing step or the inexes o [2, 6] when hr onstrints on the size o the inex must e met. Due to spe limittions, the proos o ll theoretil results n e oun in the ull version o this pper [5].

3 2 Prolem Formultion 2.1 Deinitions A tree pttern is n unorere noe-lele tree tht speiies ontent n struture onitions on n XML oument. More speiilly, tree pttern hs set o noes, enote y, where eh noe in hs lel, enote y, whih n either e tg nme, - wilr tht mthes ny tg), or.. the esennt opertor). In prtiulr, the root noe hs speil lel.. We use 0! to enote the sutree o roote t, reerre to s su-pttern o. Some exmples o tree ptterns re epite in Figure 2. To eine the semntis o tree pttern, we irst give the semntis o su-pttern!#, where is not the root noe o. Rell tht XML ouments re typilly represente s noe-lele trees, reerre to s XML trees. Let / e n XML tree n e noe in /. We sy tht / stisies!# t noe, enote y /!#, i the ollowing onitions hol: 1) i! is tg, hs hil noe lele suh tht or eh hil noe o,!/!! 2) i -, hs hil noe lele with n ritrry tg suh tht or eh hil noe o, /!! n ) i.., hs esennt noe possily ) suh tht or eh hil o, /!!. We next eine the semntis o tree ptterns. Let / e e tree pttern with n XML tree with root, n root!. We sy tht / stisies, enote y /", i or eh hil noe o #, 1) i is tg, is lele with n or eh hil noe o,!/ $ %! here! speiies the tg o ) 2) i -, my hve ny lel n or eh hil noe o, / &!! ) i.., hs esennt noe possily ' ) suh tht /, where / is the sutree roote t, n is ientil to!# exept tht is the lel or the root noe inste o! ). Oserve tht is trete ierently rom the rest o the noes o. The motivtion ehin this is illustrte y in Figure 2, whih speiies the ollowing: or ny XML tree / stisying, its root must e lele with n moreover, it must ontin two onseutive elements somewhere. This nnot e expresse without our speil root lel s tree ptterns o not llow union opertor). Exmple 2.1 Consier the tree pttern, in Figure 2. An XML oument / stisies, i its root element stisies ll the ollowing onitions: 1) its lel is 2) it must hve hil element with n ritrry tg, whih in turn hs hil element with lel n ) it must hve esennt element whih hs oth -hil element n n -hil element. Thus, essentilly speiies existentil) onjuntive onitions on XML ouments. It shoul e note tht ouments stisying my hve tgs/sutrees not mentione in. For instne, the root element o / my hve -hil element, n the -elements o / my hve -esennt elements. 4 A tree pttern is si to e onsistent i n only i there exists n XML oument tht stisies. We only onsier onsistent tree ptterns in our work. Further, the tree ptterns eine ove n e nturlly generlize to ommote simple onitions ' n preites e.g., GE n ). To simpliy the isussion, we o not onsier suh extensions in this pper. It is worth mentioning tht tree pttern n e esily onverte to n equivlent XPth expression [15] in whih eh su-pttern is expresse s onition/quliier [5]. Thus, our tree ptterns re grph representtions o lss o XPth expressions, whih re similr to the tree ptterns tht hve een stuie or XML queries e.g., [, 17]). It is tempting to onsier using lrger rgment o XPth to express susription ptterns. However, it turns out tht even mil generliztion o our tree ptterns e.g., with the ition o union/isjuntion opertors) les to muh higher omplexity onp-hr or eyon) or si opertions suh s ontinment omputtion e.g., see [10]). A tree pttern ) is si to e ontine in nother tree pttern, enote y )+, i n only i or ny XML tree /, i / stisies ) / lso stisies. I )+, we reer to s the ontiner pttern n ) s the ontine pttern. We sy tht n ) re equivlent, enote y -,.), i /0) n )-. This einition n e generlize to sets o tree ptterns: set o tree ptterns is ontine in nother set o tree ptterns, enote y 1, i or eh 2, there exists 2 suh tht. Continment or su-ptterns is eine similrly. The size o tree pttern, enote y 4, is simply the rinlity o its noe set. For exmple, reerring to Figure 2, 65 n + ' Prolem Sttement The tree pttern ggregtion prolem tht we investigte in this pper n now e stte s ollows. Given set o tree pttern susriptions n spe oun 9 on the totl size o the ggregte susriptions, ompute set o tree ptterns tht stisies ll o the ollowing three onitions: C1) : i.e., is t lest s generl s ), C2) <&=?>A@ B >C D-9 i.e., is onise ), n C) is s preise s possile, in the sense tht there oes not exist nother set o tree ptterns tht stisies the irst two onitions n. Clerly, the tree pttern ggregtion prolem my not neessrily hve unique solution sine it is possile to hve two sets n tht stisy the irst two onitions ut E n FE. Thereore, we nee to evise some mesure to quntiy the gooness o nite solutions in terms o oth their oniseness s well s preiseness. With respet to oniseness, we re intereste in miniml tree ptterns tht o not ontin ny reunnt noes. More preisely, we sy tht tree pttern is minimize i or ny tree pttern suh tht,, it is the se tht GHDI J. With respet to preiseness, it n e

4 E E E ) p ) p ) p ) p x y e) p e ) p g) p g h) p h i) p i Figure 2: Exmples o Tree Ptterns. shown tht the ontinment reltionship on the universe o tree ptterns tully eines lttie. In prtiulr, the notions o upper oun n lest upper oun re o relevne to the ggregtion prolem n, thereore, we eine them ormlly here. An upper oun o two tree ptterns n ) is tree pttern suh tht n ), i.e., or ny XML tree /, i / or / 6) /. The lest upper oun LUB) o n ), enote y ), is n upper oun o n ) suh tht, or ny upper oun o n ),. One gin, we generlize the notion o LUBs to set o tree ptterns. An upper oun o is tree pttern, enote y /, suh tht : or every :2. The LUB o, enote y, is n upper oun o suh tht or ny upper oun o,. Clerly, i is n ggregte tree pttern or set o tree ptterns i.e.,. ), is n upper oun o. Oserve tht, i is the LUB o, is the most preise ggregte tree pttern or. In t, it n e shown tht exists n is unique up to equivlene or ny set o tree ptterns [5] thus, it is meningul to tlk out s the most preise ggregte tree pttern. Exmple 2.2 Consier gin the tree ptterns in Figure 2. Oserve tht +, 02 n sine + % 02, 0+ is not minimize pttern. In t, exept or,+, ll the tree ptterns in Figure 2 re minimize ptterns. Note tht 02 euse the root noe o oes not hve tg- hil noe n 2 euse there exists no noe in 2 tht is prent noe o oth tg--noe n tg--noe. Oserve tht 0 n 2 0 i.e., is n upper oun o n 2. However, 2 sine we hve nother tree pttern,, whih is n upper oun o n 2 suh tht. Inee, 2 with! 8 2. Note, however, tht the size o n LUB is not neessrily lwys smller thn the size o its onstituent ptterns. For exmple, 2 ut % 02. Note tht is n upper oun o1 '0+!02!!. 4 We onlue this setion y presenting some itionl nottion use in this pper. For noe in tree pttern, we enote the set o hil noes o in y 0!. We lso eine prtil orering on noe lels suh tht i n re tg nmes, 1) -.'. n 2) i. Given two noes n, 0 is eine to e the lest upper oun o their lels! n s ollows: 65 9!%'& ),+:1! #"$!%'& ),+.-0/214 7 =>= 9!%'& ),+:19!%'& ),/21<- i )?9!%'& ),+:1 =>= 1 86 i )?9!%'& ),/214 =@= 1<- or otherwise. For exmple, # - n -.'.!... For nottionl onveniene, we reer to noe in tree pttern s n A -B ' i CA, n reer to s tg-noe i 2 E Computing the Most Preise Aggregte In this setion, we onsier speil se o our tree pttern ggregtion prolem, nmely, when the ggregte set onsists o single tree pttern n there is no spe onstrint. For this se, we provie n lgorithm to ompute the most preise ggregte tree pttern i.e., LUB) or set o tree ptterns. Some o the lgorithms given in this setion re lso key omponents o our solution or the generl prolem, whih is presente in the next setion. Given two input tree ptterns n ), Algorithm LUB in Figure omputes the most preise ggregte tree pttern or ) i.e., the LUB o n ) ). It trverses n ) top-own n omputes the tightest ontiner su-ptterns or eh pir o su-ptterns!# n ) )! enountere, where n re noes in n ), respetively. The tightest ontiner su-ptterns o n ) re set D o su-ptterns suh tht: 1) D onsists o ontiner su-ptterns 2 o n ), i.e., or ny XML oument / n ny element in /, i!/ or /.) / or eh 2ED n, 2 Note tht su-pttern o tree ptterns F n G is n upper-oun o F n G, n we use these two terms interhngely.

5 ë h )- >1 Algorithm LUB Input: n re tree ptterns. Output: A tree pttern representing the LUB o n. 1) i ) return 2) i ) return % +.-0/ ) Initilize +!&! ) 1<- /"!&#!, )$>1 4) Let +#%'&&) n /%'&&) enote the root noes o n, resp. 5) or eh + /#),+0%'&1&) -2#1,+.- o 6) or eh /" /#),/ %'&1&) - 1 % 4, /5#76#849 :849 o),+.- / - % 4 1 7) 8) Crete tree pttern with root noe lel =4< n the set o hil = su-ptterns Y >?A@CBEDGF HEIJ>1K'L LNMPO QERSO TU?4@CBDVF HEIJTWK'LNLNMO XR ), 1 9) return Z\[E]\[EZ^[#_#` % 4 1 Algorithm + / ),+.- / - LUB SUB Y Input:, re noes in tree ptterns % /, % 4 respetively),, % 4 is +.- / 2-imensionl rry suh tht, %PN>& & is the),+.- set#1 o & ontiner ),/ -N 1 su-ptterns o n Y. Output: ) % / Y % / >1. 1) i Y 2) return ) % /5 %PN>& & ),/ -N 1 %PN>& & ),+.-#1 1 ) else i 4) return ) & ),+.-$ 1 %PN>& & Y ),+.- 1 %PN>& & ),/ - >1 1 5) else i 6) return & ),/ -'>1 Y 7) else 8) Initilize + g e e, e 9) or eh / /#),+.-2 1,+.- V#),/ -1>1 o 10) or eh 60849,+\- :E849),+ - / o 11) e + % 1 /#), ) or eh,+.- o 1) e e / :E849),+ - / - % 4 1 h V#),/ -1>1 14) or eh,+\- 6#849 :849 ),+.- o / % ) e e h, 16) Let e the pttern with root noe lel n set o hil sutree ptterns e 17) Let =>= e the pttern with root noe lel n set o hil sutree ptterns e 18) Let =@= e the pttern with root noe lel n set o hil sutree ptterns e 19) return % /5. -! #"$!%'& ),+.-0/21 Figure : Lest-Upper-Boun Computtion Algorithm. 2) D is tightest in the sense tht or ny other set o ontiner su-ptterns D o n ) tht stisies onition 1), ny XML oument / n ny element in /, i!/ + or eh 2 D / or ll 2 D. Intuitively, D is olletion o onitions impose y oth n ) suh tht i / stisies or ) t, / lso stisies the onjuntion o these onitions t. We now show how the LUB or n ) n e ompute rom the tightest ontiner su-ptterns. Let n e the roots o ptterns n ), respetively. Note tht oument / tht stisies lso stisies, or eh 2!, the restrition o to the root noe n only 0!. Consequently, oument / tht stisies or ) must lso stisy the pttern onsisting o root noe with lel ) whose hilren re the tightest ontiner suptterns or eh pir!# n )!, where 2 # 1! n /2 )!. This pttern is thus n LUB o n ). The min suroutine in our LUB omputtion Algorithm LUB SUB) omputes the tightest ontiner suptterns o n ) s ollows. I ) resp. 0) ), resp. ) ) is the tightest ontiner supttern otherwise, the tightest ontiner su-ptterns re set > < o su-ptterns, whih re eine in the ollowing mnner. The root noe o is lele with 0 n the hil sutrees o re the tightest ontiner su-ptterns o eh hil sutree o n eh hil sutree o )!. Intuitively, the root o orrespons to the roots o n ) with lel equl to the lest upper oun o tht o n ) ). In other wors, preserves the positions o the orresponing noes in n ). However, this position-preserving generliztion is not suiient sine n ) my hve ommon suptterns t ierent positions reltive to their roots. For exmple, 2 n in Figure 2 hve ommon su-pttern roote t n -noe tht hs oth -hil n -hil, ut this pttern is lote t ierent positions reltive to the roots o 2 n. To pture these o-position ommon su-ptterns, we nee to ompute n. The hil sutrees o re the tightest ontiner su-ptterns o ) itsel n eh hil sutree o n the lel o the root noe o is.'. to ommote ommon su-ptterns t ierent positions reltive to the roots o n ). Similrly, the root noe o hs lel.., n the hil sutrees o re the tightest ontiner su-ptterns o itsel n eh hil sutree o ). By omputing the tightest ontiner su-ptterns reursively, the lgorithm omputes the LUB o the input tree ptterns n ). By inution on the strutures o n ), we n show the ollowing result [5]. Proposition.1: Given two tree ptterns n ), Algorithm LUB )! omputes ). 4 Exmple.1 Given 2 n in Figure 2, Algorithm LUB returns, whih is inee 2. To help explin the omputtion o, we use the nottion i to reer the B noe in some tree pttern) tht is lele, where eh olletion o noes shring the sme lel re orere se on their pre-orer sequene or exmple, in, we use..^j n.'.ak to reer to the letmost n rightmost.. -noes, respetively. Algorithm LUB SUB invoke y Algorithm LUB) irst extrts the position preserving tightest ontiner su-ptterns or j!02 n 0! #, whih yiels the su-pttern j in Steps 9 11). Note tht the root noe o j! is lele euse oth the root noes o j! 2 n 0! re lele. The su-ptterns Ẅl! 2 n, however, hve quite ierent strutures n thus position-preserving ttempt to extrt their ommon su-ptterns only yiels

6 ! & & & j. In prtiulr, the ommon su-pttern onsisting o n -noe with oth -hil-noe n -hil-noe is not pture y the ove proess euse they our t ierent positions reltive to the root noes o Ẅl! 2 n. To extrt suh o-position ommon su-ptterns, Algorithm LUB SUB ompres j 2 with n!, s well s ompres # with Ẅl 2 in Steps 12 15). Inee, this yiels..0k! whih hs.'. -root sine this ommon su-pttern ours t ierent positions reltive to the root noes o j! 2 n 0! #. It shoul e mentione tht oth.'.j! n.. l! re lso proue y the o-position proessing, s Algorithm LUB SUB reursively proesses the su-pttern l 02 with n! #, respetively. Finlly, the lgorithm removes the reunnt noes in the result tree pttern y using minimiztion lgorithm whih will e expline shortly) to generte the LUB. 4 It is strightorwr to show tht our LUB opertor, onsiere s inry opertor, is ommuttive n ssoitive, i.e., j Ul Ul j n j l k j l k. As result, Algorithm LUB n e nturlly extene to ompute the LUB o ny set o tree ptterns. We next explin the etils o the two uxiliry lgorithms use in Algorithm LUB. Algorithm LUB nees to hek the ontinment o tree ptterns, whih is implemente y Algorithm CONTAINS in Figure 4. Given two input tree ptterns n ), the lgorithm etermines i )%. It mintins two-imensionl rry 4,, whih is initilize with 4 #< B,! to inite tht 2/ n 2/ )! hve not een ompre otherwise, 4 #< 2 suh tht 4 #< ' 0 i n only i? )! +!#. Clerly, ) i n only i F, #, where! n enote the root noes o n ), respetively. The min suroutine in our ontinment lgorithm is Algorithm CONTAINS SUB. Astrtly, CONTAINS SUB trverses n ) top-own n uptes 4, 0 or eh pir o noes 2. n 2. ' )! visite s ollows. Let n )! enote 0! n? )!, respetively. I 4 #< hs lrey een ompute i.e., 4, 0 E B,! ), its vlue is returne. Otherwise, our lgorithm etermines whether ), s ollows. I! E.., 4, 0 i n eh hil sutree o ontins some hil sutree o. Otherwise, i.'., two itionl onitions nee to e tken into ount. This is euse unlike - -noe or tg-nme-noe,.. -noe in ontiner tree pttern n lso e mppe to possily empty) hin o noes in ontine tree pttern. For exmple, onsier the tree ptterns n in Figure 2. Note tht, n the.'. -noe in is not mppe to ny noe in in the sense tht woul still e ontine in i the.'. -noe )- >1 Algorithm CONTAINS Input: n re two tree ptterns. Output: Returns i :$! & 04!A +.-0/5. otherwise. 1) Initilize +!&! )#1<- /" #!&,! )$>1 2) Let ) +#%1&1&) /#),+ n /%1&1&) enote the root noes o n %1&1&) -$ 1$ 1, resp. ) i,+.- 4) return N 5) else ),+ 6) return CONTAINS SUB %'&&) - / %'&1&) -! 1 04! 1 Algorithm + / ),+.- / - CONTAINS SUB Input:, re noes in tree ptterns!a +.- /5C - :$! & - 4!A +.-0/ Output: 4!A +.-0/.,1, respetively), is 2-imensionl rry suh tht eh. ) 1) i 2) return + 04!4 +.- / ) i is le noe in ) 4) )?9!%'& ),/21!A +.- /5# )?9!%'& ),/21 :% & ),+:1 1 5) else i :% & ),+:1 1!A +.- /5#! & 6) 7) else 8)!A +.- /5# : :E849 ),+ -0/ ]^[E] >>S?A@ ) BDGF H I> O QR T>S?A@CBEDGF HEIJTUO XR 4!A +.-0/ :$! & 1 )?9!% & ),+:14 9) i =@=!A +.- /5. n 10) >>S?A@CBEDGF ) : :E849),+ - / - 04 HEIJ> O QR ]^[E] 4!A +.-0/ :$! & 1 )?9!% & ),+:14 11) i =@= n!a +.- /5. : :E849 ),+.-0/ 12) ]^[E] T >?A@ BDGF H IJTUO XR!A +.-0/ 1) return Figure 4: Tree-Pttern Continment Algorithm. 04! 1 1! ! 1 in 0 is elete. On the other hn, or the tree ptterns n in Figure 2, n the.'. -noe in is mppe to oth the - - n -noes in in the sense tht -.'. n..)!. These two itionl senrios re hnle y Steps 10 n 12 in Algorithm CONTAINS SUB: Step 10 ounts or the se where.'. -noe itsel) is mppe to n empty hin o noes, n Step 12 or the se where.. -noe itsel) is mppe to nonempty hin. Note tht in Steps 8 n 12, the expres- > ) sion i "! $# %'&)+,)-).0/, < 4, returns i )! 65. By inution on the strutures o n, we n show the ollowing result. Proposition.2: Given two tree ptterns n ), Algo- ) 9 4 rithm +7,)-).8/,+1 )! etermines i in 4+: )C time. The qurti time omplexity o our tree-pttern ontinment lgorithm is ue to, mong other things, the t tht eh pir o su-ptterns in n ) is heke t most one, euse o the use o the 4, rry. To simpliy the isussion, we hve omitte rom Algorithm CON- TAINS ertin sutle etils tht involve tree ptterns with

7 l hins o.. - n - -noes. Suh ses require some itionl pre-proessing to onvert the tree pttern to some nonil orm, ut this oes not inrese our lgorithm s time omplexity. To ensure tht our tree ptterns re onise, we nee to ientiy n eliminte reunnt noes in them. Given tree pttern, minimize tree pttern equivlent to n e ompute using reursive lgorithm MIN- IMIZE. Strting with the root o, our minimiztion lgorithm perorms the ollowing two steps to minimize the su-pttern 0! roote t noe in : 1) For ny 2 0!, i!!!!, elete! rom 0! n, 2) For eh 2!# tht ws not elete in the irst step), reursively minimize!. The omplete etils n e oun in [5]. Proposition.: Algorithm MINIMIZE minimizes ny tree pttern in 9 4 time. 4 Proposition.4: For ny minimize tree ptterns n,, i i.e., they re synttilly equl). 4 Given the low omputtionl omplexities o CON- TAINS n MINIMIZE, one might expet tht this woul lso e the se or Algorithm LUB. Unortuntely, in the worst se, the size o the minimize) LUB o two tree ptterns n e exponentilly lrge see [5] or etile nlysis). Our implementtion results, however, emonstrte tht our LUB lgorithm exhiits resonly low vergese omplexity in prtie. 4 Seletivity-se Aggregtion Algorithm While the LUB lgorithm presente in the previous setion n e use to ompute single, most preise ggregte tree pttern or given set o ptterns, the size o the LUB my e too lrge n, thereore, my violte the speiie spe onstrint 9 on the totl size o the ggregte susriptions Setion 2.2). Thus, in orer to it our ggregtes within the llotte spe uget, we relx the requirement o single preise ggregte y permitting our inste o solution to e set 1 j l?? single pttern), suh tht eh pttern ) 2 is ontine in some pttern F2. O ourse, we lso require tht provie the tightest ontinment or ptterns in or the given spe onstrint Setion 2.2) tht is, the numer o XML ouments tht stisy some tree pttern in ut not, is smll. A simple mesure o the preiseness o is its seletivity, whih is essentilly the rtion o iltere XML ouments tht stisy some pttern in. Thus, our ojetive is to ompute set o ggregte ptterns whose seletivity is very lose to tht o. Clerly, the seletivity o our tree ptterns is highly epenent on the istriution o the unerlying olletion o XML ouments enote y ). It is, however, inesile to mintin the etile istriution o streming XML ouments or our ggregtion the spe requirements woul e enormous! Inste, our pproh is se on uiling onise synopsis o on-line i.e., s ouments re streming y), n using tht synopsis to estimte pproximte) tree-pttern seletivities. At high level, our ggregtion lgorithm itertively omputes set tht is oth seletive n stisies the spe onstrint, strting with i.e., the originl set o ptterns), n perorming the ollowing sequene o steps in eh itertion: 1. Generte nite set o ggregte tree ptterns onsisting o ptterns in n LUBs o similr pttern pirs in. 2. Prune eh pttern in y eleting/merging noes in in orer to reue its size.. Choose nite 2 to reple ll ptterns in F tht re ontine in. Our nite-seletion strtegy is se on mrginl gins [14]: The selete nite is the one tht results in the minimum loss in seletivity per unit reution in the size o to the replement o ptterns in y ). ue Note tht our pruning step Step 2) ove mkes nite ggregte ptterns less seletive in ition to eresing their size). Thus, y repling ptterns in y ptterns in, we re eetively trying to reue the size o F y giving up some o its seletivity. In the ollowing susetions, we esrie in more etil our lgorithm or omputing. We egin y presenting our pproh or estimting the seletivity o tree ptterns over the unerlying oument istriution, whih is ritil to hoosing goo replement nite in Step ove. 4.1 Seletivity Estimtion or Tree Ptterns The Doument Tree Synopsis. As mentione ove, it is simply impossile to mintin the urte oument istriution i.e., the ull set o streming ouments) in orer to otin urte seletivity estimtes or our tree ptterns. Inste, our pproh is to pproximte y onise synopsis struture, whih we reer to s the oument tree. Our oument tree synopsis or, enote y /, ptures pth sttistis or ouments in, n is uilt on-line s XML ouments strem y. The oument tree essentilly hs the sme struture s n XML tree, exept or two ierenes. First, the root noe o / hs the speil lel. Seon, eh non-root noe in / hs requeny ssoite with it, whih we enote y 0 ). Intuitively, i j. l. : : :. i is the sequene o tg nmes on noes long the pth rom the root to exluing the lel or the root), 0 ) represents the numer o ouments / in tht ontin pth with tg sequene j. l. :": :1. i originting t the root o /. The requeny or the root noe o / is set to, the numer o ouments in. As XML ouments strem y, / is inrementlly mintine s ollows. For eh rriving oument /, we / / / / irst onstrut the skeleton tree / or oument. In the skeleton tree /, eh noe hs t most one hil with given tg. / is uilt rom y simply olesing two hilren o noe in i they shre ommon tg. Clerly, y trversing noes in in top-own shion, n olesing

8 x x e) Doument Tree 2. x x ) Compresse Doument Tree ) T1 ) T2 ) T ) Skeleton tree or T1 x x g) p1 h) p2 x i) p Figure 5: Exmple Douments, Skeleton Tree, Doument Tree, n Ptterns. hil noes with ommon tgs, we n onstrut / rom / in single pss using n event-se XML prser). As n exmple, Figure 5) epits the skeleton tree or the XML-oument tree in Figure 5). Next, we use / to upte the sttistis mintine in our oument tree synopsis / s ollows. For eh pth in /, with tg sequene sy j. l. : : :. i, let e the lst noe on the orresponing unique) pth in /. We inrement 0 ) y. Figure 5e) shows the oument tree with noe requenies) or the XML trees / j, / l, n / k in Figure 5) to ). Note tht it is possile to urther ompress / y using tehniques similr in spirit to the methos employe y Aoulng et l. [1] or summrizing pth trees. The key ie is to merge noes with the lowest requenies n store, with eh merge noe, the verge o the originl requenies or noes in / tht were merge. This is illustrte in Figure 5) or the oument tree in Figure 5e), n with the lel use to inite merge noes. Due to spe onstrints, in the reminer o this susetion, we only present solutions to the seletivity estimtion prolem using the unompresse tree /. However, our propose methos n e esily extene to work even when / is ompresse [5]. We shoul note here tht our seletivity estimtion prolem or tree ptterns iers rom the work o Aoulng et l. [1] in two importnt respets. First, in [1], the uthors onsier the prolem o estimting seletivity or only simple pths tht onsist o -noe ollowe y tg noes. In ontrst, we estimte seletivities o generl tree ptterns with rnhes, n - or -noes ritrrily istriute in the tree. Seon, we re intereste in seletivity t the grnulrity o ouments, so our gol is to estimte the numer o XML ouments tht mth tree pttern inste, [1] resses the seletivity prolem t the grnulrity o iniviul oument elements tht re isovere y pth. It is esy to see tht these re two very ierent estimtion prolems. Seletivity Estimtion Proeure. Rell tht the seletivity o tree pttern is the rtion o ouments / in tht stisy. By onstrution, our / synopsis gives urte seletivity estimtes or tree ptterns omprising single hin o tg-noes i.e., with no or ). However, otining urte seletivity estimtes or ritrry tree ptterns with rnhes,, n is, in generl, not possile with / summries. This is euse, while / ptures the numer o ouments ontining single pth, it oes not store oument ientities. As result, or pir o ritrry pths in tree pttern, it is impossile to etermine the ext numer o ouments tht ontin oth pths or ouments tht ontin one pth, ut not the other. Our estimtion proeure solves this prolem, y mking the ollowing simpliying ssumption: The istriution o eh pth in tree pttern is inepenent o other pths. Thus, we estimte the seletivity o tree pttern ontining no.. or - lels, simply s the prout o the seletivities o eh root to le pth in the pttern. For ptterns ontining.. or -, we onsier ll possile instntitions or.'. n - with element tgs, n hoose s our pttern seletivity the mximum seletivity vlue over ll instntitions. This is similr to the einition o uzzy opertor in uzzy logi [1].) We illustrte our seletivity estimtion methoology in the ollowing exmple. Exmple 4.1 Consier the prolem o estimting the seletivities o the tree ptterns shown in Figures 5g) to i) using the oument tree shown in Figure 5e). The totl numer o ouments,, is. Clerly, the numer o ouments stisying pttern j whih onsists o single pth, n e estimte urtely y ollowing the pth in / n returning the requeny or the -noe t the en o the pth) in /. Thus, the seletivity o j is. whih is urte sine only ouments / l n / k stisy j. Estimting the numer o ouments ontining pttern l, however, is somewht more triky. This is euse there re two pths with tg sequenes.!. n.!.. in / tht mth Ul orresponing to instntiting with n. ). Summing the requenies or the two -noes t the en o these pths gives us n nswer o 4 whih over-estimtes the numer o ouments stisying Yl only ouments /Yl n / k stisy Ul ). To voi oule-ounting requenies, we estimte the numer o ouments stisying l to e the mximum n not the sum) o requenies over ll pths in / tht mth l. Thus, the seletivity o l is estimte s.. Finlly, the seletivity o k is ompute y onsiering ll possile instntitions or n, n hoosing the one with the mximum seletivity. The two possile instntitions or tht result in non-zero seletivities re n,.!, n - n e instntite with either or or.., n or or.'.,.!. Choosing.'. n -& results in the mximum seletivity sine the prout o the seletivities o pths.! ). n,.. is mximum, n is equl to. $:#... 4 Algorithm SEL epite in Figure 6), invoke with input prmeters $ root o pttern ) n root o / ), omputes the seletivity or n ritrry tree

9 ) ) : ` ` ` E & h Algorithm + + SEL, ) Input: is & noe in tree pttern, % 4 +.-NN is noe in. Output: & % 4 +.-NN. 1) i & % 4 is lrey +.- N ompute) 2) return :% & ) 1 9!%'& ),+:1 ) else i ) 4) return + & % N. 5) else i is >&:) le) 1 = 6) return 7) or eh hil + 5 /#),+.-$#1 &,+.- o 8) > )?4@CBDVF HEI O R : 6 ),+ - 1 & 9) 9!%'& ),+:1 >?A@ BDGF =>= & H I> O QR > 10) i & ) 11) > 6 ),+ - 1 >?A@ BDGF H I> O QR & & - & 12) > & 1) > )?4@CBDVF HEI O R : 6 ),+.-N 1 & & - & 14) > & % # & 15) return Figure 6: Tree Pttern Seletivity Estimtion Algorithm. pttern in 9 / :# 4 time. In the lgorithm, or noes 2 n 2 /, 0 stores the seletivity o the su-pttern!# with respet to the sutree o / roote t noe. This seletivity is estimte similr to the seletivity or pttern, exept tht we now onsier ll instntitions o 0! otine y instntiting.'. n - with element tgs), n the seletivity o eh instntition is ompute with respet to s the root inste o the root o /. For instne, suppose tht is the -noe in k in Figure 5i)), n is the hil -noe o the -noe in / in Figure 5e)). Then, the seletivity o!# k with respet to is essentilly the prout o the seletivity o pths.- n. with respet to noe, whih is :.. Thus, 0.. #. Our gol is to ompute For pir o noes n, Algorithm SEL omputes # rom vlues or the hilren o n. Clerly, i E Steps -4 o the lgorithm), every pth in!# egins with lel ierent rom n thus the seletivity o eh o the pths is. I n is le Steps 5-6), we simply instntite i.'. or ) with, giving seletivity o 0 ).. On the other hn, i is n internl noe o, in ition to instntiting! with, we lso nee to ompute, or every hil 2 o, the instntition or!2! tht hs the mximum seletivity with respet to some hil 2 o. Sine 2 2' is the seletivity o '2 with respet to 2, the prout "! # & 2 2 or the hilren '2 o gives the seletivity o 0! with respet to. Finlly, i!..,.. n e simply B, 0, in whih se the seletivity o!# with respet to is ompute s esrie in Step 11, or.. is instntite to sequene onsisting o ollowe y 2, where 2 is the hil o suh tht the seletivity o!# with respet to 2 is mximize Step 1). Oserve tht, in Steps 8 n 1, i hs no "! # "8&? evlutes to. 4.2 Tree Pttern Aggregtion Algorithm We re now rey to present our greey heuristi lgorithm or the tree pttern ggregtion prolem eine in Setion 2.2 whih is, in generl, n #%$ -hr lustering prolem [5]). As esrie erlier, to ggregte n input set o tree ptterns into spe-eiient n preise set, our lgorithm Algorithm AGGREGATE in Figure 7) itertively prunes the tree ptterns in y repling smll suset o tree ptterns with more onise upper-oun ggregte pttern, until stisies the given spe onstrint. During eh itertion, our lgorithm irst genertes smll set o potentil nite ggregte ptterns, n selets rom these the lolly) est nite pttern, i.e., the nite tht mximizes the gin in spe while minimizing the expete loss in seletivity. ) -'& 1 Algorithm AGGREGATE Input: is set o tree ptterns, & is spe onstrint. Output: A set o tree ptterns Y suh tht n Q#?!) >+,. 1) Initilize ) Y & 1 2) while Q#?-) >. 0/ o 4-5#8 ) - 1<-W ) #8 ]A` ) <! <- -N 4) :9 ]A` >= 6?7 5) 21Uh 9 6) Selet & & ), 1 suh tht is mximum 7) 6 8) return Y Figure 7: Tree Pttern Aggregtion Algorithm. Cnite Genertion. We now explin the proess or generting the nite set in Steps 5 o Algorithm AGGREGATE. To reue the size o iniviul nite ptterns o the orm or ), eh nite is prune y invoking Algorithm PRUNE etils in [5]). Given n input pttern n spe onstrint B, Algorithm PRUNE prunes to smller tree pttern suh tht & n D B. The lgorithm trets tg-noes s more seletive thn - - n.. -noes, n thereore tries to prune wy - - n.. -noes eore the tg-noes. Speiilly, the lgorithm irst prunes the - - n.'. -noes in y 1) repling eh jent pir o non-tg-noes #< with single.. -noe, i is the only hil o, n 2) eliminting sutrees tht onsist o only non-tg-noes. I the tree pttern is still not smll enough ter the pruning o the nontg-noes, we strt pruning the tg-noes. There re two wys to reue the size o tree pttern y one noe. The irst is to elete some le noe in, n the seon is to ollpse two noes n into single.. -noe, where!. n 0! >. To help selet goo le noe to elete or, pir o noes to ollpse), we mke use o the seletivity o the tg nmes. More speiilly, we use our oument tree synopsis / to estimte the totl numer o ourrenes o tg nme in the oument olletion, n hoose the tgs with higher totl requenies whih re less seletive) s nites or pruning.

10 : ` 6 ` Cnite Seletion. One the set o nite ggregte ptterns hs een generte, we nee some riterion or seleting the est nite to insert into. For this purpose, we ssoite eneit vlue with eh nite ggregte pttern 2, enote y >B 0J?, se on its mrginl gin [14] tht is, we eine >B 0J? s the rtio o the svings in spe to the loss in seletivity. More ormlly, i o using over- /!/2 K L LNM,, n = K'L LNM represent the root noes o, /, n 0J? is equl to: Q 0O Q#?!)> 6 6 ),+ K'LNLNM -' %'&&) 1 Q 0O Q#?!)> : 6 ),+ Q K'L L M -' %1&1&) 1 Note tht we ompute the seletivity loss y ompring the seletivity o the nite ggregte pttern with tht o the lest seletive pttern ontine in it. This gives goo pproximtion o the seletivity loss in ses when the ptterns ) 2 use to generte re similr n overlp in the oument tree /. The nite ggregte pttern with the highest eneit vlue is hosen to reple the ptterns ontine in it in Steps 6 7). 5 Experimentl Stuy To veriy the eetiveness o our tree pttern ggregtion lgorithms, we hve onute n extensive perormne stuy using rel-lie DTDs n lrge numers o tree ptterns. Our results inite tht our propose ggregtion tehniques hieve signiint reutions in the numer s well s totl size o tree ptterns with miniml loss in seletivity. 5.1 Experimentl Teste n Methoology Our generl methoology or evluting the eetiveness o pttern ggregtion lgorithm is s ollows. Given lrge input set o tree ptterns n spe onstrint 9, we use to ompute set o ggregte ptterns or, where & n B > G D 9 our spe onstrint is expresse in terms o numer o noes, sine ptterns n e ritrrily lrge). We mesure the loss in preision when using inste o to ilter XML ouments. Oserve tht when 9, ontins single ontiner pttern.. ). To mesure the loss in preision o the ggregte set, we use suset o representtive set o XML ouments, suh tht no oument in mthes ny tree pttern in our initil pttern set. The reson, o ourse, is tht XML ouments tht mth re lso gurntee to mth, so they re unlikely to et our preisionloss mesurements. As eomes less preise, some ouments in will e erroneously reporte s mthes. e the numer o ouments in tht mth the loss in preision o over n e estimte An ggregtion lgorithm is oviously more eetive i remins smll s <&=@ B >C G ereses. XML Douments. We use two rel-lie DTDs to generte our XML oument t set. The irst one, the Extensile Hypertext Mrkup Lnguge XHTML) DTD [7], is reormultion o HTML s n XML pplition n is rguly the oument type most wiely use over the Internet. The XHTML DTD version 1.0) ontins 5 5 elements with 5#5 ttriutes. The seon DTD, the News Inustry Text Formt NITF) DTD[8], is supporte y most o the worl s mjor news genies. The NITF DTD version 2.5) ontins elements with ttriutes. We generte our t set o XML ouments using IBM s XML Genertor tool [11]. Both the XHTML n NITF DTDs ontin reursive strutures, whih n e neste to proue XML ouments with ritrry numer o levels. We e the option o generting ouments skewe oring to Zip istriution [18], where some tg nmes pper more requently thn others, s is generlly the se with rel-lie t. For eh eh DTD n eh skew vlue, we generte two isjoint sets o XML ouments with pproximtely noes n levels on verge. The irst set orrespons to the olletion o XML ouments use to onstrut the oument tree / or seletivity estimtion the seon set is use to mesure the loss in preision o the ggregtion lgorithms. Both sets were generte with the sme prmeters, n thus n e expete to hve similr istriutions. In eh experiment, we use the omine XML ouments or oth the XHTML n NITF DTDs, i.e., we use totl o ouments or the oument tree /, n ierent) ' ouments or mesuring the loss in preision. XPth Expressions. To generte the set o tree ptterns, we implemente n XPth expression genertor tht tkes DTD s input n retes set o vli XPth expressions se on set o prmeters tht ontrol: 1) the mximum height o the tree ptterns 2) the proilities n o hving wilr - or esennt.. opertor t noe o tree pttern ) the proility o hving more thn one hil t given noe n 4) the skew B o the Zip istriution use or seleting element tg nmes. For eh DTD n eh skew vlue B ) ', we generte set o tree ptterns with n. Eh experiment ws run with tree ptterns rom oth the XHTML n NITF DTDs, i.e., ' tree ptterns whih mounte to more thn ' noes. Algorithms. We ompre two ierent ggregtion lgorithms in our experiments. The irst nive ) lgorithm, PRUNE, is se on simple noe pruning n works s ollows. At eh itertion, it selets tree pttern rom with the lrgest numer o tg-noes, ollpses multiple - - n.. -noes, n eletes prunle noe i.e., le noe or noe lote next to.. -noes) with the highest requeny i.e., lest seletive) in the oument tree /. I there is lrey tree pttern ientil to the prune pttern, the uplite is remove rom. The lgorithm itertes until the spe onstrint is stisie. The seon lgorithm, AGGR, is our greey tree pttern ggregtion lgorithm rom Figure 7) with oth nite genertion n seletion se on mximizing the eneit). Our experiments were onute on 866 MHz Intel Pentium III

11 Seletivity Loss %) Prune θ D =0) Prune θ D =1) Prune θ D =2) Aggr θ D =0) Aggr θ D =1) Aggr θ D =2) Seletivity Loss %) Prune θ S =0) Prune θ S =1) Prune θ S =2) Aggr θ S =0) Aggr θ S =1) Aggr θ S =2) Seletivity Loss %) Prune θ D =θ S =0) Prune θ D =θ S =1) Prune θ D =θ S =2) Aggr θ D =θ S =0) Aggr θ D =θ S =1) Aggr θ D =θ S =2) Numer o Noes x1,000) ) Vrying ) ) Numer o noes x1,000) ) Vrying ) ) Numer o noes x1,000) ) Vrying ) n Figure 8: Evlution o the Aggregtion Algorithms. mhine with MB o min memory running Linux. Both lgorithms omplete the ggregtion o ' eliminte erly. tree ptterns in pproximtely minutes. 5.2 Experimentl Results We irst ompre the perormne o the two ggregtion lgorithms y vrying the skew or element tgs in the XML ouments n in the XPth expressions. We rn the experiments with no skew, with skewe XML ouments, with skewe XPth expressions, n with skew in oth the XML ouments n XPth expressions. In the lst se, we skew the istriution or element nmes in the opposite iretion pplying the sme skew to oth the XML ouments n XPth expressions woul yiel similr results s with no skew). The experimentl results re shown in Figures 8), 8), n 8), where the spe onstrint, expresse in terms o the numer o noes, is vrie long the -xis, n the -xis inites the oserve loss in seletivity or given spe onstrint, i.e., the perentge o XML ouments tht re erroneously reporte s mthes. We lso mesure the eneits o ggregtion in terms o iltering perormne, using the XTrie mthing lgorithm esrie in [6]. Sine the ost o iltering in XTrie grows linerly with the numer o XPth expressions, we expet to oserve signiint improvement in iltering spee s the rinlity o ereses. Non-skewe worklo. When neither the XML t nor the tree ptterns ontin skew i.e., B ), the AGGR lgorithm n ggregte tree ptterns up to o their originl size with only loss in preision the results or non-skewe t re reporte in ll grphs o Figure 8). In ontrst, the preision o PRUNE lgorithm strts to egre muh sooner, n the loss in preision rehes lmost t o the initil spe. The etter perormne o AGGR n e ttriute to three min tors: 1) the upper oun omputtion genertes goo nites with ew noes n little loss in preision, 2) the seletivity-se heuristis help to etet n isr nites tht orrespon to ptterns with low seletivity i.e., requently ourring or given DTD), n ) the overing omputtion enles reunnt tree ptterns to e Skewe XML ouments. Rel-worl XML ouments re generlly not uniormly istriute mong the vli XML t or given DTD. When XML ouments re skewe Figure 8)), we oserve tht the eetiveness o the AGGR lgorithm inreses. The reson or this is tht, s t eomes more skewe, the XML ouments ten to orm lusters with ouments within luster eing more similr thn those in ierent lusters this, in turn, improves the ury o seletivity estimtion. The PRUNE lgorithm lso eneits rom the skew lthough to lesser extent) euse o its requeny-se pruning heuristi. Skewe tree ptterns. We lso oserve signiint improvement in our ggregtion lgorithm when the element nmes o tree ptterns re skewe Figure 8)). Inee, the skew inues lustering o ptterns suh tht similr tree ptterns re groupe into the sme luster, whih onsequently inreses the proportion o ptterns tht evelop ontinment reltionships. This permits the ggregtion lgorithm to reue the size o with miniml loss o seletivity, y omputing tighter upper oun ptterns n isring overe ptterns. Skewe worklo. The two ggregtion lgorithms perorm est when oth the XML t n the tree ptterns re skewe in ierent iretions Figure 8)). With high skew vlues, there is little overlp etween the element nmes o the XML ouments n the tree ptterns, n AGGR remins highly seletive with only ew hunres noes. The PRUNE lgorithm lso exhiits signiint improvements n mintins seletivity even ter the originl numer o noes re reue to less thn thir. Filtering spee. As mentione previously, the ost o mthing tree ptterns ginst inoming XML ouments is proportionl to the numer o tree ptterns. Sine AGGR genertes nites y omputing upper ouns, the nites over more ptterns, n s result, the numer o ptterns in shrinks ster with AGGR. Figure 9 shows tht the verge iltering time per oument ereses ster s spe is inrese) or AGGR thn or the PRUNE lgorithm. Our ggregtion lgorithm is thereore more ee-

12 l Filtering Time ms) Prune Aggr Numer o noes x1,000) Figure 9: Filtering spee. tive oth in terms o seletivity s well s iltering spee. 6 Relte Work To the est o our knowlege, our tree pttern ggregtion prolem is novel prolem tht hs not een stuie in erlier work. In ontrst to the lt ptterns previously stuie in the ontext o ggregting ttriute-preite-se susriptions [12], our pper ouses on hierrhil ptterns, whih re more omplex s tree ptterns onsist o oth t ontents n struture) n require more sophistite ggregtion tehniques. A relte re is the work on query merging to reue t issemintion osts o query susriptions in multist environment [9]. The motivtion or query merging is to merge multiple similr queries into single, more generl query so s to reue the worklo o the server n possily the mount o tri etween the server n its lients. However, the prolem omin onsiere in [9] ouses on geogrphil queries represente s retngles) urthermore, the issue o spe onstrint is not relevnt there. Some orms o tree ptterns hve een stuie s queries or XML t [, 17]. In prtiulr, minimiztion lgorithms or these ptterns hve een evelope in orer to optimize pttern queries. The tree ptterns in [] ier rom ours in two spets. On the one hn, the tree ptterns o [] o not llow - -noes wilrs) whih, s mentione in Setion, give rise to sutle prolems in the presene o.. -noes esennts) when ontinment o tree ptterns is onsiere. On the other hn, they support seletion o set o oument noes s the result o pttern query, whih we o not onsier sine wht mtters or our susription ggregtion ontext is whether or not oument mthes susription the tul set o oument noes tht mthes susription is not relevnt. Beuse o these ierenes, the minimiztion lgorithm o [] hs n 9 B omplexity in ontrst to our 9 B omplexity. Similrly, the work in [17] stuies ierent lss o tree ptterns n their minimiztion lgorithm is only known to e in polynomil time. 7 Conlusions We hve provie the irst systemti stuy o tree pttern ggregtion, n importnt prolem in uiling nextgenertion, slle XML issemintion systems. The min hllenge is to ggregte n input set o tree ptterns into smller set suh tht: 1) given spe onstrint on the totl size o the ptterns is met, n 2) the loss in preision ue to ggregtion) is minimize. We hve propose n eiient ggregtion lgorithm tht mkes eetive use o oument-istriution sttistis in orer to ompute preise set o ggregte tree ptterns within the llotte spe uget. Further, some o our lgorithmi results re o interest in their own right, n n prove useul in other omins, suh s XML query optimiztion. Extensive results rom prototype implementtion hve veriie the eetiveness o our pproh. Reerenes [1] A. Aoulng, A. Almeleen, n J. Nughton. Estimting the seletivity o xml pth expressions or internet sle pplitions. In Pro. 27th Intl. Con. on Very Lrge Dtses VLDB 2001), Septemer [2] M. Altinel n M.J. Frnklin. Eiient iltering o xml ouments or seletive issemintion o inormtion. In Pro. 26th Intl. Con. on Very Lrge Dtses VLDB 2000), pges 5 64, Septemer [] S. Amer-Yhi, S. Cho, L.V.S. Lkshmnn, n D. Srivstv. Minimiztion o Tree Pttern Queries. In Pro. o SIG- MOD, pges , Snt Brr, Cliorni, My [4] A. Crznig n A.L. Wol. Content-se Networking: A New Communition Inrstruture. NSF Workshop on Inrstruture or Moile n Wireless Systems, Otoer [5] C.-Y. Chn, W. Fn, P. Feler, M. Grolkis, n R. Rstogi. Tree Pttern Aggregtion or Slle XML Dt Dissemintion. Bell Ls Teh. Memornum, Ferury [6] C.-Y. Chn, P. Feler, M. Grolkis, n R. Rstogi. Eiient Filtering o XML Douments with XPth Expressions. In Pro. o the 18th Intl. Con. on Dt Engineering, Sn Jose, Cliorni, Ferury [7] Worl Wie We Consortium. The SGML/XML We Pge. Jnury [8] R. Cover. The SGML/XML We Pge. open.org/over/sgml-xml.html, Deemer [9] A. Crespo, O. Buyukkokten, n H. Gri-Molin. Query Merging: Improving Query Susription Proessing in Multist Environment. IEEE Trns. on Knowlege n Dt Engineering. To pper. [10] A. Deutsh n V. Tnnen. Continment o Regulr Pth Expressions uner Integrity Constrints. In Pro. o Intl. Workshop on Knowlege Representtion meets Dtses KRDB), [11] A.L. Diz n D. Lovell. XML Genertor. lphworks.im.om/teh/xmlgenertor, Septemer [12] L. Opyrhl et l. Exploiting IP Multist in Content-se Pulish-Susrie Systems. In Pro. o Intl. Con. on Distriute Systems Pltorms Milewre), [1] R. Fgin. Comining Fuzzy Inormtion rom Multiple Systems. In Pro. o the 15th ACM Symp. on Priniples o Dtse Systems, Montrel, Quee, June [14] B. Fox. Disrete Optimiztion Vi Mrginl Anlysis. Mngement Siene, 1): , Novemer [15] WC. XML Pth Lnguge XPth) [16] WC. Extensile Mrkup Lnguge XML) [17] P. T. Woo. Minimizing Simple XPth Expressions. In Pro. o Intl. Workshop on the We n Dtses WeDB), Snt Brr, Cliorni, My [18] G.K. Zip. Humn Behviour n Priniple o Lest Eort. Aison-Wesley, Cmrige, Msshusetts, 1949.

Tree Pattern Aggregation for Scalable XML Data Dissemination

Tree Pattern Aggregation for Scalable XML Data Dissemination Tree Pttern Aggregtion for Slle XML Dt Dissemintion Chee-Yong Chn, Wenfei Fn Λ, Psl Feler y, Minos Groflkis, Rjeev Rstogi Bell Ls, Luent Tehnologies fyhn,wenfei,minos,rstogig@reserh.ell-ls.om, Psl.Feler@eureom.fr

More information

CS 491G Combinatorial Optimization Lecture Notes

CS 491G Combinatorial Optimization Lecture Notes CS 491G Comintoril Optimiztion Leture Notes Dvi Owen July 30, August 1 1 Mthings Figure 1: two possile mthings in simple grph. Definition 1 Given grph G = V, E, mthing is olletion of eges M suh tht e i,

More information

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs Isomorphism of Grphs Definition The simple grphs G 1 = (V 1, E 1 ) n G = (V, E ) re isomorphi if there is ijetion (n oneto-one n onto funtion) f from V 1 to V with the property tht n re jent in G 1 if

More information

Lecture 11 Binary Decision Diagrams (BDDs)

Lecture 11 Binary Decision Diagrams (BDDs) C 474A/57A Computer-Aie Logi Design Leture Binry Deision Digrms (BDDs) C 474/575 Susn Lyseky o 3 Boolen Logi untions Representtions untion n e represente in ierent wys ruth tle, eqution, K-mp, iruit, et

More information

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of: 22: Union Fin CS 473u - Algorithms - Spring 2005 April 14, 2005 1 Union-Fin We wnt to mintin olletion of sets, uner the opertions of: 1. MkeSet(x) - rete set tht ontins the single element x. 2. Fin(x)

More information

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition Dt Strutures, Spring 24 L. Joskowiz Dt Strutures LEURE Humn oing Motivtion Uniquel eipherle oes Prei oes Humn oe onstrution Etensions n pplitions hpter 6.3 pp 385 392 in tetook Motivtion Suppose we wnt

More information

Lecture 6: Coding theory

Lecture 6: Coding theory Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those

More information

2.4 Theoretical Foundations

2.4 Theoretical Foundations 2 Progrmming Lnguge Syntx 2.4 Theoretil Fountions As note in the min text, snners n prsers re se on the finite utomt n pushown utomt tht form the ottom two levels of the Chomsky lnguge hierrhy. At eh level

More information

XML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW

XML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW XML n Dtses Exm Prepertion Disuss Answers to lst yer s exm Sestin Mneth NICTA n UNSW CSE@UNSW -- Semester 1, 2008 (1) For eh of the following, explin why it is not well-forme XML (is WFC or the XML grmmr

More information

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides. 1 PYTHAGORAS THEOREM 1 1 Pythgors Theorem In this setion we will present geometri proof of the fmous theorem of Pythgors. Given right ngled tringle, the squre of the hypotenuse is equl to the sum of the

More information

XML and Databases. Outline. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths

XML and Databases. Outline. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths Outline Leture Effiient XPth Evlution XML n Dtses. Top-Down Evlution of simple pths. Noe Sets only: Core XPth. Bottom-Up Evlution of Core XPth. Polynomil Time Evlution of Full XPth Sestin Mneth NICTA n

More information

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 ) Neessry n suient onitions for some two vrile orthogonl esigns in orer 44 C. Koukouvinos, M. Mitrouli y, n Jennifer Seerry z Deite to Professor Anne Penfol Street Astrt We give new lgorithm whih llows us

More information

A Rewrite Approach for Pattern Containment

A Rewrite Approach for Pattern Containment A Rewrite Approh or Pttern Continment Brr Kory rr.kory@univ-orlens.r LIFO - Université Orléns, Frne Astrt. In this pper we introue n pproh tht llows to hnle the ontinment prolem or the rgment XP(/,//,[

More information

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4 Am Blnk Leture 13 Winter 2016 CSE 332 CSE 332: Dt Astrtions Sorting Dt Astrtions QuikSort Cutoff 1 Where We Are 2 For smll n, the reursion is wste. The onstnts on quik/merge sort re higher thn the ones

More information

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014 S 224 DIGITAL LOGI & STATE MAHINE DESIGN SPRING 214 DUE : Mrh 27, 214 HOMEWORK III READ : Relte portions of hpters VII n VIII ASSIGNMENT : There re three questions. Solve ll homework n exm prolems s shown

More information

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Now we must transform the original model so we can use the new parameters. = S max. Recruits MODEL FOR VARIABLE RECRUITMENT (ontinue) Alterntive Prmeteriztions of the pwner-reruit Moels We n write ny moel in numerous ifferent ut equivlent forms. Uner ertin irumstnes it is onvenient to work with

More information

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite! Solutions for HW9 Exerise 28. () Drw C 6, W 6 K 6, n K 5,3. C 6 : W 6 : K 6 : K 5,3 : () Whih of the following re iprtite? Justify your nswer. Biprtite: put the re verties in V 1 n the lk in V 2. Biprtite:

More information

Finite State Automata and Determinisation

Finite State Automata and Determinisation Finite Stte Automt nd Deterministion Tim Dworn Jnury, 2016 Lnguges fs nf re df Deterministion 2 Outline 1 Lnguges 2 Finite Stte Automt (fs) 3 Non-deterministi Finite Stte Automt (nf) 4 Regulr Expressions

More information

The DOACROSS statement

The DOACROSS statement The DOACROSS sttement Is prllel loop similr to DOALL, ut it llows prouer-onsumer type of synhroniztion. Synhroniztion is llowe from lower to higher itertions sine it is ssume tht lower itertions re selete

More information

6.5 Improper integrals

6.5 Improper integrals Eerpt from "Clulus" 3 AoPS In. www.rtofprolemsolving.om 6.5. IMPROPER INTEGRALS 6.5 Improper integrls As we ve seen, we use the definite integrl R f to ompute the re of the region under the grph of y =

More information

CS 573 Automata Theory and Formal Languages

CS 573 Automata Theory and Formal Languages Non-determinism Automt Theory nd Forml Lnguges Professor Leslie Lnder Leture # 3 Septemer 6, 2 To hieve our gol, we need the onept of Non-deterministi Finite Automton with -moves (NFA) An NFA is tuple

More information

Generalization of 2-Corner Frequency Source Models Used in SMSIM

Generalization of 2-Corner Frequency Source Models Used in SMSIM Generliztion o 2-Corner Frequeny Soure Models Used in SMSIM Dvid M. Boore 26 Mrh 213, orreted Figure 1 nd 2 legends on 5 April 213, dditionl smll orretions on 29 My 213 Mny o the soure spetr models ville

More information

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106 8. Problem Set Due Wenesy, Ot., t : p.m. in - Problem Mony / Consier the eight vetors 5, 5, 5,..., () List ll of the one-element, linerly epenent sets forme from these. (b) Wht re the two-element, linerly

More information

Lecture 2: Cayley Graphs

Lecture 2: Cayley Graphs Mth 137B Professor: Pri Brtlett Leture 2: Cyley Grphs Week 3 UCSB 2014 (Relevnt soure mteril: Setion VIII.1 of Bollos s Moern Grph Theory; 3.7 of Gosil n Royle s Algeri Grph Theory; vrious ppers I ve re

More information

CIT 596 Theory of Computation 1. Graphs and Digraphs

CIT 596 Theory of Computation 1. Graphs and Digraphs CIT 596 Theory of Computtion 1 A grph G = (V (G), E(G)) onsists of two finite sets: V (G), the vertex set of the grph, often enote y just V, whih is nonempty set of elements lle verties, n E(G), the ege

More information

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Project 6: Minigoals Towards Simplifying and Rewriting Expressions MAT 51 Wldis Projet 6: Minigols Towrds Simplifying nd Rewriting Expressions The distriutive property nd like terms You hve proly lerned in previous lsses out dding like terms ut one prolem with the wy

More information

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA Common intervls of genomes Mthieu Rffinot CNRS LIF Context: omprtive genomis. set of genomes prtilly/totlly nnotte Informtive group of genes or omins? Ex: COG tse Mny iffiulties! iology Wht re two similr

More information

NON-DETERMINISTIC FSA

NON-DETERMINISTIC FSA Tw o types of non-determinism: NON-DETERMINISTIC FS () Multiple strt-sttes; strt-sttes S Q. The lnguge L(M) ={x:x tkes M from some strt-stte to some finl-stte nd ll of x is proessed}. The string x = is

More information

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point GCSE C Emple 7 Work out 9 Give your nswer in its simplest form Numers n inies Reiprote mens invert or turn upsie own The reiprol of is 9 9 Mke sure you only invert the frtion you re iviing y 7 You multiply

More information

Section 2.1 Special Right Triangles

Section 2.1 Special Right Triangles Se..1 Speil Rigt Tringles 49 Te --90 Tringle Setion.1 Speil Rigt Tringles Te --90 tringle (or just 0-60-90) is so nme euse of its ngle mesures. Te lengts of te sies, toug, ve very speifi pttern to tem

More information

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution Tehnishe Universität Münhen Winter term 29/ I7 Prof. J. Esprz / J. Křetínský / M. Luttenerger. Ferur 2 Solution Automt nd Forml Lnguges Homework 2 Due 5..29. Exerise 2. Let A e the following finite utomton:

More information

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18 Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re

More information

A Disambiguation Algorithm for Finite Automata and Functional Transducers

A Disambiguation Algorithm for Finite Automata and Functional Transducers A Dismigution Algorithm for Finite Automt n Funtionl Trnsuers Mehryr Mohri Cournt Institute of Mthemtil Sienes n Google Reserh 51 Merer Street, New York, NY 1001, USA Astrt. We present new ismigution lgorithm

More information

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression 5-2 Dt Strutures n Algorithms Dt Compression n Huffmn s Algorithm th Fe 2003 Rjshekr Rey Outline Dt ompression Lossy n lossless Exmples Forml view Coes Definition Fixe length vs. vrile length Huffmn s

More information

Slope Lengths for 2-Bridge Parent Manifolds. Martin D. Bobb

Slope Lengths for 2-Bridge Parent Manifolds. Martin D. Bobb Cliorni Stte University, Sn Bernrino Reserh Experiene or Unergrutes Knot Theory Otoer 28, 2013 Hyperoli Knot Complements Hyperoli Knots Deinition A knot or link K is hyperoli i hyperoli metri n e ple on

More information

Solutions to Problem Set #1

Solutions to Problem Set #1 CSE 233 Spring, 2016 Solutions to Prolem Set #1 1. The movie tse onsists of the following two reltions movie: title, iretor, tor sheule: theter, title The first reltion provies titles, iretors, n tors

More information

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches Saad Mneimneh Computtionl Biology Leture 8: Genome rerrngements, finding miml mthes Sd Mneimneh We hve seen how to rerrnge genome to otin nother one sed on reversls nd the knowledge of the preserved loks or genes. Now

More information

CSC2542 State-Space Planning

CSC2542 State-Space Planning CSC2542 Stte-Spe Plnning Sheil MIlrith Deprtment of Computer Siene University of Toronto Fll 2010 1 Aknowlegements Some the slies use in this ourse re moifitions of Dn Nu s leture slies for the textook

More information

Solving the Class Diagram Restructuring Transformation Case with FunnyQT

Solving the Class Diagram Restructuring Transformation Case with FunnyQT olving the lss Digrm Restruturing Trnsformtion se with FunnyQT Tssilo Horn horn@uni-kolenz.e Institute for oftwre Tehnology, University Kolenz-Lnu, Germny FunnyQT is moel querying n moel trnsformtion lirry

More information

Lecture 8: Abstract Algebra

Lecture 8: Abstract Algebra Mth 94 Professor: Pri Brtlett Leture 8: Astrt Alger Week 8 UCSB 2015 This is the eighth week of the Mthemtis Sujet Test GRE prep ourse; here, we run very rough-n-tumle review of strt lger! As lwys, this

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

I 3 2 = I I 4 = 2A

I 3 2 = I I 4 = 2A ECE 210 Eletril Ciruit Anlysis University of llinois t Chigo 2.13 We re ske to use KCL to fin urrents 1 4. The key point in pplying KCL in this prolem is to strt with noe where only one of the urrents

More information

Boolean Algebra cont. The digital abstraction

Boolean Algebra cont. The digital abstraction Boolen Alger ont The igitl strtion Theorem: Asorption Lw For every pir o elements B. + =. ( + ) = Proo: () Ientity Distriutivity Commuttivity Theorem: For ny B + = Ientity () ulity. Theorem: Assoitive

More information

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version A Lower Bound for the Length of Prtil Trnsversl in Ltin Squre, Revised Version Pooy Htmi nd Peter W. Shor Deprtment of Mthemtil Sienes, Shrif University of Tehnology, P.O.Bo 11365-9415, Tehrn, Irn Deprtment

More information

Compression of Palindromes and Regularity.

Compression of Palindromes and Regularity. Compression of Plinromes n Regulrity. Kyoko Shikishim-Tsuji Center for Lierl Arts Eution n Reserh Tenri University 1 Introution In [1], property of likstrem t t view of tse is isusse n it is shown tht

More information

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours Mi-Term Exmintion - Spring 0 Mthemtil Progrmming with Applitions to Eonomis Totl Sore: 5; Time: hours. Let G = (N, E) e irete grph. Define the inegree of vertex i N s the numer of eges tht re oming into

More information

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS The University of ottinghm SCHOOL OF COMPUTR SCIC A LVL 2 MODUL, SPRIG SMSTR 2015 2016 MACHIS AD THIR LAGUAGS ASWRS Time llowed TWO hours Cndidtes my omplete the front over of their nswer ook nd sign their

More information

Lecture 3. XML Into RDBMS. XML and Databases. Memory Representations. Memory Representations. Traversals and Pre/Post-Encoding. Memory Representations

Lecture 3. XML Into RDBMS. XML and Databases. Memory Representations. Memory Representations. Traversals and Pre/Post-Encoding. Memory Representations Leture XML into RDBMS XML n Dtses Sestin Mneth NICTA n UNSW Leture XML Into RDBMS CSE@UNSW -- Semester, 00 Memory Representtions Memory Representtions Fts DOM is esy to use, ut memory hevy. in-memory size

More information

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b

where the box contains a finite number of gates from the given collection. Examples of gates that are commonly used are the following: a b CS 294-2 9/11/04 Quntum Ciruit Model, Solovy-Kitev Theorem, BQP Fll 2004 Leture 4 1 Quntum Ciruit Model 1.1 Clssil Ciruits - Universl Gte Sets A lssil iruit implements multi-output oolen funtion f : {0,1}

More information

Geodesics on Regular Polyhedra with Endpoints at the Vertices

Geodesics on Regular Polyhedra with Endpoints at the Vertices Arnol Mth J (2016) 2:201 211 DOI 101007/s40598-016-0040-z RESEARCH CONTRIBUTION Geoesis on Regulr Polyher with Enpoints t the Verties Dmitry Fuhs 1 To Sergei Thnikov on the osion of his 60th irthy Reeive:

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic Chpter 3 Vetor Spes In Chpter 2, we sw tht the set of imges possessed numer of onvenient properties. It turns out tht ny set tht possesses similr onvenient properties n e nlyzed in similr wy. In liner

More information

Part I: Study the theorem statement.

Part I: Study the theorem statement. Nme 1 Nme 2 Nme 3 A STUDY OF PYTHAGORAS THEOREM Instrutions: Together in groups of 2 or 3, fill out the following worksheet. You my lift nswers from the reding, or nswer on your own. Turn in one pket for

More information

Eigenvectors and Eigenvalues

Eigenvectors and Eigenvalues MTB 050 1 ORIGIN 1 Eigenvets n Eigenvlues This wksheet esries the lger use to lulte "prinipl" "hrteristi" iretions lle Eigenvets n the "prinipl" "hrteristi" vlues lle Eigenvlues ssoite with these iretions.

More information

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6 CS311 Computtionl Strutures Regulr Lnguges nd Regulr Grmmrs Leture 6 1 Wht we know so fr: RLs re losed under produt, union nd * Every RL n e written s RE, nd every RE represents RL Every RL n e reognized

More information

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005 RLETON UNIVERSIT eprtment of Eletronis ELE 2607 Swithing iruits erury 28, 05; 0 pm.0 Prolems n Most Solutions, Set, 2005 Jn. 2, #8 n #0; Simplify, Prove Prolem. #8 Simplify + + + Reue to four letters (literls).

More information

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points: Eidgenössishe Tehnishe Hohshule Zürih Eole polytehnique fédérle de Zurih Politenio federle di Zurigo Federl Institute of Tehnology t Zurih Deprtement of Computer Siene. Novemer 0 Mrkus Püshel, Dvid Steurer

More information

ANALYSIS AND MODELLING OF RAINFALL EVENTS

ANALYSIS AND MODELLING OF RAINFALL EVENTS Proeedings of the 14 th Interntionl Conferene on Environmentl Siene nd Tehnology Athens, Greee, 3-5 Septemer 215 ANALYSIS AND MODELLING OF RAINFALL EVENTS IOANNIDIS K., KARAGRIGORIOU A. nd LEKKAS D.F.

More information

Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework

Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework R-17 SASIMI 015 Proeeings Tehnology Mpping Metho for Low Power Consumption n High Performne in Generl-Synhronous Frmework Junki Kwguhi Yukihie Kohir Shool of Computer Siene, the University of Aizu Aizu-Wkmtsu

More information

Learning Partially Observable Markov Models from First Passage Times

Learning Partially Observable Markov Models from First Passage Times Lerning Prtilly Oservle Mrkov s from First Pssge s Jérôme Cllut nd Pierre Dupont Europen Conferene on Mhine Lerning (ECML) 8 Septemer 7 Outline. FPT in models nd sequenes. Prtilly Oservle Mrkov s (POMMs).

More information

Logic, Set Theory and Computability [M. Coppenbarger]

Logic, Set Theory and Computability [M. Coppenbarger] 14 Orer (Hnout) Definition 7-11: A reltion is qusi-orering (or preorer) if it is reflexive n trnsitive. A quisi-orering tht is symmetri is n equivlene reltion. A qusi-orering tht is nti-symmetri is n orer

More information

Chapter 4 State-Space Planning

Chapter 4 State-Space Planning Leture slides for Automted Plnning: Theory nd Prtie Chpter 4 Stte-Spe Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Motivtion Nerly ll plnning proedures re serh proedures Different

More information

Analysis of Temporal Interactions with Link Streams and Stream Graphs

Analysis of Temporal Interactions with Link Streams and Stream Graphs Anlysis of Temporl Intertions with n Strem Grphs, Tiphine Vir, Clémene Mgnien http:// ltpy@ LIP6 CNRS n Soronne Université Pris, Frne 1/23 intertions over time 0 2 4 6 8,,, n for 10 time units time 2/23

More information

INTRODUCTION TO AUTOMATA THEORY

INTRODUCTION TO AUTOMATA THEORY Chpter 3 INTRODUCTION TO AUTOMATA THEORY In this hpter we stuy the most si strt moel of omputtion. This moel els with mhines tht hve finite memory pity. Setion 3. els with mhines tht operte eterministilly

More information

Monochromatic Plane Matchings in Bicolored Point Set

Monochromatic Plane Matchings in Bicolored Point Set CCCG 2017, Ottw, Ontrio, July 26 28, 2017 Monohromti Plne Mthings in Biolore Point Set A. Krim Au-Affsh Sujoy Bhore Pz Crmi Astrt Motivte y networks interply, we stuy the prolem of omputing monohromti

More information

CS 360 Exam 2 Fall 2014 Name

CS 360 Exam 2 Fall 2014 Name CS 360 Exm 2 Fll 2014 Nme 1. The lsses shown elow efine singly-linke list n stk. Write three ifferent O(n)-time versions of the reverse_print metho s speifie elow. Eh version of the metho shoul output

More information

Welcome. Balanced search trees. Balanced Search Trees. Inge Li Gørtz

Welcome. Balanced search trees. Balanced Search Trees. Inge Li Gørtz Welome nge Li Gørt. everse tehing n isussion of exerises: 02110 nge Li Gørt 3 tehing ssistnts 8.00-9.15 Group work 9.15-9.45 isussions of your solutions in lss 10.00-11.15 Leture 11.15-11.45 Work on exerises

More information

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem.

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem. 27 Lesson 2: The Pythgoren Theorem nd Similr Tringles A Brief Review of the Pythgoren Theorem. Rell tht n ngle whih mesures 90º is lled right ngle. If one of the ngles of tringle is right ngle, then we

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

Let s divide up the interval [ ab, ] into n subintervals with the same length, so we have

Let s divide up the interval [ ab, ] into n subintervals with the same length, so we have III. INTEGRATION Eonomists seem muh more intereste in mrginl effets n ifferentition thn in integrtion. Integrtion is importnt for fining the epete vlue n vrine of rnom vriles, whih is use in eonometris

More information

(Lec 9) Multi-Level Min III: Role of Don t Cares

(Lec 9) Multi-Level Min III: Role of Don t Cares Pge 1 (Le 9) Multi-Level Min III: Role o Don t Cres Wht you know 2-level minimiztion l ESPRESSO Multi-level minimiztion: Boolen network moel, Algeri moel or toring Retngle overing or extrtion Wht you on

More information

Maximum size of a minimum watching system and the graphs achieving the bound

Maximum size of a minimum watching system and the graphs achieving the bound Mximum size of minimum wthing system n the grphs hieving the oun Tille mximum un système e ontrôle minimum et les grphes tteignnt l orne Dvi Auger Irène Chron Olivier Hury Antoine Lostein 00D0 Mrs 00 Déprtement

More information

Automata and Regular Languages

Automata and Regular Languages Chpter 9 Automt n Regulr Lnguges 9. Introution This hpter looks t mthemtil moels of omputtion n lnguges tht esrie them. The moel-lnguge reltionship hs multiple levels. We shll explore the simplest level,

More information

Nondeterministic Automata vs Deterministic Automata

Nondeterministic Automata vs Deterministic Automata Nondeterministi Automt vs Deterministi Automt We lerned tht NFA is onvenient model for showing the reltionships mong regulr grmmrs, FA, nd regulr expressions, nd designing them. However, we know tht n

More information

A Primer on Continuous-time Economic Dynamics

A Primer on Continuous-time Economic Dynamics Eonomis 205A Fll 2008 K Kletzer A Primer on Continuous-time Eonomi Dnmis A Liner Differentil Eqution Sstems (i) Simplest se We egin with the simple liner first-orer ifferentil eqution The generl solution

More information

Total score: /100 points

Total score: /100 points Points misse: Stuent's Nme: Totl sore: /100 points Est Tennessee Stte University Deprtment of Computer n Informtion Sienes CSCI 2710 (Trnoff) Disrete Strutures TEST 2 for Fll Semester, 2004 Re this efore

More information

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE M. STISSING, C. N. S. PEDERSEN, T. MAILUND AND G. S. BRODAL Bioinformtis Reserh Center, n Dept. of Computer Siene, University

More information

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching CS261: A Seon Course in Algorithms Leture #5: Minimum-Cost Biprtite Mthing Tim Roughgren Jnury 19, 2016 1 Preliminries Figure 1: Exmple of iprtite grph. The eges {, } n {, } onstitute mthing. Lst leture

More information

Factorising FACTORISING.

Factorising FACTORISING. Ftorising FACTORISING www.mthletis.om.u Ftorising FACTORISING Ftorising is the opposite of expning. It is the proess of putting expressions into rkets rther thn expning them out. In this setion you will

More information

Implication Graphs and Logic Testing

Implication Graphs and Logic Testing Implition Grphs n Logi Testing Vishwni D. Agrwl Jmes J. Dnher Professor Dept. of ECE, Auurn University Auurn, AL 36849 vgrwl@eng.uurn.eu www.eng.uurn.eu/~vgrwl Joint reserh with: K. K. Dve, ATI Reserh,

More information

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza An Unfoling Approh to Moel Cheking Jvier Esprz Lbortory for Fountions of Computer Siene University of Einburgh Conurrent progrms Progrm: tuple P T 1 T n of finite lbelle trnsition systems T i A i S i i

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx Applitions of Integrtion Are of Region Between Two Curves Ojetive: Fin the re of region etween two urves using integrtion. Fin the re of region etween interseting urves using integrtion. Desrie integrtion

More information

A Differential Approach to Inference in Bayesian Networks

A Differential Approach to Inference in Bayesian Networks Dierentil pproh to Inerene in Byesin Networks esented y Ynn Shen shenyn@mi.pitt.edu Outline Introdution Oeriew o lgorithms or inerene in Byesin networks (BN) oposed new pproh How to represent BN s multi-rite

More information

THE PYTHAGOREAN THEOREM

THE PYTHAGOREAN THEOREM THE PYTHAGOREAN THEOREM The Pythgoren Theorem is one of the most well-known nd widely used theorems in mthemtis. We will first look t n informl investigtion of the Pythgoren Theorem, nd then pply this

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

= state, a = reading and q j

= state, a = reading and q j 4 Finite Automt CHAPTER 2 Finite Automt (FA) (i) Derterministi Finite Automt (DFA) A DFA, M Q, q,, F, Where, Q = set of sttes (finite) q Q = the strt/initil stte = input lphet (finite) (use only those

More information

Subsequence Automata with Default Transitions

Subsequence Automata with Default Transitions Susequene Automt with Defult Trnsitions Philip Bille, Inge Li Gørtz, n Freerik Rye Skjoljensen Tehnil University of Denmrk {phi,inge,fskj}@tu.k Astrt. Let S e string of length n with hrters from n lphet

More information

Durable Top-k Search in Document Archives

Durable Top-k Search in Document Archives Durle Top-k Serh in Doument Arhives Leong Hou U, Nikos Mmoulis, Klus Bererih, Sriknt Bethur Deprtment of Computer Siene, University of Hong Kong Pokfulm Ro, Hong Kong {hleongu, nikos}@s.hku.hk Mx-Plnk

More information

Symmetrical Components 1

Symmetrical Components 1 Symmetril Components. Introdution These notes should e red together with Setion. of your text. When performing stedy-stte nlysis of high voltge trnsmission systems, we mke use of the per-phse equivlent

More information

A Study on the Properties of Rational Triangles

A Study on the Properties of Rational Triangles Interntionl Journl of Mthemtis Reserh. ISSN 0976-5840 Volume 6, Numer (04), pp. 8-9 Interntionl Reserh Pulition House http://www.irphouse.om Study on the Properties of Rtionl Tringles M. Q. lm, M.R. Hssn

More information

Determinants. x 1 y 2 z 3 + x 2 y 3 z 1 + x 3 y 1 z 2 x 1 y 3 z 2 + x 2 y 1 z 3 + x 3 y 2 z 1 = 0,

Determinants. x 1 y 2 z 3 + x 2 y 3 z 1 + x 3 y 1 z 2 x 1 y 3 z 2 + x 2 y 1 z 3 + x 3 y 2 z 1 = 0, 6 Determinnts One person s onstnt is nother person s vrile. Susn Gerhrt While the previous hpters h their ous on the explortion o the logi n struturl properties o projetive plnes this hpter will ous on

More information

MAT 403 NOTES 4. f + f =

MAT 403 NOTES 4. f + f = MAT 403 NOTES 4 1. Fundmentl Theorem o Clulus We will proo more generl version o the FTC thn the textook. But just like the textook, we strt with the ollowing proposition. Let R[, ] e the set o Riemnn

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233,

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233, Surs n Inies Surs n Inies Curriulum Rey ACMNA:, 6 www.mthletis.om Surs SURDS & & Inies INDICES Inies n surs re very losely relte. A numer uner (squre root sign) is lle sur if the squre root n t e simplifie.

More information

Computing the Quartet Distance between Evolutionary Trees in Time O(n log n)

Computing the Quartet Distance between Evolutionary Trees in Time O(n log n) Computing the Qurtet Distne etween Evolutionry Trees in Time O(n log n) Gerth Stølting Brol, Rolf Fgererg Christin N. S. Peersen Mrh 3, 2003 Astrt Evolutionry trees esriing the reltionship for set of speies

More information

Discrete Structures Lecture 11

Discrete Structures Lecture 11 Introdution Good morning. In this setion we study funtions. A funtion is mpping from one set to nother set or, perhps, from one set to itself. We study the properties of funtions. A mpping my not e funtion.

More information

POSITIVE IMPLICATIVE AND ASSOCIATIVE FILTERS OF LATTICE IMPLICATION ALGEBRAS

POSITIVE IMPLICATIVE AND ASSOCIATIVE FILTERS OF LATTICE IMPLICATION ALGEBRAS Bull. Koren Mth. So. 35 (998), No., pp. 53 6 POSITIVE IMPLICATIVE AND ASSOCIATIVE FILTERS OF LATTICE IMPLICATION ALGEBRAS YOUNG BAE JUN*, YANG XU AND KEYUN QIN ABSTRACT. We introue the onepts of positive

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

Math 32B Discussion Session Week 8 Notes February 28 and March 2, f(b) f(a) = f (t)dt (1)

Math 32B Discussion Session Week 8 Notes February 28 and March 2, f(b) f(a) = f (t)dt (1) Green s Theorem Mth 3B isussion Session Week 8 Notes Februry 8 nd Mrh, 7 Very shortly fter you lerned how to integrte single-vrible funtions, you lerned the Fundmentl Theorem of lulus the wy most integrtion

More information