Durable Top-k Search in Document Archives

Size: px
Start display at page:

Download "Durable Top-k Search in Document Archives"

Transcription

1 Durle Top-k Serh in Doument Arhives Leong Hou U, Nikos Mmoulis, Klus Bererih, Sriknt Bethur Deprtment of Computer Siene, University of Hong Kong Pokfulm Ro, Hong Kong {hleongu, Mx-Plnk Institute for Informtis Srrüken, Germny {kereri, ABSTRACT We propose n stuy new rnking prolem in versione tses. Consier tse of versione ojets whih hve ifferent vli instnes long history (e.g., ouments in we rhive). Durle top-k serh fins the set of ojets tht re onsistently in the top-k results of query (e.g., keywor query) throughout given intervl (e.g., from June 28 to My 29). Existing work on temporl top-k queries minly fouses on fining the most representtive top-k elements within intervl. Suh methos re not reily pplile to urle top-k queries. To ress this nee, we propose two tehniques tht ompute the urle top-k result. The first is pte from the lssi top-k rnk ggregtion lgorithm NRA. The seon tehnique is se on shre exeution prigm n is more effiient thn the first pproh. In ition, we propose speil inexing tehnique for rhive t. The inex, ouple with spe prtitioning tehnique, improves performne even further. We use t from Wikipei n the Internet Arhive to emonstrte the effiieny n effetiveness of our solutions. Ctegories n Sujet Desriptors H.3.3 [Informtion Serh n Retrievl]: Serh proess Generl Terms Algorithms, Experimenttion Keywors Doument Arhives, Top-k Serh, Temporl Queries 1. INTRODUCTION Consier set of ojets (e.g., we ouments) n sequene of ifferent rnkings of these ojets. The rnkings re -ho (i.e., not pre-efine) n oul e erive from serh opertion (e.g., keywor query). Assume tht the ojets re not stti, ut hnge over (e.g., ifferent versions of we pges), n Permission to mke igitl or hr opies of ll or prt of this work for personl or lssroom use is grnte without fee provie tht opies re not me or istriute for profit or ommeril vntge n tht opies er this notie n the full ittion on the first pge. To opy otherwise, to repulish, to post on servers or to reistriute to lists, requires prior speifi permission n/or fee. SIGMOD 1, June 6 11, 21, Ininpolis, Inin, USA. Copyright 21 ACM /1/6...$1.. tht the serh opertion refers to intervl. Then, the ifferent rnkings re sensitive to the hnge of ouments uring the query intervl. The Internet Arhive ( is hrteristi exmple of oument rhive, where serh on ifferent versions of ouments is possile. A given intervl (e.g, June 28 Otoer 29) n set of keywors (e.g., Welsh footll plyer ) efine sequene of rnkings of ll ouments over. The orer of oument my hnge if the oument is reple y newer version tht hs ifferent relevne to the keywors. This pper stuies the prolem of fining ojets tht re onsistently in the top-k throughout the sequene of rnkings efine y intervl[t,t e) n set of keywors W. The min pplition is fining ouments tht re onsistently relevnt to speifi sujet over given perio. The result of this query hs size to k; queries n hve empty results if k is smll or the rnkings hnge rilly. Empty results n e voie y relxing the onsistent top-k onstrint of the query using rtio vrile r, < r 1: we seek for ojets tht re in the top-k for t lest r (t e t ) in the [t,t e) intervl. We ll this prolem urle top-k serh. Wikipei is one of the systems where urle top-k serh n e pplie. A pge in Wikipei is typilly moifie y eitors over. For instne, Figure 1 shows how n entry out footll plyer Ryn Giggs evolves; this pge hs een moifie over 35 s from 24 to present. () in Otoer 27 () in My 28 () in Otoer 29 Figure 1: Different versions of topi Ryn Giggs in Wikipei As n exmple, onsier five ouments 1 5,hvingifferent versions over, n query efine y intervl [t,t e) n set of keywors. The (relevne) of eh oument over, within the intervl [t,t e), normlize to e within [, 1] is shown in Figure 2(). For instne, oument 4 hs four versions, with s 1,.7,.4, n.25. Figure 2() shows the suintervls of [t,t e), within whih the rnking remins onstnt. A risp urle top-3 query with r=1, hs 2 s the only result. If the query is relxe to r =.6, { 1, 2, 5} eomes the query result. This query is not only pplile to oument rhives, ut in generl for pplitions tht nee to merge of -ho rnkings, whih re -prmetri. For exmple, onsier the hnging ttriutes of stoks over (e.g., prie, volume, et.) n onsistent top-k query for n ggregte (e.g., verge) of n -ho suset of these ttriutes. Other pplitions inlue expert fining 555

2 t t e () Vrile over () Rnking in su-intervls Figure 2: Relevne of ouments over n fining informtion soures tht one shoul susrie to. For the former, onsier pulition tse n the query olumn stores. Inste of the ouments, the uthors re rnke n their ggregte is erive from the s of their relevnt pulitions. Our query fins people who hve onsistently proue relevnt work: these re onsiere long- experts on the topi. For the ltter, onsier informtion soures suh s logs (twitter users) tht one typilly susries to (follows). The user my wnt to susrie to those tht onsistently inlue relevnt mteril to set of keywors. Bererih et l. [6] introue -trvel keywor queries in oument rhives. Given intervl n n ggregte funtion (relevne moel), -trvel keywor query returns the most relevnt ouments to the keywors oring to their ggregte s ompute over the intervl. Typil relevne moels ompute the mximum (MAX), minimum (MIN), or verge (AVG) s. For exmple, the MAX-ggregte s of the 5 ouments in Figure 2() re {.6,.95,.5, 1,.75}. Although previously stuie -trvel keywor queries shre some similrity to urle top-k serh, they nnot iretly e use for this new query. We use rel exmple to show the speil nture of urle topk serh. We use tset from [22], whih ontins 11,328 URLs from Google Diretory 1 n fin their rhive versions in 24 t the Internet Arhive. Consier query with keywors helthy n poliy n intervl the thir n fourth qurters in 24. We set k = 2 n r =.5 (r is tune to ensure tht the urle query proues the sme numer of results s other moels). The numer of ifferent ouments etween the result of the urle top-k query n the relevne moels MIN, MAX, n AVG is 5, 8, n 8, respetively. The results of ggregte moels re erive from the utmost/verge s, whih re not iretly relte to onsisteny. Tle 1 shows the ifferent URLs ompute y the MIN moel n the urle query. DUR MIN (MIN DUR) ontins the URLs whih exist in the urle (MIN moel) result ut not in the MIN moel (urle) result. Note tht the URL is news mgzine wesite n it isussehelthy n poliy t some in 24 ut not onsistently. Therefore, it is not in the result of the urle query. On the other hn, the urle top-k results re onsistently relevnt to the query within the given intervl, n exlue noisy outliers. The min hllenge in proessing urle top-k queries is tht they re se on n -ho set of multiple keywors. This mens tht the rnkings of the oument versions tht overlp with the query intervl re not pre-efine n n only e etermine from the inverte lists of the query keywors. For exmple, the ontent of Figure 2() is not pre-ompute, ut ynmilly erive 1 Tle 1: Exmple of urle top-k serh results DUR MIN MIN DUR iwh.org reserh.rp.org/helth hometown.ol.om/ihin y interseting the inverte lists of the query keywors (these ontin the oument versions tht inlue the keywors), skipping oument versions tht re outsie the query intervl. For the keywor queries we onsier in this work, ouments re typilly rnke using relevne moel suh s Pivote Normliztion [23], Okpi BM25 [2], vrints of tf-if [24], or lnguge moeling pprohes [19]. The otine relevne s n e represente s sums of keywor-speifi ontriutions. Top-k ggregtion tehniques [11] re immeitely pplile, if the relevne s of oument versions to eh keywor re pre-ompute n the versions re orere in the orresponing inverte lists. In this pper, we propose n effiient, speilize tehnique for urle top-k queries. Our metho is se on storge orgniztion, whih sorts the ontents of the inverte lists for eh keywor in esening orer. While essing the lists of the query keywors in prllel, our metho mintins for eh i, 1 i k, n in the query line [t,t e) pturing the s of the urrent top-i results for eh stmp in the query intervl. At the sme, we mintin nites n pturing, for eh stmp in [t,t e), the est possile of ny ojet, whih is not urrently in the top-k t tht stmp. If, uring retrievl, the k-th n is ove the nites n, then the top-k results t eh stmp re onfirme, whih llows us to post-proess them n ientify the response set of the urle top-k query. We pir our metho with severl optimiztion tehniques tht minimize the expensive mintenne of the nites n n elerte -trvel serh with the help of sptil inexing. In ition, we propose trnsforme R-tree inex for inexing the inverte lists on the isk. With the help of this trnsformtion, we re le to eompose urle query into set of simple top-k queries n one urle query with smller serh spe. This wy, not only the I/O ut lso the omputtionl ost is reue. More interestingly, our urle top-k serh pproh erives the top-k results t every stmp, efore omputing the urle result. Hene, urle results of ifferent onsisteny n e foun y progressively inresing the prmeter r. The rest of the pper is orgnize s follows. Setion 2 esries work relte to the prolem uner stuy, whih is formlly efine in Setion 3. Setion 4 presents seline pproh to solve urle top-k queries. Our optimize tehnique is esrie in Setion 5. Setion 6 esries n inexing tehnique tht improves performne. Setion 7 empirilly evlutes our propose solution using Wikipei t. Finlly, Setion 8 onlues the pper n isusses future reserh iretions. 2. BACKGROUND AND RELATED WORK In this setion, we review previous work whih is losely relte to our prolem, suh s top-k serh, inexing versione ouments, n -trvel queries in oument tses. 556

3 2.1 Top-k Queries Fgin et l. [11] propose n nlyze methos for top-k merging of rnke lists, se on sorte n rnom esses. In n Informtion Retrievl (IR) system, the relevne s of keywor to ll ouments (or oument versions) re preompute n store in n inverte list. If the lists re orere y, we n pply the methos of [11] to fin the most relevnt ouments (or versions) to given set of keywors. As rnom esses t inverte lists re signifintly more expensive ompre to sorte ones, we esrie the no-rnom esses (NRA) lgorithm (Algorithm 1), whih omputes top-k results using sorte esses only. NRA itertively retrieves ojets o from the rnke inputs n mintins the upper γ u o n lower ouns γo l of their re the tomi s of o ggregte s. Bouns γo u n γo l seen so fr plus the highest n lowest possile from the lists whih hve not een seen. Let W k e the set of the k ojets with the lrgest γo l. If the smllest vlue in W k is t lest the lrgest γo u of ny ojet o not in W k,thenw k is reporte s the top-k result n the lgorithm termintes. LARA [18] is n effiient implementtion of NRA, whih mnges the nite results in lttie n minimizes reunnt oun omputtions n heks. Algorithm 1 NRA Algorithm NRA(sorte lists L) 1: perform sequentil esses to eh L i ; 2: for eh new ojet o upte γo l ; lower oun 3: if less thn k ojet hve een seen so fr then goto Line 1; 4: for eh ojet o seen so fr ompute γo u ; upper oun 5: W k := the k ojets with the highest γo l ; 6: l := min{γo l : o W k } 7: u := mx{γo u : o/ W k } 8: if l u then return W k ; otherwise goto Line 1; [16] is the most relevnt piee of work to urle top-k serh. Given tse of -series (spnning the sme history) n intervl [t,t e) (whih is ontine in the history), the prolem is to fin the -series tht re onsistently in the top-k set for eh stmp of the query intervl. For exmple, ssuming tse of stok trnstions, one might wnt to fin stoks tht re onsistently in the top-2 y turnover, uring the first three months of 29. There re ertin ifferenes etween this prolem to the urle top-k queries tht we stuy here. First, the ontents of the -series in [16] re pre-efine (i.e., not -ho), therefore the vlues on whih the top-k funtion is pplie re pre-ompute n n e inexe. On the other hn, in our prolem, rnking is efine se on n -ho set of keywors. Although relevne with respet to single keywor is pre-efine, pre-proessing n inexing for n -ho keywor omintion is not possile. Seon, the efinition of [16] lks the prmeter r, whih relxes the efinition of urility n vois otherwise empty query results (i.e., for r =1). Finlly, the solution suggeste in [16] oes not sle well. For eh ojet o, sequene of rnks for o in the whole history (e.g., 1st t the 1st stmp, 3r t the 2n stmp, et.) is pre-ompute. To fin onsistent top-k ojets uring intervl, we serh within eh ojet list for its rnks in the intervl n we output only those ouments whose rnks in ll orerings in the intervl re t most k. The ost of this metho is proportionl to the numer of ojets (i.e., -intervl serh first shoul e pplie t eh ojet list, then the retrieve rnks hve to e esse n ompre with k), so the metho oes not sle well with the numer of ojets. Pruning strtegies in IR, se on speilly esigne inverte files, were propose in [1, 2, 26]. The ontents of the inverte lists n e orere y oument is or s. The first pproh is the lssi implementtion, whih enles multi-wy merging when proessing query with multiple keywors. The sizes of inverte lists n e reue y storing the ifferene (i.e., gp) etween onseutive is. If the inverte lists re store in this wy, simple solution oul e use to ompute urle top-k queries. First, we trverse the omplete inverte lists n put ll entries tht interset the query intervl into temporry rry. Next, eh reor in the temporry rry is split into strt n en event n ll events re sorte y. The urle top-k result is ompute y running simple snline lgorithm from omputtionl geometry. The snline is run on hep of size k mintining the urrent top-k, with n itionl t struture for elements tht hve een in the top-k ut re urrently not. Suh n pproh is not effiient in prtie, ue to the lrge overhe of generting, sorting, n snning potentilly huge numer of oument versions. The seon pproh use in [1] keeps oument entries in the inverte lists orere y s n uses Algorithm 1 to terminte list intersetion erly. However, the inverte lists nnot e esily ompresse, sine the s my e in floting point formt. In view of this, hyri pproh is propose in [2, 26], whih uses two level struture. The ouments in list re eompose into ifferent segments se on their s n the ouments in one segment re sorte y is to filitte ompression. NRA use lso on this t representtion, ut this ll segments hving the sme ouns re esse in th n fter eh th ess the termintion onition is verifie. 2.2 Inexing versione oument olletions Inexing versione oument olletions hs een stuie in [7, 25, 14, 13]. Broer et l. [7] propose tehnique tht exploits lrge ontent overlps etween ouments to hieve reution in inex size. Eh version is prtitione into set of frgments, e.g., n emil is prtitione into two frgments, sujet n oy. The frgments etween versions re orgnize in tree struture n eh hil inherits the shre frgments from its prent. This solution mkes strong ssumptions out the struture of oument overlps. [25] uses ontent-epenent prtitioning tehnique [21] to prtition pge into smller frgments suh tht more frgments re ommon etween versions. More reent pprohes y Hersovii et l. [14] n He et l. [13] exploit ritrry ontent overlps etween ouments to reue inex size. [14] ttempt to fin susets of terms tht re ontine in onseutive versions of oument. Eh suset is store into virtul oument n the totl storge ost is optimize y minimizing the overll numer n size of the virtul ouments. [13] propose two-level inex ompression tht improves the query proessing. This pproh groups similr union-ouments into lusters, where union-oument ontins ll terms in the orresponing versions, n the terms re ompresse lolly for eh luster. This struture gretly reues storge n still preserves the hierrhil reltionship etween ouments n versions. All these inexing methos primrily im t reuing the spe require for storing the versione oument olletions, tking vntge of the similrity etween versions. However, the urle top-k serh prolem tht we stuy in this pper is CPU-intensive n oes not enefit iretly from suh ompression tehniques, sine ll versions my hve to e esse n reonstrute from the ompresse storge sheme. 2.3 Time Trvel Queries in IR There is signifint oy of work on nlyzing lrge text olletions over. Bnsl n Kous [4] esrie full-flege system for serhing the logosphere. Among others, the system sup- 557

4 ports the etetion of stle keywor lusters s esrie in [3]. Erlier work y Kleinerg [15] n Duinko et l. [1] lso fouses on the nlysis of text n tg strems, respetively, to etet ursty keywors or tgs. However, ll of the three forementione pprohes operte on the oument olletion s whole n not on -ho keywor query results. Other work hs investigte the use of temporl informtion s mens to otin etter rnking of query results. Li n Croft [17], for instne, propose lnguge moeling pproh tht ftors in the pulition of the oument. Del Corso et l. [8] fous on news n propose rnking methos tht tke into ount when news rtile ws pulishe n linke to y other news rtiles. Bererih et. l. [5] propose temporl text inexing tehnique for we ouments whih supports -trvel queries. In typil inverte file, eh inverte list ontins posting (, s), where is the oument i n s is the relevne of the term in oument. To support inexing of versione ouments, in [5], the inverte file is extene, suh tht eh posting inlues intervl λ. The temporl informtion hrterizes the vliity intervl of the inexe version of. The ojetive of -trvel query is to ientify the top-k ouments with the highest ggregte uring the query intervl, s we expline in the Introution. t IL t1 t2 t3 t4 t5 t6 t7 e () single list t IL 1 IL 2 IL 3 t1 t2 t3 t4 t5 t6 t7 e () multiple lists Figure 3: Comprison etween one n multiple inverte lists Consier set of postings for keywor w n top-k trvel query q on w with intervl [t 2,t 3) sshowninfigure 3(). The query n e proesse y essing the postings in eresing orer, ignoring those tht o not overlp with the query intervl. While oing so, we n use upper ouns for the susets of the query intervl where postings hve not een seen (i.e., run version of NRA) n t some point onfirm the k ouments with the est ggregte s. Although only four postings (,,, n) re vli in [t 2,t 3), the whole inverte list hs to e re, in the worst se. To tkle this prolem, Bererih et. l. [5] propose prtitioning pproh, whih splits the inverte list with the entire posting set into smller lists. For instne, we n prtition the inverte list of Figure 3() into three su-lists s shown in Figure 3(). Eh posting is store in ll lists whih it temporlly intersets (e.g., posting is store into IL 1 n IL 2 ). Now, query q temporlly intersets only with list IL 2, therefore only five postings hve to e re (inste of 13 if IL of Figure 3() is use). One strtegy is to mterilize su-lists for ll elementry intervls. For instne, we oul rete 7 su-lists for the t in Figure 3(). This hieves exellent performne for queries with short intervls, ut lot of spe is wste ue to replite storge of postings tht interset multiple list intervls. In ition, queries with long intervls ess multiple lists with overlpping ontents. In view of this, [5] stuy the optimiztion prolem of splitting the lists to suitle set of su-lists with or without onstrint for the spe oupie y them. 3. PROBLEM DEFINITION Prolem 1 is forml efinition of the urle top-k query. Although this efinition is tilore for temporl keywor serh in rhives of ouments with versions, we n pt it (n the solutions propose in this pper) to pply on ny type of t, with ifferent versions long ritrry imensions (e.g., oument versions se on lotion, or log items groupe y user). PROBLEM 1. Let D e set of n ouments. Eh Dhs numer of versions, n eh version v of is hrterize y vliity intervl [v.t,v.t e). The intervls of two ifferent versions of the sme oument my not overlp. Let q e query, onsisting y set of keywors q.w n intervl [q.t,q.t e). For stmp t [q.t,q.t e), the relevne of oument Dto q is efine y pplying n IR relevne moel on the version v of for whih t [v.t,v.t e), usingq.w. The relevne is zero if no suh version exists. Given n integer k, <k<nn rel r, <r 1, the urle top-k serh prolem fins ll D, suh tht ppers in the set of top-k most relevnt ouments to q within [q.t,q.t e) for t lest r (q.t e q.t ). 4. PRELIMINARY SOLUTIONS In this setion, we esrie some iret pttions of NRA for solving the urle top-k serh prolem. For the ese of isussion, we ssume tht ll postings for eh keywor re sorte y their s n mterilize into single inverte list. The use of multiple inverte lists per keywor (s propose in [5]) is orthogonl to the presente solutions n will e isusse lter. 4.1 Brute-fore metho Consier query q with set of keywors W n intervl q.λ =[q.t,q.t e). LetΛ enote set of su-intervls suh tht (i) no two intervls in Λ overlp, (ii) their union equls [q.t,q.t e) n (iii) eh oument version either fully overs or oes not overlp with ny su-intervl in Λ. Then, fining the top-k results in eh su-intervl suffies to ompute the urle top-k result. Conition (iii) ensures the uniqueness of eh oument in su-intervl n onitions (i) n (ii) gurntee ompleteness. After olleting the top-k results from ll su-intervls, we interset them, while mesuring for eh oument the totl temporl length of the su-intervls where is in the top-k result. If this length is t lest r (t e t ), is urle top-k result. Algorithm 2 is greey metho tht fins Λ inrementlly, y essing the postings tht interset with q.λ. Initilly, we set Λ= q.λ = [q.t,q.t e). For eh posting p with intervl [p.t,p.t e), we fin the suset Λ of Λ suh tht ll intervls in Λ interset p (line 1). If the egin/en stmp p.t of p is insie of n intervl [t,t e) in Λ, this intervl is split n reple y the two intervls [t,p.t) n [p.t, t e) (line 3). We n esily show tht this lgorithm omputes unique orret set of (mx-length) su-intervls tht stisfy onitions (i), (ii), n (iii). We n then ompute the top-k results within eh intervl n the urle top-k result. Algorithm 2 Intervl Set Mintenne mintinis(intervl set Λ, new intervl [p.t,p.t e)) 1: Λ is suset of Λ tht eh intervl intersets with [p.t,p.t e); 2: if p.t or p.t e is insie [t,t e) Λ then 3: reple [t,t e) y two su-intervls; For exmple, ssume tht our query ontins 1 keywor n the inverte list ontins four postings,,,n, s shown in Figure 4. Postings splits the entire intervl λ =[q.t,q.t e) into two λ 1 558

5 n λ 2 (λ 2 is the union of λ 3 n λ 4, shown in the figure). Next, posting intersets intervl λ 2 (i.e., Λ = {λ 2}). λ 3 n λ 4 re rete sine.t is insie of λ 2. Next, posting intersets intervls λ 1, λ 3,nλ 4. The strting point of is not insie λ 1,soλ 1 is not split to su-intervls. On the other hn, the enpoint of is ontine in λ 4,soλ 4 is reple y new su-intervls λ 5 n λ 6 (see Figure 4()). In turn, posting splits λ 3 n λ 5, ening with 6 su-intervls in totl (not shown in the figure). After the sets Γ l λ n Γ u λ re rete for the new intervl λ, we use the urrent posting p. s the next input to NRA to upte the ouns n the urrent top-k results. 2 If NRA onfirms the top-k result in λ, we mrk intervl λ s finlize. Tht is, no further splits re performe to λ, if new postings re foun to interset it lter. If ll intervls re mrke s finlize, we merge their top-k results to ompute the urle top-k set. Otherwise, we get the next posting from the inverte lists () fter 2 esses () fter 3 esses Figure 4: Exmple of intervls mintenne 4.2 Dynmi Aptive Algorithm The rute-fore solution is ineffiient sine (i) it res ll postings tht interset the query intervl to rete the su-intervls set n (ii) mny su-intervls re rete, whih require lrge numer of top-k queries. For instne, we hve 6 su-intervls fter we proess 4 postings in Figure 4, mening tht 6 top-k serhes shoul e exeute efore we n ollet the urle top-k result. Note tht some su-intervls nee not e ompute t ll if k is smll. For instne, if k =1, we n terminte our serh fter posting is re sine we n fin the top-1 result for eh intervl lrey. As shown y Algorithm 2, the su-intervls n e mintine inrementlly. Thus, we n mintin the su-intervls n exeute top-k ggregtion simultneously. For eh su-intervl, NRA is invoke to ompute the top-k result. Aoring to Algorithm 1, we hve to keep the lower oun γo l n the upper oun γo u for every ojet o seen so fr. Let Γ l λ n Γ u λ e the set of lower n upper ouns in intervl λ, respetively. The usge of these oun sets will e isusse shortly. Algorithm 3 Dynmi Aptive Algorithm (sorte posting lists L) 1: p :=ess the next posting from L; 2: mintinis(λ, [p.t,p.t e)); 3: for ll new λ Λ o 4: if λ is finlize, goto line 3; 5: rete or uplite Γ l n Γ u for intervl λ 6: if intersets(λ, [p.t,p.t e)) then 7: fee p to NRA for λ using Γ l n Γ u ; 8: if NRA returns top-k result, mrk λ s finlize; 9: if ll λ Λ re finlize then 1: ompute urle top-k result; 11: else 12: goto Line 1; Aoring to Algorithm 2, if more postings re re from the lists, more su-intervls re rete. Therefore, we propose tehnique, lle Dynmi Aptive Algorithm (), to terminte our serh s erly s possile (see Algorithm 3). This is possile, if the postings in the lists re sorte in esening orer. First, we ess the postings sequentilly n the su-intervls re mintine y Algorithm 2. If n existing intervl is split into two new ones, euse it ontins one enpoint of the urrently proesse posting p, then for eh of the two new intervls λ, Γ l λ n Γ u λ re replite from the ol split intervl u= () fter 1 st ess 3 u= u= IL 1 IL 2 1,.8,[1,5) 2,.9, [5,8) 3,.5,[1,3) 1,.8, [1,5) u= () fter 3 r ess u= u= () fter 2 n ess u= u=1.3 2 u= () fter 4 th ess Figure 5: An exmple of Dynmi Aptive Algorithm Figure 5 emonstrtes the Dynmi Aptive Algorithm. The postings of two keywors store into two inverte lists IL 1 n IL 2, s shown t the top of the figure. Assume tht k =1.After the first ess, Γ l λ 1 n Γ u λ 1 re rete. 1 is urrently in the topk set W k of λ 1 n u is 1.8 in λ 1 (note tht u is the highest upper oun of ny o / W k ). Next, we get 2 from IL 2 n we rete Γ l λ 2 n Γ u λ 2. u in oth intervls re 1.7 now. In orer to improve reility, we remove ll t whih re not in W k in the susequent figures. After the thir ess ( 3 from IL 1), λ 1 is split into two intervls λ 3 n λ 4 n their Γ u n Γ l re uplite from λ 1. u in λ 3 is 1.4 sine 3 hs een seen in this intervl with the upper oun γ u 3 (.5+.9=1.4); u in other intervls is lso 1.4, whih is the highest ggregte from ll lists. Finlly, fter the fourth ess, we upte the 1 s γ l to 1.6 in λ 3 n λ 4. In ition, u eomes 1.3 in ll intervls. Aoring to the NRA termintion onition, intervls λ 3 n λ 4 return 1 s their top-1 result. 5. THE BAND APPROACH During the exeution of, mny Γ l n Γ u sets re rete. These ffet negtively not only the exeution ut lso the memory usge. In this setion, we introue new metho tht solves the urle top-k prolem using the shre exeution prigm, se on the oservtion tht two neighor intervls usully hve similr top-k results. In nutshell, our metho performs similr splits s, however, we o not mintin Γ l n Γ u t eh su-intervl. Inste, we mintin (in ompresse representtion) for eh i, 1 i k, the n (i.e., ounry) for the i-th worstse t eh unit in the query intervl. In ition, we 2 In se of split, the new posting is fe to only one of the two intervls: the one tht intersets the posting. 559

6 mintin in nites n the est-se of ll ojets urrently not in the top-k set for eh unit. If the nites n rops elow the k-th n t ll units, we n gurntee tht the top-k results re foun t ll stmps n we n terminte. 5.1 Top Bns Computtion First, we efine the onept of top-k n. Consier the finest grnulrity unit of the imension (e.g., ys) n ssume tht posting p spns u units of this grnulrity (i.e., the intervl [p.t s,p.t e) inlues u si units). Then, p n e moele s sequene of u postings p = τ 1 p,τ 2 p,...,τ u p, eh spnning single unit. Note tht the other ttriutes (i.e., oument i n ) re ommon to ll unit-postings of p. Assuming representtion, where eh posting is reple y its unit-postings, Definition 1 formlly efines the top-k n. DEFINITION 1. The top-k n is sequene of τ t p unit-postings, one for eh unit t in [q.t s,q.t e), whereτ t p is the unit-posting mong ll those vli t t with the k-th highest. 1. f g i h () 1 Postings e j g 1 h 2 h 1 e 2 e top-1 n top-2 n () top-1 n top-2 ns Figure 6: An exmple of top-k ns Figure 6 shows n exmple of top-k ns. Consier keywor with 1 postings ( to j), s shown in Figure 6(). The top-1 n top-2 ns re shown in Figure 6(). The postings tht re not relte to the top-1 n top-2 ns re remove. For simpliity, we overlo the nottion τp i to enote segments of onseutive unitpostings from the sme posting p n use i to istinguish etween segments from the sme posting p. The top-1 n onsists of 3 segments {τ 1,τ 2,τ 1 }. Note tht τ 1 n τ 1 re the sme s postings n respetively sine there is no other segment etter thn or in their vli intervls. Posting is eompose into three segments n only τ 2 ppers on the top-1 n. By removing ll segments in the top-1 n (i.e., τ 1, τ 2,nτ 1 ), the top-2 n eomes the top-1 n of the remining segments. This importnt property helps us to mintin the ns itertively. Algorithm 4 shows reursive metho tht mintins the set of top-k ns for single keywor queries (multiple keywors queries will e isusse shortly). T top i enotes the top-i n n it is moele y set of segments. Let T ins e set of new segments tht shoul e use to upte the top ns. Eh τ T ins is eompose into set of smll segments τ = {τ 1... τ n }, oring to τ s intersetions with T top i. For instne, in Figure 7, posting is eompose into 3 segments {τ 1,τ 2,τ 3 }. After we ollet the set of segments {τ 1,..., τ n }, we ompre them with the top-i n T top i one t. If τ i oes not interset with ny segment in T top i (line 5), τ i is inserte into T top i sine it fills urrent gp in the top-i n. Suppose tht there is τ top interseting with τ i t t. If the of τ top is not worse thn the of τ i, τ i is move to T next to e proesse in the next n (line 7). For instne, τ 1 in Figure 7() will e proesse in the top-2 n. If τ i hs higher thn τ top, we eompose τ top t t, whih is the enpoint of τ i insie τ top. Consier gin the exmple of Figure 7(). Segment τ 1 intersets with τ 3 n its is lower thn the of τ 3. Therefore, it is eompose into {τ 2,τ 3 }. τ 3 n τ 2 hve the sme vliity intervl (τ 3.λ = τ 2.λ). Algorithm 4 Top-k Bns Insertion insertbn(top-i n T top i, set of segments T ins,intk) 1: set T next := 2: for τ T ins o 3: for eh τ top T top i, eompose τ if τ τ top, suh tht τ is eompose into {τ 1,..., τ n } 4: for eh τ i o 5: if not interset(τ i,τ top ) then 6: insert into T top i 7: else if τ top.s τ i.s then 8: insert τ i into T next ; 9: else interset t t 1: use t to eompose τ top into τ top1 n τ top2 11: # ssuming tht τ top1.λ = τ i.λ 12: reple τ top y τ top2 n τ i 13: insert τ top1 into T next ; 14: T ins := T next ; 15: if i<kthen 16: return insertbn(t top i+1, T ins, k); 17: else 18: return T ins ; Bse on their s, τ 3 n τ 3 re inserte into the top-1 n n τ 2 is store in T next to e proesse t the next ll of the lgorithm (line 16), whih omputes the next n. 1 1 () insertion of () new top-1 n Figure 7: Insertion in top-k n This proess gurntees tht the top-k results t ll stmps re equivlent to the results of the top ns. One we ollet ll top ns, we proess to ompute the urle top-k result. So fr, we hve isusse how to mintin the top ns for single keywor queries. If we hve multiple keywors, we re the postings from eh keywor (in prllel) n insert eh of them into the top ns using Algorithm 4. If the oument version in the urrent posting hs een seen t some other keywor list efore n it is in the top-i n, we remove it from the n n reinsert it onsiering its upte relevne (whih is inrese ompre to its previous prtil ). Note tht the removl proess oes not require to ess ny t outsie the top-k ns. More preisely, this oes not require ny hnge t the top-(i +1)to top-k ns sine the relevne is only inresing. 5.2 Cnite Bn Computtion The simplest wy to ompute top-k ns is to re ll postings from eh keywor one n exhustively, while upting the ns t eh reing. However, we woul like to terminte the esses to the inverte lists s erly s possile. Therefore, we investigte n pproprite ess pln n termintion onition for our n pproh. If we ess the postings in eresing orer of their s (i.e., like ), termintion onition n e erive with the help of typil top-k ggregtion pproh. Before we isuss our solution, we efine the onepts of nite ontiner C n nite n in Definitions 2 n 3. DEFINITION 2. The nite ontiner C stores the segments of oument versions whih hve een seen so fr ut they re not inlue in top ns. 56

7 DEFINITION 3. The nite n ontins the segments tht re the top-1 n of the nite ontiner C. Note tht the upper oun u in NRA is ompute y the highest possile s γo u from the ojets tht re not in W k (where W k ontins the k ojets with the highest γo l ). Similrly, we store ll segments tht re not in our top ns to our nite ontiner C. For eh segment τ i in C, its is set to e the upper oun γ u of τ i. We ompute top-1 n from C n this is our nite n. After every ess, if we hve the top-k n n the nite n, it is possile to terminte retrievl if the onitions of Lemm 1 re stisfie for every stmp in the query intervl. LEMMA 1. The top-k result t stmp t hs een store in the top ns if n only if the of the k-th n t t is not worse thn (1) the of the nite n C t t n (2) the sum Ψ of the lst seen s t ll lists. Conition (2) is heke if there is no element in C tht intersets t. Algorithm 5 shows the pseuooe of our n pproh. We use set opertion λ to enote omprisons over intervl λ. For instne, T top λ T n mens tht t ny stmp t in λ the segments in T top tht re vli t t re not worse thn the segments in T n tht re vli t t. Similrly, T top λ Ψ enotes tht the segments in T top re not worse thn the line efine y Ψ. For the urrent posting, if segment hs een inserte into the top ns or the nite ontiner, we remove it n reinsert it with n upte (lines 2-4). Then, suroutine insertbn is lle (Algorithm 4), whih uptes the top ns inrementlly y inserting T ins n returns the lrey-seen set of segments T ret tht re outsie the top ns (line 5). Next, we insert ll segments in T ret into the nite ontiner C n ompute the nite n (lines 6-9). Finlly, if ll stmps meet the onitions of Lemm 1, we terminte our serh n proee to fin the urle top-k result using the top ns. Algorithm 5 Bn Bse Algorithm BBA(sorte lists L) 1: p :=ess the next posting from L; 2: for ll τ i p o 3: remove τ i from top n / nite ontiner; 4: set τ i.s to its γ l n insert into T ins ; use lower oun 5: T ret :=insertbn(t top 1,T ins,k); top ns 6: for ll τ j T ret o 7: insert τ j into into C; 8: set τ j.s to its γ u, τ j C; use upper oun 9: T1 n :=ompute top-1 n from C; nite n 1: if T top k q.λ T1 n n T top k q.λ Ψ then 11: hek urle top-k result; 12: else 13: goto Line 1; We use the t from Figure 5 to emonstrte our n pproh, for k =1. The first two postings (in roun-roin orer from the lists) re inserte into the top-1 n using line 5 n Ψ eomes.8+.9 =1.7. When we insert the thir posting, it fils to enter the top-1 n sine its lower oun γ l =.5 is not etter thn the γ l =.8 of τ 1 1. Therefore, suroutine insertbn returns {τ 1 3 }. After tht, we insert it into the nite ontiner C = {τ 1 3 },nwesetτ 1 3 s to upper oun γ u = = 1.4. The nite n is then ompute, s shown in Figure 8(). When the fourth posting is re, we remove τ 1 1 from the top-1 n sine the of τ 1 1 is upte y this ess. We upte its to = 1.6 n reinsert it into the top ns. Next, we reompute our nite n, s the of τ 1 3 eomes = 1.3. Note tht top-1 result in intervl [1, 5) hs een onfirme sine the onitions (T top [1,5) T n n T top [1,5) Ψ) eome true. The lgorithm will terminte lter, fter the intervl [5, 8) is onfirme top-1 n 1 1 =1.8 n n () fter 1 st ess 1 3 top-1 n 1 1 = n n () fter 3 r ess top-1 n 1 1 = n n () fter 2 n ess top-1 n = n n () fter 4 th ess Figure 8: Exmples of Bn Bse Algorithm 5.3 Optimiztions Note tht the min ifferene etween the top ns n the nite n is tht the lower oun γ l is use in the top ns ut we use the upper oun γ u in the nite n. Unlike the segments in top ns, the entire nite n oul e hnge when new posting is re from list ue to the hnges of upper ouns. Consequently, the nite n is reompute t eh loop of Algorithm 5. The reomputtion of the nite n might e very expensive when ontiner C eomes very lrge. To reue this ost, we use powerset-se pproh to support inrementl uptes t the nite n. In ition, we propose gri-se inex to mnge the t in the nite ontiner Lttie Bse Continers Similr to LARA [18], we rete set of ontiners C x, one for eh omintion of the m inputs {L 1,..., L m} (rell tht eh input is the inverte list of keywor). For eh segment τ i in C, τ i is store into C x if τ i hs een seen extly in the x inputs. The prtil ifferene etween C n the olletion of C x s is tht we mintin the lower ouns γ l of the segments in eh C x,s oppose to mintining the upper ouns of ll segments in C. As for the top ns, the top-1 n T Cx 1 for eh C x n e ompute inrementlly. Tht is, when we insert new segment τ i into C x, if this segment hs een seen y other lists, we remove it from its previous ontiner C y n T Cy 1. (Note tht fter the removl of τ i from C y, C y shoul e upte ppropritely. This will e isusse in next susetion.) Finlly, we insert τ i with n upte γ l into C x n ll suroutine insertbn to mintin the top-1 n of C x. Now, let us see how we n erive the nite n in C n use it in BBA. This n e esily one y merging the nite ns of ll C x s. We onurrently trverse these ns in orer n for eh C x we to the of its top-n segment n the sum of the lst s from the lists L where / x. Then we ynmilly erive n upper oun for the segments of C x. The ynmilly erive upper ouns for eh C x re ompre to ompute the glolly highest upper oun in ll C x s t eh stmp. This is equivlent to the nite n of C Gri-Bse Segments Mngement If we o not mnge the segments in C x properly, we might hve to ess ll segments in C x eh segment from C x 561

8 hs to e move to nother nite set (euse it hs een seen t new input) n the nite n for C x hs to e upte. To perform this opertion effiiently, we use gri inex to ivie the two imensionl / spe for eh C x into ells. A segment is ssigne to ell if it intersets it. An exmple is shown in Figure 9(). For instne, segment τ 1 3 is inserte into 6 ells of the first row. If segment τ is elete from C x, it is remove from the orresponing ells. If τ ws prt of C x s top n, we must seek for replement in the n. To o this, we first hek the ells tht interset τ, if they re empty the ells elow, et () Retngle Gri 1 3 () Stripe Gri Figure 9: Exmples of ifferent Gri Inies 1 1 In our implementtion, in orer to voi the replition of segments to multiple ells, we use gri with only horizontl stripes suh s tht shown in Figure 9(). As segments re horizontl, no segment is replite. Uptes re lso more effiient in this se. In orer to support fst serh uring top n uptes t C x,we orer the segments within eh ell in orer to lote fst the ones tht overlp the remove segment. 5.4 Optimize Bn Bse Algorithm An optimize version of BBA, whih inlues ll optimiztions of Setion 5.3, is shown in Algorithm 6. The min hnges from BBA to re lines 3-5 n One segment τ i is remove from nite ontiner C y, we ess the orresponing gri ell of C y to upte its n (lines 4-5). Line 11 lls suroutine insertbn to ompute the top-1 n of C x,whereτ i is inserte. Finlly, the omplete nite n is ompute y the set of nite ns (line 12). Algorithm 6 Optimize Bn Bse Algorithm (sorte lists L) 1: p :=ess the next posting from L ; 2: for ll τ i p o 3: remove τ i from top n / nite n; 4: if τ i is in nite ontiner C y then 5: remove τ i n reple intervl of τ i using C y gri; 6: set τ i.s to its γ l n insert into T ins ; use lower oun 7: T ret :=insertbn(t top 1,T ins,k); top ns 8: for ll τ j T ret o 9: set τ j.s to its γ l ; use lower oun 1: insert τ j into into C x; 11: insertbn(t Cx 1,T ret, 1); top-1 n of C x; 12: ompute T1 n using {T C 1,..., T C 2 m }; nite n 13: sme s lines 1-13 of Algorithm 5; 6. POSTINGS MATERIALIZATION Typil oument rhives over long perio (e.g., 1 yers), while user queries my pply to reltively short perio only (e.g., June 25). If we use single inverte list for eh keywor, we might hve to sn lrge numer of irrelevnt postings to the query intervl. To minimize the numer of reunnt esses, in this setion, we propose speilly esigne R-tree for mterilizing the postings in eh inverte list, whih outperforms other typil inexing tehniques s shown in our experiments. In ition, we propose tehnique tht eomposes urle top-k query into multiple simple top-k queries n simpler urle topk query. This eomposition n further improve performne. 6.1 Trnsforme R-tree There re ifferent possile pprohes for orgnizing the ontents of n inverte list in orer to minimize the ess of postings whih re irrelevnt to the query intervl, n still llow ess in eresing orer for the relevnt ones (to filitte top-k ggregtion). These pprohes inlue pttions of intervl inexes, suh s the intervl tree [9], the segment tree [9], n the R-tree [12], n t uplition with multiple inverte lists [5]. In this setion, we propose n pttion of the R-tree for this inexing prolem. In Setion 7, we ompre our proposl to lterntive inexes, inluing the proposl of [5]. The R-tree is lssi sptil ess metho, whih ivies the spe with hierrhilly neste, possily overlpping, minimum ouning retngles (MBRs). Figure 1() shows n exmple to uil R-tree inex for n inverte list ontining -relevnt oument versions. We hve 4 lef noes (m 1 m 4), 2 intermeite noes (M 1,M 2), n root noe (M) in this tree n eh MBR looks like flt retngle. Note tht it is hr to voi the high egree of overlpping in the R-tree when we hve line segments in 2D spe (-). Beuse of the overlpping, there is higher hne to hve flse hits 3, whih egre performne. For instne, ssume tht we hve query, s shown in the she re of Figure 2(). This query intersets ll lef MBRs of the tree, mening tht ll postings hve to e exmine in this exmple, lthough only 6% of them re tully relevnt to the query intervl. 1. e.5 h f g j i M1 M M2 m1 m4 m2 m () Rw t n lssi R-tree m 1 M 1 m 2 h j e m 5 m 3 g f m 4 M 2 (6,3) M () Trnsforme R-tree Figure 1: Different R-tree representtions Sine grouping line segments is not effetive for -trvel queries, we trnsform our t into egin/enpoint spe tht supports etter grouping. Now, point is the si inexe unit, mking grouping of t to noes more effetive. For eh posting, we use point (t,t e) to represent it in 2D spe. We will explin how to hnle the orering shortly. Figure 1() shows n exmple fter the trnsformtion. For instne, posting with intervl [5, 7) is represente y point (5, 7). After the trnsformtion, our query result is lote in the she re. Although the she re is lrger thn the one in Figure 1(), it intersets only 6 noes (M, M1, M2, m 1, m 3,nm 5) n there re fewer flse hits. To filitte ess of the postings tht interset the query intervl in eresing orer (s require y our ggregtion lgorithm), we pre-ompute n ggregte s mx for eh MBR, where s mx stores the mximum for ll hil MBRs. In ition, if the MBR is lef noe, s mx stores the mximum 3 flse hit is n esse posting tht oes not interset the query intervl i 562

9 of the postings insie. This sheme supports prioritize ess of the MBRs tht interset the query intervl q.λ y eresing orer of their ggregte s. Strting from the root of the tree, eh entry tht intersets q.λ is inserte to priority queue. The entry with the highest is ehepe n the proess is repete for its hilren. When posting (i.e., lef noe entry) is ehepe, we know tht this orrespons to the next posting tht intersets q.λ n hs the highest mong ll remining suh postings. Thus, the trnsforme R-tree elegntly omines temporl serh se on q.λ n eresing- ess orer of the results. 6.2 A Prtitioning-Bse Approh In this setion, we investigte query eomposition tehnique tht further improves the performne of the n pproh using the trnsforme R-tree. This tehnique ims t reuing the numer of n mintenne opertions. Rell tht the ext of oument version i t t is unknown until it hs een seen L s from the inverte lists, where L is the numer of keywors in the query. Assume tht τ i is the unit posting of i t stmp t. For eh new ess of i t t, τ i is first remove from existing n strutures (e.g., top ns/nite n/nite ontiner) n then reinserte into the n struture with n upte. Suppose tht there is metho to etermine the ext of τ i efore the first insertion; then τ i is proesse only one inste of L s. Bse on this ie, we oul further improve the performne of the propose lgorithms. Looking t the postings istriution in the trnsforme R-tree, we oserve tht the orer of some postings n e ompute esily using simple NRA query. Figure 11() shows n exmple tht eomposes the spe, se on query intervl [3, 6), into four res: I, II, III,nVI. Note tht re I ontins ll postings tht fully over the entire query intervl, whih re n in our exmple. h I III (3,6) II IV j g e f (6,3) () Query eomposition i Next posting in -I 5.7 [,5) R1 1.8 [1,5) 3.5 [1,3) 5.4 [1,7) top-1 n R2 2.9 [5,8) 1.8 [1,5) 5.7 [1,7) = Next result in re I [,8) n n R1 4.7 [,8) 7.3 [,8) 1 2 () Next ess R2 4.6 [,8) 8 [,8) Figure 11: Exmple of prtitioning-se pproh Our prtitioning-se pproh exlues re I from the urle query; we only issue onstrine NRA query to ompute the topk result in this re. The results of this query shoul e merge with the results of the n pproh in the remining spe. A trivil wy to o this is to ompute the ext top-k result of the onstrine NRA, n then merge it with the prtil urle topk serh in re Δ I, whereδ represents the entire query re. While this pproh gurntees orretness, we my hve poor performne s more postings my e esse ompre to single urle query in the whole re. A etter pproh is to integrte the onstrine NRA query with our n mintenne lgorithm. In orer to support suh prtitioning-se pproh, we revise Algorithm 6, se on the following lines: New essing pproh : Inste of reing the next posting from the inverte lists 2 4 (s in line 1 of Algorithm 6), it is re either from (i) the trnsforme R-tree exluing re I or (ii) the next result of the onstrine NRA (using inrementl serh). Choosing the mximum lst seen : Note tht we hve two sets of lst seen s. For eh keywor, the mximum lst seen is hosen from re Δ I or re I. Revise termintion onition : Corollry 1 is n extene version of Lemm 1. COROLLARY 1. The top-k result t stmp t is tht store in the top ns if n only if the of the k-th n t t is not worse thn items (1) n (2) in Lemm 1, n (3) the next result from the onstrine NRA. With the ove moifitions, we n ess the posting ritrrily from (i) or (ii) without ffeting the orretness. In orer to minimize the numer of esses, in our implementtion, we efine this orer se on the est (e.g., Ψ) of the prtil urle query n the next element in the onstrine NRA. We enrih the exmple in Figure 8() with more t to illustrte the prtitioning-se pproh. Note tht the lst seen of L 1 (L 2)is.7 =mx{.5,.7} (.8 =mx{.8,.6}). Currently, Ψ is set to 1.5(=.7+.8). Aoring to the informtion in Figure 11(), we know tht the next posting is 4 with 1.3 n intervl [, 8), whih is ompute y the onstrine NRA. When we insert this posting into the top ns, it is split into two unit postings {τ 1 4,τ 2 4 } with intervls [, 5) n [5, 8) respetively. τ 1 4 fils to enter the top-1 n ut τ 2 4 suesses to reple τ 1 2 n enters the top-1 n. Therefore, suroutine insertbn returns {τ 1 4,τ 1 2 }. In the next loop, the onstrine NRA is lle to fin the next top posting in re I. Suppose tht the onstrine NRA only res one posting from eh inex n the next posting is 7 in re I. The lst seen of L 1 (L 2) eomes.5 =mx{.5,.3} (.8 = mx{.8, }) n Ψ is upte to 1.3. After this upte, we n terminte the serh sine now our top n is not worse thn (1) Ψ, (2) the nite n, n (3) the next result of the onstrine NRA (see Corollry 1). Note tht 4 is inserte only one into the top ns while it woul e inserte twie using the originl n pproh. The prtitioning-se pproh n e extene to further reue the re where the n pproh is pplie. Note tht re II (III) ontins ll postings tht interset q.t (q.t e) ut not interset q.t e (q.t ). Bse on our oservtion for re I, we n lso two onstrine NRA in res II n III. Finlly, we eompose the originl urle query into three onstrine NRA queries plus one prtil urle query only in re IV. 7. EXPERIMENTS In this setion we empirilly evlute the performne of our lgorithms on the Wikipei revision history, whih is freely ville t The totl size of the tset use in our experiments is.7 TBytes, ontining the full eiting history from Jnury 21 to Deemer 25 of the entire English Wikipei. The ompression tehnique propose in [5] is use to group similr onseutive versions of the sme oument, reuing the totl size of the t to.15 TBytes. The resulting tset ontins totl of 892,255 ouments (i.e., topis) with 13,976,915 versions, so there is men of versions per oument n stnr evition of Okpi BM25 [2] is use to normlize the term frequeny with the length normliztion prmeter = 1.2 n the 563

10 tf-sturtion prmeter k 1 =.75. Inverte lists store postings of the form [o-i, egin-, en-, ]. We selete the most frequent keywor queries (of 2 to 5 keywors) from serh engine log tht yiel Wikipei rtile s we-serh result. This gurntees tht ll keywors re relevnt to Wikipei rtiles. We lssify query q se on the totl numer of postings V (q) in the inverte lists of its keywors q.w n the orreltion etween the keywors. The orreltion is efine y R(q) = w i q.w D(w i) wi q.w D(w i), where D(w i) enotes the set of ouments ontining keywor w i. For instne, if query q hs V (q) n low R(q), it is lssifie s high volume n low orreltion ( HL lss in short). Aoringly, we hve 4 lsses in totl, HH, HL, LH, n LL. Some sttistis for these lsses re shown in Tle 2: verge numer of postings per keywor, verge intervl length of postings, verge numer of istint ouments in postings of keywor, n numer of queries Q in lss. The spe of n inverte list (unompresse) n e erive y multiplying the numer of postings with the posting size (16 ytes). In ition, the verge numer of postings in lss HH is 64K, 22K, 713K, n 1.98M in yers 21, 22, 23, n 24 respetively: more versions re rete in more reent yers. In the experiments, we evlute the slility of our lgorithms, inluing (Setion 4), BBA (Setion 5.2), (Setion 5.3), n the prtitioning-se pproh (PBA) (Setion 6.2). We use LARA [18] (n optimize implementtion of NRA) for NRA omputtions in n PBA. Unless otherwise speifie, in ll experiments we selete queries from the HH lss. 4 In eh experimentl instne, 5 queries from the hosen lss re use n the results re verge. Tle 2: Sttistis of test queries in the four lsses lss vg. postings per w Q vg. length vg. o HH 2.95M ys HL 3.32M ys LH.92M ys LL.77M ys All methos were implemente in C++ n the experiments were performe on n Intel Core2Duo 2.66GHz CPU mhine with 4 GBytes memory, running on Uuntu 8.4. Tle 3 shows the rnges of the investigte prmeters, n their efult vlues (in ol). In eh experiment, we vry single prmeter while setting the remining ones to their efult vlues. Tle 3: Rnges of prmeter vlues Prmeter Vlues Numer of keywors W 2, 3,4,5 k 2, 5, 1, 2, 4 Query length, λ (in ys) 15, 3, 6, 12, 24 Query egin, t (in yer) 21, 22, 23, 24 Query lss HH, HL, LH, LL 7.1 Differene to Other Queries First, we stuy how ifferent the results proue y the urle top-k query re, ompre to simpler ggregtion moels. Tle 4 shows the perentge of the urle top-k results tht re not generte y other ggregte queries. For exmple, DUR MIN enotes 4 typil serhes inlue orrelte keywors; we use keywors with lrge numer of postings to evlute slility. Tle 4: Result iversity in ifferent queries W λ λ =6ys λ = 12 ys DUR MIN 1% 24% 6% 4% 26% 32% 12% 14% DUR MAX 14% 2% 1% 4% 2% 14% 24% 28% DUR AVG 34% 44% 1% 4% 36% 58% 16% 34% the set ifferene etween the results of the urle n the MIN ggregte query. The queries re selete from lss HH (we foun similr results when using other query lsses) n we teste two query intervl vlues (λ). The query length W vries from 2 to 5 keywors. There is signifint ifferene in the urle top-k results, ompre to other moels n the ifferene inreses with λ, s lrger intervls enlose more oument versions. This shows tht the urle query provies ifferent n potentilly more interesting results thn simpler ggregtion moels. 7.2 Effiieny n Slility We now ompre the urle top-k lgorithms in terms of effiieny n slility. Figure 12 shows the response n pek memory usge with respet to the numer of keywors W, when keywor queries re selete from two lsses: HH n HL. The optimize n pproh () lwys outperforms the other two methos, eing 1-2 orer of mgnitues fster in most ses. All methos eome more expensive when there re more keywors in the query. This is onsistent to the oservtions in [18]. All methos perform etter for queries in lss HH thn queries in lss HL, sine the orreltion etween keywors in lss HH is high; oument versions of high s in ll keywors re foun fster, ssisting erly termintion of serh. Response (se) Pek Memory (MBytes) 1.e4 1.e3 1.e2 1.e1 1.e 1.e-1 BBA 1.e Numer of keywors () Response HH Numer of keywors () Pek memory HH Response (se) Pek Memory (MBytes) 1.e5 1.e4 1.e3 1.e2 1.e1 1.e BBA 1.e Numer of keywors () Response HL Figure 12: Effet of W Numer of keywors () Pek memory HL Note tht we skippe the se W =5in Fig. 12. The reson is tht in this se onsumes the physil memory of our system. In Figures 12() n 12(), we show the pek memory usge of the methos uring the query exeution. is more sensitive to the numer of keywors in the queries thn the n pproh. The reson is tht it runs O(m) NRA top-k queries simultneously, where m is equl to the numer of postings tht hve een re. Eh query onsumes O(m) spe in the worst se [18], therefore the spe requirements of re huge. On the other hn, the n pproh stores only k top ns, one nite n, n one 564

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of: 22: Union Fin CS 473u - Algorithms - Spring 2005 April 14, 2005 1 Union-Fin We wnt to mintin olletion of sets, uner the opertions of: 1. MkeSet(x) - rete set tht ontins the single element x. 2. Fin(x)

More information

CS 491G Combinatorial Optimization Lecture Notes

CS 491G Combinatorial Optimization Lecture Notes CS 491G Comintoril Optimiztion Leture Notes Dvi Owen July 30, August 1 1 Mthings Figure 1: two possile mthings in simple grph. Definition 1 Given grph G = V, E, mthing is olletion of eges M suh tht e i,

More information

Lecture 6: Coding theory

Lecture 6: Coding theory Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those

More information

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4 Am Blnk Leture 13 Winter 2016 CSE 332 CSE 332: Dt Astrtions Sorting Dt Astrtions QuikSort Cutoff 1 Where We Are 2 For smll n, the reursion is wste. The onstnts on quik/merge sort re higher thn the ones

More information

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106 8. Problem Set Due Wenesy, Ot., t : p.m. in - Problem Mony / Consier the eight vetors 5, 5, 5,..., () List ll of the one-element, linerly epenent sets forme from these. (b) Wht re the two-element, linerly

More information

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Now we must transform the original model so we can use the new parameters. = S max. Recruits MODEL FOR VARIABLE RECRUITMENT (ontinue) Alterntive Prmeteriztions of the pwner-reruit Moels We n write ny moel in numerous ifferent ut equivlent forms. Uner ertin irumstnes it is onvenient to work with

More information

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA Common intervls of genomes Mthieu Rffinot CNRS LIF Context: omprtive genomis. set of genomes prtilly/totlly nnotte Informtive group of genes or omins? Ex: COG tse Mny iffiulties! iology Wht re two similr

More information

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx Applitions of Integrtion Are of Region Between Two Curves Ojetive: Fin the re of region etween two urves using integrtion. Fin the re of region etween interseting urves using integrtion. Desrie integrtion

More information

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs Isomorphism of Grphs Definition The simple grphs G 1 = (V 1, E 1 ) n G = (V, E ) re isomorphi if there is ijetion (n oneto-one n onto funtion) f from V 1 to V with the property tht n re jent in G 1 if

More information

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014 S 224 DIGITAL LOGI & STATE MAHINE DESIGN SPRING 214 DUE : Mrh 27, 214 HOMEWORK III READ : Relte portions of hpters VII n VIII ASSIGNMENT : There re three questions. Solve ll homework n exm prolems s shown

More information

I 3 2 = I I 4 = 2A

I 3 2 = I I 4 = 2A ECE 210 Eletril Ciruit Anlysis University of llinois t Chigo 2.13 We re ske to use KCL to fin urrents 1 4. The key point in pplying KCL in this prolem is to strt with noe where only one of the urrents

More information

2.4 Theoretical Foundations

2.4 Theoretical Foundations 2 Progrmming Lnguge Syntx 2.4 Theoretil Fountions As note in the min text, snners n prsers re se on the finite utomt n pushown utomt tht form the ottom two levels of the Chomsky lnguge hierrhy. At eh level

More information

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 ) Neessry n suient onitions for some two vrile orthogonl esigns in orer 44 C. Koukouvinos, M. Mitrouli y, n Jennifer Seerry z Deite to Professor Anne Penfol Street Astrt We give new lgorithm whih llows us

More information

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite! Solutions for HW9 Exerise 28. () Drw C 6, W 6 K 6, n K 5,3. C 6 : W 6 : K 6 : K 5,3 : () Whih of the following re iprtite? Justify your nswer. Biprtite: put the re verties in V 1 n the lk in V 2. Biprtite:

More information

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point GCSE C Emple 7 Work out 9 Give your nswer in its simplest form Numers n inies Reiprote mens invert or turn upsie own The reiprol of is 9 9 Mke sure you only invert the frtion you re iviing y 7 You multiply

More information

CS 360 Exam 2 Fall 2014 Name

CS 360 Exam 2 Fall 2014 Name CS 360 Exm 2 Fll 2014 Nme 1. The lsses shown elow efine singly-linke list n stk. Write three ifferent O(n)-time versions of the reverse_print metho s speifie elow. Eh version of the metho shoul output

More information

Chapter 4 State-Space Planning

Chapter 4 State-Space Planning Leture slides for Automted Plnning: Theory nd Prtie Chpter 4 Stte-Spe Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Motivtion Nerly ll plnning proedures re serh proedures Different

More information

Lecture 2: Cayley Graphs

Lecture 2: Cayley Graphs Mth 137B Professor: Pri Brtlett Leture 2: Cyley Grphs Week 3 UCSB 2014 (Relevnt soure mteril: Setion VIII.1 of Bollos s Moern Grph Theory; 3.7 of Gosil n Royle s Algeri Grph Theory; vrious ppers I ve re

More information

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233,

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233, Surs n Inies Surs n Inies Curriulum Rey ACMNA:, 6 www.mthletis.om Surs SURDS & & Inies INDICES Inies n surs re very losely relte. A numer uner (squre root sign) is lle sur if the squre root n t e simplifie.

More information

Section 2.1 Special Right Triangles

Section 2.1 Special Right Triangles Se..1 Speil Rigt Tringles 49 Te --90 Tringle Setion.1 Speil Rigt Tringles Te --90 tringle (or just 0-60-90) is so nme euse of its ngle mesures. Te lengts of te sies, toug, ve very speifi pttern to tem

More information

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression 5-2 Dt Strutures n Algorithms Dt Compression n Huffmn s Algorithm th Fe 2003 Rjshekr Rey Outline Dt ompression Lossy n lossless Exmples Forml view Coes Definition Fixe length vs. vrile length Huffmn s

More information

The DOACROSS statement

The DOACROSS statement The DOACROSS sttement Is prllel loop similr to DOALL, ut it llows prouer-onsumer type of synhroniztion. Synhroniztion is llowe from lower to higher itertions sine it is ssume tht lower itertions re selete

More information

CSC2542 State-Space Planning

CSC2542 State-Space Planning CSC2542 Stte-Spe Plnning Sheil MIlrith Deprtment of Computer Siene University of Toronto Fll 2010 1 Aknowlegements Some the slies use in this ourse re moifitions of Dn Nu s leture slies for the textook

More information

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Project 6: Minigoals Towards Simplifying and Rewriting Expressions MAT 51 Wldis Projet 6: Minigols Towrds Simplifying nd Rewriting Expressions The distriutive property nd like terms You hve proly lerned in previous lsses out dding like terms ut one prolem with the wy

More information

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition Dt Strutures, Spring 24 L. Joskowiz Dt Strutures LEURE Humn oing Motivtion Uniquel eipherle oes Prei oes Humn oe onstrution Etensions n pplitions hpter 6.3 pp 385 392 in tetook Motivtion Suppose we wnt

More information

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides. 1 PYTHAGORAS THEOREM 1 1 Pythgors Theorem In this setion we will present geometri proof of the fmous theorem of Pythgors. Given right ngled tringle, the squre of the hypotenuse is equl to the sum of the

More information

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points: Eidgenössishe Tehnishe Hohshule Zürih Eole polytehnique fédérle de Zurih Politenio federle di Zurigo Federl Institute of Tehnology t Zurih Deprtement of Computer Siene. Novemer 0 Mrkus Püshel, Dvid Steurer

More information

Automata and Regular Languages

Automata and Regular Languages Chpter 9 Automt n Regulr Lnguges 9. Introution This hpter looks t mthemtil moels of omputtion n lnguges tht esrie them. The moel-lnguge reltionship hs multiple levels. We shll explore the simplest level,

More information

Eigenvectors and Eigenvalues

Eigenvectors and Eigenvalues MTB 050 1 ORIGIN 1 Eigenvets n Eigenvlues This wksheet esries the lger use to lulte "prinipl" "hrteristi" iretions lle Eigenvets n the "prinipl" "hrteristi" vlues lle Eigenvlues ssoite with these iretions.

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching CS261: A Seon Course in Algorithms Leture #5: Minimum-Cost Biprtite Mthing Tim Roughgren Jnury 19, 2016 1 Preliminries Figure 1: Exmple of iprtite grph. The eges {, } n {, } onstitute mthing. Lst leture

More information

A Primer on Continuous-time Economic Dynamics

A Primer on Continuous-time Economic Dynamics Eonomis 205A Fll 2008 K Kletzer A Primer on Continuous-time Eonomi Dnmis A Liner Differentil Eqution Sstems (i) Simplest se We egin with the simple liner first-orer ifferentil eqution The generl solution

More information

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005

CARLETON UNIVERSITY. 1.0 Problems and Most Solutions, Sect B, 2005 RLETON UNIVERSIT eprtment of Eletronis ELE 2607 Swithing iruits erury 28, 05; 0 pm.0 Prolems n Most Solutions, Set, 2005 Jn. 2, #8 n #0; Simplify, Prove Prolem. #8 Simplify + + + Reue to four letters (literls).

More information

CIT 596 Theory of Computation 1. Graphs and Digraphs

CIT 596 Theory of Computation 1. Graphs and Digraphs CIT 596 Theory of Computtion 1 A grph G = (V (G), E(G)) onsists of two finite sets: V (G), the vertex set of the grph, often enote y just V, whih is nonempty set of elements lle verties, n E(G), the ege

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 5 Supplement Greedy Algorithms Cont d Minimizing lteness Ching (NOT overed in leture) Adm Smith 9/8/10 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov,

More information

Lecture 8: Abstract Algebra

Lecture 8: Abstract Algebra Mth 94 Professor: Pri Brtlett Leture 8: Astrt Alger Week 8 UCSB 2015 This is the eighth week of the Mthemtis Sujet Test GRE prep ourse; here, we run very rough-n-tumle review of strt lger! As lwys, this

More information

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours Mi-Term Exmintion - Spring 0 Mthemtil Progrmming with Applitions to Eonomis Totl Sore: 5; Time: hours. Let G = (N, E) e irete grph. Define the inegree of vertex i N s the numer of eges tht re oming into

More information

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE M. STISSING, C. N. S. PEDERSEN, T. MAILUND AND G. S. BRODAL Bioinformtis Reserh Center, n Dept. of Computer Siene, University

More information

XML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW

XML and Databases. Exam Preperation Discuss Answers to last year s exam. Sebastian Maneth NICTA and UNSW XML n Dtses Exm Prepertion Disuss Answers to lst yer s exm Sestin Mneth NICTA n UNSW CSE@UNSW -- Semester 1, 2008 (1) For eh of the following, explin why it is not well-forme XML (is WFC or the XML grmmr

More information

Factorising FACTORISING.

Factorising FACTORISING. Ftorising FACTORISING www.mthletis.om.u Ftorising FACTORISING Ftorising is the opposite of expning. It is the proess of putting expressions into rkets rther thn expning them out. In this setion you will

More information

An Efficient R-Tree Implementation over Flash-Memory Storage Systems

An Efficient R-Tree Implementation over Flash-Memory Storage Systems An Effiient -Tree Implementtion over Flsh-Memory Storge Systems Chin-Hsien Wu, Li-Pin Chng, Tei-Wei Kuo Deprtment of Computer Siene n Informtion Engineering Ntionl Tiwn University, Tipei, Tiwn 16, OC Fx:

More information

Appendix A: HVAC Equipment Efficiency Tables

Appendix A: HVAC Equipment Efficiency Tables Appenix A: HVAC Equipment Effiieny Tles Figure A.1 Resientil Centrl Air Conitioner FEMP Effiieny Reommention Prout Type Reommene Level Best Aville 11.0 or more EER 14.6 EER Split Systems 13.0 or more SEER

More information

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line

APPENDIX. Precalculus Review D.1. Real Numbers and the Real Number Line APPENDIX D Preclculus Review APPENDIX D.1 Rel Numers n the Rel Numer Line Rel Numers n the Rel Numer Line Orer n Inequlities Asolute Vlue n Distnce Rel Numers n the Rel Numer Line Rel numers cn e represente

More information

Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework

Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework R-17 SASIMI 015 Proeeings Tehnology Mpping Metho for Low Power Consumption n High Performne in Generl-Synhronous Frmework Junki Kwguhi Yukihie Kohir Shool of Computer Siene, the University of Aizu Aizu-Wkmtsu

More information

XML and Databases. Outline. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths

XML and Databases. Outline. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths Outline Leture Effiient XPth Evlution XML n Dtses. Top-Down Evlution of simple pths. Noe Sets only: Core XPth. Bottom-Up Evlution of Core XPth. Polynomil Time Evlution of Full XPth Sestin Mneth NICTA n

More information

Mining Frequent Web Access Patterns with Partial Enumeration

Mining Frequent Web Access Patterns with Partial Enumeration Mining Frequent We Aess Ptterns with Prtil Enumertion Peiyi Tng Deprtment of Computer Siene University of Arknss t Little Rok 2801 S. University Ave. Little Rok, AR 72204 Mrkus P. Turki Deprtment of Computer

More information

Solving the Class Diagram Restructuring Transformation Case with FunnyQT

Solving the Class Diagram Restructuring Transformation Case with FunnyQT olving the lss Digrm Restruturing Trnsformtion se with FunnyQT Tssilo Horn horn@uni-kolenz.e Institute for oftwre Tehnology, University Kolenz-Lnu, Germny FunnyQT is moel querying n moel trnsformtion lirry

More information

6.5 Improper integrals

6.5 Improper integrals Eerpt from "Clulus" 3 AoPS In. www.rtofprolemsolving.om 6.5. IMPROPER INTEGRALS 6.5 Improper integrls As we ve seen, we use the definite integrl R f to ompute the re of the region under the grph of y =

More information

F / x everywhere in some domain containing R. Then, + ). (10.4.1)

F / x everywhere in some domain containing R. Then, + ). (10.4.1) 0.4 Green's theorem in the plne Double integrls over plne region my be trnsforme into line integrls over the bounry of the region n onversely. This is of prtil interest beuse it my simplify the evlution

More information

On a Class of Planar Graphs with Straight-Line Grid Drawings on Linear Area

On a Class of Planar Graphs with Straight-Line Grid Drawings on Linear Area Journl of Grph Algorithms n Applitions http://jg.info/ vol. 13, no. 2, pp. 153 177 (2009) On Clss of Plnr Grphs with Stright-Line Gri Drwings on Liner Are M. Rezul Krim 1,2 M. Siur Rhmn 1 1 Deprtment of

More information

Let s divide up the interval [ ab, ] into n subintervals with the same length, so we have

Let s divide up the interval [ ab, ] into n subintervals with the same length, so we have III. INTEGRATION Eonomists seem muh more intereste in mrginl effets n ifferentition thn in integrtion. Integrtion is importnt for fining the epete vlue n vrine of rnom vriles, whih is use in eonometris

More information

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE

COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE COMPUTING THE QUARTET DISTANCE BETWEEN EVOLUTIONARY TREES OF BOUNDED DEGREE M. STISSING, C. N. S. PEDERSEN, T. MAILUND AND G. S. BRODAL Bioinformtis Reserh Center, n Dept. of Computer Siene, University

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 8 Mx. lteness ont d Optiml Ching Adm Smith 9/12/2008 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov, K. Wyne Sheduling to Minimizing Lteness Minimizing

More information

6. Suppose lim = constant> 0. Which of the following does not hold?

6. Suppose lim = constant> 0. Which of the following does not hold? CSE 0-00 Nme Test 00 points UTA Stuent ID # Multiple Choie Write your nswer to the LEFT of eh prolem 5 points eh The k lrgest numers in file of n numers n e foun using Θ(k) memory in Θ(n lg k) time using

More information

THE PYTHAGOREAN THEOREM

THE PYTHAGOREAN THEOREM THE PYTHAGOREAN THEOREM The Pythgoren Theorem is one of the most well-known nd widely used theorems in mthemtis. We will first look t n informl investigtion of the Pythgoren Theorem, nd then pply this

More information

Applied. Grade 9 Assessment of Mathematics. Multiple-Choice Items. Winter 2005

Applied. Grade 9 Assessment of Mathematics. Multiple-Choice Items. Winter 2005 Applie Gre 9 Assessment of Mthemtis Multiple-Choie Items Winter 2005 Plese note: The formt of these ooklets is slightly ifferent from tht use for the ssessment. The items themselves remin the sme. . Multiple-Choie

More information

Equivalent fractions have the same value but they have different denominators. This means they have been divided into a different number of parts.

Equivalent fractions have the same value but they have different denominators. This means they have been divided into a different number of parts. Frtions equivlent frtions Equivlent frtions hve the sme vlue ut they hve ifferent enomintors. This mens they hve een ivie into ifferent numer of prts. Use the wll to fin the equivlent frtions: Wht frtions

More information

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem Rep Fun Gme Properties Arrow s Theorem Arrow s Impossiility Theorem Leture 12 Arrow s Impossiility Theorem Leture 12, Slide 1 Rep Fun Gme Properties Arrow s Theorem Leture Overview 1 Rep 2 Fun Gme 3 Properties

More information

Welcome. Balanced search trees. Balanced Search Trees. Inge Li Gørtz

Welcome. Balanced search trees. Balanced Search Trees. Inge Li Gørtz Welome nge Li Gørt. everse tehing n isussion of exerises: 02110 nge Li Gørt 3 tehing ssistnts 8.00-9.15 Group work 9.15-9.45 isussions of your solutions in lss 10.00-11.15 Leture 11.15-11.45 Work on exerises

More information

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version A Lower Bound for the Length of Prtil Trnsversl in Ltin Squre, Revised Version Pooy Htmi nd Peter W. Shor Deprtment of Mthemtil Sienes, Shrif University of Tehnology, P.O.Bo 11365-9415, Tehrn, Irn Deprtment

More information

Arrow s Impossibility Theorem

Arrow s Impossibility Theorem Rep Voting Prdoxes Properties Arrow s Theorem Arrow s Impossiility Theorem Leture 12 Arrow s Impossiility Theorem Leture 12, Slide 1 Rep Voting Prdoxes Properties Arrow s Theorem Leture Overview 1 Rep

More information

SOME INTEGRAL INEQUALITIES FOR HARMONICALLY CONVEX STOCHASTIC PROCESSES ON THE CO-ORDINATES

SOME INTEGRAL INEQUALITIES FOR HARMONICALLY CONVEX STOCHASTIC PROCESSES ON THE CO-ORDINATES Avne Mth Moels & Applitions Vol3 No 8 pp63-75 SOME INTEGRAL INEQUALITIES FOR HARMONICALLY CONVE STOCHASTIC PROCESSES ON THE CO-ORDINATES Nurgül Okur * Imt Işn Yusuf Ust 3 3 Giresun University Deprtment

More information

Solutions to Problem Set #1

Solutions to Problem Set #1 CSE 233 Spring, 2016 Solutions to Prolem Set #1 1. The movie tse onsists of the following two reltions movie: title, iretor, tor sheule: theter, title The first reltion provies titles, iretors, n tors

More information

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18 Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re

More information

Subsequence Automata with Default Transitions

Subsequence Automata with Default Transitions Susequene Automt with Defult Trnsitions Philip Bille, Inge Li Gørtz, n Freerik Rye Skjoljensen Tehnil University of Denmrk {phi,inge,fskj}@tu.k Astrt. Let S e string of length n with hrters from n lphet

More information

8 THREE PHASE A.C. CIRCUITS

8 THREE PHASE A.C. CIRCUITS 8 THREE PHSE.. IRUITS The signls in hpter 7 were sinusoidl lternting voltges nd urrents of the so-lled single se type. n emf of suh type n e esily generted y rotting single loop of ondutor (or single winding),

More information

MCH T 111 Handout Triangle Review Page 1 of 3

MCH T 111 Handout Triangle Review Page 1 of 3 Hnout Tringle Review Pge of 3 In the stuy of sttis, it is importnt tht you e le to solve lgeri equtions n tringle prolems using trigonometry. The following is review of trigonometry sis. Right Tringle:

More information

Identifying and Classifying 2-D Shapes

Identifying and Classifying 2-D Shapes Ientifying n Clssifying -D Shpes Wht is your sign? The shpe n olour of trffi signs let motorists know importnt informtion suh s: when to stop onstrution res. Some si shpes use in trffi signs re illustrte

More information

Total score: /100 points

Total score: /100 points Points misse: Stuent's Nme: Totl sore: /100 points Est Tennessee Stte University Deprtment of Computer n Informtion Sienes CSCI 2710 (Trnoff) Disrete Strutures TEST 2 for Fll Semester, 2004 Re this efore

More information

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem.

Lesson 2: The Pythagorean Theorem and Similar Triangles. A Brief Review of the Pythagorean Theorem. 27 Lesson 2: The Pythgoren Theorem nd Similr Tringles A Brief Review of the Pythgoren Theorem. Rell tht n ngle whih mesures 90º is lled right ngle. If one of the ngles of tringle is right ngle, then we

More information

Logic, Set Theory and Computability [M. Coppenbarger]

Logic, Set Theory and Computability [M. Coppenbarger] 14 Orer (Hnout) Definition 7-11: A reltion is qusi-orering (or preorer) if it is reflexive n trnsitive. A quisi-orering tht is symmetri is n equivlene reltion. A qusi-orering tht is nti-symmetri is n orer

More information

Lecture 4: Graph Theory and the Four-Color Theorem

Lecture 4: Graph Theory and the Four-Color Theorem CCS Disrete II Professor: Pri Brtlett Leture 4: Grph Theory n the Four-Color Theorem Week 4 UCSB 2015 Through the rest of this lss, we re going to refer frequently to things lle grphs! If you hen t seen

More information

Lecture Notes No. 10

Lecture Notes No. 10 2.6 System Identifition, Estimtion, nd Lerning Leture otes o. Mrh 3, 26 6 Model Struture of Liner ime Invrint Systems 6. Model Struture In representing dynmil system, the first step is to find n pproprite

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

Momentum and Energy Review

Momentum and Energy Review Momentum n Energy Review Nme: Dte: 1. A 0.0600-kilogrm ll trveling t 60.0 meters per seon hits onrete wll. Wht spee must 0.0100-kilogrm ullet hve in orer to hit the wll with the sme mgnitue of momentum

More information

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza An Unfoling Approh to Moel Cheking Jvier Esprz Lbortory for Fountions of Computer Siene University of Einburgh Conurrent progrms Progrm: tuple P T 1 T n of finite lbelle trnsition systems T i A i S i i

More information

Solids of Revolution

Solids of Revolution Solis of Revolution Solis of revolution re rete tking n re n revolving it roun n is of rottion. There re two methos to etermine the volume of the soli of revolution: the isk metho n the shell metho. Disk

More information

Lesson 2.1 Inductive Reasoning

Lesson 2.1 Inductive Reasoning Lesson 2.1 Inutive Resoning Nme Perio Dte For Eerises 1 7, use inutive resoning to fin the net two terms in eh sequene. 1. 4, 8, 12, 16,, 2. 400, 200, 100, 50, 25,, 3. 1 8, 2 7, 1 2, 4, 5, 4. 5, 3, 2,

More information

Separable discrete functions: recognition and sufficient conditions

Separable discrete functions: recognition and sufficient conditions Seprle isrete funtions: reognition n suffiient onitions Enre Boros Onřej Čepek Vlimir Gurvih Novemer 21, 217 rxiv:1711.6772v1 [mth.co] 17 Nov 217 Astrt A isrete funtion of n vriles is mpping g : X 1...

More information

For a, b, c, d positive if a b and. ac bd. Reciprocal relations for a and b positive. If a > b then a ab > b. then

For a, b, c, d positive if a b and. ac bd. Reciprocal relations for a and b positive. If a > b then a ab > b. then Slrs-7.2-ADV-.7 Improper Definite Integrls 27.. D.dox Pge of Improper Definite Integrls Before we strt the min topi we present relevnt lger nd it review. See Appendix J for more lger review. Inequlities:

More information

Lesson 2.1 Inductive Reasoning

Lesson 2.1 Inductive Reasoning Lesson 2.1 Inutive Resoning Nme Perio Dte For Eerises 1 7, use inutive resoning to fin the net two terms in eh sequene. 1. 4, 8, 12, 16,, 2. 400, 200, 100, 50, 25,, 3. 1 8, 2 7, 1 2, 4, 5, 4. 5, 3, 2,

More information

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals AP Clulus BC Chpter 8: Integrtion Tehniques, L Hopitl s Rule nd Improper Integrls 8. Bsi Integrtion Rules In this setion we will review vrious integrtion strtegies. Strtegies: I. Seprte the integrnd into

More information

NON-DETERMINISTIC FSA

NON-DETERMINISTIC FSA Tw o types of non-determinism: NON-DETERMINISTIC FS () Multiple strt-sttes; strt-sttes S Q. The lnguge L(M) ={x:x tkes M from some strt-stte to some finl-stte nd ll of x is proessed}. The string x = is

More information

Section 2.3. Matrix Inverses

Section 2.3. Matrix Inverses Mtri lger Mtri nverses Setion.. Mtri nverses hree si opertions on mtries, ition, multiplition, n sutrtion, re nlogues for mtries of the sme opertions for numers. n this setion we introue the mtri nlogue

More information

Particle Physics. Michaelmas Term 2011 Prof Mark Thomson. Handout 3 : Interaction by Particle Exchange and QED. Recap

Particle Physics. Michaelmas Term 2011 Prof Mark Thomson. Handout 3 : Interaction by Particle Exchange and QED. Recap Prtile Physis Mihelms Term 2011 Prof Mrk Thomson g X g X g g Hnout 3 : Intertion y Prtile Exhnge n QED Prof. M.A. Thomson Mihelms 2011 101 Rep Working towrs proper lultion of ey n sttering proesses lnitilly

More information

Computing on rings by oblivious robots: a unified approach for different tasks

Computing on rings by oblivious robots: a unified approach for different tasks Computing on rings y olivious roots: unifie pproh for ifferent tsks Ginlorenzo D Angelo, Griele Di Stefno, Alfreo Nvrr, Niols Nisse, Krol Suhn To ite this version: Ginlorenzo D Angelo, Griele Di Stefno,

More information

A Study on the Properties of Rational Triangles

A Study on the Properties of Rational Triangles Interntionl Journl of Mthemtis Reserh. ISSN 0976-5840 Volume 6, Numer (04), pp. 8-9 Interntionl Reserh Pulition House http://www.irphouse.om Study on the Properties of Rtionl Tringles M. Q. lm, M.R. Hssn

More information

Lecture 11 Binary Decision Diagrams (BDDs)

Lecture 11 Binary Decision Diagrams (BDDs) C 474A/57A Computer-Aie Logi Design Leture Binry Deision Digrms (BDDs) C 474/575 Susn Lyseky o 3 Boolen Logi untions Representtions untion n e represente in ierent wys ruth tle, eqution, K-mp, iruit, et

More information

Introduction to Olympiad Inequalities

Introduction to Olympiad Inequalities Introdution to Olympid Inequlities Edutionl Studies Progrm HSSP Msshusetts Institute of Tehnology Snj Simonovikj Spring 207 Contents Wrm up nd Am-Gm inequlity 2. Elementry inequlities......................

More information

Metaheuristics for the Asymmetric Hamiltonian Path Problem

Metaheuristics for the Asymmetric Hamiltonian Path Problem Metheuristis for the Asymmetri Hmiltonin Pth Prolem João Pero PEDROSO INESC - Porto n DCC - Fule e Ciênis, Universie o Porto, Portugl jpp@f.up.pt Astrt. One of the most importnt pplitions of the Asymmetri

More information

System Validation (IN4387) November 2, 2012, 14:00-17:00

System Validation (IN4387) November 2, 2012, 14:00-17:00 System Vlidtion (IN4387) Novemer 2, 2012, 14:00-17:00 Importnt Notes. The exmintion omprises 5 question in 4 pges. Give omplete explntion nd do not onfine yourself to giving the finl nswer. Good luk! Exerise

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Analysis of Temporal Interactions with Link Streams and Stream Graphs

Analysis of Temporal Interactions with Link Streams and Stream Graphs Anlysis of Temporl Intertions with n Strem Grphs, Tiphine Vir, Clémene Mgnien http:// ltpy@ LIP6 CNRS n Soronne Université Pris, Frne 1/23 intertions over time 0 2 4 6 8,,, n for 10 time units time 2/23

More information

Tree Pattern Aggregation for Scalable XML Data Dissemination

Tree Pattern Aggregation for Scalable XML Data Dissemination Tree Pttern Aggregtion for Slle XML Dt Dissemintion Chee-Yong Chn, Wenfei Fn Λ, Psl Feler y, Minos Groflkis, Rjeev Rstogi Bell Ls, Luent Tehnologies fyhn,wenfei,minos,rstogig@reserh.ell-ls.om, Psl.Feler@eureom.fr

More information

SOME COPLANAR POINTS IN TETRAHEDRON

SOME COPLANAR POINTS IN TETRAHEDRON Journl of Pure n Applie Mthemtis: Avnes n Applitions Volume 16, Numer 2, 2016, Pges 109-114 Aville t http://sientifivnes.o.in DOI: http://x.oi.org/10.18642/jpm_7100121752 SOME COPLANAR POINTS IN TETRAHEDRON

More information

A Disambiguation Algorithm for Finite Automata and Functional Transducers

A Disambiguation Algorithm for Finite Automata and Functional Transducers A Dismigution Algorithm for Finite Automt n Funtionl Trnsuers Mehryr Mohri Cournt Institute of Mthemtil Sienes n Google Reserh 51 Merer Street, New York, NY 1001, USA Astrt. We present new ismigution lgorithm

More information

Compression of Palindromes and Regularity.

Compression of Palindromes and Regularity. Compression of Plinromes n Regulrity. Kyoko Shikishim-Tsuji Center for Lierl Arts Eution n Reserh Tenri University 1 Introution In [1], property of likstrem t t view of tse is isusse n it is shown tht

More information

QUADRATIC EQUATION. Contents

QUADRATIC EQUATION. Contents QUADRATIC EQUATION Contents Topi Pge No. Theory 0-04 Exerise - 05-09 Exerise - 09-3 Exerise - 3 4-5 Exerise - 4 6 Answer Key 7-8 Syllus Qudrti equtions with rel oeffiients, reltions etween roots nd oeffiients,

More information

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution Tehnishe Universität Münhen Winter term 29/ I7 Prof. J. Esprz / J. Křetínský / M. Luttenerger. Ferur 2 Solution Automt nd Forml Lnguges Homework 2 Due 5..29. Exerise 2. Let A e the following finite utomton:

More information

Linear Algebra Introduction

Linear Algebra Introduction Introdution Wht is Liner Alger out? Liner Alger is rnh of mthemtis whih emerged yers k nd ws one of the pioneer rnhes of mthemtis Though, initilly it strted with solving of the simple liner eqution x +

More information