1 Introduction Lrge-scle heterogeneous electronic text collections re more ville now thn ever efore nd rnge from pulished documents (e.g. electronic d

Size: px
Start display at page:

Download "1 Introduction Lrge-scle heterogeneous electronic text collections re more ville now thn ever efore nd rnge from pulished documents (e.g. electronic d"

Transcription

1 The String -Tree: A New Dt Structure for String Serch in Externl Memory nd its Applictions. Polo Ferrgin Diprtimento di Informtic Universit di Pis Roerto Grossi Diprtimento di Sistemi e Informtic Universit di Firenze April 1998 Astrct We introduce new text-indexing dt structure, the String -Tree, tht cn e seen s link etween some trditionl externl-memory nd string-mtching dt structures. In short phrse, it is comintion of -trees nd Ptrici tries for internl-node indices tht is mde more eective y dding extr pointers to speed up serch nd updte opertions. Consequently, the String -Tree overcomes the theoreticl limittions of inverted les, -trees, prex -trees, sux rrys, compcted tries nd sux trees. String -trees hve the sme worst-cse performnce s -trees ut they mnge unounded-length strings nd perform much more powerful serch opertions such s the ones supported y sux trees. String -trees re lso eective in min memory (RAM model) ecuse they improve the online sux tree serch on dynmic set of strings. They lso cn e successfully pplied to dtse indexing nd softwre dupliction. Keywords: -tree, Ptrici trie, compound ttriutes nd dtse indexing, externlmemory dt structures, mgnetic nd opticl disks, prex nd rnge serching, string serching nd sorting, sux rry, sux tree, text indexing. AMS(MOS) suject clssictions: 68P05, 68P10, 68P20, 68Q20, 68Q25. The results descried in this pper were presented t the ACM Symposium on Theory of Computing (1995), see [18]. The rst uthor ws supported y MURST of Itly nd y Post-Doctorl Fellowship t the Mx-Plnck-Institut fur Informtik, Srrucken, Germny (ferrgin@di.unipi.it). The second uthor ws supported y MURST of Itly (grossi@dsi.uni.it).

2 1 Introduction Lrge-scle heterogeneous electronic text collections re more ville now thn ever efore nd rnge from pulished documents (e.g. electronic dictionries nd encyclopedis, lirries nd rchives, newspper les, telephone directories, textook mterils, etc.) to privte dtses (e.g., mrketing informtion, legl records, medicl histories, etc.). A gret numer of texts re spred over Internet every dy in the form of electronic mil, ulletin ords, World Wide We pges, etc. Online providers of legl nd newswire texts lredy hve hundreds of text gigytes nd will soon hve terytes. Mny pplictions tret lrge text collections tht chnge over time, such s dt compression [49, 50, 14, 35], computer virus detection [28], genome dt nks [21], telephone directory hndling [12] nd softwre mintennce [7]. Lst ut not lest, dtses cn lso e considered dynmic text collections ecuse their records re essentilly yte sequences tht chnge over time. In this context, indexing dt structures nd serching engines re fundmentl tools for storing, updting nd extrcting useful informtion from dt in externl storge devices (e.g., disks or CD-ROMs). However, while min memory is high-speed electronic device, externl memory is essentilly low-speed mechnicl device. Min-memory ccess times hve decresed from 30 to 80 percent yer, while externl-memory ccess times hve not improved much t ll over the pst twenty yers [38]. Nevertheless, we need externl storge ecuse we cnnot uild min memory hving n unounded cpcity nd single-cycle ccess time. Ongoing reserch is trying to improve the input/output susystem y introducing some hrdwre mechnisms such s disk rrys, disk cches, etc. [38], nd is investigting how to rrnge dt on disks y mens of some ecient lgorithms nd dt structures tht minimize the numer of externl-memory ccesses [45]. We therefore elieve tht the design nd nlysis of externl-memory text-indexing dt structures is very importnt from oth theoreticl nd prcticl point of view. Surprisingly enough, in scientic literture, no good worst-cse ounds hve een otined for lgorithms nd dt structures mnipulting ritrrily-long strings in externl memory. As fr s trditionl externl-memory dt structures re concerned, inverted les [39], -trees [9] nd their vritions, such s Prex -trees [10, 15], re well-known nd uiquitous tools for mnipulting lrge dt ut their worst-cse performnce is not ecient enough when their keys re ritrrily long. As fr s string-mtching dt structures re concerned, sux rrys [22, 33], Ptrici tries [22, 36] nd sux trees [34, 48] re prticulrly eective in hndling unounded-length strings which re smll enough to t into min memory. However, they re no longer ecient when the text collection ecomes lrge, chnges over time nd mkes considerle use of externl memory. Their worst-cse ineciency is minly due to the fct tht they hve to e pcked into the disk pges in order to void tht too mny pges remin lmost empty fter few updtes. In the worst cse, this sitution cn seriously degenerte in externl memory. In Section 5, we discuss in detil the properties nd drwcks of these tools. As result, the design of externl-memory text-indexing dt structures whose performnce is provly good in the worst cse is importnt. In this pper, we introduce new dt structure, the String -Tree 1 which chieves this gol. In short phrse, it is com- 1 The originl nme of the dt structure ws S-tree [18, 19]. Recently, Don Knuth pointed out the 1

3 intion of -trees nd Ptrici tries for internl-node indices tht is mde more eective y dding extr pointers to speed up serch nd updte opertions. In certin sense, String -trees link externl-memory dt structures to string-mtching dt structures y overcoming the theoreticl limittions of inverted les (modiility nd tomic keys), sux rrys (modiility nd contiguous spce), sux trees (unlnced tree topology) nd prex -trees (ounded-length keys). The String -tree is the rst externl-memory dt structure tht hs the sme worst-cse performnce s regulr -trees ut hndles unounded-length strings nd performs much more powerful serch opertions such s the ones supported y sux trees. We formlize our opertions y mens of two sic prolems. We use stndrd terminology for n s-chrcter string X[1; s] y clling X[1; i] prex, X[j; s] sux nd X[i; j] sustring of X, for 1 i j s. We sy tht there is n occurrence of pttern string P in X if we cn nd sustring X[i; i + jp j? 1] equl to P. Prolem 1 (Prex Serch nd Rnge Query). Let = f 1 ; : : : ; k g e set of text strings whose totl length is N. We store nd keep it sorted in externl memory under the insertion nd deletion of individul text strings. We llow for the following two queries: (1) Prex Serch(P ) retrieves ll of 's strings whose prex is pttern P ; (2) Rnge Query(K 0 ; K 00 ) retrieves ll of 's strings etween K 0 to K 00 in lexicogrphic order. We let occ denote the numer of strings retrieved y query. Prolem 1 represents the typicl indexing prolem solved y -trees, here generlized to tret unounded-length strings. For exmple, let us exmine string set = f`ce', `id', `tls', `tom', `ttenute', `y', `ye', `cr', `cod', `dog', `t', `lid', `ptent', `sun', `zoo'g. Prex Serch(`t') retrieves strings: `tls', `tom' nd `ttenute' (here, occ = 3), while Rnge Query(`cp', `left') retrieves strings: `cr', `cod', `dog' nd `t' (here, occ = 4). Prolem 2 (Sustring Serch). Let = f 1 ; : : : ; k g e set of text strings whose totl length is N. We store in externl memory nd mintin it under the insertion nd deletion of individul text strings. We llow for the query: Sustring Serch(P ) nds ll of P 's occurrences in 's strings. We denote the numer of such occurrences y occ. Prolem 2 extends Prolem 1 ecuse it dels with ritrry sustrings of 's strings. For exmple, Sustring Serch(`t') retrieves occurrences `tls',`tom', `ttenute' nd `ptent' (here, occ = 5). This generliztion inevitly complictes the updte opertions ecuse, while updting in Prolem 1 only involves single text string, in Prolem 2 it involves ll of its suxes. We investigte Prolems 1 nd 2 in the clssicl two-level memory model [16]. It ssumes tht there is fst nd smll min memory (i.e., rndom ccess memory) nd slow nd lrge externl memory (i.e., secondry storge devices such s mgnetic disks or CD-ROMs). The externl memory is ssumed to e prtitioned into trnsfer locks, clled disk pges, ech of which contins tomic items, like integers, chrcters nd pointers. We cll the disk pge size nd disk pge reding or writing opertion disk ccess. According to [16], we nlyze nd provide symptoticl ounds for: () the totl numer existence of dierent dt structure nmed \S-tree" [37], where the \S" stnds for \sequentil". 2

4 of disk ccesses performed y the vrious opertions; () the totl numer of disk pges occupied y the dt structure. In the scientic literture there re severl indexing dt structures tht cn e employed to eciently solve Prolems 1 nd 2. We discuss them in detil in Section 5. We wish to point out here tht Prolem 1 cn e solved y plin comintion of -trees nd Ptrici tries for internl nodes. This tkes O( p log k + occ ) disk ccesses for Prex Serch(P ), nd O( m log k) disk ccesses for inserting or deleting string of length m in. Although interesting s p= < 1 in prcticl cses, this comintion does not chieve the optiml theoreticl ounds s shown elow. As fr s Prolem 2 is concerned, this comintion tkes O( p log N + occ ) disk ccesses for Sustring Serch(P ), nd O(( m +1)m log N) disk ccesses for inserting or deleting (ll the suxes of) string of length m in. Notice tht the ltter ound is qudrtic in m ecuse string insertion/deletion might require to entirely rescn ll of its suxes from the eginning, thus exmining overll (m 2 ) chrcters. Another interesting solution is given y single Ptrici trie uilt on the whole set of suxes of 's strings [13]. This chieves O( h p p + log p N) disk ccesses for Sustring Serch(P ), where h N is Ptrici trie's height. Inserting or deleting string in costs t lest s serching for ll of its suxes individully. These two solutions re prcticlly ttrctive ut do not gurntee provly good performnce in the worst cse. Our min contriution is to show tht the dt structure resulting from the plin comintion of -trees nd Ptrici tries cn e further rened nd mde more eective y dding extr pointers nd proving new structurl properties tht void the drwcks previously mentioned. y mens of String -trees, we chieve the following results: Prolem 1: Prex Serch(P ) tkes O( p+occ + log k) worst-cse disk ccesses, where p = jp j. Rnge Query(K 0 ; K 00 ) tkes O( k0 +k 00 +occ + log k) worst-cse disk ccesses, where k 0 = jk 0 j nd k 00 = jk 00 j. Inserting or deleting string of length m in string set tkes O( m +log k) worst-cse disk ccesses. The spce usge is ( k ) disk pges, while the spce occupied y string set is ( N ) disk pges. 3

5 Prolem 2: Sustring Serch(P ) tkes O( p+occ + log N) worst-cse disk ccesses, where p = jp j. Inserting or deleting string of length m in string set tkes O(m log (N + m)) worst-cse disk ccesses. The spce used y oth the String -tree nd string set is ( N ) disk pges. The spce usge of String -trees in Prolem 1 is proportionl to the numer k of 's strings rther thn to their totl length N, ecuse we represent the strings y their logicl pointers. It turns out tht the spce occupied is symptoticlly optiml in oth Prolems 1 nd 2. The constnts hidden in the ig-oh nottion re smll. Additionlly, the String -tree opertions tke symptoticlly optiml CPU time, i.e., O(d) time when our lgorithms red or write d disk pges, nd they only need to keep constnt numer of disk pges loded in min memory t ny time. 1.1 Further Results in Externl Memory Let us exmine the prmeterized pttern mtching prolem, introduced y [7] for identifying dupliction in softwre system. The prolem consists of nding the progrm frgments tht re identicl except for systemtic renming of their prmeters. In this cse, the progrm frgments re represented y some prmeterized strings, clled p-strings. A suf- x tree generliztion, clled p-sux tree [7], llows us to serch for p-strings online nd to identify p-string duplictions y ignoring prmeter renming. P-sux trees nd the other p-string lgorithms [4, 26, 30] re designed to work in min memory nd hve to del with the dynmic nture of prmeter renming. We cn formulte Prolems 1 nd 2 for p-strings nd then pply String -trees to them y mens of some minor lgorithmic modictions. Consequently, the forementioned theoreticl results regrding strings cn e extended to p-strings. Our serch ound improves the one otined in [7, 30] for lrge lphets, even when the p-string set is sttic. We refer the interested reder to Section 6.1 for further detils. Let us now exmine the dtses tht tret vrile-length records (not necessrily textul dtses), nd in prticulr, their compound ttriute orgniztion [31] nd [29, Sect 6.5], in which the lexicogrphic order of some records' comintions is properly mintined. An exmple of this is indexing n employee dtse ccording to the string otined y conctenting employee's nme, oce nd phone numer. Prex -trees [9] re the most widely-used tool in mnging compound ttriute orgniztions. However, since they work y copying some prts of the key strings, they cuse dt dupliction nd spce overhed. Conversely, String -trees fully exploit the lexicogrphic order nd tke dvntge of the prex shred y ny two (consecutive) key strings. As consequence, we cn use String -trees to support this orgniztion without hving to copy the ttriutes in the dt structure ecuse we cn interpret ech vrile-length record s text string of ritrry length nd so use our solution to Prolem 1. The spce usge of String -trees is proportionl to the numer of key strings nd not to their totl length; thus, String -trees chieve much etter worst-cse spce sving with respect to prex -trees. We refer the interested reder to Section 6.2 for more detils. 4

6 1.2 Results in Min Memory (RAM model) Fixing = O(1), the String -tree cn e seen s n ugmented 2{3-tree [2] tht llows us to otin some interesting results in the stndrd RAM model, due to its lnced tree topology. We improve the online serch in sux trees when they store dynmic set of strings whose chrcters re tken from lrge lphet [3, 25]. Speciclly, we reduce the serching time from O(p log N + occ) to O(p + log N + occ) y using our solution to Prolem 2. This ws previously chieved y [33] only for sttic string set y mens of sux rrys. We implement dynmic sux rrys [17] in liner O(N) spce without using the nming technique of [27]. We still otin n lphet-independent serch nd the updtes run within the sme time ounds s in [17]. We refer the reder to Section 6.3. We otin tight ound, i.e., (N +k log k), in the comprison model for the prolem of sorting 's strings online. We strt out with n empty String -tree nd then insert 's strings one t time y mens of the procedure used in Prolem 1. This pproch requires totl of O(N + k log k) comprisons. The lower ound (N + k log k) holds ecuse we must exmine ll of the N input chrcters nd output permuttion of k strings. A strightforwrd use of compcted tries [29] would require O(N + k 2 log k) comprisons in the worst cse. A recent optiml pproch sed upon ternry serch trees hs een descried in [11]. The rest of this pper is orgnized s follows. In Section 2, we introduce String -trees nd discuss their min properties nd opertions. We give forml, detiled description of them in Sections 3 nd 4. In Section 5, we review nd discuss some previous work on the most importnt dt structures for mnipulting externl-memory text collections with the im of clrifying String -trees' min properties nd dvntges. In Section 6, we study the pplicility of String -trees. We conclude the pper with some open prolems nd some suggestions for further reserch. 2 The String -Tree Dt Structure We ssume tht ech string in the input set is stored in contiguous sequence of disk pges nd represent the strings y their logicl pointers to the externl-memory ddresses of their rst chrcter, s shown in Figure 1. We cn therefore locte the disk pge contining the i-th chrcter of string y performing constnt numer of simple rithmeticl opertions on its logicl pointer. When mnging keys in the form of logicl pointers to ritrrily-long strings we re fced with two mjor diculties tht re not usully encountered in other elds, such s computtionl geometry [23]: We cn group () logicl pointers to strings into single disk pge ut, unfortuntely, if we only red this pge, we re not le to retrieve strings' chrcters. We cn compre ny two strings chrcter-y-chrcter ut this is extremely inef- cient if repeted severl times ecuse its worst-cse cost is proportionl to the length of the two strings involved ech time. We cll this prolem rescnning, due to the fct tht the sme input chrcters re (re)exmined severl times. 5

7 i d t o m t t e n u t e c r p t e n t z o o t l s s u n y f i t d o g c e l i d c o d =8 y e = { ce, id, tls, tom, ttenute, y, ye, cr, cod, dog, fit, lid, ptent, sun, zoo } Figure 1. An exmple of storing string set in externl memory. The strings re not put in ny prticulr order. Disk is represented y liner rry with disk pge size = 8. The logicl pointers to 's strings re their strting positions in externl memory. For exmple, 48 is the logicl pointer to string `t' nd 14 is the logicl pointer to sux `nute'. The lck oxes in the disk pges denote specil endmrkers tht prevent two suxes of 's strings from eing equl. Consequently, we elieve tht proper orgniztion of the strings nd method for voiding rescnning re crucil to solve Prolems 1 nd 2 with provly good performnce in the worst cse, nd we show how to do this in the rest of this section. We egin y descriing -tree-like dt structure tht helps us to solve Prolem 1 y hndling keys which re logicl pointers to ritrrily-long strings. Since the worstcse ounds otined re not the ones climed in the introduction, we perform nother step nd trnsform the -tree-like dt structure into simplied version of the String - tree y properly orgnizing the logicl pointers inside its nodes y mens of Ptrici tries. This comintion is descried in Section 2.1, where we introduce new structurl properties tht llow us to design serch procedure which voids the rescnning prolem previously mentioned, thus showing how to solve Prolem 1 eciently. Finlly, we show in Section 2.2 how to otin the nl version of the String -tree for solving Prolem 2 y dding some extr pointers nd proving further properties tht re crucil to chieve our ounds. 2.1 Prex Serch nd Rnge Query (Prolem 1) We strt out y descriing -tree-like dt structure which gives us n initil, rough solution to Prolem 1. As previously stted, we represent strings y their logicl pointers. The input is string set whose totl numer of chrcters is N. We denote y K = fk 1 ; : : : ; K k g the set of 's strings in lexicogrphic order, denoted y L. We ssume tht strings K 1 ; : : : ; K k reside in the -tree leves, which re linked together to form idirectionl list, nd only some strings re copied in the internl nodes we otin the so-clled + -tree [15]. We denote the ordered string set ssocited with node y S, where S K, nd denote S 's leftmost string y L() nd S 's rightmost string y R(). We store ech node in single disk pge nd put constrint on the numer of its strings: 6

8 σ 1 L( π ) = L( ) R( π ) = R( σ g ) π L( σ ) R( ) L( σ ) R( ) 1 σ 1 2 σ 2 g L( σ ) R( ) σ g L( σ )... R( ) 1 σ 1 L( σ )... R( σ ) L( σ )... R( σ ) 2 2 g g σ 1 σ σ g 2 Figure 2. The logicl lyout of -tree internl node hving g = n() children. js j 2, where = () is n even integer properly chosen to let single node t into disk pge. We llow the root to contin less thn strings. We distriute the strings mong the -tree nodes s follows: We prtition K into groups of consecutive strings ech, except for the lst group which cn contin from to 2 strings. We mp ech group to lef, sy, nd form its string set S in such wy tht we cn retrieve K y scnning the leves rightwrds nd y conctenting their string sets. Ech internl node hs n() children 1 ; : : : ; n() nd its ordered string set S = fl( 1 ); R( 1 ); : : : ; L( n() ); R( n() )g is otined y copying the leftmost nd rightmost strings contined in its children, s shown in Figure 2. (Actully, we could only copy one string from ech child ut this would mke our lgorithms more complex.) Since n() = jsj, ech node hs from to children except for the root nd the leves, nd the 2 2 resulting numer of -tree levels is H = O(log =2 k) = O(log k). We cll H its height, nd numer these levels y strting from the root (level 1). See Figure 3 for n exmple. Prolem 1 cn e solved y using the -tree-like lyout descried ove. We only discuss the Prex Serch(P ) opertion in detil. It is sed on n interesting oservtion introduced y Mner nd Myers [33]: the strings hving prex P occupy contiguous prt of K. In the exmple descried in Section 1, the strings hving prex P = `t' ll rnge from string `tls' to string `ttenute'. Consequently, we only hve to retrieve K's leftmost nd rightmost strings whose prex is P ecuse the rest of the strings to e retrieved lie in K etween these two strings. In our cse, these strings occupy contiguous sequence of -tree leves i.e., the ones storing the logicl pointers 35, 5 nd 10 in Figure 3. In nother oservtion of theirs, Mner nd Myers identify the leftmost string whose prex is P : this string is djcent to P 's position in K ccording to the lexicogrphic order L. In the exmple given in Section 1, if P = `t', its position in K is etween strings `id' nd `tls'; in fct, `tls' is the leftmost string we re looking for. A symmetricl oservtion holds for the rightmost string nd so we do not discuss it here. Since K is dynmic set prtitioned mong the -tree leves, we cn use Mner nd Myers' oservtions in our -tree-like lyout. We therefore nswer Prex Serch(P ) y focusing on the retrievl of P 's position in K. We represent P 's position in the whole set K y mens of pir (; j), 7

9 level i d t o m t t e n u t e c r p t e n t z o o t l s s u n y f i t d o g c e l i d c o d y e Figure 3. An exmple of -tree-like lyout (upper prt) nd its input string set (lower prt). Set K = f`ce', `id', `tls', `tom', `ttenute', `y', `ye', `cr', `cod', `dog', `t', `lid', `ptent', `sun', `zoo'g is otined y sorting. The strings in K re stored in the -tree leves y mens of their logicl pointers 56, 1, 35, 5, 10, : : :, 31. such tht is the lef contining this position nd j? 1 is the numer of S 's strings lexicogrphiclly smller thn P, where 1 j js j+1. We lso sy tht j is P 's position in set S. In our exmple for P = `t', is the leftmost lef in Figure 3 nd j = 3, where S is mde up of the strings pointed y 56, 1, 35 nd 5. In Figure 4, we illustrte the lgorithmic scheme for identifying pir (; j), where we denote the procedure tht determines P 's position in set S y PT-Serch(P, S ). We egin y checking the two trivil cses in which P is either smller thn ny other string in K (Step (1)) or lrger thn ny other string in K (Step (2)). If oth checks turn out to e flse, we strt out from = root in Step (3) nd perform downwrd -tree trversl y mintining the invrint: L() < L P L R() for ech node visited (Steps (4){ (8)). In visiting, we lod its disk pge nd pply procedure PT-Serch in order to nd P 's position j in string set S, nmely, we determine its two djcent strings verifying ck j?1 < L P L c Kj. If is lef, we stop the trversl. If is n internl node, we hve the following two cses: (1) If strings c K j?1 nd c K j elong to two distinct children of, sy c K j?1 = R( 0 ) nd 8

10 (1) if P L K 1 then := leftmost lef; j := 1; return(; j); (2) if P > L K k then := rightmost lef; j := js j + 1; return(; j); (3) := root; while true do /* Invrint: L() < L P L R() */ (4) Lod 's pge nd let S = fc K1 ; : : : ; c K2n() g; (5) j := PT-Serch(P, S ); /* c Kj?1 < L P L c Kj */ (6) if is lef then := ; return(; j); (7) if c Kj = L(), for child of then := 's leftmost descending lef; j := 1; return(; j); (8) if c Kj = R(), for child of then := ; endwhile Figure 4. The pseudocode for identifying pir (; j) tht represents P 's position in K. ck j = L() for two children 0 nd, then the two strings re djcent in the whole set K due to -tree's lyout. This determines P 's position in K. We therefore choose s the leftmost -tree lef tht descends from nd conclude tht P is in the rst position in S ecuse L() = L() = c K j. (2) If oth c K j?1 nd c K j elong to the sme child, sy c K j?1 = L() nd c K j = R() for child, then we set := in order to mintin the invrint nd continue the -tree trversl on the next level recursively. At the end of this trversl, we nd the pir ( L ; j L ) tht represents the position of K's leftmost string hving prex P. In the sme wy, we cn determine the pir ( R ; j R ) tht represents the position of K's rightmost string hving prex P. We go on to nswer Prex Serch(P ) y scnning the linked sequence of -tree leves delimited y L nd R (inclusive) nd y listing ll the strings from the (j L )-th string in S L up to the (j R? 1)-th string in S R. The serch descried so fr is similr to the one used for regulr -trees, especilly if we implement procedure PT-Serch y performing inry serch of P in set S nd exmining O(log 2 js j) = O(log 2 ) strings. While this inry serch does not cost nything more in regulr -trees, in this cse, once we lod 's disk pge, we hve to py O( p + 1) disk ccesses to lod ech string exmined nd compre it to P ecuse we represent the strings y their logicl pointers. Consequently, cll to PT-Serch tkes O(( p + 1) log 2 ) disk ccesses in the worst cse. It follows tht this simple pproch for Prex Serch clls PT-Serch H times nd thus tkes totl of O(H ( p + 1) log 2 ) = O(( p + 1) log 2 k) disk ccesses plus O( occ) disk ccesses for retrieving the strings delimited y leves L nd R. This ound is the sme s for the sux rry serch without ny uxiliry dt structures [33], nd worse thn the one we climed in the introduction. Nevertheless, the -tree-like lyout gives us good strting point for nding n ecient implementtion of Prex Serch. We now crry out nother step in the -tree-like lyout y plugging Ptrici trie [36] into ech -tree node in order to orgnize its strings properly nd support serches tht 9

11 c c 0 c 3 4 c c c c c c c c c c c c c c c c c c c Compcted Trie Ptrici Trie Figure 5. The numer leling n internl node u denotes the length of the string spelled out y the downwrd pth from the root to u. compre only one string of set S in the worst-cse rther thn the log 2 js j ones required for inry serch. We cll the resulting dt structure the simplied String -tree. 2 Let us exmine node in the String -tree nd the Ptrici trie P T plugged into it. We cn dene P T in two steps: (1) We uild compcted trie [29] on S 's strings (see Figure 5, left). (2) We lel ech compcted trie node y the length of the sustring stored into it nd we replce ech sustring leling n rc y its rst chrcter only, clled rnching chrcter (see Figure 5, right). On one hnd, the Ptrici trie loses some informtion with respect to the compcted trie ecuse we delete ll the chrcters in ech rc lel except the rnching chrcter. On the other hnd, the Ptrici trie hs two importnt fetures tht we discuss elow: (i) it ts () strings into one - tree node independently of their length; (ii) it llows to perform lexicogrphic serches y rnching out from node without further disk ccesses. It is worth noting tht compcted trie might stisfy feture (i) y representing the sustrings leling its rcs vi pirs of pointers to their externl-memory positions; however, feture (ii) would e no longer stised ecuse of the pirs of pointers nd this would increse the numer of disk ccesses tken y the serch opertion. We now show how to exploit some new properties of Ptrici tries for implementing the PT-Serch procedure in two phses. Due to its fetures, herefter we will cll this serch procedure lind serch: 2 We were not le to nd ny source in the reserch literture referring to dt structure sed on -trees nd Ptrici tries for internl nodes, nd resemling the simplied String -tree. Proly some progrmmers know such dt structure. Nonetheless, we highlight new structurl properties tht re crucil to chieve optiml worst-cse ounds for Prolem 1. 10

12 0 P = cc 0 P = cc } mismtch 3 4 c c c 3 4 c } hit-node lef l c c c c c c c c c correct position c c c c c c c c c correct position } common prefix Figure 6. (Left) An exmple of the rst phse in lind serch. The mrked rcs re the trversed ones. (Right) An exmple of the second phse in lind serch. The hit node is circled, nd the mrked rcs re the ones trversed to nd P 's position in S. In the rst phse, we trce downwrd pth in P T to locte lef l, which does not necessrily identify P 's position in S. We strt out from the root nd only compre some of P 's chrcters with the rnching chrcters found in the rcs trversed until we either rech lef, sy l, or no further rnching is possile. In the ltter cse, we choose l to e descending lef from the lst node trversed. In the second phse, we lod l's string nd compre it to P in order to determine their common prex. We prove useful property (Lemm 3.5): Lef l stores one of S 's strings tht shre the longest common prex with P. We use this common prex in two wys: we rst determine l's shllowest ncestor (the hit node) whose lel is n integer equl to, or greter thn, the common prex length of l's string nd P. We then nd P 's position y using P 's mismtching chrcter to choose proper Ptrici trie lef descending from the hit node. We give n exmple of PT-Serch(P, S ) in Figure 6, where P = `cc'. In prticulr, Figure 6(left) depicts the rst phse in which l represents the rightmost lef. It is worth noting tht l does not identify P 's position in S ecuse we do not compre P 's mismtching chrcter (i.e., P [4] = `') nd thus we induce \mistke." We determine P 's correct position in the second phse, illustrted in Figure 6(right). We strt out y determining the common prex of l's string nd P (i.e., `c') nd then we nd l's shllowest ncestor (the hit node) whose lel is greter thn j`c'j = 3. After tht, we use mismtching chrcter P [4] = `' to identify P 's correct position j = 4 y trversing the 11

13 mrked rcs in Figure 6(right). It is worth noting tht we only lod the disk pges tht store prex `cc' in l's string ecuse the Ptrici trie is stored in 's disk pge, thus mking the rnching chrcters ville. In this wy, we do not tke more thn O( p +1) disk ccesses to execute PT-Serch. It is now cler tht putting Ptrici tries nd the previously descried -tree lyout together, we void the inry serch in the nodes trversed nd thus reduce the overll complexity from O(( p + 1) log 2 k) to O(( p + 1)H) = O(( p + 1) log k) disk ccesses. However, this ound is yet not stisfctory nd does not mtch the one climed in the introduction. The reson is tht t ech visited node we re rescnning P from the eginning. We void rescnning nd otin the nl optiml ound y designing n improved PT-Serch procedure tht derives directly from the previous one ut exploits the String -tree lyout nd the Ptrici trie properties etter. It tkes three input prmeters (P; S ; `), where the dditionl input prmeter ` stises the property tht there is string in S whose rst ` chrcters re equl to P 's. PT-Serch(P; S ; `) returns pir (j; lcp), where j is P 's position in S (s efore) nd the dditionl output prmeter lcp is the common prex length of l's string nd P computed in the lind serch. A comment is in order t this point. We cn show tht lcp ` (see Lemm 3.6) nd cn therefore design fst incrementl PT-Serch tht compres P to l's string y only loding nd exmining the chrcters in positions ` + 1; : : : ; lcp + 1. As result, PT-Serch now only tkes d lcp?` e + 1 disk ccesses (see Theorem 3.8). We now go ck to the lgorithmic scheme for nding P 's position in the whole set K. The ove considertions llow us to modify the pseudocode in Figure 4 y dding instruction ` := 0 to Step (3) nd y replcing Step (5) with: (5) (j; `) := PT-Serch(P, S, `) We re now redy to nlyze Prex Serch's complexity. As previously mentioned, we hve to serch for K's leftmost nd rightmost strings hving prex P y identifying the pirs ( L ; j L ) nd ( R ; j R ). We do this y mens of our modied pseudocode which trverses sequence of nodes, sy 1 ; 2 ; : : : ; H. The cost of exmining i is dominted y Step (5), which tkes d i = d `i?`i?1 e + 1 `i?`i?1 + O(1) disk ccesses ecuse we execute PT-Serch with ` = `i?1 to compute lcp = `i. The totl cost of this trversl is P H i=1 d i = `H?`0 + O(H) = O( p + log k) disk ccesses. We use the fct tht it is telescopic sum, where `0 = 0, `H p nd H = O(log k). Susequently, we retrieve K's strings hving prex P y exmining the leves of the String -tree delimited y L nd R in O( occ p+occ ) disk ccesses. The totl cost of Prex Serch(P ) is therefore O( + log k) disk ccesses. We refer the reder to Section 4.1 for detiled, forml discussion of this result. The simplied String -tree lyout hs the considerle dvntge of eing dynmic without requiring ny contiguous spce. A new string K cn e inserted into like regulr -trees, tht is, y inserting K into K in lexicogrphic order. We identify K's position in K y computing its pir (; j). We then insert K into string set S t position j. If L() or R() chnge in, then we extend the chnge to 's ncestors. After tht, if gets full (i.e., it contins more thn 2 strings), we sy tht split occurs. We crete new lef nd instll it s n djcent siling of. We then split string set S into two roughly equl prts of t lest strings ech, in order to otin 's nd 's new string sets. We copy 12

14 strings L(); R(); L() nd R() in their prent node in order to replce the old strings L() nd R(). If 's prent lso gets full ecuse it hs two more strings, we split it. In the worst cse, the splitting cn extend up to the String -tree's root nd the resulting String -tree's height cn increse y one. The deletion of string from is similr to its insertion, except tht we re fced with lef tht gets hlf-full ecuse it hs less thn strings. In this cse, we sy tht merge occurs nd we join this lef nd n djcent siling lef together: we merge their string sets nd propgte the merging to their ncestors. In the worst cse, the merging cn extend up to the String -tree's root nd so the height cn decrese y one. The cost for inserting or deleting string is given y its serching cost plus the O(log k) relncing cost. We cn prove the following result: Theorem 2.1 (Prolem 1). Let e set of k strings whose totl length is N. Pre- x Serch(P ) tkes O( p+occ + log k) worst-cse disk ccesses, where p = jp j. Rnge Query(K 0 ; K 00 ) tkes O( k0 +k 00 +occ + log k) worst-cse disk ccesses, where k 0 = jk 0 j nd k 00 = jk 00 j. Inserting or deleting string of length m tkes O( m + log k) worst-cse disk ccesses. The spce occupied y the String -tree uilt on is ( k ) disk pges nd the spce required y string set is ( N ) disk pges. 2.2 Sustring Serch (Prolem 2) We now show how solve Prolem 2, in which the input is string set = f 1 ; : : : ; k g whose totl numer of chrcters is N = P k h=1 j h j. We denote the sux set y SUF () = f[i; jj] : 1 i jj nd 2 g, which therefore contins N lexicogrphiclly ordered suxes. As previously mentioned, Prolem 2 concerns with more powerful Sustring Serch(P ) opertion tht serches for P 's occurrences in 's strings, i.e., it nds ll the length-p sustrings equl to P. Since ech of these occurrences corresponds to sux whose prex is P i.e., [i; i + p? 1] = P if nd only if P is prex of [i; jj] 2 SUF () our prolem is ctully to retrieve ll of SUF ()'s strings hving prex P. We therefore turn Sustring Serch(P ) on string set into Prex Serch(P ) on sux set SUF (). For exmple, let us exmine the String -tree shown in Figure 7 nd serch for P = `t'. We hve to retrieve occ = 5 occurrences: `tls',`tom', `ttenute' nd `ptent'. The suxes hving prex P nd corresponding to these occurrences hve their logicl pointers (i.e., 16, 25, 35, 5 nd 10) stored in contiguous sequence of leves in Figure 7. As result, we cn set the string set K = SUF () nd its size k = N nd execute Prex Serch(P ). The totl cost of nswering Sustring Serch(P ) is therefore O( p+occ + log N) worst-cse disk ccesses y Theorem 2.1. Although this trnsformtion notly simplies the serch opertion, it introduces some updting prolems tht represent the most chllenging prt of solving Prolem 2. We wish to point out tht the insertion of n individul string Y into string set, where m = jy j, consists of inserting ll of its m suxes into sux set SUF () in lexicogrphic order. Consequently, we could consider inserting one sux t time, sy Y [i; m], with jy [i;m]j d i = O( + log N) disk ccesses y Theorem 2.1 with K = SUF () nd k = N. The totl insertion cost would e P m i=1 d i = O(m ( m +1)+m log (N +m)) disk ccesses nd this is worse thn the O(m log (N +m)) worst-cse ound we climed in the introduction. The 13

15 T T T T T T T T T T T T T i d t o m t t e n u t e c r p t e n t z o o t l s Figure 7. An exmple of n String -tree lyout for solving Prolem 2 on string set = f`id', `tls', `tom', `ttenute', `cr', `ptent', `zoo'g. Here, = 4 nd K = f`id',`r',`s', : : :, `ute',`zoo' g. prolem here is tht we tret the m inserted suxes like ritrry strings nd this cuses the rescnning prolem. The solution lies in the fct tht they re ll prt of the sme string. Consequently, we ugment the simplied String -tree y introducing two types of uxiliry pointers which help us to void rescnning in the updting process: One type is the stndrd prent pointer dened for ech node; the other is the succ pointer dened for ech string in SUF () s follows. The succ pointer for [i; jj] 2 SUF () leds to String -tree's lef contining [i + 1; jj]. If i = jj, then we let succ e self-loop pointer to its own lef, i.e., the lef contining [i; jj]. We only descrie the logic ehind Y 's insertion here ecuse its deletion is simpler, nd tret the suject formlly in Sections 4.2{4.5. We insert Y 's suxes into the String -tree storing SUF () t the eginning, going from the longest to the shortest one. We proceed y induction on i = 1; 2; : : : ; m nd mke sure tht we stisfy the following two conditions fter Y [i; m]'s insertion: () Suxes Y [j; m] re stored in the String -tree, for ll 1 j i, nd Y [i; m] shres its rst h i chrcters with one of its djcent strings in the String -tree. () All the succ pointers re correctly set for the strings in the String -tree except for Y [i; m]. This mens tht succ(y [i; m]) is the only dngling pointer, unless i = m, in which cse it is self-loop pointer to its own lef. We refer the reder to the self-explntory pseudocode illustrted in Figure 8 for further detils. We ssume tht Conditions () nd () re stised for i? 1. y executing 14

16 procedure S-Insert(Y ); m := jy j; for i = 1; 2; : : : ; m do (1) nd the lef i tht contins Y [i; m]'s position; (2) insert Y [i; m] into i ; (3) if split occurs then relnce the String -tree; redirect some succ nd prent pointers; (4) succ(y [i? 1; m]) := lef contining Y [i; m]; (5) if i = m then succ(y [i; m]) := succ(y [i? 1; m]); /* self-loop pointer */ endfor Figure 8. The insertion lgorithm. Steps (1){(5), we mke succ(y [i; m]) e the new dngling pointer nd stisfy Conditions () nd () for i. We therefore go on y setting i := i + 1 nd repet the insertion for the next sux of Y. The two min prolems rising in the implementtion of the insertion procedure re: Step (1): We hve to nd Y [i; m]'s position without ny rescnning. Step (3): We hve to relnce the updted String -tree y redirecting some succ nd prent pointers eciently. We now exmine the prolem of nding Y [i; m]'s position (Step (1)). For i = 1, we nd Y [1; m]'s position y trversing the String -tree nlogously to Prex Serch(Y [1; m]). We tke dierent pproch for the rest of Y 's suxes (i > 1) to void rescnning nd inductively exploit Conditions () nd () for i? 1. When nding Y [i; m]'s position, insted of strting out from the root, we trverse the String -tree from the lst lef visited in the String -tree (i.e., the one contining Y [i? 1; m]). Since Y [i? 1; m] = Y [i? 1] Y [i; m], we would e tempted to use the succ(y [i? 1; m]) pointer to identify Y [i; m]'s position directly ut cnnot ecuse the pointer is dngling y Condition (). However, we know tht Y [i? 1; m] shres its rst h i?1 chrcters with one of its djcent strings y Condition (). We therefore tke the succ-pointer of this djcent string, which is correctly set y Condition (), nd rech lef which veries the following property: it contins string tht shres the rst mxf0; h i?1? 1g chrcters with Y [i; m] (Lemm 4.8). We continue the insertion y performing n upwrd nd downwrd String -tree trversl leding to lef i, which contins Y [i; m]'s position. Since we cn prove tht h i mxf0; h i?1? 1g (Corollry 4.9), our lgorithm voids rescnning y only exmining Y 's chrcters in positions i + mxf0; h i?1? 1g; : : : ; i + h i. We show tht this \doule" String -tree trversl correctly identies i with h i?mxf0;h i?1?1g + O(log (N + m)) disk ccesses (Lemm 4.10). After Y [i; m]'s insertion in its lef, we hve to relnce the String -tree if split occurs (Step (3)). A strightforwrd hndling of prent nd succ pointers would tke O( log (N + m)) worst-cse disk ccesses per inserted sux ecuse: (i) ech node split opertion cn redirect () of these pointers from possily distinct nodes; (ii) there cn 15

17 e H = O(log (N + m)) split opertions per inserted sux. In Section 4.5, we show how to otin n O(log (N + m)) mortized cost per sux nd then devise generl strtegy sed on node clusters to chieve O(log (N + m)) in the worst cse. As fr s the worst-cse complexity of Y [i; m]'s insertion is concerned, we tke d i = h i?mxf0;h i?1?1g + O(log (N + m)) disk ccesses, where h 0 = 0 nd h i m. As result, totl of P m i=1 d i = O( m + m log (N + m)) = O(m log (N + m)) disk ccesses re required for inserting Y into. It is worth noting tht we chieve the sme worst-cse performnce s for the insertion of m integer keys into regulr -tree; ut dditionlly, our ound is proportionl to the numer of inserted suxes rther thn their totl length, which is ounded y (m 2 ). We give forml, detiled discussion of the updte opertions in Sections 4.2{4.5 nd prove the following result: Theorem 2.2 (Prolem 2). Let e set of strings whose totl length is N. Sustring Serch(P ) tkes O( p+occ + log N) worst-cse disk ccesses, where p = jp j. Inserting string of length m in or deleting it tkes O(m log (N + m)) worst-cse disk ccesses. The spce occupied y oth the String -tree nd the string set mounts to ( N ) disk pges. We egin our forml discussion with technicl description of the Ptrici trie dt structure nd its opertions (Section 3). We then give technicl description of the String -tree dt structure nd discuss its opertions in detil (Section 4). 3 A Technicl Description of Ptrici Tries We let denote n ordered lphet nd L denote the lexicogrphic order mong the strings whose chrcters re tken from. Given two strings X nd Y tht re not ech other's prex, we dene lcp(x; Y ) to e their longest common prex length, i.e., lcp(x; Y ) = k i X[1; k] = Y [1; k] nd X[k+1] 6= Y [k+1]. This denition cn e extended to the cse in which X is Y 's prex (or vice vers) y ppending specil endmrker to oth strings. The following fct illustrtes the reltionship etween the lexicogrphic order L nd the lcp vlue: Fct 3.1. For ny strings X 1 ; X 2 ; Y such tht either X 1 L X 2 L Y or Y L X 2 L X 1 : lcp(x 1 ; Y ) lcp(x 2 ; Y ). Let us now consider n ordered string set S = fx 1 ; : : : ; X d g nd ssume tht ny two strings in S re not ech other's prex. We use the shorthnd mx lcp(y; S) to indicte the mximum mong the lcp-vlues of Y nd S's strings, i.e., mx lcp(y; S) = mx X2S lcp(y; X). We sy tht n integer j is Y 's position in set S if exctly (j? 1) strings in S re lexicogrphiclly smller thn Y, where 1 j d + 1. The following fct illustrtes the reltionship etween the mx lcp vlue nd S's strings ner Y 's position: Fct 3.2. If j is Y 's position in S, then 8 >< lcp(y; X 1 ) if j = 1 mx lcp(y; S) = mxflcp(x j?1 ; Y ); lcp(y; X j )g if 2 j d >: lcp(x d ; Y ) if j = d

18 We introduce denition of Ptrici tries tht is slightly dierent from the one in [36], ut it is suitle for our purposes. A Ptrici trie P T S uilt on S stises the following conditions (see Figure 5): (1) Ech rc is leled y rnching chrcter tken from nd ech internl node hs t lest two outgoing rcs leled with dierent chrcters. The rcs re ordered ccording to their rnching chrcters nd only the root cn hve one child. (2) There is distinct lef v ssocited with ech string in S. We denote this string y W (v). Lef v lso stores its string length len(v) = jw (v)j. (3) If node u is the lowest common ncestor of two leves l nd f, then it is leled y integer len(u) = lcp(w (l); W (f)) (nd we let len(root) = 0). Speciclly, lef l (resp., f) descends from u's outgoing rc whose rnching chrcter is the (len(u) + 1)-st chrcter in string W (l) (resp., W (f)). Let us now consider n internl node u in P T S nd denote u's prent y prent(u); we let f e one of u's descending leves. Property (3) suggests tht we denote the string implicitly stored in node u y W (u), tht is, W (u) is equl to the rst len(u) chrcters of W (f). Arc (prent(u); u) implicitly corresponds to sustring of length (len(u)? len(prent(u))) hving its rst chrcter equl to the rnching chrcter W (f)[len(prent(u)) + 1] nd the other chrcters equl to W (f)'s chrcters in positions len(prent(u)) + 2; : : : ; len(u). We cn now introduce the denition of hit node tht is the nlog of the extended locus notion in compcted tries [34]: Denition 3.3. The hit node for pir (f; `), such tht f is lef nd 0 < ` len(f), is f's ncestor u stisfying: len(u) ` > len(prent(u)). If ` = 0, the hit node is the root. Ptrici tries do not tke up very much spce: P T S hs d leves nd no more thn d internl nodes ecuse only the root cn hve one child. Therefore, the totl spce required is O(d) even if the totl length of S's strings cn e much more thn d. 3.1 lind Serching in Ptrici Tries: PT-Serch procedure We propose serch method tht mkes use of Ptrici trie P T S to eciently retrieve the position of n ritrry string P in n ordered set S. We stted the intuition nd logic ehind it in Section 2.1 (PT-Serch procedure). PT-Serch's input is triplet (P; S; `), where ` lcp(p; X) for string X 2 S. The output is pir (j; lcp) in which j is P 's position in S nd lcp = mx lcp(p; S). Let us introduce specil chrcter $ smller thn ny other chrcter in nd let us ssume without ny loss in generlity tht P [i] = $ when i > jp j. We implicitly use the following fct to identify S's leftmost string whose prex is P (we cn lso determine its rightmost one y letting $ e lrger thn ny other lphet chrcter). Fct 3.4. There is mismtch etween P nd ny other string nd, if ny of S's strings hve prex P [1; jp j], then P 's position in S is to their immedite left. There re two min phses in our procedure: 17

19 First Phse: Downwrd Trversl. We locte lef, sy l, y trversing P T S downwrds. We strt out from its root nd compre P 's chrcters with the rnching chrcters of the rcs trversed. If u is the currently visited node nd hs n outgoing rc (u; v) whose rnching chrcter is equl to P [len(u) + 1], then we move from u to its child v nd set u := v. We go on like this until we either rech lef, which is l, or we cnnot rnch ny further nd then choose l s one of u's descending leves. Lef l stores one of S's strings tht stisfy the following useful property: Lemm 3.5. If we let lcp denote lcp(w (l); P ), then lcp = mx lcp(p; S). Proof: y wy of contrdiction, we ssume tht there is nother string X in S, such tht X 6= W (l) nd lcp(x; P ) > lcp, nd show tht we cnnot rech lef l. We hve lcp(w (l); X) = lcp. Let u denote the lowest common ncestor of l nd the lef storing X. From Property (3) of the Ptrici tries, it follows tht len(u) = lcp(w (l); X) nd P [len(u) + 1] = X[len(u) + 1] 6= W (l)[len(u) + 1] (ecuse lcp(x; P ) > lcp). Consequently, W (u) is proper prex of P nd we cn rnch further out from u to its child v y mtching P [len(u) + 1] with rnching chrcter X[len(u) + 1]. Since the rnching chrcter is dierent from W (l)[len(u)+1], we otin the contrdiction tht v is not one of l's ncestors nd therefore l cnnot e reched t the end of the downwrd trversl. It is worth noting tht we retrieve lef l without performing ny disk ccesses ecuse we only use the rnching chrcters stored in P T S 's disk pge. Furthermore, l's position does not necessrily correspond to P 's position in S (see Figure 6(left)). Second Phse: Retrievl of P's position in S. We compute lcp = lcp(w (l); P ) nd the two mismtching chrcters c = P [lcp+1] nd c 0 = W (l)[lcp+1] (which re well-dened y Fct 3.4) y exploiting the following result: Lemm 3.6. lcp `. Proof: We know tht there is string X 2 S, such tht lcp(p; X) `. Moreover, mx lcp(p; S) lcp(p; X) y denition. Since lcp = mx lcp(p; S) y Lemm 3.5, we deduce tht lcp `. From Lemm 3.6, we deduce tht the rst ` chrcters in P nd W (l) re denitely equl. We therefore compute lcp; c nd c 0 y strting out from the (`+1)-st chrcters in P nd W (l) rther thn from their eginning. Consequently, we only retrieve d lcp?` e + 1 disk pges, nmely the ones storing sustring W (l)[` + 1; lcp + 1]. We then detect the hit node, sy u, for the pir (l; lcp) y trversing the Ptrici trie upwrds nd nd P 's position j in S y using the property tht ll of S's strings hving prex P [1; lcp] re stored in u's descending leves. Lemm 3.7. We cn compute P 's position j without ny further disk ccesses. 18

20 Proof: We lredy hve lcp; c nd c 0 in min memory. We hndle two cses on hit node u nd derive their correctness from the Ptrici trie properties: (1) Cse len(u) = lcp. We let c 1 ; : : : ; c k e the rnching chrcters in u's outgoing rcs. None of them mtch chrcter c. If c < L c 1, then we move to u's leftmost descending lef z nd let j? 1 e the numer of leves to z's left (z excluded). If c k < L c, then we move to u's rightmost descending lef z nd let j? 1 e the numer of leves to z's left (z included). In ll other cses, we determine two rnching chrcters, sy c i nd c i+1, such tht c i < L c < L c i+1. We move to the leftmost lef z tht is rechle through the rc leled c i+1 nd let j? 1 e the numer of leves to z's left (z excluded). (2) Cse len(u) > lcp. We cn infer tht ll the strings stored in u's descending leves shre the sme prex of length len(u) nd we know tht len(u) > lcp > len(prent(u)). The (lcp + 1)-st chrcter of them ll is equl to c 0 ecuse l is one of u's descending leves. If c < L c 0, then we move to u's leftmost descending lef z nd let j? 1 e the numer of leves to z's left (z excluded). If c 0 < L c, then we move to u's rightmost descending lef z nd let j? 1 e the numer of leves to z's left (z included). It is worth noting tht the computtion of lcp; c nd c 0 is the only expensive step in the second phse. We cn therefore stte the following, sic result: Theorem 3.8. Let us ssume tht Ptrici trie P T S is lredy in min memory nd let ` e non-negtive integer such tht ` lcp(x; P ) for string X 2 S. PT-Serch(P; S; `) returns the pir (j; lcp) in which j is P 's position in S nd lcp = mx lcp(p; S). It does not cost more thn d lcp?` e + 1 disk ccesses. Proof: The correctness follows from Lemms 3.5 nd 3.7. We now nlyze the totl numer of disk ccesses. In the rst phse, we do not mke ny disk ccesses nd we perform no more thn 2d chrcter comprisons, s this is the numer of rnching chrcters in P T S. In the second phse, we do not require ny more thn d lcp?` e + 1 disk ccesses to compute lcp; c; c 0 nd O(lcp?`+1) chrcter comprisons. Finlly, we do not hve to mke ny more disk ccesses or more thn d chrcter comprisons to determine hit node u nd position j. 3.2 Dynmic opertions on Ptrici Tries We now descrie how to mintin Ptrici tries under conctente, split, insert nd delete opertions. These opertions will e useful to us further on. PT-Conctente(P T S1 ; P T S2 ; lcp; c; c 0 ) We let S 1 nd S 2 e two ordered string sets, such tht S 1 's strings re lexicogrphiclly smller thn S 2 's strings. If X is S 1 's rightmost string nd Y is S 2 's leftmost string, then the lst three input prmeters must stisfy lcp = lcp(x; Y ), c = X[lcp + 1] nd c 0 = Y [lcp + 1] (with c < L c 0 ). We use PT-Conctente to conctente Ptrici tries P T S1 nd P T S2 in order to crete single Ptrici trie P T S1 [S 2 whose ordered set S 1 [ S 2 is otined y ppending S 2 's strings to S 1 's. We uild P T S1 [S 2 y merging P T S1 's rightmost pth with P T S2 's leftmost pth. 19

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.) CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts

More information

Finite Automata-cont d

Finite Automata-cont d Automt Theory nd Forml Lnguges Professor Leslie Lnder Lecture # 6 Finite Automt-cont d The Pumping Lemm WEB SITE: http://ingwe.inghmton.edu/ ~lnder/cs573.html Septemer 18, 2000 Exmple 1 Consider L = {ww

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostt.wisc.edu Gols for Lecture Key concepts how lrge-scle lignment differs from the simple cse the

More information

Lecture 09: Myhill-Nerode Theorem

Lecture 09: Myhill-Nerode Theorem CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives

More information

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014 CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

expression simply by forming an OR of the ANDs of all input variables for which the output is

expression simply by forming an OR of the ANDs of all input variables for which the output is 2.4 Logic Minimiztion nd Krnugh Mps As we found ove, given truth tle, it is lwys possile to write down correct logic expression simply y forming n OR of the ANDs of ll input vriles for which the output

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

Harvard University Computer Science 121 Midterm October 23, 2012

Harvard University Computer Science 121 Midterm October 23, 2012 Hrvrd University Computer Science 121 Midterm Octoer 23, 2012 This is closed-ook exmintion. You my use ny result from lecture, Sipser, prolem sets, or section, s long s you quote it clerly. The lphet is

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9. Regulr Expressions, Pumping Lemm, Right Liner Grmmrs Ling 106 Mrch 25, 2002 1 Regulr Expressions A regulr expression descries or genertes lnguge: it is kind of shorthnd for listing the memers of lnguge.

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 utomt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Prolem (II) Chpter II.5.: Properties of Context Free Grmmrs (14) nton Setzer (Bsed on ook drft y J. V. Tucker nd K. Stephenson)

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms Preview Greed Algorithms Greed Algorithms Coin Chnge Huffmn Code Greed lgorithms end to e simple nd strightforwrd. Are often used to solve optimiztion prolems. Alws mke the choice tht looks est t the moment,

More information

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite

More information

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2016 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 9 1. (4pts) ((p q) (q r)) (p r), prove tutology using truth tles. p

More information

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms The Minimum Lel Spnning Tree Prolem: Illustrting the Utility of Genetic Algorithms Yupei Xiong, Univ. of Mrylnd Bruce Golden, Univ. of Mrylnd Edwrd Wsil, Americn Univ. Presented t BAE Systems Distinguished

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

The size of subsequence automaton

The size of subsequence automaton Theoreticl Computer Science 4 (005) 79 84 www.elsevier.com/locte/tcs Note The size of susequence utomton Zdeněk Troníček,, Ayumi Shinohr,c Deprtment of Computer Science nd Engineering, FEE CTU in Prgue,

More information

Lecture 3: Equivalence Relations

Lecture 3: Equivalence Relations Mthcmp Crsh Course Instructor: Pdric Brtlett Lecture 3: Equivlence Reltions Week 1 Mthcmp 2014 In our lst three tlks of this clss, we shift the focus of our tlks from proof techniques to proof concepts

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb. CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2 CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

On Suffix Tree Breadth

On Suffix Tree Breadth On Suffix Tree Bredth Golnz Bdkoeh 1,, Juh Kärkkäinen 2, Simon J. Puglisi 2,, nd Bell Zhukov 2, 1 Deprtment of Computer Science University of Wrwick Conventry, United Kingdom g.dkoeh@wrwick.c.uk 2 Helsinki

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

Homework Solution - Set 5 Due: Friday 10/03/08

Homework Solution - Set 5 Due: Friday 10/03/08 CE 96 Introduction to the Theory of Computtion ll 2008 Homework olution - et 5 Due: ridy 10/0/08 1. Textook, Pge 86, Exercise 1.21. () 1 2 Add new strt stte nd finl stte. Mke originl finl stte non-finl.

More information

The practical version

The practical version Roerto s Notes on Integrl Clculus Chpter 4: Definite integrls nd the FTC Section 7 The Fundmentl Theorem of Clculus: The prcticl version Wht you need to know lredy: The theoreticl version of the FTC. Wht

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

Chapter 4: Techniques of Circuit Analysis. Chapter 4: Techniques of Circuit Analysis

Chapter 4: Techniques of Circuit Analysis. Chapter 4: Techniques of Circuit Analysis Chpter 4: Techniques of Circuit Anlysis Terminology Node-Voltge Method Introduction Dependent Sources Specil Cses Mesh-Current Method Introduction Dependent Sources Specil Cses Comprison of Methods Source

More information

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs Prm University, Mth. Deprtment Summry of lecture 9 Algorithms nd Dt Structures Disjoint sets Summry of this lecture: (CLR.1-3) Dt Structures for Disjoint sets: Union opertion Find opertion Mrco Pellegrini

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016 CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

Where did dynamic programming come from?

Where did dynamic programming come from? Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf

More information

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018 CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35 Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

More on automata. Michael George. March 24 April 7, 2014

More on automata. Michael George. March 24 April 7, 2014 More on utomt Michel George Mrch 24 April 7, 2014 1 Automt constructions Now tht we hve forml model of mchine, it is useful to mke some generl constructions. 1.1 DFA Union / Product construction Suppose

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Solving the String Statistics Problem in Time O(n log n)

Solving the String Statistics Problem in Time O(n log n) Alcom-FT Technicl Report Series ALCOMFT-TR-02-55 Solving the String Sttistics Prolem in Time O(n log n) Gerth Stølting Brodl 1,,, Rune B. Lyngsø 3, Ann Östlin1,, nd Christin N. S. Pedersen 1,2, 1 BRICS,

More information

Formal Languages and Automata

Formal Languages and Automata Moile Computing nd Softwre Engineering p. 1/5 Forml Lnguges nd Automt Chpter 2 Finite Automt Chun-Ming Liu cmliu@csie.ntut.edu.tw Deprtment of Computer Science nd Informtion Engineering Ntionl Tipei University

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1 Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more

More information

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages Deprtment of Computer Science, Austrlin Ntionl University COMP2600 Forml Methods for Softwre Engineering Semester 2, 206 Assignment Automt, Lnguges, nd Computility Smple Solutions Finite Stte Automt nd

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

CS 330 Formal Methods and Models

CS 330 Formal Methods and Models CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2017 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 2 1. Prove ((( p q) q) p) is tutology () (3pts) y truth tle. p q p q

More information

First Midterm Examination

First Midterm Examination Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does

More information

1 APL13: Suffix Arrays: more space reduction

1 APL13: Suffix Arrays: more space reduction 1 APL13: Suffix Arrys: more spce reduction In Section??, we sw tht when lphbet size is included in the time nd spce bounds, the suffix tree for string of length m either requires Θ(m Σ ) spce or the minimum

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Designing Information Devices and Systems I Spring 2018 Homework 7

Designing Information Devices and Systems I Spring 2018 Homework 7 EECS 16A Designing Informtion Devices nd Systems I Spring 2018 omework 7 This homework is due Mrch 12, 2018, t 23:59. Self-grdes re due Mrch 15, 2018, t 23:59. Sumission Formt Your homework sumission should

More information

3 Regular expressions

3 Regular expressions 3 Regulr expressions Given n lphet Σ lnguge is set of words L Σ. So fr we were le to descrie lnguges either y using set theory (i.e. enumertion or comprehension) or y n utomton. In this section we shll

More information

Name Ima Sample ASU ID

Name Ima Sample ASU ID Nme Im Smple ASU ID 2468024680 CSE 355 Test 1, Fll 2016 30 Septemer 2016, 8:35-9:25.m., LSA 191 Regrding of Midterms If you elieve tht your grde hs not een dded up correctly, return the entire pper to

More information

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Converting Regular Expressions to Discrete Finite Automata: A Tutorial Converting Regulr Expressions to Discrete Finite Automt: A Tutoril Dvid Christinsen 2013-01-03 This is tutoril on how to convert regulr expressions to nondeterministic finite utomt (NFA) nd how to convert

More information

1B40 Practical Skills

1B40 Practical Skills B40 Prcticl Skills Comining uncertinties from severl quntities error propgtion We usully encounter situtions where the result of n experiment is given in terms of two (or more) quntities. We then need

More information

Bridging the gap: GCSE AS Level

Bridging the gap: GCSE AS Level Bridging the gp: GCSE AS Level CONTENTS Chpter Removing rckets pge Chpter Liner equtions Chpter Simultneous equtions 8 Chpter Fctors 0 Chpter Chnge the suject of the formul Chpter 6 Solving qudrtic equtions

More information

First Midterm Examination

First Midterm Examination 24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information

1 From NFA to regular expression

1 From NFA to regular expression Note 1: How to convert DFA/NFA to regulr expression Version: 1.0 S/EE 374, Fll 2017 Septemer 11, 2017 In this note, we show tht ny DFA cn e converted into regulr expression. Our construction would work

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

Formal languages, automata, and theory of computation

Formal languages, automata, and theory of computation Mälrdlen University TEN1 DVA337 2015 School of Innovtion, Design nd Engineering Forml lnguges, utomt, nd theory of computtion Thursdy, Novemer 5, 14:10-18:30 Techer: Dniel Hedin, phone 021-107052 The exm

More information

DFA minimisation using the Myhill-Nerode theorem

DFA minimisation using the Myhill-Nerode theorem DFA minimistion using the Myhill-Nerode theorem Johnn Högerg Lrs Lrsson Astrct The Myhill-Nerode theorem is n importnt chrcteristion of regulr lnguges, nd it lso hs mny prcticl implictions. In this chpter,

More information

Lecture 2: January 27

Lecture 2: January 27 CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full

More information

BİL 354 Veritabanı Sistemleri. Relational Algebra (İlişkisel Cebir)

BİL 354 Veritabanı Sistemleri. Relational Algebra (İlişkisel Cebir) BİL 354 Veritnı Sistemleri Reltionl lger (İlişkisel Ceir) Reltionl Queries Query lnguges: llow mnipultion nd retrievl of dt from dtse. Reltionl model supports simple, powerful QLs: Strong forml foundtion

More information

Section 4: Integration ECO4112F 2011

Section 4: Integration ECO4112F 2011 Reding: Ching Chpter Section : Integrtion ECOF Note: These notes do not fully cover the mteril in Ching, ut re ment to supplement your reding in Ching. Thus fr the optimistion you hve covered hs een sttic

More information

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations. Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one

More information

Acceptance Sampling by Attributes

Acceptance Sampling by Attributes Introduction Acceptnce Smpling by Attributes Acceptnce smpling is concerned with inspection nd decision mking regrding products. Three spects of smpling re importnt: o Involves rndom smpling of n entire

More information

Linear Inequalities. Work Sheet 1

Linear Inequalities. Work Sheet 1 Work Sheet 1 Liner Inequlities Rent--Hep, cr rentl compny,chrges $ 15 per week plus $ 0.0 per mile to rent one of their crs. Suppose you re limited y how much money you cn spend for the week : You cn spend

More information

Fast Frequent Free Tree Mining in Graph Databases

Fast Frequent Free Tree Mining in Graph Databases The Chinese University of Hong Kong Fst Frequent Free Tree Mining in Grph Dtses Peixing Zho Jeffrey Xu Yu The Chinese University of Hong Kong Decemer 18 th, 2006 ICDM Workshop MCD06 Synopsis Introduction

More information

Lecture 08: Feb. 08, 2019

Lecture 08: Feb. 08, 2019 4CS4-6:Theory of Computtion(Closure on Reg. Lngs., regex to NDFA, DFA to regex) Prof. K.R. Chowdhry Lecture 08: Fe. 08, 2019 : Professor of CS Disclimer: These notes hve not een sujected to the usul scrutiny

More information

7.2 The Definite Integral

7.2 The Definite Integral 7.2 The Definite Integrl the definite integrl In the previous section, it ws found tht if function f is continuous nd nonnegtive, then the re under the grph of f on [, b] is given by F (b) F (), where

More information

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont. NFA DFA Exmple 3 CMSC 330: Orgniztion of Progrmming Lnguges NFA {B,D,E {A,E {C,D {E Finite Automt, con't. R = { {A,E, {B,D,E, {C,D, {E 2 Equivlence of DFAs nd NFAs Any string from {A to either {D or {CD

More information

Tutorial Automata and formal Languages

Tutorial Automata and formal Languages Tutoril Automt nd forml Lnguges Notes for to the tutoril in the summer term 2017 Sestin Küpper, Christine Mik 8. August 2017 1 Introduction: Nottions nd sic Definitions At the eginning of the tutoril we

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

Centrum voor Wiskunde en Informatica REPORTRAPPORT. Supervisory control for nondeterministic systems

Centrum voor Wiskunde en Informatica REPORTRAPPORT. Supervisory control for nondeterministic systems Centrum voor Wiskunde en Informtic REPORTRAPPORT Supervisory control for nondeterministic systems A. Overkmp Deprtment of Opertions Reserch, Sttistics, nd System Theory BS-R9411 1994 Supervisory Control

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk out solving systems of liner equtions. These re prolems tht give couple of equtions with couple of unknowns, like: 6= x + x 7=

More information

1.4 Nonregular Languages

1.4 Nonregular Languages 74 1.4 Nonregulr Lnguges The number of forml lnguges over ny lphbet (= decision/recognition problems) is uncountble On the other hnd, the number of regulr expressions (= strings) is countble Hence, ll

More information

Winter 2016 COMP-250: Introduction to Computer Science. Lecture 24, April 7, 2016

Winter 2016 COMP-250: Introduction to Computer Science. Lecture 24, April 7, 2016 Winter 2016 COMP-250: Introduction to Computer Science Lecture 24, April 7, 2016 Tries 1 2 3 4 5 Tries Atrie is tree-sed dt dte structure for storing strings in order to mke pttern mtching fster. Tries

More information

Fingerprint idea. Assume:

Fingerprint idea. Assume: Fingerprint ide Assume: We cn compute fingerprint f(p) of P in O(m) time. If f(p) f(t[s.. s+m 1]), then P T[s.. s+m 1] We cn compre fingerprints in O(1) We cn compute f = f(t[s+1.. s+m]) from f(t[s.. s+m

More information

Combinational Logic. Precedence. Quick Quiz 25/9/12. Schematics à Boolean Expression. 3 Representations of Logic Functions. Dr. Hayden So.

Combinational Logic. Precedence. Quick Quiz 25/9/12. Schematics à Boolean Expression. 3 Representations of Logic Functions. Dr. Hayden So. 5/9/ Comintionl Logic ENGG05 st Semester, 0 Dr. Hyden So Representtions of Logic Functions Recll tht ny complex logic function cn e expressed in wys: Truth Tle, Boolen Expression, Schemtics Only Truth

More information

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite Unit #8 : The Integrl Gols: Determine how to clculte the re described by function. Define the definite integrl. Eplore the reltionship between the definite integrl nd re. Eplore wys to estimte the definite

More information

Linear Systems with Constant Coefficients

Linear Systems with Constant Coefficients Liner Systems with Constnt Coefficients 4-3-05 Here is system of n differentil equtions in n unknowns: x x + + n x n, x x + + n x n, x n n x + + nn x n This is constnt coefficient liner homogeneous system

More information

Hamiltonian Cycle in Complete Multipartite Graphs

Hamiltonian Cycle in Complete Multipartite Graphs Annls of Pure nd Applied Mthemtics Vol 13, No 2, 2017, 223-228 ISSN: 2279-087X (P), 2279-0888(online) Pulished on 18 April 2017 wwwreserchmthsciorg DOI: http://dxdoiorg/1022457/pmv13n28 Annls of Hmiltonin

More information

INF1383 -Bancos de Dados

INF1383 -Bancos de Dados 3//0 INF383 -ncos de Ddos Prof. Sérgio Lifschitz DI PUC-Rio Eng. Computção, Sistems de Informção e Ciênci d Computção LGER RELCIONL lguns slides sedos ou modificdos dos originis em Elmsri nd Nvthe, Fundmentls

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 Automt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Problem (II) Chpter II.6.: Push Down Automt Remrk: This mteril is no longer tught nd not directly exm relevnt Anton Setzer (Bsed

More information

1.3 Regular Expressions

1.3 Regular Expressions 56 1.3 Regulr xpressions These hve n importnt role in describing ptterns in serching for strings in mny pplictions (e.g. wk, grep, Perl,...) All regulr expressions of lphbet re 1.Ønd re regulr expressions,

More information

ɛ-closure, Kleene s Theorem,

ɛ-closure, Kleene s Theorem, DEGefW5wiGH2XgYMEzUKjEmtCDUsRQ4d 1 A nice pper relevnt to this course is titled The Glory of the Pst 2 NICTA Resercher, Adjunct t the Austrlin Ntionl University nd Griffith University ɛ-closure, Kleene

More information

1.9 C 2 inner variations

1.9 C 2 inner variations 46 CHAPTER 1. INDIRECT METHODS 1.9 C 2 inner vritions So fr, we hve restricted ttention to liner vritions. These re vritions of the form vx; ǫ = ux + ǫφx where φ is in some liner perturbtion clss P, for

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information