Solving the String Statistics Problem in Time O(n log n)

Size: px
Start display at page:

Download "Solving the String Statistics Problem in Time O(n log n)"

Transcription

1 Alcom-FT Technicl Report Series ALCOMFT-TR Solving the String Sttistics Prolem in Time O(n log n) Gerth Stølting Brodl 1,,, Rune B. Lyngsø 3, Ann Östlin1,, nd Christin N. S. Pedersen 1,2, 1 BRICS, Deprtment of Computer Science, University of Arhus, Ny Munkegde, DK-8000 Århus C, Denmrk. E-mil: {gerth,nno,cstorm}@rics.dk 2 BiRC, University of Arhus, Ny Munkegde, DK-8000 Århus C, Denmrk. 3 Deprtment of Sttistics, Oxford University, Oxford OX1 3TG, UK. E-mil: lyngsoe@stts.ox.c.uk Astrct The string sttistics prolem consists of preprocessing string of length n such tht given query pttern of length m, the mximum numer of non-overlpping occurrences of the query pttern in the string cn e reported efficiently. Apostolico nd Preprt introduced the miniml ugmented suffix tree (MAST) s dt structure for the string sttistics prolem, nd showed how to construct the MAST in time O(nlog 2 n) nd how it supports queries in time O(m) for constnt sized lphets. A susequent theorem y Frenkel nd Simpson stting tht string hs t most liner numer of distinct squres implies tht the MAST requires spce O(n). In this pper we improve the construction time for the MAST to O(nlog n) y extending the lgorithm of Apostolico nd Preprt to exploit properties of efficient joining nd splitting of serch trees together with refined nlysis. 1 Introduction The string sttistics prolem consists of preprocessing string S of length n such tht given query pttern α of length m, the mximum numer of nonoverlpping occurrences of α in S cn e reported efficiently. Without preprocessing the mximum numer of non-overlpping occurrences of α in S cn e found in time O(n), y using liner time string mtching lgorithm to find ll occurrences of α in S, e.g. the lgorithm y Knuth, Morris, nd Prtt [14], nd then in greedy fshion from left-to-right compute the mximl numer of non-overlpping occurrences. Prtilly supported y the Future nd Emerging Technologies progrmme of the EU under contrct numer IST (ALCOM-FT). Supported y the Crlserg Foundtion (contrct numer ANS-0257/20). Bsic Reserch in Computer Science (BRICS), funded y the Dnish Ntionl Reserch Foundtion. Bioinformtics Reserch Center (BiRC), funded y Arhus University Reserch Fundtion.

2 Apostolico nd Preprt in [3] descried dt structure for the string sttistics prolem, the miniml ugmented suffix tree MAST(S), with preprocessing time O(n log 2 n) nd with query time O(m) for constnt sized lphets. In this pper we present n improved lgorithm for constructing MAST(S) with preprocessing time O(n log n), nd prove tht MAST(S) requires spce O(n), which follows from recent theorem of Frenkel nd Simpson [9]. The sic ide of the lgorithm of Apostolico nd Preprt nd our lgorithm for constructing MAST(S), is to perform trversl of the suffix tree of S while mintining the lef-lists of the nodes visited in pproprite dt structures (see Section 1.1 for definition detils). Trversing the suffix tree of string to construct nd exmine the lef-lists t ech node is generl technique for finding regulrities in string, e.g. for finding squres in string (or tndem repets) [2,17], for finding mximl qusi-periodic sustrings, i.e. sustrings tht cn e covered y shorter sustring, [1,6], nd for finding mximl pirs with ounded gp [4]. All these prolems cn e solved using this technique in time O(n log n). Other pplictions re listed y Gusfield in [10, Chpter 7]. A crucil component of our lgorithm is the representtion of lef list y collection of serch trees, such tht the lef-list of node in the suffix tree of S cn e constructed from the lef-lists of the children y efficient merging. Hwng nd Lin [13] descried how to optimlly merge two sorted lists of length n 1 nd n 2, where n 1 n 2, with O(n 1 log n1+n2 n 1 ) comprisons. Brown nd Trjn [7] descried how to chieve the sme numer of comprisons for merging two AVL-trees in time O(n 1 log n1+n2 n 1 ), nd Huddleston nd Mehlhorn [12] showed similr result for level-linked (2,4)-trees. In our lgorithm we will use slightly extended version of level-linked (2,4)-trees where ech element hs n ssocited weight. Due to lck of spce proofs hve een omitted. The omitted detils cn e found in [5]. 1.1 Preliminries Some of the terminology nd nottion used in the following origintes from [3], ut with minor modifictions. We let Σ denote finite lphet, nd for string S Σ we let S denote the length of S, S[i] the ith chrcter in S, for 1 i S, nd S[i.. j] = S[i]S[i + 1] S[j] the sustring of S from the ith to the jth chrcter, for 1 i j S. The suffix S[i.. S ] of S strting t position i will e denoted S[i.. ]. An integer p, for 1 p S, is denoted period of S if nd only if the suffix S[p ] of S is lso prefix of S, i.e. S[p ] = S[1.. S p]. The shortest period p of S is denoted the period of S, nd the string S is sid to e periodic if nd only if p S /2. A nonempty string S is squre, if S = αα for some string α. In the rest of this pper S denotes the input string with length n nd α sustring of S. A non-empty string α is sid to occur in S t position i if α = S[i.. i+ α 1] nd 1 i n α +1. E.g. in the string the sustring occurs t positions 1 nd 8. The mximum numer of nonoverlpping occurrences of string α in string S, is the mximum numer of

3 occurrences of α where no two occurrences overlp. E.g. the mximum numer of non-overlpping occurrences of in is three, since the occurrences t positions 1, 5 nd 9 do not overlp. The suffix tree ST(S) of the string S is the compressed trie storing ll suffixes of the string S where / Σ. Ech lef in ST(S) represents suffix S[i.. ] of S nd is nnotted with the index i. Ech edge in ST(S) is leled with nonempty sustring of S, represented y the strt nd end positions in S, such tht the pth from the root to the lef nnotted with index i spells the suffix S[i.. ]. We refer to the sustring of S spelled y the pth from the root to node v s the pth-lel of v nd denote it L(v). We refer to the set of indices stored t the leves of the sutree rooted t v s the lef-list of v nd denote it LL(v). Since LL(v) is exctly the set of strt positions i where L(v) is prefix of the suffix S[i.. ], we hve Fct 1 elow. Fct 1 If v is n internl node of ST(S), then LL(v) = c child of v LL(c), nd i LL(v) if nd only if L(v) occurs t position i in S. The prolem of constructing ST(S) hs een studied intensively nd severl lgorithms hve een developed which for constnt sized lphets cn construct ST(S) in time nd spce O( S ) [8,15,18,19]. For non-constnt lphet sizes the running time of the lgorithms ecome O( S log Σ ). In the following we let the height of tree T e denoted h(t) nd e defined s the mximum numer of edges in root-to-lef pth in T, nd let the size of T e denoted T nd e defined s the numer of leves of T. For node v in T we let T v denote the sutree of T rooted t node v, nd let v = T v nd h(v) = h(t v ). Finlly, for node v in inry tree we let smll(v) denote the child of v with smller size (ties re roken ritrrily). The sic ide of our lgorithm in Section 5 is to process the suffix tree of the input string ottom-up, such tht we t ech node v spend mortized time O( smll(v) log( v / smll(v) )). Lemm 1 then sttes tht the totl time ecomes O(n log n) [16, Exercise 35]. Lemm 1. Let T e inry tree with n leves. If for every internl node v, c v = smll(v) log( v / smll(v) ), nd for every lef v, c v = 0, then v T c v n log n. 2 The String Sttistics Prolem Given string S of length n nd pttern α of length m the following greedy lgorithm will compute the mximum numer of non-overlpping occurrences of α in S. Find ll occurrences of α in S y using n exct string mtching lgorithm. Choose the leftmost occurrence. Continue to choose greedily the leftmost occurrence not overlpping with ny so fr chosen occurrence. This greedy lgorithm will compute the mximum numer of occurrences of α in S in time O(n), since ll mtchings cn e found in time O(n), e.g. y the lgorithm y Knuth, Morris, nd Prtt [14].

4 11 9 v Figure1. To the left is the suffix tree ST(S) of the string S =. The node v hs pth-lel L(v) = nd lef-list LL(v) = {1, 3,6, 9}. To the right is the miniml ugmented suffix tree MAST(S) for the string S =. Numers in the internl nodes re the c-vlues In the string sttistics prolem we wnt to preprocess string S such tht queries of the following form re supported efficiently: Given query string α, wht is the mximum numer of non-overlpping occurrences of α in S? The mximum numer of non-overlpping occurrences of α is clled the c-vlue of α, denoted c(α). The preprocessing will e to compute the miniml ugmented suffix tree descried elow. Given the miniml ugmented suffix tree, string sttistics queries cn e nswered in time O(m). For ny sustring, α, of S there is exctly one pth from the root of ST(S) ending in node or on n edge of ST(S) spelling out the string α. This node or edge is clled the locus of α. In suffix tree ST(S) the numer of leves in the sutree elow the locus of α in ST(S) tells us the numer of occurrences of α in S. These occurrences my overlp, hence the suffix tree is not immeditely suitle for the string sttistics prolem. The miniml ugmented suffix tree for S, denoted MAST(S) cn e constructed from the suffix tree ST(S) s follows. A minimum numer of new uxiliry nodes re inserted into ST(S) in such wy tht the c-vlue for ll sustrings with locus on n edge (u, v), where u is the prent of v, hve c-vlue equl to c(l(v)), i.e. the c-vlue only chnges t internl nodes long pth from lef to the root. Ech internl node v in the ugmented tree is then leled y c(l(v)) to get the miniml ugmented suffix tree. Figure 1 shows the suffix tree nd the miniml ugmented suffix tree for the string. Frenkel nd Simpson in [9] prove tht string S contins less thn 2 S distinct squres, which implies the following lemm. Lemm 2. The miniml ugmented suffix tree for string S hs t most 3 S internl nodes.

5 Figure 2. The grouping of occurrences in string into chunks nd necklces. Occurrences re shown elow the string. Thick lines re occurrences in chunks. The grouping into chunks nd necklces is shown ove the string. Necklces re shown using dshed lines. Note tht necklce cn consist of single occurrence. It follows tht the spce needed to store MAST(S) is O(n). 3 String Properties The lemm elow gives chrcteriztion of how the occurrences of string α cn pper in S (proof omitted). Lemm 3. Let S e string nd α sustring of S. If the occurrences of α in S re t positions i 1 < < i k, then for ll 1 j < k either i j+1 i j = p or i j+1 i j > mx{ α p, p}, where p denotes the period of α. A consequence of Lemm 3 is tht if p α /2, then n occurrence of α in S t position i j cn only overlp with the occurrences t positions i j 1 nd i j+1. If p < α /2, then two consecutive occurrences i j nd i j+1, either stisfy i j+1 i j = p or i j+1 i j > α p. Corollry 1. If i j+1 i j α /2, then i j+1 i j = p where p is the period of α. Motivted y the ove oservtions we group the occurrences of α in S into chunks nd necklces. Let p denote the period of α. Chunks cn only pper if p < α /2. A chunk is mximl sequence of occurrences contining t lest two occurrences nd where ll consecutive occurrences hve distnce p. The remining occurrences re grouped into necklces. A necklce is mximl sequence of overlpping occurrences, i.e. only two consecutive occurrences overlp t given position nd the overlp of two occurrences is etween one nd p 1 positions long. Figure 2 shows the occurrences of the string in string of length 55 grouped into chunks nd necklces. By definition two necklces cnnot overlp, ut chunk cn overlp with nother chunk or necklce t oth ends. By Lemm 3 the overlp is t most p 1 positions. We now turn to the contriution of chunks nd necklces to the c-vlues. We first consider the cse where chunks nd necklces do not overlp. An isolted necklce or chunk is necklce or chunk tht does not overlp with other necklces nd chunks. Figure 3 gives n exmple of the contriution to the c-vlues y n isolted necklce nd chunk. More formlly, we hve the following lemm, which we stte without proof. Lemm 4. An isolted necklce of k occurrences of α contriutes to the c-vlue of α with k/2. An isolted chunk of k occurrences of α contriutes to the c-vlue of α with k/ α /p, where p is the period of α.

6 Figure 3. Exmples of the contriution to the c-vlues y n isolted necklce (left; α = nd the contriution is 5 = 9/2 ) nd n isolted chunk (right; α =, p = 2, nd the contriution is 3 = 8/ 5/2 ) Motivted y Lemm 4, we define the nominl contriution of necklce of k occurrences of α to e k/2 nd the nominl contriution of chunk of k occurrences of α to e k/ α /p. The nominl contriution of necklce or chunk of α s is the contriution to the c-vlue of α if the necklce or chunk pperes isolted. If the necklce of chunk does not pper isolted, i.e. it overlps with neighoring necklce or chunk, then its ctul contriution to the c-vlue of α is t most e one less thn its nominl contriution to the c-vlue of α. We define the excess of necklce of k occurrences to e (k 1) mod 2, nd the excess of chunk of k occurrences to e (k 1) mod α /p. The excess descries the numer of occurrences of α[1.. p] which re covered y the necklce or chunk, ut not covered y the mximl sequence of non-overlpping occurences. We group the chunks nd necklces into collection of chins C y the following two rules: 1. A chunk with excess t lest two is chin y itself. 2. A mximl sequence of overlpping necklces nd chunks with excess zero or one is chin. For chin c C we define # 0 (c) to e the numer of chunks nd necklces with excess zero in the chin. We re now redy to stte our min lemm enling the efficient computtion of the c-vlues. The lemm gives n lterntive to the chrcteriztion in [3, Proposition 2] (proof omitted). Lemm 5. The mximum numer of non-overlpping occurrences of α in S equls the sum of the nominl contriutions of ll necklces nd chunks minus c C # 0(c)/2. 4 Level-Linked (2,4)-Trees In this section we consider how to mintin set of sorted lists of elements s collection of level-linked (2,4)-trees where the elements re stored t the leves in sorted order from left-to-right, nd ech element cn hve n ssocited rel vlued weight. For detiled tretment of level-linked (2,4)-trees see [12] nd [16, Section III.5]. The opertions we consider supported re: NewTree(e, w): Cretes new tree T contining the element e with ssocited weight w.

7 Serch(p, e): Serch for the element e strting the serch t the lef of tree T tht p points to. Returns reference to the lef in T contining e or the immedite predecessor or successor of e. Insert(p, e, w): Cretes new lef contining the element e with ssocited weight w nd inserts the new lef immedite next to the lef pointed to y p in tree T, provided tht the sorted order is mintined. Delete(p): Deletes the lef nd element tht p is pointer to in tree T. Join(T 1, T 2 ): Conctentes two trees T 1 nd T 2 nd returns reference to the resulting tree. It is required tht ll elements in T 1 re smller thn the elements in T 2 w.r.t. the totl order. Split(T, e): Splits the tree T into two trees T 1 nd T 2, such tht e is lrger thn ll elements in T 1 nd smller thn or equl to ll elements in T 2. Returns references to the two trees T 1 nd T 2. Weight(T): Returns the sum of the weights of the elements in the tree T. Theorem 1 (Hoffmnn et l. [11, Section 3]). Level-linked (2,4)-trees support NewTree, Insert nd Delete in mortized constnt time, Serch in time O(log d) where d is the numer of elements in T etween e nd p, nd Join nd Split in mortized time O(log min{ T 1, T 2 }). To llow ech element to hve n ssocited weight we extend the construction from [11, Section 3] such tht we for ll nodes v in tree store the sum of the weights of the leves in the sutree T v, except for the nodes on the pths to the leftmost nd rightmost leves. These sums re strightforwrd to mintin while relncing (2,4)-tree under node splittings nd fusions, since the sum t node is the sum of the weights t the children of the node. For ech tree we lso store the totl weight of the tree. Theorem 2. Weighted level-linked (2,4)-trees support NewTree nd Weight in mortized constnt time, Insert nd Delete in mortized time O(log T ), Serch in time O(log d) where d is the numer of elements in T etween e nd p, nd Join nd Split in mortized time O(log min{ T 1, T 2 }). 5 The Algorithm In this section we descrie the lgorithm for constructing the miniml ugmented suffix tree for string S of length n. Algorithm ide: The lgorithm strts y constructing the suffix tree, ST(S), for S. The suffix tree is then ugmented with extr nodes nd c-vlues for ll nodes to get the miniml ugmented suffix tree, MAST(S), for S. The ugmenttion of ST(S) to MAST(S) strts t the leves nd the tree is processed in ottom-up fshion. At ech node v encountered on the wy up the tree the c-vlue for the pth-lel L(v) is dded to the tree, nd t ech edge new nodes nd their c-vlues re dded if there is chnge in the c-vlue long the edge. To e le to efficiently compute the c-vlues nd decide if new nodes should

8 e dded long edges the indices in the lef-list of v, LL(v), re stored in dt structure tht keeps trck of necklces, chunks, nd chins, s defined in Section 3. Dt structure: Let α e sustring of S. The dt structure D(α) is serch tree for the indices of the occurrences of α in S. The leves in D(α) re the leves in LL(v), where v is the node in ST(S) such tht the locus of α is the edge directly ove v or the node v. The serch tree, D(α), will e orgnized into three levels to keep trck of chins, chunks, nd necklces. The top level in the serch tree stores chins, the middle level chunks nd necklces, nd the ottom level occurrences. Top level: Unweighted (2,4)-tree (cf. Theorem 1) with the chins s leves. The leftmost indices in ech chin re the keys. Middle level: One weighted (2,4)-tree (cf. Theorem 2) for ech chin, with the chunks nd necklces s leves. The leftmost indices in ech chunk or necklce re the keys. The weight of lef is 1 if the excess of the chunk or necklce is zero, otherwise the weight is 0. The totl weight of tree on the middle level is # 0 (c), where c denotes the chin represented y the tree. Bottom level: One weighted (2,4)-tree for ech chunk nd necklce, with the occurrences in the chunk or necklce s the leves. The weight of lef is one. The totl weight of tree is the numer of occurrences in the chunk or the necklce. Together with ech of the 3-level serch trees, D(α), some vriles re stored. NCS(α) stores the sum of the nominl contriution for ll chunks nd necklces, ZS(α) stores the sum c C # 0(c)/2, where C is the set of chins. By Lemm 5 the mximum numer of non-overlpping occurrences of α is NCS(α) ZS(α). We lso store the totl numer of indices in D(α) nd list of ll chunks denoted CHUNKLIST(α). Finlly we store, p(α), which is the smllest difference etween the indices of two consecutive occurrences in D(α). Note tht, y Corollry 1, p(α) is the period of α if there is t lest one chunk. To mke our presenttion more redle we will sometimes refer to the tree for chin, chunk, or necklce just s the chin, chunk, or necklce. For the top level tree in D(α) we will use level-linked (2,4)-trees, ccording to Theorem 1, nd for the middle nd ottom level trees in D(α) we will use weighted level-linked (2,4)-trees, ccording to Theorem 2. In these trees predecessor nd successor queries re supported in constnt time. We denote y l(e) nd r(e) the indices to the left nd right of index e. To e le to check fst if there re overlps etween two consecutive trees on the middle nd ottom levels we store the first nd lst index in ech tree in the root of the tree. This cn esily e kept updted when the trees re joined nd split. We will now descrie how the suffix tree is processed nd how the dt structures re mintined during this process. Processing events: We wnt to process edges in the tree ottom-up, i.e. for decresing length of α, so tht new nodes re inserted if the c-vlue chnges long

9 the edge, the c-vlues for nodes re dded to the tree, nd the dt structure is kept updted. The following events cn cuse chnges in the c-vlue nd the chin, chunk, nd necklce structure. 1. Excess chnge: When α ecomes i p(α), for i = 2, 3, 4,... the excess nd nominl contriution of chunks chnges nd we hve to updte the dt structure nd possily dd node to the suffix tree. 2. Chunks ecome necklces: When α decreses nd ecomes 2p chunk degenertes into necklce. At this point we join ll overlpping chunks nd necklces into one necklce nd possily dd node to the suffix tree. 3. Necklce nd chin rek-up: When α decreses two consecutive occurrences t some point no longer overlp. The result is tht necklce or chin my split, nd we hve to updte the necklce nd chin structure nd possily dd node to the suffix tree. 4. Merging t internl nodes: At internl nodes in the tree the dt structures for the sutrees elow the node re merged into one dt structure nd the c-vlue for the node is dded to the tree. To keep trck of the events we use n event queue, denoted EQ, tht is common priority queue of events for the whole suffix tree. The priority of n event in EQ is equl to the length of the string α when the event hs to e processed. Events of type 1 nd 2 store pointer to ny lef in D(α). Events of type 3, i.e. tht two consecutive overlpping occurrences with index e 1 nd e 2, e 1 < e 2, terminte to overlp, store pointer to the lef e 1 in the suffix tree. For the lef e 1 in the suffix tree lso pointer to the event in EQ is stored. Events of type 4 stores pointer to the internl node in the suffix tree involved in the event. When the suffix tree is constructed ll events of type 4 re inserted into EQ. For node v in ST(S) the event hs priority L(v) nd stores pointer to v. The pointers re used to e le to decide which dt structure to updte. The priority queue EQ is implemented s tle with entries EQ[1]... EQ[ S ]. All events with priority x re stored in linked list in entry EQ[x]. Since the priorities of the events considered re monotonic decresing, it is sufficient to consider the entries of EQ in single scn strting t EQ[ S ]. The events re processed in order of the priority nd for events with the sme priority they re processed in the order s ove. Events of the sme type nd with the sme priority re processed in ritrry order. In the following we only look t one edge t the time when events of type 1, 2, nd 3 re tken cre of. Due to spce limittions mny lgorithmic detils re left out in the following. See [5] for detiled description of the lgorithm. 1. Excess chnge. The excess chnges for ll chunks t the sme time, nmely when α = i p(α) for i = 2, 3, 4,.... For ech chunk in CHUNKLIST(α) we will remove the chunk from D(α), recompute the excess nd nominl contriution sed on the numer of occurrences in the chunk, updte NCS(α), reinsert the chunk with the new excess nd finlly updte ZS(α). This is done s follows: First decide which chin ech chunk elongs to y serching the tree. Remove ech chunk from its chin y splitting the tree for the chin. Recompute the

10 excess for ech chunk nd reconstruct the tree. In the new tree the chin structure my hve chnged. Chunks for which the excess increses to two will e seprte chins, while chunks where the excess ecome less thn two my join two or three chins into one chin. NCS(α) nd ZS(α) re lwys kept updted during the processing of the event. If α = 2p(α) then insert n event of type 2 with priority 2p(α) into EQ, with pointer to ny lef in D(α). If α = ip(α) > 2p(α), then insert n event of type 1 with priority (i 1)p(α) into EQ, with pointer to ny lef in D(α). 2. Chunks ecome necklces. When α decreses to 2p ll chunks ecome necklces t the sme time. At this point ll chunks nd necklces tht overlp shll e joined into one necklce. Note tht ll chunks hve excess 0 or 1 when α = 2p nd since we first recompute the excess ll overlpping chunks nd necklces re in the sme chin. Hence, wht we hve to do is to join ll chunks nd necklces from left to right, in ech chin. This is done y first deciding for ech chunk which chin it elongs to. Next, for ech chin contining t lest one chunk, join ll chunks nd necklces from left to right. Updte NCS(α) nd ZS(α). 3. Necklce nd chin rek-up. When two consecutive occurrences of α with indices e 1 nd e 2 terminte to overlp this my cuse necklce or chin to rek up into two necklces or chins. If e 1 nd e 2 elong to the sme chin then the chin reks up in two chins. If e 1 nd e 2 elongs to the sme necklce then split oth the necklce nd the chin etween e 1 nd e 2. If e 1 nd e 2 elong to different necklces or chunks in the chin then split the chin etween the two sutrees including e 1 nd e 2 respectively. Updte NCS(α) nd ZS(α). 4. Merging t internl nodes. Let α e sustring such tht the locus of α is node v in the suffix tree. Then the lef-list, LL(v) for v is the union of the lef-lists for the sutrees elow v, hence t the nodes in the suffix tree the dt structures for the sutrees should e merged into one. We ssume tht the edges elow v re processed for α s descried ove. Let T 1,..., T t e the sutrees elow v in the suffix tree. We never merge more thn two dt structures t the time. If there re more thn two sutrees the merging is done in the following order: T = Merge(T, T i ), for i = 2,...,t, where T = T 1 to strt with. This cn lso e viewed s if the suffix tree is mde inry y replcing ll nodes of degree lrger thn 2 y inry tree with edges without lels. From now on we will descrie how to merge the dt structures for two sutrees. The merging will e done y inserting ll indices from the smller of the two lef-lists into the dt structure for the lrger one. Let T denote the 3-level serch tree to insert new indices in nd denote y e 1,..., e m the indices to insert, where e i < e i+1. The insertion is done y first splitting the tree T t ll positions e i for i = 1,...,m. The tree is then reconstructed from left to right t the sme time s the new indices re inserted in incresing order. Assume tht the tree is

11 reconstructed for ll indices, in oth trees, smller thn e i. The next step is to insert e i nd ll indices etween e i nd e i+1. This is done s follows: Check if the occurrence with index e i overlps ny occurrences to the left, i.e. n occurrence in the tree reconstructed so fr. Insert e i into the tree. If e i overlps with n occurrence lredy in the tree then check in wht wy this ffects the chin, chunk, nd necklce structure nd do the pproprite updtes. Do the corresponding check nd updtes when the tree to the right of e i (the tree for indices etween e i nd e i+1 ) is incorported, i.e. check if e i will cuse ny further chnges in the chin, chunk, nd necklce structure due to overlps to the right. Updte NCS(α) nd ZS(α). Every time, during the ove descried procedure, when two overlpping occurrences with indices e i nd e j, e i < e j, from different sutrees re encountered the event (e i, e j ) with priority e j e i is inserted into the event queue EQ nd the previous event, if ny, with pointer to e i is removed from EQ. Updte p(α) to e j e i if this is smller thn the current p(α) vlue. If α > 2p(α) then insert n event of type 1 with priority α /p(α) p(α) into EQ, with pointer to ny lef in D(α). 6 Anlysis Theorem 3. The miniml ugmented suffix tree, MAST(S), for string S of length n cn e constructed in time O(n log n) nd spce O(n). In the full version of the pper [5] we show tht the running time of the lgorithm in Section 5 is O(n log n). Here we only stte the min steps of the proof. The proof uses n mortiztion rgument, llowing ech edge to e processed in mortized constnt time, nd ech inry merge t node (in the inry version) of ST(S) of two lef-lists of sizes n 1 nd n 2, with n 1 n 2, in mortized time O(n 2 log n1+n2 n 2 ). From Lemm 1 it then follows tht the totl time for processing the internl nodes nd edges of ST(S) is O(n log n). Using Theorem 1 nd 2 we cn prove tht: Processing events of types 1 nd 2 tke time O(m log LL(v) m ), where m = CHUNKLIST(α). Processing n event of type 3 tkes time O(log c ), where c is the chin eing split. An event of type 4 hs processing time O(n 1 log n1+n2 n 1 ). Let v e node in the suffix tree nd let α e string with locus v or locus on the edge immeditely ove v. For the dt structure D(α) we define potentil Φ(D(α)). Let C e the set of chins stored in D(α), nd for chin c let c denote the numer of occurrences of α in c. We define the potentil of D(α) y Φ(D(α)) = Φ 1 (α) + Φ 2 (α) + c C Φ 3(c), where the rôle of Φ 1, Φ 2, nd Φ 3 is to ccount for the potentil required to e le to process events of type 1, 2, nd 3 respectively. For chunk, with leftmost occurrence of α t position i, consider the sustring S[i.. j] with mximl j nd S[i.. j] hving period p, where p = p(α) is the period of α. We denote the chunk green if nd only if α mod p j i+1 mod p. Otherwise the chunk is red. Let k denote the numer of chunks in D(α) nd let g denote the numer of green chunks in D(α).

12 We define Φ 1 (α) = 7g log v e g, Φ 2(α) = k log v e k, nd Φ 3(c) = 2 c log c 2, with the exceptions tht Φ 1 (α) = 0 if g = 0, nd Φ 2 (α) = 0 if k = 0. We cn prove tht processing events of type 1, 2, nd 3 relese sufficient potentil to py for the processing, while processing n event of type 4 increses the potentil y O(n 1 log n1+n2 n 1 ). By Lemm 1 the totl mortized time for hndling ll events is O(n log n). References 1. A. Apostolico nd A. Ehrenfeucht. Efficient detection of qusiperiodicities in strings. Theoreticl Computer Science, 119: , A. Apostolico nd F. P. Preprt. Optiml off-line detection of repetitions in string. Theoreticl Computer Science, 22: , A. Apostolico nd F. P. Preprt. Dt structures nd lgorithms for the string sttistics prolem. Algorithmic, 15: , G. S. Brodl, R. Lyngsø, C. N. S. Pedersen, nd J. Stoye. Finding mximl pirs with ounded gp. Journl of Discrete Algorithms, Specil Issue of Mtching Ptterns, 1(1):77 104, G. S. Brodl, R. B. Lyngsø, A. Östlin, nd C. N. S. Pedersen. Solving the string sttistics prolem in time O(n log n). Technicl Report RS-02-13, BRICS, Deprtment of Computer Science, University of Arhus, G. S. Brodl nd C. N. S. Pedersen. Finding mximl qusiperiodicities in strings. In Proc. 11th Comintoril Pttern Mtching, volume 1848 of Lecture Notes in Computer Science, pges Springer Verlg, Berlin, M. R. Brown nd R. E. Trjn. A fst merging lgorithm. Journl of the ACM, 26(2): , M. Frch. Optiml suffix tree construction with lrge lphets. In Proc. 38th Ann. Symp. on Foundtions of Computer Science (FOCS), pges , A. S. Frenkel nd J. Simpson. How mny squres cn string contin? Journl of Comintoril Theory, Series A, 82(1): , D. Gusfield. Algorithms on Strings, Trees nd Sequences: Computer Science nd Computtionl Biology. Cmridge University Press, K. Hoffmnn, K. Mehlhorn, P. Rosenstiehl, nd R. E. Trjn. Sorting Jordn sequences in liner time using level-linked serch trees. Informtion nd Control, 86(1-3): , S. Huddleston nd K. Mehlhorn. A new dt structure for representing sorted lists. Act Informtic, 17: , F. K. Hwng nd S. Lin. A simple lgorithm for merging two disjoint linerly ordered sets. SIAM Journl of Computing, 1(1):31 39, D. E. Knuth, J. H. Morris, nd V. R. Prtt. Fst pttern mtching in strings. SIAM Journl of Computing, 6: , E. M. McCreight. A spce-economicl suffix tree construction lgorithm. Journl of the ACM, 23(2): , K. Mehlhorn. Sorting nd Serching, volume 1 of Dt Structures nd Algorithms. Springer Verlg, Berlin, J. Stoye nd D. Gusfield. Simple nd flexile detection of contiguous repets using suffix tree. Theoreticl Computer Science, 270: , E. Ukkonen. On-line construction of suffix trees. Algorithmic, 14: , P. Weiner. Liner pttern mtching lgorithms. In Proc. 14th Symposium on Switching nd Automt Theory, pges 1 11, 1973.

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.) CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts

More information

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostt.wisc.edu Gols for Lecture Key concepts how lrge-scle lignment differs from the simple cse the

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Balanced binary search trees

Balanced binary search trees 02110 Inge Li Gørtz Overview Blnced binry serch trees: Red-blck trees nd 2-3-4 trees Amortized nlysis Dynmic progrmming Network flows String mtching String indexing Computtionl geometry Introduction to

More information

Formal Languages and Automata

Formal Languages and Automata Moile Computing nd Softwre Engineering p. 1/5 Forml Lnguges nd Automt Chpter 2 Finite Automt Chun-Ming Liu cmliu@csie.ntut.edu.tw Deprtment of Computer Science nd Informtion Engineering Ntionl Tipei University

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

On Suffix Tree Breadth

On Suffix Tree Breadth On Suffix Tree Bredth Golnz Bdkoeh 1,, Juh Kärkkäinen 2, Simon J. Puglisi 2,, nd Bell Zhukov 2, 1 Deprtment of Computer Science University of Wrwick Conventry, United Kingdom g.dkoeh@wrwick.c.uk 2 Helsinki

More information

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs Prm University, Mth. Deprtment Summry of lecture 9 Algorithms nd Dt Structures Disjoint sets Summry of this lecture: (CLR.1-3) Dt Structures for Disjoint sets: Union opertion Find opertion Mrco Pellegrini

More information

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /j.jda

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /j.jda Strikovsky, T. A., & Vildhøj, H. W. (2015). A suffix tree or not suffix tree? Journl of Discrete Algorithms, 32, 14-23. DOI: 10.1016/j.jd.2015.01.005 Peer reviewed version Link to pulished version (if

More information

The size of subsequence automaton

The size of subsequence automaton Theoreticl Computer Science 4 (005) 79 84 www.elsevier.com/locte/tcs Note The size of susequence utomton Zdeněk Troníček,, Ayumi Shinohr,c Deprtment of Computer Science nd Engineering, FEE CTU in Prgue,

More information

Lecture 09: Myhill-Nerode Theorem

Lecture 09: Myhill-Nerode Theorem CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives

More information

Dynamic Fully-Compressed Suffix Trees

Dynamic Fully-Compressed Suffix Trees Motivtion Dynmic FCST s Conclusions Dynmic Fully-Compressed Suffix Trees Luís M. S. Russo Gonzlo Nvrro Arlindo L. Oliveir INESC-ID/IST {lsr,ml}@lgos.inesc-id.pt Dept. of Computer Science, University of

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 utomt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Prolem (II) Chpter II.5.: Properties of Context Free Grmmrs (14) nton Setzer (Bsed on ook drft y J. V. Tucker nd K. Stephenson)

More information

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute Victor Admchik Dnny Sletor Gret Theoreticl Ides In Computer Science CS 5-25 Spring 2 Lecture 2 Mr 3, 2 Crnegie Mellon University Deterministic Finite Automt Finite Automt A mchine so simple tht you cn

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1 Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more

More information

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite

More information

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014 CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

3 Regular expressions

3 Regular expressions 3 Regulr expressions Given n lphet Σ lnguge is set of words L Σ. So fr we were le to descrie lnguges either y using set theory (i.e. enumertion or comprehension) or y n utomton. In this section we shll

More information

First Midterm Examination

First Midterm Examination 24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Converting Regular Expressions to Discrete Finite Automata: A Tutorial Converting Regulr Expressions to Discrete Finite Automt: A Tutoril Dvid Christinsen 2013-01-03 This is tutoril on how to convert regulr expressions to nondeterministic finite utomt (NFA) nd how to convert

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb. CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton 25. Finite Automt AUTOMATA AND LANGUAGES A system of computtion tht only hs finite numer of possile sttes cn e modeled using finite utomton A finite utomton is often illustrted s stte digrm d d d. d q

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2 CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

Where did dynamic programming come from?

Where did dynamic programming come from? Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf

More information

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018 CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35 Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

Homework Solution - Set 5 Due: Friday 10/03/08

Homework Solution - Set 5 Due: Friday 10/03/08 CE 96 Introduction to the Theory of Computtion ll 2008 Homework olution - et 5 Due: ridy 10/0/08 1. Textook, Pge 86, Exercise 1.21. () 1 2 Add new strt stte nd finl stte. Mke originl finl stte non-finl.

More information

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science CSCI 340: Computtionl Models Trnsition Grphs Chpter 6 Deprtment of Computer Science Relxing Restrints on Inputs We cn uild n FA tht ccepts only the word! 5 sttes ecuse n FA cn only process one letter t

More information

CHAPTER 1 Regular Languages. Contents

CHAPTER 1 Regular Languages. Contents Finite Automt (FA or DFA) CHAPTE 1 egulr Lnguges Contents definitions, exmples, designing, regulr opertions Non-deterministic Finite Automt (NFA) definitions, euivlence of NFAs nd DFAs, closure under regulr

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms Preview Greed Algorithms Greed Algorithms Coin Chnge Huffmn Code Greed lgorithms end to e simple nd strightforwrd. Are often used to solve optimiztion prolems. Alws mke the choice tht looks est t the moment,

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016 CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

The Knapsack Problem. COSC 3101A - Design and Analysis of Algorithms 9. Fractional Knapsack Problem. Fractional Knapsack Problem

The Knapsack Problem. COSC 3101A - Design and Analysis of Algorithms 9. Fractional Knapsack Problem. Fractional Knapsack Problem The Knpsck Prolem COSC A - Design nd Anlsis of Algorithms Knpsck Prolem Huffmn Codes Introduction to Grphs Mn of these slides re tken from Monic Nicolescu, Univ. of Nevd, Reno, monic@cs.unr.edu The - knpsck

More information

1.4 Nonregular Languages

1.4 Nonregular Languages 74 1.4 Nonregulr Lnguges The number of forml lnguges over ny lphbet (= decision/recognition problems) is uncountble On the other hnd, the number of regulr expressions (= strings) is countble Hence, ll

More information

1 From NFA to regular expression

1 From NFA to regular expression Note 1: How to convert DFA/NFA to regulr expression Version: 1.0 S/EE 374, Fll 2017 Septemer 11, 2017 In this note, we show tht ny DFA cn e converted into regulr expression. Our construction would work

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont. NFA DFA Exmple 3 CMSC 330: Orgniztion of Progrmming Lnguges NFA {B,D,E {A,E {C,D {E Finite Automt, con't. R = { {A,E, {B,D,E, {C,D, {E 2 Equivlence of DFAs nd NFAs Any string from {A to either {D or {CD

More information

Harvard University Computer Science 121 Midterm October 23, 2012

Harvard University Computer Science 121 Midterm October 23, 2012 Hrvrd University Computer Science 121 Midterm Octoer 23, 2012 This is closed-ook exmintion. You my use ny result from lecture, Sipser, prolem sets, or section, s long s you quote it clerly. The lphet is

More information

Name Ima Sample ASU ID

Name Ima Sample ASU ID Nme Im Smple ASU ID 2468024680 CSE 355 Test 1, Fll 2016 30 Septemer 2016, 8:35-9:25.m., LSA 191 Regrding of Midterms If you elieve tht your grde hs not een dded up correctly, return the entire pper to

More information

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages 5//6 Grmmr Automt nd Lnguges Regulr Grmmr Context-free Grmmr Context-sensitive Grmmr Prof. Mohmed Hmd Softwre Engineering L. The University of Aizu Jpn Regulr Lnguges Context Free Lnguges Context Sensitive

More information

First Midterm Examination

First Midterm Examination Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

Regular expressions, Finite Automata, transition graphs are all the same!!

Regular expressions, Finite Automata, transition graphs are all the same!! CSI 3104 /Winter 2011: Introduction to Forml Lnguges Chpter 7: Kleene s Theorem Chpter 7: Kleene s Theorem Regulr expressions, Finite Automt, trnsition grphs re ll the sme!! Dr. Neji Zgui CSI3104-W11 1

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

Looking for All Palindromes in a String

Looking for All Palindromes in a String Looking or All Plindromes in String Shih Jng Pn nd R C T Lee Deprtment o Computer Science nd Inormtion Engineering, Ntionl Chi-Nn University, Puli, Nntou Hsien,, Tiwn, ROC sjpn@lgdoccsiencnuedutw, rctlee@ncnuedutw

More information

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9. Regulr Expressions, Pumping Lemm, Right Liner Grmmrs Ling 106 Mrch 25, 2002 1 Regulr Expressions A regulr expression descries or genertes lnguge: it is kind of shorthnd for listing the memers of lnguge.

More information

Torsion in Groups of Integral Triangles

Torsion in Groups of Integral Triangles Advnces in Pure Mthemtics, 01,, 116-10 http://dxdoiorg/1046/pm011015 Pulished Online Jnury 01 (http://wwwscirporg/journl/pm) Torsion in Groups of Integrl Tringles Will Murry Deprtment of Mthemtics nd Sttistics,

More information

Homework 4. 0 ε 0. (00) ε 0 ε 0 (00) (11) CS 341: Foundations of Computer Science II Prof. Marvin Nakayama

Homework 4. 0 ε 0. (00) ε 0 ε 0 (00) (11) CS 341: Foundations of Computer Science II Prof. Marvin Nakayama CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 4 1. UsetheproceduredescriedinLemm1.55toconverttheregulrexpression(((00) (11)) 01) into n NFA. Answer: 0 0 1 1 00 0 0 11 1 1 01 0 1 (00)

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014 CS125 Lecture 12 Fll 2014 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

Section 4: Integration ECO4112F 2011

Section 4: Integration ECO4112F 2011 Reding: Ching Chpter Section : Integrtion ECOF Note: These notes do not fully cover the mteril in Ching, ut re ment to supplement your reding in Ching. Thus fr the optimistion you hve covered hs een sttic

More information

Bridging the gap: GCSE AS Level

Bridging the gap: GCSE AS Level Bridging the gp: GCSE AS Level CONTENTS Chpter Removing rckets pge Chpter Liner equtions Chpter Simultneous equtions 8 Chpter Fctors 0 Chpter Chnge the suject of the formul Chpter 6 Solving qudrtic equtions

More information

Lecture 2: January 27

Lecture 2: January 27 CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full

More information

Some Theory of Computation Exercises Week 1

Some Theory of Computation Exercises Week 1 Some Theory of Computtion Exercises Week 1 Section 1 Deterministic Finite Automt Question 1.3 d d d d u q 1 q 2 q 3 q 4 q 5 d u u u u Question 1.4 Prt c - {w w hs even s nd one or two s} First we sk whether

More information

1.3 Regular Expressions

1.3 Regular Expressions 56 1.3 Regulr xpressions These hve n importnt role in describing ptterns in serching for strings in mny pplictions (e.g. wk, grep, Perl,...) All regulr expressions of lphbet re 1.Ønd re regulr expressions,

More information

State Minimization for DFAs

State Minimization for DFAs Stte Minimiztion for DFAs Red K & S 2.7 Do Homework 10. Consider: Stte Minimiztion 4 5 Is this miniml mchine? Step (1): Get rid of unrechle sttes. Stte Minimiztion 6, Stte is unrechle. Step (2): Get rid

More information

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms The Minimum Lel Spnning Tree Prolem: Illustrting the Utility of Genetic Algorithms Yupei Xiong, Univ. of Mrylnd Bruce Golden, Univ. of Mrylnd Edwrd Wsil, Americn Univ. Presented t BAE Systems Distinguished

More information

DFA minimisation using the Myhill-Nerode theorem

DFA minimisation using the Myhill-Nerode theorem DFA minimistion using the Myhill-Nerode theorem Johnn Högerg Lrs Lrsson Astrct The Myhill-Nerode theorem is n importnt chrcteristion of regulr lnguges, nd it lso hs mny prcticl implictions. In this chpter,

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages Deprtment of Computer Science, Austrlin Ntionl University COMP2600 Forml Methods for Softwre Engineering Semester 2, 206 Assignment Automt, Lnguges, nd Computility Smple Solutions Finite Stte Automt nd

More information

Lecture 08: Feb. 08, 2019

Lecture 08: Feb. 08, 2019 4CS4-6:Theory of Computtion(Closure on Reg. Lngs., regex to NDFA, DFA to regex) Prof. K.R. Chowdhry Lecture 08: Fe. 08, 2019 : Professor of CS Disclimer: These notes hve not een sujected to the usul scrutiny

More information

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-* Regulr Expressions (RE) Regulr Expressions (RE) Empty set F A RE denotes the empty set Opertion Nottion Lnguge UNIX Empty string A RE denotes the set {} Alterntion R +r L(r ) L(r ) r r Symol Alterntion

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk out solving systems of liner equtions. These re prolems tht give couple of equtions with couple of unknowns, like: 6= x + x 7=

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations. Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

Fingerprint idea. Assume:

Fingerprint idea. Assume: Fingerprint ide Assume: We cn compute fingerprint f(p) of P in O(m) time. If f(p) f(t[s.. s+m 1]), then P T[s.. s+m 1] We cn compre fingerprints in O(1) We cn compute f = f(t[s+1.. s+m]) from f(t[s.. s+m

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

Lecture 9: LTL and Büchi Automata

Lecture 9: LTL and Büchi Automata Lecture 9: LTL nd Büchi Automt 1 LTL Property Ptterns Quite often the requirements of system follow some simple ptterns. Sometimes we wnt to specify tht property should only hold in certin context, clled

More information

Riemann Sums and Riemann Integrals

Riemann Sums and Riemann Integrals Riemnn Sums nd Riemnn Integrls Jmes K. Peterson Deprtment of Biologicl Sciences nd Deprtment of Mthemticl Sciences Clemson University August 26, 203 Outline Riemnn Sums Riemnn Integrls Properties Abstrct

More information

Quadratic Forms. Quadratic Forms

Quadratic Forms. Quadratic Forms Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte

More information

Things to Memorize: A Partial List. January 27, 2017

Things to Memorize: A Partial List. January 27, 2017 Things to Memorize: A Prtil List Jnury 27, 2017 Chpter 2 Vectors - Bsic Fcts A vector hs mgnitude (lso clled size/length/norm) nd direction. It does not hve fixed position, so the sme vector cn e moved

More information

4. GREEDY ALGORITHMS I

4. GREEDY ALGORITHMS I 4. GREEDY ALGORITHMS I coin chnging intervl scheduling scheduling to minimize lteness optiml cching Lecture slides by Kevin Wyne Copyright 2005 Person-Addison Wesley http://www.cs.princeton.edu/~wyne/kleinberg-trdos

More information

Lecture 3: Equivalence Relations

Lecture 3: Equivalence Relations Mthcmp Crsh Course Instructor: Pdric Brtlett Lecture 3: Equivlence Reltions Week 1 Mthcmp 2014 In our lst three tlks of this clss, we shift the focus of our tlks from proof techniques to proof concepts

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Tries and suffixes trees

Tries and suffixes trees Trie: A dt-structure for set of words Tries nd suffixes trees Alon Efrt Comuter Science Dertment University of Arizon All words over the lhet Σ={,,..z}. In the slides, let sy tht the lhet is only {,,c,d}

More information

QUADRATURE is an old-fashioned word that refers to

QUADRATURE is an old-fashioned word that refers to World Acdemy of Science Engineering nd Technology Interntionl Journl of Mthemticl nd Computtionl Sciences Vol:5 No:7 011 A New Qudrture Rule Derived from Spline Interpoltion with Error Anlysis Hdi Tghvfrd

More information

Random subgroups of a free group

Random subgroups of a free group Rndom sugroups of free group Frédérique Bssino LIPN - Lortoire d Informtique de Pris Nord, Université Pris 13 - CNRS Joint work with Armndo Mrtino, Cyril Nicud, Enric Ventur et Pscl Weil LIX My, 2015 Introduction

More information

GNFA GNFA GNFA GNFA GNFA

GNFA GNFA GNFA GNFA GNFA DFA RE NFA DFA -NFA REX GNFA Definition GNFA A generlize noneterministic finite utomton (GNFA) is grph whose eges re lele y regulr expressions, with unique strt stte with in-egree, n unique finl stte with

More information

Winter 2016 COMP-250: Introduction to Computer Science. Lecture 24, April 7, 2016

Winter 2016 COMP-250: Introduction to Computer Science. Lecture 24, April 7, 2016 Winter 2016 COMP-250: Introduction to Computer Science Lecture 24, April 7, 2016 Tries 1 2 3 4 5 Tries Atrie is tree-sed dt dte structure for storing strings in order to mke pttern mtching fster. Tries

More information

Designing Information Devices and Systems I Spring 2018 Homework 7

Designing Information Devices and Systems I Spring 2018 Homework 7 EECS 16A Designing Informtion Devices nd Systems I Spring 2018 omework 7 This homework is due Mrch 12, 2018, t 23:59. Self-grdes re due Mrch 15, 2018, t 23:59. Sumission Formt Your homework sumission should

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science CSCI 340: Computtionl Models Kleene s Theorem Chpter 7 Deprtment of Computer Science Unifiction In 1954, Kleene presented (nd proved) theorem which (in our version) sttes tht if lnguge cn e defined y ny

More information