2. Udi Manber, Gene Myers: Suffix arrays: A new method for online string searching, SIAM Journal on Computing 22:935-48,1993

Size: px
Start display at page:

Download "2. Udi Manber, Gene Myers: Suffix arrays: A new method for online string searching, SIAM Journal on Computing 22:935-48,1993"

Transcription

1 7 Suffix rrys This exposition ws developed y Clemens Gröpl nd Knut Reinert. It is sed on the following sources, which re ll recommended reding: 1. Simon J. Puglisi, W. F. Smyth, nd Andrew Turpin, A txonomy of suffix rry construction lgorithms, ACM Computing Surveys, Vol. 39, Issue 2, to pper (2007). [PST07] 2. Udi Mner, Gene Myers: Suffix rrys: A new method for online string serching, SIAM Journl on Computing 22:935-48, Ksi, Lee, Arimur, Arikw, Prk: Liner-Time Longest-Common-Prefix Computtion in Suffix Arrys nd Its Applictions, CPM Mohmed Irhim Aouelhod, Stefn Kurtz, Enno Ohleusch: Replcing suffix trees with enhnced suffix rrys. Journl of Discrete Algorithms 2 (2004) Dn Gusfield: Algorithms in strings, trees nd sequences, Cmridge, pges 94ff. 7.1 Introduction Exct string mtching is sic step used y mny lgorithms in computtionl iology: Given pttern P = P[1.. m], nd text S = S[1.. n], we wnt to find ll occurences of P in S. This cn redily e done with exct string mtching lgorithms in time O(m+n). These lgorithms perform some kind of preprocessing of the pttern. In this wy it is often possile to exclude portions of the text from considertion (e. g. the Horspool lgorithm cn shift the serch window y m positions if verifiction fils). But s long s m = O(1), the running time for this clss of lgorithms cnnot e o(n). In order to chieve truly suliner serch time, we hve to preprocess the text. Preprocessing the text is useful in scenrios where the text is reltively constnt over time (e. g. genome), nd we will serch for mny different ptterns. Even if the text is very long, we do not need to scn it completely for every query. The running time cn e s low s O(m + p), where p is the numer of occurrences. Here we will see lgorithms to chieve serch time of O(m + p + log n). In prctice, the extr log n fctor is counterlnced y good cching ehvior. In this lecture we introduce one such preprocessing, nmely the construction of suffix rry. Suffix rrys re closely relted to suffix trees. A good reference for suffix trees is the ook of Gusfield. In 1990, Mner nd Myers introduced suffix rrys s spce efficient lterntive to suffix trees. Both suffix trees nd suffix rrys require O(n) spce, ut wheres recent, tuned suffix tree implementtion requires Bytes per chrcter (Kurtz, 1999), for suffix rrys, s few s 5 + o(1) ytes re sufficient (with some tricks). Definition 1. Given text S of length n, the suffix rry for S, often denoted suft, is n rry of integers of rnge 1 to n specifying the lexicogrphic ordering of the suffixes of the string S. It will e convenient to ssume tht S[n] = $, where $ is smller thn ny other letter. Tht is, suft[j] = i if nd only if S[i.. n] is the j-th suffix of S in scending lexicogrphicl order. We will write S i := S[i.. n]. We will ssume tht n fits into 4 ytes of memory. (Tht is, n < 2 32 = ) Then the sic form of suffix rry needs only 4n ytes. The suffix rry cn e computed y sorting the suffixes, s illustrted in the following exmple.

2 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13: Exmple The text is S = $, n = 13. The suffix rry is: Suffixes Ordered suffixes i S i i suft[i] S suft[i] 1 $ 1 13 $ 2 $ 2 3 $ 3 $ 3 1 $ 4 $ 4 4 $ 5 $ 5 6 $ 6 $ 6 9 $ 7 $ 7 12 $ 8 $ 8 2 $ 9 $ 9 5 $ 10 $ 10 8 $ 11 $ $ 12 $ 12 7 $ 13 $ $ It is tempting to confuse suft [i] with S suft [i] since there is one-to-one correspondence, ut of course the two re completely different concepts. Compre this to the suffix tree, which is otined y merging common prefixes of the suffixes S i in trie. (Note: The string in the figure hs no triling $. Some suffixes re not numered; their pths led to internl nodes.) S = Why nother lgorithm? The suffix rry cn e constructed in (essentilly) 4n spce y sorting the suffix indices using ny sorting lgorithm. (Exercise: How much would simple quicksort cost?) But such n pproch fils to tke dvntge of the fct tht we re sorting collection of relted suffixes. We cnnot get n O(n) time lgorithm in this wy. Alterntively, we could first uild suffix tree in liner time, then trnsform the suffix tree into suffix rry in liner time (exercise: work out the detils), nd finlly discrd the suffix tree. Of course, sufficient memory hs to e ville to construct the suffix tree. Thus this pproch fils for lrge texts. Over the lst 15 yers or so, there hve een hundreds of reserch rticles pulished on the construction

3 7002 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13:45 nd ppliction of suffix trees nd suffix rrys. A recent survey on suffix rry construction lgorithms is [PST07]. In the introduction, Puglisi, Smyth, nd Turpin write: It hs een shown tht prcticl spce-efficient suffix rry construction lgorithms (SACAs) exist tht require worst-cse time liner in string length; SACAs exist tht re even fster in prctice, though with suprliner worstcse construction time requirements; ny prolem whose solution cn e computed using suffix trees is solvle with the sme symptotic complexity using suffix rrys. Thus suffix rrys hve ecome the dt structure of choice for mny, if not ll, of the string processing prolems to which suffix tree methodology is pplicle. [PST07] In [PST07] running times for 17 SACAs re listed. Tody the originl construction proposed y Mner nd Myers is out 30 times slower thn the fstest SACA known so fr. The rce is not finished yet, new lgorithms nd implementtions re eing developed nd it is hrd to predict where this will eventully led to. Therefore we will not discuss SACA in full detil in this lecture ut only mention few sic ides. One such ide is prefix douling. It is the fundment of the originl MM lgorithm (1990). A modified version y Lrsson nd Sdkne (1999) is only fctor 3 slower thn the currently est one. 7.4 Prefix douling In order to construct the suffix rry we hve to cleverly sort the n suffixes S 1,..., S n. A prefix-douling lgorithm will not sort the suffixes completely in single stge. Insted, it proceeds in log 2 (n + 1) stges. In the first stge the suffixes re rrnged into groups or uckets ccording to their first symol. Thus they re ordered with repect to their prefixes of length 1. We sy tht the suffixes re in h -order if they re ordered lexicogrphiclly ccording to the first h letters (= h nd < h re defined ccordingly). Inductively, the lgorithm prtitions the uckets of the preceeding stge ( h ) further y sorting ccording to twice the numer of symols ( 2h ). We will numer the stges 1, 2, 4, 8,... to indicte the numer of ffected symols. After the h-th stge, the suffixes re sorted ccording to h order, nd ll suffixes in ucket hve common prefix of length h.

4 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13: We re done when h n. Ech stge tkes O(n) time. Thus the totl running time is O(n log n). The key oservtion is: In order to refine the ordering of n h-ucket to 2h -order, it suffices to look t the h positions following the (common) prefix of length h; These positions re the prefixes of other suffixes nd hve een h -sorted lredy. This technique hs ecome known s prefix douling. Let us summrize this ide: Oservtion 2 (Krp, Miller, Rosenerg (1972)). Let S i nd S j e two suffixes elonging to the sme ucket fter the h-th step, tht is S i = h S j. We need to compre the next h symols. But the next h symols of S i (respectively, S j ) re exctly the first h symols of S i+h (respectively, S j+h ). By ssumption we lredy know the reltive order of S i+h nd S j+h ccording to h. For this pproch to work it is necessry tht we cn ccess the h -rnk of suffix (i. e., its position ccording to the h -order). Therefore the inverse of the current suft tle is stored in nother tle sufinv. These two tles (suft, sufinv) together mount to the 8n ytes required y the Mner-Myers lgorithm. 7.5 Serching After constructing our suffix rry we hve the tle suft which gives us in sorted order the suffixes of S. Suppose now we wnt to find ll instnces of string P = p 1,..., p m of length m < n in S. nd Let L P = min{k : P m S suft [k] or k = n + 1} R P = mx{k : S suft [k] m P or k = 0}. Since suft is in m order, it follows tht P mtches suffix S i if nd only if i = suft [k] for some k [L P, R P ]. Hence simple inry serch cn find L P nd R P. Ech comprison in the serch needs O(m) chrcter comprisons, nd hence we cn find ll instnces in the string in time O(m log n). This is the simple code piece to serch fo L P. (1) if P m S suft [1] (2) then L P = 1; (3) else if P > m S suft [n] (4) then L P = n + 1; (5) else (6) (L, R) = (1, n); (7) while R L > 1 do (8) M = (L + R)/2 ; (9) if P m S suft [M] (10) then R = M; (11) else L = M; (12) fi (13) od (14) L P = R; (15) fi (16) fi For exmple if we serch for P = c in the text S = cctt$ then L P = 3 nd R P = 4. We find the vlue L P nd R P respectively, y setting (L, R) to (1, n) nd chnging the orders of this intervl sed on the comprison with the suffix t position (L+R)/2 e.g. we find L P with the sequence: (1, 11) (1, 6) (1, 4) (1, 3) (2, 3). Hence L P = 3.

5 7004 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13:45 1 ctt$ 2 ctt$ 3 cctt$ 4 ctt$ 5 tt$ 6 t$ 7 cctt$ 8 ctt$ 9 tt$ 10 t$ 11 $ The inry serches ech need O(log n) steps. In ech step we need to compre m chrcters of the text nd the pttern in the m opertions. This leds to running time of O(m log n). Cn we do etter? While the inry serch continues, let L nd R denote the left nd right oundries of the current serch intervl. At the strt, L equls 1 nd R equls n. Then in ech itertion of the inry serch query is mde t loction M = (R + L)/2 of suft. We keep trck of the longest prefixes of S suft (L) nd S suft (R) tht mtch prefix of P. Let l nd r denote the prefix lengths respectively nd let mlr = min(l, r). Then we cn use the vlue mlr to ccelerte the lexicogrphicl comprison of P nd the suffix S suft [M]. Since suft is ordered, it is cler tht ll suffixes etween L nd R shre the sme prefix. Hence we cn strt the first comprison t position mlr + 1. In prctice this trick lredy rings the running time to O(m + log n) in most cses, however one cn construct exmples tht still need time O(m log n) (exercise). We cll n exmintion of chrcter of P redundnt if tht chrcter hs een exmined efore. The gol is to limit the numer of redundnt chrcter comprisons to O(1) per itertion of the inry serch. The use of mlr lone does not suffice: In the cse tht l r, ll chrcters in P from mlr + 1 to mx(l, r) will hve lredy een exmined. Thus ll comprisons to these chrcters re redundnt. We need wy to egin the comprisons t the mximum of l nd r. To do this we introduce the following definition. Definition 3. lcp (i, j) is the length of the longest common prefix of the suffixes specified in positions i nd j of suft. For exmple for S = ctt the lcp (1, 2) is the length of the longest common prefix of ct nd ct which is 2. With the help of the lcp informtion, we cn chieve our gol of one redundnt chrcter comprison per itertion of the serch. For now ssume tht we know lcp (i, j), i, j. How do we use the lcp informtion? In the cse of l = r we compre P to suft [M] s efore strting from position mlr + 1, since in this cse the minimum of l nd r is lso the mximum of the two nd no redundnt chrcter comprisons re mde. If l r, there re three cses. We ssume w.l.o.g. l > r. Cse 1: lcp (L, M) > l. Then the common prefix of the suffixes S suft [L] nd S suft [M] is longer thn the common prefix of P nd S suft [L]. Therefore, P grees with the suffix S suft [M] up through chrcter l. Or to put it differently, chrcters l + 1 of S suft [L] nd S suft [M] re identicl nd lexiclly less thn chrcter l + 1 of P. Hence ny possile strting position must strt to the right of M in suft. So in this cse no exmintion of P is needed. L is set to M nd l nd r remin unchnged. Cse 2: lcp (L, M) < l. Then the common prefix of suffix suft [L] nd suft [M] is smller thn the common prefix of suft [L] nd P.

6 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13: Therefore P grees with suft [M] up through chrcter lcp (L, M). The lcp (L, M) + 1 chrcters of P nd nd suft [L] re identicl nd lexiclly less thn the chrcter lcp (L, M) + 1 of suft [M]. Hence ny possile strting position must strt left of M in suft. So in this cse gin no exmintion of P is needed. R is set to M, r is chnged to lcp (L, M), nd l remins unchnged. Cse 3: lcp (L, M) = l. Then P grees with suft [M] up to chrcter l. The lgorithm then lexiclly compres P to suft [M] strting from position l + 1. In the usul mnner the outcome of the compre determines which of L nd R chnge long with the corresponding chnge of l nd r. Illustrtion of the three cses cse 1) cse 2) cse 3) P = c d e m n P = c d e m n P = c d e m n lcp(l,m) lcp(l,m) lcp(l,m) l l l L -> c d e f g... L -> c d e f g... L -> c d e f g... M -> c d e f g... M -> c d g g... M -> c d e g... R -> c w x y z... R -> c w x y z... R -> c w x y z... r r r Then the following theorem holds: Theorem 4. Using the lcp vlues, the serch lgorithm does t most O(m + log n) comprisons nd runs in tht time. Proof: Exercise. Use the fct tht neither l nor r decrese in the inry serch, nd find ound for the numer of redundnt comprisons per itertion of the inry serch. 7.6 Computing the lcp vlues We now know how to serch fst in suffix rry under the ssumption, tht we know the lcp vlues for ll pirs i, j. But how do we compute the lcp vlues? And which ones? Computing them ll would require too much time nd, worse, qudrtic spce! We will now first dicuss, which lcp vlues we relly need, nd then how to compute them. For the computtion give in more detil newer, simple O(n) lgorithm to compute the lcp vlues given the suffix rry suft. In the ppendix we lso sketch Myers proposl for computing the lcp vlues during the construction of the suffix rry. We first oserve tht indeed we only need the lcp vlues of L nd R tht we encounter in the inry serch for L P nd R P. However, the set of pirs (i, j) which cn e considered is contined in inry serch tree which does not depend on P, nd hs liner size. Oservtion 5. Only O(n) mny lcp vlues re needed for the lcp sed serch in suffix rry. Exmple: n = 9 (1,9) (1,5) (5,9) (1,3) (3,5) (5,7) (7,9) (1,2) (2,3) (3,4) (4,5) (5,6) (6,7) (7,8) (8,9) We get those vlues in two step procedure:

7 7006 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13:45 1. Compute the lcp vlues for pirs of suffixes djcent in suft using n rry height of size n. 2. For the fixed inry serch tree used in the serch for L P nd R P compute the lcp vlues for its internl nodes using the rry height. (exercise *) (*) The vlue t n internl node is the minimum of its successors (why?) Hence the essentil thing to do is to compute the rry height, i.e. the lcp vlues of djcent suffixes in suft. 7.7 The Ksi et l. lgorithm An elegnt, short lgorithm for computing the height rry in liner time is due to Toru Ksi, Gunho Lee, Hiroki Arimur, Setsuo Arikw, nd Kunsoo Prk (presented t CPM 2001). The rry height is defined y height (k) = lcp (S suft [k 1], S suft [k] ). Tht is, it contins the lcp vlues of ll djcent suffixes in the suffix rry suft. We cn compute the lcp vlues contined in the inry serch tree in liner time nd spce, once we hve height vlues. The Ksi et l. lgorithm uses the inverse of the suffix rry, tht is, the rry sufinv with the defining property sufinv [ suft [ i ]] = i. Clerly, sufinv cn e computed in one liner scn over suft, if it is not ville yet. It is importnt to keep the semntics of suft nd sufinv in mind. Perhps the following digrm is useful: Suffix S sufinv [ i ] = j i hs rnk j in lexicogrphic order suft [ j ] = i j-th lowest Suffix in lexicogrphic order is S i The lgorithm computes the height vlues of the suffixes S i in order of decresing length. Thus the min loop runs over i = 1,..., n. Let p := sufinv (i). The height vlue for S i depends on S i nd its predecessor in suft ; we hve with k := suft (p 1). height (p) = lcp (S suft (p 1), S suft (p) ) = lcp (S k, S i ), i p 1 p suft [i] k i The lgorithm keeps trck of the lst height vlue computed, h. Initilly, we hve h = 0. Then height (p) is computed in the stright-forwrd wy: (1) while S[i + h] = S[k + h] do (2) h++; (3) od (4) height [ sufinv [i]] = h; From now on, we ssume tht the lst h hs een computed correctly. Now the lgorithm proceeds to S i+1.

8 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13: But in fct height ( sufinv (i)) nd height ( sufinv (i+1)) re closely relted. Nmely, if h = height ( sufinv (i)) > 0, then lcp (S i, S suft [ sufinv [i] 1] ) = h > 0 nd hence, lcp (S i+1, S suft [ sufinv [i] 1]+1 ) = h 1. Moreover, implies ecuse the first letters were the sme. S i lex. S suft [ sufinv [i] 1] S i+1 lex. S suft [ sufinv [i] 1]+1, Now, how does this relte to height ( sufinv (i + 1))? Let p := sufinv (i + 1). By the preceeding oservtion, we hve found position q < p such tht lcp (S suft [q ], S suft [p ]) h 1, nmely, q := sufinv [ suft [ sufinv [i] 1] + 1]. But we cnnot ssert tht q is the immedite predecessor of p. i suft [i] p 1 k k = suft [ sufinv [i] 1] p i p = sufinv [i].. q k + 1 q = sufinv [ suft [ sufinv [i] 1] + 1]. (mye q < p 1) p i + 1 p = sufinv [i + 1] Yet the following oservtion helps. We hve h 1 lcp (S suft [q ], S suft [p ]) = min k [q,p 1] lcp (S suft [k], S suft [k+1] ) lcp (S suft [p 1], S suft [p ]) = height (p ). In other words, the next height vlue to e computed (elonging to S i+1 ) is t most one less thn the preceeding one (elonging to S i ). 7.8 The lgorithm The following lgorithm computes the rry height following the ove discussion in time O(n): (1) GetHeight(S, suft ) (2) for i = 1 to n do (3) sufinv [ suft [i]] = i; (4) od (5) h = 0; (6) for i = 1 to n do (7) if sufinv [i] > 1 (8) then (9) k = suft [ sufinv [i] 1]; (10) while S[i + h] = S[k + h] do (11) h++; (12) od (13) height [ sufinv [i]] = h; (14) if h > 0 then h = h 1; fi (15) fi (16) od

9 7008 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13:45 The ove lgorithm uses only liner time. In the loop in line 6 we iterte from 1 to n. In the loop is while loop in line 10 tht increses the height (i.e. the lcp vlue of djcent suffixes). Since the height is mximlly n nd since in line 14 we decrese h y t most 1 per itertion of the min loop, it follows tht the while loop cn increse h t most 2n times in totl. 7.9 Exmple The exmple ws prepred using Stefn Kurtz s progrms mkvtree nd vstree2tex, see (Sorry for index shifts!) i suft [i] height [i] S suft [i] 0 2 ctt$ 1 3 ctt$ cctt$ 3 4 ctt$ 4 6 tt$ 5 8 t$ 6 1 cctt$ 7 5 ctt$ 8 7 tt$ i sufinv [i] S i 0 2 cctt$ 1 6 cctt$ 2 0 ctt$ 3 1 ctt$ 4 3 ctt$ 5 7 ctt$ 6 4 tt$ 7 8 tt$ 8 5 t$ i = 0: sufinv [i] = 2, k = suft [ sufinv [i] 1] = suft [1] = 3. We compre S 0 nd S 3 nd get height [2] = 1. i suft [i] height [i] S suft [i] 0 2 ctt$ 1 3 ctt$ cctt$ 3 4 ctt$ 4 6 tt$ 5 8 t$ cctt$ 7 5 ctt$ 8 7 tt$ i sufinv [i] S i 0 2 cctt$ 1 6 cctt$ 2 0 ctt$ 3 1 ctt$ 4 3 ctt$ 5 7 ctt$ 6 4 tt$ 7 8 tt$ 8 5 t$ i = 1: sufinv [i] = 6, k = suft [ sufinv [i] 1] = suft [5] = 8. We compre S 1 nd S 8 nd get height [6] = 0. i suft [i] height [i] S suft [i] 0 2 -/- ctt$ 1 3 ctt$ cctt$ 3 4 ctt$ 4 6 tt$ 5 8 t$ cctt$ 7 5 ctt$ 8 7 tt$ i = 2: sufinv [i] = 0. There is no height vlue in the first row. i sufinv [i] S i 0 2 cctt$ 1 6 cctt$ 2 0 ctt$ 3 1 ctt$ 4 3 ctt$ 5 7 ctt$ 6 4 tt$ 7 8 tt$ 8 5 t$

10 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13: i suft [i] height [i] S suft [i] 0 2 ctt$ ctt$ cctt$ 3 4 ctt$ 4 6 tt$ 5 8 t$ cctt$ 7 5 ctt$ 8 7 tt$ i sufinv [i] S i 0 2 cctt$ 1 6 cctt$ 2 0 ctt$ 3 1 ctt$ 4 3 ctt$ 5 7 ctt$ 6 4 tt$ 7 8 tt$ 8 5 t$ i = 3: sufinv [i] = 1, k = suft [ sufinv [i] 1] = suft [0] = 2. We compre S 3 nd S 2 nd get height [1] = 2. i suft [i] height [i] S suft [i] 0 2 ctt$ ctt$ cctt$ ctt$ 4 6 tt$ 5 8 t$ cctt$ 7 5 ctt$ 8 7 tt$ i sufinv [i] S i 0 2 cctt$ 1 6 cctt$ 2 0 ctt$ 3 1 ctt$ 4 3 ctt$ 5 7 ctt$ 6 4 tt$ 7 8 tt$ 8 5 t$ i = 4: sufinv [i] = 3, k = suft [ sufinv [i] 1] = suft [2] = 0. We compre S 4 nd S 0. We strt t h = lcp (S 3, S 2 ) 1 = 1. Oserve tht S 4 > S 0 > S 3 in lex. order. We get height [3] = 3. i = 5: sufinv [i] = 7. We cn skip the first height [3] 1 = 3 1 = 2 letters from comprison. [...] The finl result is: i suft [i] height [i] S suft [i] 0 2 ctt$ ctt$ cctt$ ctt$ tt$ t$ cctt$ ctt$ tt$ t$ $ i sufinv [i] S i 0 2 cctt$ 1 6 cctt$ 2 0 ctt$ 3 1 ctt$ 4 3 ctt$ 5 7 ctt$ 6 4 tt$ 7 8 tt$ 8 5 t$ Our overll strtegy for constructing nd serching suffix rry could then e s follows: Construct the suffix rry for S in time O(n log n). (Liner time constructions re possile) Compute the height rry (for djcent positions) in liner time. Precompute the serch tree for the inry serch nd nnotte its internl nodes with lcp vlues in time O(n). (exercise) Support O(log n + m) queries y dpting the serches for L P nd R P.

11 7010 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13: Summry Suffix rrys re spce efficient lterntiv to suffix trees. They cn e uilt in time O(n log n). Simple serches cn e conducted in time O(m log n). However the simple mlr heuristic perofrms lredy well in prctice (time O(m + log n). The lcp vlues cn e computed in liner time, given suffix rry. Using the lcp vlues the serch in suffix rrys cn e speeded up to O(m + log n) Appendix: Mner-Myers lgorithm The Mner-Myers lgorithm stores the result in the tle suft nd, in ddition, uses in nother rry Bh of oolen vlues to demrcte the prtitioning of the suffix rry into uckets. Ech ucket initilly holds the suffixes with the sme first symol. The lgorithm uses some uxiliry oolen tles. These re stored s higher-order its in the other tles nd thus do not require dditionl memory lloction. However, the rnge of fesile n is reduced to 2 31 if this implementtion technique is used. The lgorithm needs (essentilly) 8n ytes nd runs in O(n log n) time. [PST07] Attention: There is n index shift in the following presenttion, suft strts t position 0. If we look t the exmple cctt$, we hve the following fter stge 1 (extr spce seprtes the h-uckets): i Bh [i] suft [i] 0 1 0=cctt$ 1 0 3=ctt$ 2 0 4=ctt$ 3 0 6=tt$ 4 0 8=t$ 5 1 2=ctt$ 6 1 1=cctt$ 7 0 5=ctt$ 8 1 7=tt$ 9 0 9=t$ =$

12 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13: The ide is now the following: Let S i e the first suffix in the first ucket (i.e. suft [0] = i), nd consider S i h. Since S i strts with the smllest h-symol string, S i h should e the first in its 2h ucket. Hence we move S i h to the eginning of its ucket nd mrk this fct. Rememer this: The lgorithm scns the suffixes S i s they pper in h -order. For ech S i, it moves S i h to the next ville plce in its h-ucket. Altogether, we mintin three integer rrys suft, sufinv nd count, nd two oolen rrys Bh nd B2h, ll with n + 1 elements. At the strt of stge h, suft [i] contins the strt position of the i-th smllest suffix (ccording to the first h symols). sufinv [i] is the inverse of suft, i. e. suft [ sufinv [ i ] ] = i. Bh [i] is 1 iff suft [i] contins the leftmost suffix of n h-ucket. (The ctul implementtion of count, Bh, nd B2h uses its nd currently unused entries from suft nd sufinv.) If we look t the exmple cctt$, we hve the following: i Bh [i] sufinv [i] suft [i] =cctt$ =ctt$ =ctt$ =tt$ =t$ =ctt$ =cctt$ =ctt$ =tt$ =t$ =$ In stge 2h we reset sufinv [i] to point to the leftmost cell of the h-ucket contining the i-th suffix, rther thn to the suffix s precise plce in the ucket. In our exmple we get: i Bh [i] sufinv [i] suft [i] =cctt$ =ctt$ =ctt$ =tt$ =t$ =ctt$ =cctt$ =ctt$ =tt$ =t$ =$ In ech douling step, suft is scnned in incresing order, one ucket t time. Let l nd r mrk the left nd right oundry of the h-ucket currently eing scnned. For every l i r, we do the following:

13 7012 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13:45 1. Let T i := suft [i] h. (If T i is negtive we do nothing.) 2. Increment count [ sufinv [T i ]]. 3. Set sufinv [T i ] = sufinv [T i ] + count [ sufinv [T i ]] Mrk this y setting B2h [ sufinv [T i ]] to 1. Now sufinv [i] is correct with respect to 2h. The old h -ordering is still ville in suft. The suft is updted t the end of the 2h stge. In the following exmple, we show the future positions of the suffixes field new st (not used y the lgorithm). We indicte the current position with *. The uxiliry rry count is initilized to 0 for ll i. After the initiliztion: Nothing hppens since 0 1 < 0. i Bh [i] B2h [i] count [i] sufinv [i] new st[i] suft [i] * =cctt$ =ctt$ =ctt$ =tt$ =t$ =ctt$ =cctt$ =ctt$ =tt$ =t$ =$ i Bh [i] B2h [i] count [i] sufinv [i] new st[i] suft [i] =cctt$ * =ctt$ =ctt$ =tt$ =t$ =ctt$ =cctt$ =ctt$ =tt$ =t$ =$ S 2 is moved to the front of its ucket, i. e. it stys were it is: T 1 = 2 nd sufinv [2] = 5, hence increment count [5], set sufinv [2] = 5 + count [5] 1 = 5, nd B2h [5] = 1. i Bh [i] B2h [i] count [i] sufinv [i] new st[i] suft [i] =cctt$ =ctt$ * =ctt$ =tt$ =t$ =ctt$ =cctt$ =ctt$ =tt$ =t$ =$

14 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13: S 3 is moved to the front of its ucket, i. e. to position 0: T 2 = 3 nd sufinv [3] = 0, hence increment count [0], set sufinv [3] = 0 + count [0] 1 = 0, nd B2h [0] = 1. i Bh [i] B2h [i] count [i] sufinv [i] new st[i] suft [i] =cctt$ =ctt$ =ctt$ * =tt$ =t$ =ctt$ =cctt$ =ctt$ =tt$ =t$ =$ S 5 is moved to the front of its ucket, i. e. to position 6: T 3 = 5 nd sufinv [5] = 6, hence increment count [6], set sufinv [5] = 6 + count [6] 1 = 6, nd B2h [6] = 1. i Bh [i] B2h [i] count [i] sufinv [i] new st[i] suft [i] =cctt$ =ctt$ =ctt$ =tt$ * =t$ =ctt$ =cctt$ =ctt$ =tt$ =t$ =$ S 7 is moved to the front of its ucket, i. e. to position 8: T 4 = 7 nd sufinv [7] = 8, hence increment count [8], set sufinv [7] = 8 + count [8] 1 = 8, nd B2h [8] = 1. Now we hve scnned the first ucket. Next we scn the ucket gin, find ll moved suffixes in ll uckets nd updte the B2h itvector so tht it points only to the leftmost positions of the 2h-uckets. To do tht B2h is set to flse in the intervl [ sufinv [] + 1, 1] where is every position mrked in B2h nd = min { j : j > sufinv [] nd( Bh [j] or not B2h [j]) }. It is cler tht the left order preserves the leftmost it set. The definition of the right order prevents us from resetting order of n djcent 2h ucket, ut ensures the cncelling of ll unwnted its. In our exmple nothing hppens, since ll moved suffixes were put t the eginning of new ucket. This scn updtes the su f inv nd B2h tles nd mkes them consistent with the 2h order. At the end of ech stge fter ll uckets re scnned, we updte the suft rry using the sufinv rry: For ll i: suft [ sufinv [ i ] ] := i. The next step shows tht indeed the order of S 1 nd S 5 is chnged. S 5 ws investigted during the scn of the first ucket nd put to the eginning of its 2h -ucket. Also, the B2h vector chnges now in the second scnning step.

15 7014 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13:45 i Bh [i] B2h [i] count [i] sufinv [i] new st[i] suft [i] =cctt$ =ctt$ =ctt$ =tt$ =t$ * =ctt$ =cctt$ =ctt$ =tt$ =t$ =$ Now S 1 is put t the second position of its ucket: T 5 = 1 nd sufinv [1] = 6, hence increment count [6], set sufinv [1] = 6 + count [6] 1 = 7, nd B2h [7] = 1. i Bh [i] B2h [i] count [i] sufinv [i] new st[i] suft [i] =cctt$ =ctt$ =ctt$ =tt$ =t$ =ctt$ * =cctt$ =ctt$ =tt$ =t$ =$ Now S 0 is put t the second position of the first ucket: T 6 = 0 nd sufinv [0] = 0, hence increment count [0], set sufinv [0] = 0 + count [0] 1 = 1, nd B2h [1] = 1. i Bh [i] B2h [i] count [i] sufinv [i] new st[i] suft [i] =cctt$ =ctt$ =ctt$ =tt$ =t$ =ctt$ =cctt$ * =ctt$ =tt$ =t$ =$ Now S 4 is put t the third position of the first ucket: T 7 = 4 nd sufinv [4] = 0, hence increment count [0], set sufinv [4] = 0 + count [0] 1 = 2, nd B2h [2] = 1. The ucket is finished. We scn it gin to updte B2h. B2h [2] is set to 0, since two suffixes with second chrcter c were moved, ut B2h should only mrk the eginning of the 2h-ucket. The construction shown cn clerly e done in time O(n log n) nd O(n) spce. In the originl pper Myers descries smll modifiction of the construction phse which leds to n O(n) expected time lgorithm t the expense of nother n ytes. The ide is to store for ll suffixes S i their prefixes of length T = log Σ n s T digit rdix- Σ numers. Then insted of performing the rdix sort on the first symol of the suffixes, we perform it on this rry, which cn e done in time O(n) since our choice of T gurntees tht ll integers re less thn n. Hence the se cse of the sort hs een extended from 1 to T.

16 Suffix Arrys, y Clemens Gröpl, Knut Reinert, My 9, 2011, 13: It cn e shown tht in the expected cse there is only constnt numer of dditionl rounds tht need to e performed Appendix: Computing the lcp vlues long with the MM lgorithm This cn e done during the construction of the suffix rry, without dditionl overhed, or lterntively in liner time with scn over the suffix rry. In Myers lgorithm the computtion of the lcp vlues cn e done during the construction of the suffix rry without dditionl time overhed nd with n dditionl n + 1 integers. The key ide is the following. Assume tht fter stge h of the sort we know the lcp s etween suffixes in djcent uckets (fter the first stge, the lcp s etween suffixes in djcent uckets re 0). At stge 2h the uckets re prtitioned ccording to 2h symols. Thus, the lcp s etween suffixes in newly djcent uckets must e t lest h nd t most 2h 1. Furthermore if S p nd S q re in the sme h-ucket, ut in distinct 2h-uckets, then lcp (S p, S q ) = h + lcp (S p+h, S q+h ) nd lcp (S p+h, S q+h ) < h. The prolem is tht we only hve the lcp s etween suffixes in djcent uckets, nd S p+h nd S q+h my not e in djcent uckets. However, if S suft [i] nd S suft [j] with i < j hve n lcp less thn h nd suft is in h order, then their lcp is the minimum of the lcp s of every djcent pir of suffixes etween suft [i] nd suft [ j]. Tht is lcp (S suft [i], S suft [j] ) = min k [i,j 1] { lcp (S suft [k], S suft [k+1] } Using the ove formul to compute the lcp vlues directly would require too much time. And mintining the lcp for every pir of suffixes would require too much spce. By using lnced tree tht records the minimum pirwise lcp s over collection of intervls of the suffix rry, we cn determine the lcp etween ny two suffixes in O(log n) time (which is sufficient for Myer s online construction). Since there re only n internl leves in the tree, for which the lcp hs to e computed, we spend totl of O(n log n) time to precompute the lcp vlues.

1 APL13: Suffix Arrays: more space reduction

1 APL13: Suffix Arrays: more space reduction 1 APL13: Suffix Arrys: more spce reduction In Section??, we sw tht when lphbet size is included in the time nd spce bounds, the suffix tree for string of length m either requires Θ(m Σ ) spce or the minimum

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014 CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

First Midterm Examination

First Midterm Examination 24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet

More information

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs Prm University, Mth. Deprtment Summry of lecture 9 Algorithms nd Dt Structures Disjoint sets Summry of this lecture: (CLR.1-3) Dt Structures for Disjoint sets: Union opertion Find opertion Mrco Pellegrini

More information

Fingerprint idea. Assume:

Fingerprint idea. Assume: Fingerprint ide Assume: We cn compute fingerprint f(p) of P in O(m) time. If f(p) f(t[s.. s+m 1]), then P T[s.. s+m 1] We cn compre fingerprints in O(1) We cn compute f = f(t[s+1.. s+m]) from f(t[s.. s+m

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb. CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2 CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms The Minimum Lel Spnning Tree Prolem: Illustrting the Utility of Genetic Algorithms Yupei Xiong, Univ. of Mrylnd Bruce Golden, Univ. of Mrylnd Edwrd Wsil, Americn Univ. Presented t BAE Systems Distinguished

More information

On Suffix Tree Breadth

On Suffix Tree Breadth On Suffix Tree Bredth Golnz Bdkoeh 1,, Juh Kärkkäinen 2, Simon J. Puglisi 2,, nd Bell Zhukov 2, 1 Deprtment of Computer Science University of Wrwick Conventry, United Kingdom g.dkoeh@wrwick.c.uk 2 Helsinki

More information

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2016 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 9 1. (4pts) ((p q) (q r)) (p r), prove tutology using truth tles. p

More information

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont. NFA DFA Exmple 3 CMSC 330: Orgniztion of Progrmming Lnguges NFA {B,D,E {A,E {C,D {E Finite Automt, con't. R = { {A,E, {B,D,E, {C,D, {E 2 Equivlence of DFAs nd NFAs Any string from {A to either {D or {CD

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

10. AREAS BETWEEN CURVES

10. AREAS BETWEEN CURVES . AREAS BETWEEN CURVES.. Ares etween curves So res ove the x-xis re positive nd res elow re negtive, right? Wrong! We lied! Well, when you first lern out integrtion it s convenient fiction tht s true in

More information

Homework Solution - Set 5 Due: Friday 10/03/08

Homework Solution - Set 5 Due: Friday 10/03/08 CE 96 Introduction to the Theory of Computtion ll 2008 Homework olution - et 5 Due: ridy 10/0/08 1. Textook, Pge 86, Exercise 1.21. () 1 2 Add new strt stte nd finl stte. Mke originl finl stte non-finl.

More information

Lecture 3: Equivalence Relations

Lecture 3: Equivalence Relations Mthcmp Crsh Course Instructor: Pdric Brtlett Lecture 3: Equivlence Reltions Week 1 Mthcmp 2014 In our lst three tlks of this clss, we shift the focus of our tlks from proof techniques to proof concepts

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

4. GREEDY ALGORITHMS I

4. GREEDY ALGORITHMS I 4. GREEDY ALGORITHMS I coin chnging intervl scheduling scheduling to minimize lteness optiml cching Lecture slides by Kevin Wyne Copyright 2005 Person-Addison Wesley http://www.cs.princeton.edu/~wyne/kleinberg-trdos

More information

First Midterm Examination

First Midterm Examination Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does

More information

CS 330 Formal Methods and Models

CS 330 Formal Methods and Models CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2017 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 2 1. Prove ((( p q) q) p) is tutology () (3pts) y truth tle. p q p q

More information

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17

Discrete Mathematics and Probability Theory Spring 2013 Anant Sahai Lecture 17 EECS 70 Discrete Mthemtics nd Proility Theory Spring 2013 Annt Shi Lecture 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion,

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk out solving systems of liner equtions. These re prolems tht give couple of equtions with couple of unknowns, like: 6= x + x 7=

More information

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9. Regulr Expressions, Pumping Lemm, Right Liner Grmmrs Ling 106 Mrch 25, 2002 1 Regulr Expressions A regulr expression descries or genertes lnguge: it is kind of shorthnd for listing the memers of lnguge.

More information

Section 4: Integration ECO4112F 2011

Section 4: Integration ECO4112F 2011 Reding: Ching Chpter Section : Integrtion ECOF Note: These notes do not fully cover the mteril in Ching, ut re ment to supplement your reding in Ching. Thus fr the optimistion you hve covered hs een sttic

More information

Formal languages, automata, and theory of computation

Formal languages, automata, and theory of computation Mälrdlen University TEN1 DVA337 2015 School of Innovtion, Design nd Engineering Forml lnguges, utomt, nd theory of computtion Thursdy, Novemer 5, 14:10-18:30 Techer: Dniel Hedin, phone 021-107052 The exm

More information

Section 6: Area, Volume, and Average Value

Section 6: Area, Volume, and Average Value Chpter The Integrl Applied Clculus Section 6: Are, Volume, nd Averge Vlue Are We hve lredy used integrls to find the re etween the grph of function nd the horizontl xis. Integrls cn lso e used to find

More information

The size of subsequence automaton

The size of subsequence automaton Theoreticl Computer Science 4 (005) 79 84 www.elsevier.com/locte/tcs Note The size of susequence utomton Zdeněk Troníček,, Ayumi Shinohr,c Deprtment of Computer Science nd Engineering, FEE CTU in Prgue,

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations. Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one

More information

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives Block #6: Properties of Integrls, Indefinite Integrls Gols: Definition of the Definite Integrl Integrl Clcultions using Antiderivtives Properties of Integrls The Indefinite Integrl 1 Riemnn Sums - 1 Riemnn

More information

Harvard University Computer Science 121 Midterm October 23, 2012

Harvard University Computer Science 121 Midterm October 23, 2012 Hrvrd University Computer Science 121 Midterm Octoer 23, 2012 This is closed-ook exmintion. You my use ny result from lecture, Sipser, prolem sets, or section, s long s you quote it clerly. The lphet is

More information

Surface maps into free groups

Surface maps into free groups Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The

More information

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostt.wisc.edu Gols for Lecture Key concepts how lrge-scle lignment differs from the simple cse the

More information

Where did dynamic programming come from?

Where did dynamic programming come from? Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf

More information

C Dutch System Version as agreed by the 83rd FIDE Congress in Istanbul 2012

C Dutch System Version as agreed by the 83rd FIDE Congress in Istanbul 2012 04.3.1. Dutch System Version s greed y the 83rd FIDE Congress in Istnul 2012 A Introductory Remrks nd Definitions A.1 Initil rnking list A.2 Order See 04.2.B (Generl Hndling Rules - Initil order) For pirings

More information

Interpreting Integrals and the Fundamental Theorem

Interpreting Integrals and the Fundamental Theorem Interpreting Integrls nd the Fundmentl Theorem Tody, we go further in interpreting the mening of the definite integrl. Using Units to Aid Interprettion We lredy know tht if f(t) is the rte of chnge of

More information

Improper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows:

Improper Integrals. The First Fundamental Theorem of Calculus, as we ve discussed in class, goes as follows: Improper Integrls The First Fundmentl Theorem of Clculus, s we ve discussed in clss, goes s follows: If f is continuous on the intervl [, ] nd F is function for which F t = ft, then ftdt = F F. An integrl

More information

Finite Automata-cont d

Finite Automata-cont d Automt Theory nd Forml Lnguges Professor Leslie Lnder Lecture # 6 Finite Automt-cont d The Pumping Lemm WEB SITE: http://ingwe.inghmton.edu/ ~lnder/cs573.html Septemer 18, 2000 Exmple 1 Consider L = {ww

More information

Name Ima Sample ASU ID

Name Ima Sample ASU ID Nme Im Smple ASU ID 2468024680 CSE 355 Test 1, Fll 2016 30 Septemer 2016, 8:35-9:25.m., LSA 191 Regrding of Midterms If you elieve tht your grde hs not een dded up correctly, return the entire pper to

More information

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms Preview Greed Algorithms Greed Algorithms Coin Chnge Huffmn Code Greed lgorithms end to e simple nd strightforwrd. Are often used to solve optimiztion prolems. Alws mke the choice tht looks est t the moment,

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

Designing Information Devices and Systems I Spring 2018 Homework 7

Designing Information Devices and Systems I Spring 2018 Homework 7 EECS 16A Designing Informtion Devices nd Systems I Spring 2018 omework 7 This homework is due Mrch 12, 2018, t 23:59. Self-grdes re due Mrch 15, 2018, t 23:59. Sumission Formt Your homework sumission should

More information

(e) if x = y + z and a divides any two of the integers x, y, or z, then a divides the remaining integer

(e) if x = y + z and a divides any two of the integers x, y, or z, then a divides the remaining integer Divisibility In this note we introduce the notion of divisibility for two integers nd b then we discuss the division lgorithm. First we give forml definition nd note some properties of the division opertion.

More information

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n )

Computing the Optimal Global Alignment Value. B = n. Score of = 1 Score of = a a c g a c g a. A = n. Classical Dynamic Programming: O(n ) Alignment Grph Alignment Mtrix Computing the Optiml Globl Alignment Vlue An Introduction to Bioinformtics Algorithms A = n c t 2 3 c c 4 g 5 g 6 7 8 9 B = n 0 c g c g 2 3 4 5 6 7 8 t 9 0 2 3 4 5 6 7 8

More information

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17

Discrete Mathematics and Probability Theory Summer 2014 James Cook Note 17 CS 70 Discrete Mthemtics nd Proility Theory Summer 2014 Jmes Cook Note 17 I.I.D. Rndom Vriles Estimting the is of coin Question: We wnt to estimte the proportion p of Democrts in the US popultion, y tking

More information

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1.

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1. Mth Anlysis CP WS 4.X- Section 4.-4.4 Review Complete ech question without the use of grphing clcultor.. Compre the mening of the words: roots, zeros nd fctors.. Determine whether - is root of 0. Show

More information

Section 6.1 INTRO to LAPLACE TRANSFORMS

Section 6.1 INTRO to LAPLACE TRANSFORMS Section 6. INTRO to LAPLACE TRANSFORMS Key terms: Improper Integrl; diverge, converge A A f(t)dt lim f(t)dt Piecewise Continuous Function; jump discontinuity Function of Exponentil Order Lplce Trnsform

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

New data structures to reduce data size and search time

New data structures to reduce data size and search time New dt structures to reduce dt size nd serch time Tsuneo Kuwbr Deprtment of Informtion Sciences, Fculty of Science, Kngw University, Hirtsuk-shi, Jpn FIT2018 1D-1, No2, pp1-4 Copyright (c)2018 by The Institute

More information

5: The Definite Integral

5: The Definite Integral 5: The Definite Integrl 5.: Estimting with Finite Sums Consider moving oject its velocity (meters per second) t ny time (seconds) is given y v t = t+. Cn we use this informtion to determine the distnce

More information

Algorithms in Computational. Biology. More on BWT

Algorithms in Computational. Biology. More on BWT Algorithms in Computtionl Biology More on BWT tody Plese Lst clss! don't forget to submit And by next (vi emil, repo ) implementtion week or shre prgectfltw get Not I would like reding overview! Discuss

More information

Lecture 2: January 27

Lecture 2: January 27 CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full

More information

Linear Inequalities. Work Sheet 1

Linear Inequalities. Work Sheet 1 Work Sheet 1 Liner Inequlities Rent--Hep, cr rentl compny,chrges $ 15 per week plus $ 0.0 per mile to rent one of their crs. Suppose you re limited y how much money you cn spend for the week : You cn spend

More information

Math 1B, lecture 4: Error bounds for numerical methods

Math 1B, lecture 4: Error bounds for numerical methods Mth B, lecture 4: Error bounds for numericl methods Nthn Pflueger 4 September 0 Introduction The five numericl methods descried in the previous lecture ll operte by the sme principle: they pproximte the

More information

Torsion in Groups of Integral Triangles

Torsion in Groups of Integral Triangles Advnces in Pure Mthemtics, 01,, 116-10 http://dxdoiorg/1046/pm011015 Pulished Online Jnury 01 (http://wwwscirporg/journl/pm) Torsion in Groups of Integrl Tringles Will Murry Deprtment of Mthemtics nd Sttistics,

More information

State Minimization for DFAs

State Minimization for DFAs Stte Minimiztion for DFAs Red K & S 2.7 Do Homework 10. Consider: Stte Minimiztion 4 5 Is this miniml mchine? Step (1): Get rid of unrechle sttes. Stte Minimiztion 6, Stte is unrechle. Step (2): Get rid

More information

Quadratic Forms. Quadratic Forms

Quadratic Forms. Quadratic Forms Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte

More information

Lecture 09: Myhill-Nerode Theorem

Lecture 09: Myhill-Nerode Theorem CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives

More information

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018 DATA620006 魏忠钰 Serch I Mrch 7 th, 2018 Outline Serch Problems Uninformed Serch Depth-First Serch Bredth-First Serch Uniform-Cost Serch Rel world tsk - Pc-mn Serch problems A serch problem consists of:

More information

Chapter 0. What is the Lebesgue integral about?

Chapter 0. What is the Lebesgue integral about? Chpter 0. Wht is the Lebesgue integrl bout? The pln is to hve tutoril sheet ech week, most often on Fridy, (to be done during the clss) where you will try to get used to the ides introduced in the previous

More information

Parsing and Pattern Recognition

Parsing and Pattern Recognition Topics in IT Prsing nd Pttern Recognition Week Context-Free Prsing College of Informtion Science nd Engineering Ritsumeikn University this week miguity in nturl lnguge in mchine lnguges top-down, redth-first

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Section 6.1 Definite Integral

Section 6.1 Definite Integral Section 6.1 Definite Integrl Suppose we wnt to find the re of region tht is not so nicely shped. For exmple, consider the function shown elow. The re elow the curve nd ove the x xis cnnot e determined

More information

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages Deprtment of Computer Science, Austrlin Ntionl University COMP2600 Forml Methods for Softwre Engineering Semester 2, 206 Assignment Automt, Lnguges, nd Computility Smple Solutions Finite Stte Automt nd

More information

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2.

Suppose we want to find the area under the parabola and above the x axis, between the lines x = 2 and x = -2. Mth 43 Section 6. Section 6.: Definite Integrl Suppose we wnt to find the re of region tht is not so nicely shped. For exmple, consider the function shown elow. The re elow the curve nd ove the x xis cnnot

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

DFA minimisation using the Myhill-Nerode theorem

DFA minimisation using the Myhill-Nerode theorem DFA minimistion using the Myhill-Nerode theorem Johnn Högerg Lrs Lrsson Astrct The Myhill-Nerode theorem is n importnt chrcteristion of regulr lnguges, nd it lso hs mny prcticl implictions. In this chpter,

More information

Lecture Note 9: Orthogonal Reduction

Lecture Note 9: Orthogonal Reduction MATH : Computtionl Methods of Liner Algebr 1 The Row Echelon Form Lecture Note 9: Orthogonl Reduction Our trget is to solve the norml eution: Xinyi Zeng Deprtment of Mthemticl Sciences, UTEP A t Ax = A

More information

Some Theory of Computation Exercises Week 1

Some Theory of Computation Exercises Week 1 Some Theory of Computtion Exercises Week 1 Section 1 Deterministic Finite Automt Question 1.3 d d d d u q 1 q 2 q 3 q 4 q 5 d u u u u Question 1.4 Prt c - {w w hs even s nd one or two s} First we sk whether

More information

Math 8 Winter 2015 Applications of Integration

Math 8 Winter 2015 Applications of Integration Mth 8 Winter 205 Applictions of Integrtion Here re few importnt pplictions of integrtion. The pplictions you my see on n exm in this course include only the Net Chnge Theorem (which is relly just the Fundmentl

More information

Lecture 1. Functional series. Pointwise and uniform convergence.

Lecture 1. Functional series. Pointwise and uniform convergence. 1 Introduction. Lecture 1. Functionl series. Pointwise nd uniform convergence. In this course we study mongst other things Fourier series. The Fourier series for periodic function f(x) with period 2π is

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

19 Optimal behavior: Game theory

19 Optimal behavior: Game theory Intro. to Artificil Intelligence: Dle Schuurmns, Relu Ptrscu 1 19 Optiml behvior: Gme theory Adversril stte dynmics hve to ccount for worst cse Compute policy π : S A tht mximizes minimum rewrd Let S (,

More information

Lecture 3. Introduction digital logic. Notes. Notes. Notes. Representations. February Bern University of Applied Sciences.

Lecture 3. Introduction digital logic. Notes. Notes. Notes. Representations. February Bern University of Applied Sciences. Lecture 3 Ferury 6 ern University of pplied ciences ev. f57fc 3. We hve seen tht circuit cn hve multiple (n) inputs, e.g.,, C, We hve lso seen tht circuit cn hve multiple (m) outputs, e.g. X, Y,, ; or

More information

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique? XII. LINEAR ALGEBRA: SOLVING SYSTEMS OF EQUATIONS Tody we re going to tlk bout solving systems of liner equtions. These re problems tht give couple of equtions with couple of unknowns, like: 6 2 3 7 4

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016 CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!) CMSC 330: Orgniztion of Progrmming Lnguges DFAs, nd NFAs, nd Regexps (Oh my!) CMSC330 Spring 2018 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All

More information

The practical version

The practical version Roerto s Notes on Integrl Clculus Chpter 4: Definite integrls nd the FTC Section 7 The Fundmentl Theorem of Clculus: The prcticl version Wht you need to know lredy: The theoreticl version of the FTC. Wht

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014 CS125 Lecture 12 Fll 2014 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

Worked out examples Finite Automata

Worked out examples Finite Automata Worked out exmples Finite Automt Exmple Design Finite Stte Automton which reds inry string nd ccepts only those tht end with. Since we re in the topic of Non Deterministic Finite Automt (NFA), we will

More information

Vyacheslav Telnin. Search for New Numbers.

Vyacheslav Telnin. Search for New Numbers. Vycheslv Telnin Serch for New Numbers. 1 CHAPTER I 2 I.1 Introduction. In 1984, in the first issue for tht yer of the Science nd Life mgzine, I red the rticle "Non-Stndrd Anlysis" by V. Uspensky, in which

More information

5.7 Improper Integrals

5.7 Improper Integrals 458 pplictions of definite integrls 5.7 Improper Integrls In Section 5.4, we computed the work required to lift pylod of mss m from the surfce of moon of mss nd rdius R to height H bove the surfce of the

More information

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below. Dulity #. Second itertion for HW problem Recll our LP emple problem we hve been working on, in equlity form, is given below.,,,, 8 m F which, when written in slightly different form, is 8 F Recll tht we

More information