Direct construction of compact Directed Acyclic Word Graphs

Size: px
Start display at page:

Download "Direct construction of compact Directed Acyclic Word Graphs"

Transcription

1 Diret onstrution of ompt Direted Ayli Word Grphs Mxime Crohemore, Renud Vérin To ite this version: Mxime Crohemore, Renud Vérin. Diret onstrution of ompt Direted Ayli Word Grphs. Apostolio A nd Hein J. Combintoril Pttern Mthing (Arhus, 1997), 1997, Frne. Springer- Verlg, 1264, pp , 1997, LNCS. <hl > HAL d: hl Submitted on 13 Feb 2013 HAL is multi-disiplinry open ess rhive for the deposit nd dissemintion of sientifi reserh douments, whether they re published or not. The douments my ome from tehing nd reserh institutions in Frne or brod, or from publi or privte reserh enters. L rhive ouverte pluridisiplinire HAL, est destinée u dépôt et à l diffusion de douments sientifiques de niveu reherhe, publiés ou non, émnnt des étblissements d enseignement et de reherhe frnçis ou étrngers, des lbortoires publis ou privés.

2 Diret onstrution of Compt Direted Ayli Word Grphs Mxime Crohemore nd Renud Verin nstitut Gsprd Monge Universite de Mrne-L-Vllee, 2, rue de l Butte Verte, F Noisy-Le-Grnd. Abstrt. The Direted Ayli Word Grph (DAWG) is n eient dt struture to tret nd nlyze repetitions in text, espeilly in DNA genomi sequenes. Here, we onsider the Compt Direted Ayli Word Grph of word. We give the rst diret lgorithm to onstrut it. t runs in time liner in the length of the string on xed lphbet. Our implementtion requires hlf the memory spe used by DAWGs. Keywords: pttern mthing lgorithm, sux utomton, DAWG, Compt DAWG, sux tree, index on text. 1 ntrodution n the lssil string-mthing problem for word w nd text T, we wnt to know if w ours in T, i.e., if w is ftor of T. n mny pplitions, the sme text is queried severl times. So, eient solutions re bsed on dt strutures built on the text tht serve s n index to look for ny word w in T. The typil running of vrious implementtions of the serh is O(jwj) (on xed lphbet). Among the implementtions, the sux tree ([13]) is the most populr. ts size nd onstrution time re liner in the length of the text. t hs been studied nd used extensively. Apostolio [2] lists over 40 referenes on it, nd Mnber nd Myers [12] mention severl others. Mny vrints hve been developed, like sux rrys [12], PESTry [11], sux tus [10], or sux binry serh trees [9]. Besides, the sux trie, the non-ompt version of the sux tree, hs been rened to the sux utomton (Direted Ayli Word Grph, DAWG). This utomton is good lterntive to represent the whole set of ftors of text. t is the miniml utomton epting this set. t hs been fully exposed by Blumer [3] nd Crohemore [7]. As for the sux tree, its onstrution nd size is liner in the length of the text. n the genome reserh eld, DNA sequenes n be viewed s words over the lphbet f; ; g; tg. They beome subjets for linguisti nd sttisti nlysis. For this purpose, sux utomt re useful dt strutures. ndeed, the struture is fst to ompute nd esy to use. Menwhile, the length of sequenes in dtbses grows rpidly nd the bottlenek to using the bove dt strutures is their size. Keeping the index in min

3 memory is more nd more diult for lrge sequenes. So, hving struture using s little spe s possible is ppreible for its onstrution s well s for its utiliztion. Compression methods re of no use to redue the memory spe of suh indexes beuse they eliminte the diret ess to substrings. On the ontrry, the Compt Direted Ayli Word Grph (CDAWG) keeps the diret ess while requiring less memory spe. The struture hs been introdued by Blumer et l. [4, 5]). The utomton is bsed on the ontention of ftors issued from sme ontext. This ontention indues the deletion of ll sttes of outdegree one nd of their orresponding trnsitions, exepting terminl sttes. This sves 50% of memory spe. At the sme time, the redution of the number of sttes (2=3 less) nd trnsitions (bout hlf less) mkes the pplitions run fster. Both time nd spe re sved. n this pper, we give n lgorithm to build ompt DAWGs. This diret onstrution voids onstruting the DAWG rst, whih mkes it suitble for the tul DNA sequenes (more thn 1:5 million nuleotides for some of them). The ompt DAWG llows to pply stndrd tretment on sequenes twie s long in resonble time ( few minutes). n Setion 2 we rell the bsi notions on DAWGs. Setion 3 introdues the ompt DAWG, lso lled ompt sux utomton, with the bounds on its size. We show in Setion 4 how to build the CDAWG from the DAWG in time liner in the size of this ltter struture. The diret onstrution lgorithm for the CDAWG is given in Setion 5. A onlusion follows. 2 Denitions Let be nonempty lphbet nd the set of words over, with " s the empty word. f w is word in, jwj denotes its length, w i its i th letter, nd w i::j its ftor (subword) w i w i+1 : : : w j. f w = xyz with x; y; z 2, then x, y, nd z denote some ftors or subwords of w, x is prex of w, nd z is sux of w. S(x) denotes the set of ll suxes of x nd F (x) the set of its ftors. For n utomton, the tuple (p; ; q) denotes trnsition of lbel strting t p nd ending t q. A romn letter is used for mono-letter trnsitions, greek letter for multi-letter trnsitions. Moreover, (p; ] denotes trnsition from p for whih is prex of its lbel. Here, we rell the denition of the DAWG, nd theorem bout its implementtion nd its size proved in [3] nd [7]. Denition1. The Sux Automton of word x, denoted DAWG(x), is the miniml deterministi utomton (not neessrily omplete) tht epts S(x), the (nite) set of suxes of x. For exmple, Figure 1 shows the DAWG of the word gtgt. Sttes whih re double irled re terminl sttes. Theorem 2. The size of the DAWG of word x is O(jxj) nd the utomton n be omputed in time O(jxj). The mximum number of sttes of the utomton is 2jxj 1, nd the mximum number of edges is 3jxj 4.

4 t g t g t F g 8 10 Fig. 1. DAWG(gtgt) Rell tht the right ontext of ftor u of x is u 1 S(x). The syntti ongruene, denoted by S(x), ssoited with S(x) is dened, for x; u; v 2, by: u S(x) v () u 1 S(x) = v 1 S(x). We ll lsses of ftors the ongruene lsses of the reltion S(x). The longest word of lss of ftors is lled the representtive of the lss. Sttes of DAWG(x) re extly the lsses of the reltion S(x). Sine this utomton is not required to be omplete, the lss of words not ourring in x, orresponding to the empty right ontext, is not stte of DAWG(x). Moreover, we indue seletion mong the ongruene lsses tht we ll strit lsses of ftors of S(x) nd tht re dened s follows: Denition 3. Let u be word of C, lss of ftors of S(x). f t lest two letters nd b of exist suh tht u nd ub re ftors of x, then we sy tht C is strit lss of ftors of S(x). We lso introdue the funtion endpos x : F (x)! N, dened, for every word u, by: endpos x (u) = minfjwj j w prex of x nd u sux of wg nd the funtion length x dened on sttes of DAWG(x) by : length x (p) = juj; with u representtive of p: The word u lso orresponds to the ontented lbels of trnsitions of the longest pth from the initil stte to p in DAWG(x). The trnsitions tht belong to the spnning tree of longest pths from the initil stte re lled solid trnsitions. Equivlently, for eh trnsition (p; ; q) we hve the property: (p; ; q) is solid () length x (q) = length x (p) + 1: The funtion length x works s well for multi-letter trnsitions, just repling 1 in the bove equivlene by the length of the lbel of the trnsition. This extends the notion of solid trnsitions to multi-letter trnsitions: (p; ; q) is solid () length x (q) = length x (p) + jj: n ddititon, we dene the sux link for stte of DAWG(x) by:

5 Denition4. Let p be stte of DAWG(x), dierent from the initil stte, nd let u word of the equivlene lss p. The sux link of p, denoted by s x (p), is the stte q whih representtive v is the longest sux z of u suh tht u 6 S(x) z. Note tht, onsequently to this denition, we hve length x (q) < length x (p). Then, by itertion, sux links indue sux pths in DAWG(x), whih is n importnt notion used by the onstrution lgorithm. ndeed, s onsequene of the bove inequlity, the sequene (p; s x (p); s 2 x(p); :::) is nite nd ends t the initil stte of DAWG(x). This sequene is lled the sux pth of p. 3 Compt Direted Ayli Word Grphs 3.1 Denition The ompression of DAWGs is bsed on the deletion of some sttes nd their orresponding trnsitions. This is possible using multi-letter trnsitions nd the seletion of strit lsses of ftors dened in the previous setion (Denition 3). Thus, we dene the Compt DAWG s follows. Denition5. The Compt Direted Ayli Word Grph of word x, denoted by CDAWG(x), is the omption of DAWG(x) obtined by keeping only sttes tht re either terminl sttes or strit lsses of ftors ording to S(x), nd by lbeling trnsitions ordingly. Consequently to Denition 3, the strit lsses of ftors orrespond to the sttes tht hve n outdegree greter thn one. So, we n delete every stte hving outdegree one extly, exept terminl sttes. Note tht initil nd nl sttes re terminl sttes too, so they re not deleted. gt 2 gt t F 3 4 gt Fig. 2. CDAWG(gtgt) The onstrution of the DAWG of word inluding some repetitions shows tht mny sttes hve outdegree one only. For exmple, in Figure 1, the DAWG of the word gtgt hs 12 sttes, 7 of whih hve outdegree one; it hs 18 trnsitions. Figure 2 displys the result fter the deletion of these sttes, using multi-letter trnsitions. The resulting utomton hs only 5 sttes nd 11 edges.

6 Aording to experiments to onstrut DAWGs of biologil DNA sequenes, onsidering them s words over the lphbet = f; ; g; tg, we got tht more thn 60% of sttes hve n outdegree one. So, the deletion of these sttes is worth, it provides n importnt sving. The verge nlysis of the number of sttes nd edges is done in [5] in Bernouilly model of probbility. When stte p is deleted, the deletion of outgoing edges is relized by dding the lbel of the outgoing edge of the deleted stte to the lbels of its inoming edges. For exmple, let r, p nd q be sttes linked by trnsitions (r; b; p) nd (p; ; q). We reple the edges (r; b; p) nd (p; ; q) by the edge (r; b; q). By reursion, we extend this method to every multi-letter trnsition (r; ; p). n the exmple (Figure 1), one n note tht, inside the word gtgt, ourrenes of g re followed by t, nd those of t nd gt by. So, gt is the representtive of stte 3 nd it is not neessry to rete sttes for g nd (gt or t). Then, we diretly onnet stte to stte 3 with edges (,gt,3) nd (,t,3). Sttes 1 nd 2 re so deleted. The sux links dened on sttes of DAWGs remin vlid when we redue them to CDAWGs beuse of the next lemm. Lemm 6. f p is stte of CDAWG(x), then sx(p) is stte of CDAWG(x). 3.2 Size bounds By Theorem 2 DAWG(x) is liner in jxj. As we shll see below (Setion 3.3), lbels of multi-letter trnsitions re implemented in onstnt spe. So, the size of CDAWG(x) is lso O(jxj). Menwhile, s we delete mny sttes nd edges, we review the ext bounds on the number of sttes nd edges of CDAWG(x). They re respetively denoted by Sttes(x) nd Edges(x). Corollry 7. Given x 2, if jxj = 0, then Sttes(x) = 1; if jxj = 1, then Sttes(x) = 2; else jxj 2, then 2 Sttes(x) jxj + 1 nd the upper bound is rehed when x is in the form jxj, where 2. Corollry 8. Given x 2, if jxj = 0, Edges(x) = 0; if jxj = 1, Edges(x) = 1; else jxj 2, then Edges(x) 2jxj 2 nd this upper bound is rehed when x is in the form jxj 1, where nd re two dierent letters of. 3.3 mplementtion nd Results Trnsition mtries nd djeny lists re the lssil implementtions of utomt. Their prinipl dierene lies in the implementtion of trnsitions. The rst one gives diret ess to trnsitions, but requires O(Sttes(x) rd()). The seond one stores only the ext number of trnsitions in memory, but needs O(log rd()) time to ess them. When the size of the lphbet is big nd the trnsition mtrix is sprse, djeny lists re preferble. Otherwise, like for genomi sequenes, trnsition mtrix is better hoie, s shown by the

7 experiments below. So, we only onsider here trnsition mtries to implement CDAWGs. We now desribe the ext implementtion of sttes nd edges. We do this on four-letter lphbet, so hrters tke 0:25 byte. We use integers enoded with 4 bytes. For eh stte, to enode the trget stte of outgoing edges, trnsitions mtries need vetor of 4 integers. Adjeny lists need, for eh edge, 2 integers, one for the trget stte nd nother one for the pointer to the next edge. The bsi informtion required to onstrut the DAWG is omposed of tble to implement the funtion sx nd one boolen vlue (0:125 byte) for eh edge to know if it is solid or not. For the CDAWG, in order to implement multiletter trnsitions, we need one integer for the endpos x vlue of eh stte, nd nother integer for the lbel length of eh edge. And tht is ll. ndeed, we n nd the lbel of trnsition by utting o the length of this trnsition from the endpos x vlue of its ending stte. Then, we got the position of the lbel in the soure nd its length. Keeping the soure in memory is negligible onsidering the globl size of the utomton (0:25 byte by hrter). This is quite onvenient solution lso used for sux trees. Figure 3 displys how the Stte Number 0 lengthx endposx 0 0 sx gt t gt gt F 8 9 Fig. 3. Dt Struture of CDAWG(gtgt) sttes of CDAWG(gtgt) re implemented. Then, respetively for trnsitions mtries nd djeny lists, eh stte requires 20:5 nd 17:13 bytes for the DAWG, nd 40:5 nd 41:21 bytes for the CDAWG. As referene, sux trees, s implemented by MCreight [13], need 28:25 nd 20:25 bytes per stte. Moreover, for CDAWG nd sux trees the soure hs to be stored in min memory. Theoretil verge numbers of sttes,

8 lulted by Blumer et l. ([5]), re 0:54n for CDAWG, 1:62n for DAWG, nd 1; 62n for sux trees, when n is the length of x. This gives respetive sizes in bytes per hrter of the soure: 45:68 nd 32:70 for sux trees, 33:26 nd 27:80 for DAWGs, nd 22:40 nd 22:78 for CDAWGs. Considering the omplete dt strutures required for pplitions, the funtion endpos x hs to be dded for the DAWG nd the sux tree. n ddition, the ourrene number of eh ftor hs to be stored in eh stte for ll the strutures. Therefore, the respetive sizes in bytes per hrter of the soure beome : 58:66 nd 45:68 for sux trees, 46:24 nd 40:78 for DAWGs, nd 24:26 nd 24:72 for CDAWGs. Nb sttes Nb trnsitions Nb trnsitions Soure memory jxj jxj Nb sttes jxj x dwg dwg dwg dwg dwg dwg gin hro ,64 0,54 2,54 1,44 1,55 2,66 50,36% oli ,64 0,54 2,54 1,44 1,53 2,66 51,95% bs ,66 0,50 2,50 1,34 1,50 2,66 54,78% bs ,64 0,54 2,54 1,44 1,55 2,66 50,16% rndom ,62 0,55 2,54 1,47 1,57 2,68 49,53% rndom ,62 0,55 2,55 1,47 1,57 2,68 49,35% rndom ,62 0,54 2,54 1,46 1,56 2,68 49,68% rndom ,62 0,54 2,54 1,46 1,56 2,68 49,47% theor. ver. rtios 1,63 0,54 2,54 1,46 1,56 2,67 50,55% Tble 1. Sttisti tble with ount between DAWG nd CDAWG. Moreover, Tble 1 ompres sizes of DAWG nd CDAWG ment for pplitions to DNA sequenes. Sizes for rndom words of dierent lengths nd jj = 4 re lso given. DNA sequenes re Shromyes erevisie yest hromosome (hro ), ontig of Esherihi Coli DNA sequene (oli), nd ontigs 1 nd 115 of Billus Subtilis DNA sequene (bs). Number of sttes nd edges ording to the length of the soure nd the memory spe gin re displyed. Theoretil verge rtios re given, lulted from Blumer et l. ([5]). First, we observe there re 2=3 less sttes in the CDAWG, nd ner of hlf edges. Seond, the memory spe sving is bout 50%. Third, the number of edges by stte is going up to 2:66. With four-letter lphbet, this is interesting beuse the trnsition mtrix beomes smller thn djeny lists. At the sme time, we keep diret ess to trnsitions. 4 Construting CDAWG from DAWG The DAWG onstrution is fully exposed nd demonstrted in [3] nd [7]. As we show in this setion, the CDAWG is esily derived from the DAWG.

9 ndeed, we just need to pply the denition of the CDAWG reursively. This is omputed by the funtion Redution, given below. Observe tht, in this funtion, stte(p; ] denotes the stte pointed to by the trnsition (p; ]. The omputtion is done with depth-rst trversl of the utomton, nd runs in time liner in the number of trnsitions of DAWG(x). Then, by theorem 2, the omputtion lso runs in time liner in the length of the text. However, this method needs to onstrut the DAWG rst, whih spends time nd memory spe proportionl to DAWG(x), though CDAWG(x) is signintly smller. So, it is better to onstrut the CDAWG diretly. Redution (stte E) returns (ending stte, length of redireted edge) 1. f (E not mrked) Then 2. For ll existing edge (E; ] Do 3. (stte(e; ], jlbel((e; ])j) Redution(stte(E; ]); 4. mrk(e) TRUE; 5. f (E is of outdegree one) Then 6. Let (E; ] this edge ; 7. Return (stte(e; ], 1 + jlbel((e; ])j); 8. Else 9. Return (E,1); 5 Diret Constrution of CDAWG n this setion, we give the diret onstrution of CDAWGs nd show tht the running time is liner in the size of the input word x on xed lphbet. 5.1 Algorithm Sine the CDAWG of x is minimiztion of its sux tree, it is rther nturl to bse the diret onstrution on MCreight's lgorithm [13]. Menwhile, properties of the DAWG onstrution re lso used, espeilly sux links (notion tht is dierent from the sux links of MCreight's lgorithm), lengths, nd positions, s explined in the previous setion. First, we introdue the notions used by the lgorithm, some of them re tken from [13]. The lgorithm onstruts the CDAWG of the word x of length n, noted x 0::n 1. The utomton is dened by set of sttes nd trnsitions, espeilly with nd F, the initil nd nl sttes. A prtil pth represents onneted sequene of edges between two sttes of the utomton. A pth is prtil pth tht begins t. The lbel of pth is the ontention of the lbels of orresponding edges. The lous, or ext lous, of string is the end of the pth lbeled by the string. The ontrted lous of string is the lous of the longest prex of whose lous is dened.

10 Preliminry Algorithm Bsilly, the lgorithm to build CDAWG inserts the pths orresponding to ll the suxes of x from the longest to the shortest. We dene suf i s the sux x i::n 1 of x. We denote by A i the utomton onstruted fter the insertion of ll the suf j for 0 j i. bbbb A B bbbb bbbb F 1 F bbbb C 1 D 1 b 2 bbbb bbbb bb bbbb bbb bb F b 2 b bb 3 F bb Fig. 4. Constrution of CDAWG(bbbb) Figure 4 displys four steps of the onstrution of CDAWG(bbbb). n this Figure (nd the followings), the dshed edges represent sux links of sttes, whih re used subsequently. We initilize the utomton A " with sttes nd F. At step i (i > 0), the lgorithm inserts pth orresponding to suf i in A i 1 nd produes A i. The lgorithm stises the following invrint properties: P1: t the beginning of step i, ll suxes suf j, 0 j < i, re pths in A i 1. P2: t the beginning of step i, the sttes of A i 1 re in one-to-one orrespondene with the longest ommon prexes of pirs of suxes longer thn suf j. We dene hed i s the longest prex of suf i whih is lso prex of suf j for some j < i. Equivlently, hed i is the longest prex of suf i whih is lso pth of A i 1. We dene til i s hed 1 suf i i. At step i, the preliminry lgorithm hs to insert til i from the lous of hed i in A i 1 (see Figure 5). To do so, the ontrted lous of hed i in A i 1 is found with the help of funtion SlowFind tht ompres letter-to-letter the right pth of A i 1 to suf i. This is similr to the orresponding MCreight's proedure, exept on wht is explined below. Then, if neessry, new stte is reted to split the lst enountered edge, stte tht is the lous of hed i. The utomton B of Figure 4, displys the retion of stte 1 during the insertion of suf 1 =bbbb. Note tht, if n lredy existing stte mthes the strit lss of ftor of hed i, the lst

11 hed i til i F Fig. 5. Sheme of the insertion of suf i in A i 1. enountered edge is split in the sme wy, but it is redireted to this stte. Suh n exmple ppers in the sme exmple (se D): the insertion of suf 5 =bb indues the rediretion of the edge (2,bbb,F) tht beomes (2,b,3). Then, n edge lbeled by til i is reted from the lous of hed i to F. We n write the preliminry lgorithm s follows: Preliminry Algorithm 1. For ll suf i (i 2[0..n-1]) Do 2. (q; ) SlowFind(); 3. f ( = ") Then 4. insert (q,til i,f); 5. Else 6. rete v lous of hed i splitting (q; ] nd insert (v,til i,f); or rediret (q; ] onto v, the lst reted stte; 7. End For ll; 8. mrk terminl sttes; Note rst tht SlowFind returns the lst enountered stte. This keeps essible the trnsition (q; ] tht n be split if this stte is not n ext lous. Seond, s in the DAWG onstrution, if non-solid edge is enountered during SlowFind, its trget stte hs to be duplited in lone nd the nonsolid edge is redireted to this lone. But, if the lone hs just been reted t the previous step, the edge is redireted to this stte. Note tht, in the two ses, the redireted trnsition beomes solid. Finlly, when til i = " t the end of the onstrution, terminl sttes re mrked long the sux pth of F. From the bove disussion, proof of the invrine of properties P1 nd P2 n be derived. Thus, t the end of the lgorithm ll subwords of x nd only these words re lbels of pths in the utomton (property P1). By property P2, sttes orrespond to strit lsses of ftors (when the longest ommon prex of pir of suxes is not equl to ny of them) or to terminl sttes (when the ontrry holds). This gives sketh of the orretness of the lgorithm.

12 The running time of the preliminry lgorithm is O(jxj 2 ) (with n implementtion by trnsition mtrix), like is the sum of lengths of ll suxes of the word x. Liner Algorithm To get liner-time lgorithm, we use together properties of DAWGs onstrution nd of sux trees onstrution. The min feture is the notion of sux links. They re dened s for DAWGs in Setion 2. They re the lue for the liner-running-time of the lgorithm. Three elements hve to be pointed out bout sux links in the CDAWG. First, we do not need to initilize sux links. ndeed, when suf 0 is inserted, x 0 is obviously new letter, whih diretly indues s x (F)=. Note tht s x () is never used, nd so never dened. Seond, trveling long the sux pth of stte p does not neessrily end t stte. ndeed, with multi-letter trnsitions, if s x (p)= we hve to tret the sux 1 ( 2 ) where is the representtive of p. And third, sux links indue the following invrint property stised t step i: P3: t the beginning of step i, the sux links re dened for eh stte of A i 1 ording to Denition 4. The next remrk llows rediretions without hving to serh with SlowFind for existing sttes belonging to sme lss of ftors. Remrk. Let hve lous p nd ssume tht q = s x (p) is the lous of. Then, p is the lous of suxes of whose lengths re greter thn jj. The lgorithm hs to del with sux links eh time stte is reted. This hppens when stte is duplited, nd when stte is reted fter the exeution of SlowFind. n the duplition, sux links re updted s follows. Let w be the lone of q. n regrd to strit lsses of ftors nd Denition 4, the lss of w is inserted between the ones of q nd s x (q). So, we updte sux links by setting s x (w)=s x (q) nd s x (q)=w. Moreover, the duplition hs the sme properties s in the DAWG onstrution. Let (p; ; q) be the trnsition redireted during the duplition of q. We n rediret ll non-solid edges tht end the prtil pth nd tht strt from stte of the sux pth of p. This is done until the rst edge tht is solid. We re helped in this opertion by the funtion FstFind, similr to the one used in MCreight's lgorithm [13], tht goes through trnsitions just ompring the rst letters of their lbels. This funtion returns the lst enountered stte nd edge. Note tht it is not neessry to nd eh time the prtil pth from sux of p, we just need to tke the sux link of the lst enountered stte nd the lbel of the previous redireted trnsition. Let # be the representtive of stte of the sux pth of p. Observe tht the orresponding rediretion is equivlent to insert suf i+jj j#j. ndeed, ll opertions done fter this rediretion will be the sme s for the insertion of suf i, sine they go through the sme pth.

13 q v sx s r Fig. 6. Sheme of the serh using sux links After the exeution of SlowFind, if stte v is reted, we hve to ompute its sux link. Let be the lbel of the trnsition strting t q nd ending t v. To ompute the sux link, the lgorithm goes through the pth hving lbel from the sux link of q, s = sx(q). The opertion is repeted if neessry. Figure 6 displys sheme of this serh. The thik dshed edges represent pths in the utomton, nd the thin dshed edge represents the sux link of q. This serh will llow to insert, s for the duplition, the suxes suf j, for i < j < i+jhedi j. To trvel long the pth, we use gin the funtion FstFind. Let r nd (r; ] be the lst stte nd trnsition enountered by FstFind. f r is the ext lous of, it is the wnted stte, nd we set then sx(v) = r. Else, if (r; ] is solid edge, then we hve to rete new node w. The edge (r; ] is split, it beomes (r; ; w), nd we insert the trnsition (w,til i,f). Else, (r; ] is non-solid. Then, it is split nd beomes (r; ; v). n the two lst ses, sine sx(v) is not found, we run FstFind gin with sx(r) nd, nd this goes on until sx(v) is eventully found, tht is, when = ". The disussion shows how sux links re updted to insure tht property P3 is stised. The opertions do not inuene the orretness of the lgorithm, skethed in the lst setion, but yield the following liner-time lgorithm. ts time omplexity is disussed in the next setion. Liner Algorithm 1. p ; i 0; 2. While not end of x Do 3. (q; ) SlowFind(p); 4. f ( = ") Then 5. insert (q,tili,f); 6. sx(f) q; 7. f (q 6= ) Then p sx(q) Else p ; 8. Else 9. rete v lous of hedi splitting (q; ]; 10. insert (v,tili,f); 11. sx(f) v; 12. nd r = sx(v) with FstFind; 13. p r; 14. updte i; 15. End While; 16. mrk terminl sttes;

14 5.2 Complexity Theorem 9. The lgorithm tht builds the CDAWG of word x of n be implemented in time O(jxj) nd in spe O(jxj rd()) with trnsition mtrix, or in time O(jxjlog rd()) nd in spe O(jxj) with djeny lists. suf i x hed i til i i j k q v s r Fig. 7. Positions of lbels when suf i is inserted Sketh of the proof t n be proved tht eh step of the lgorithm leds to inrese stritly vribles j or k in the generi sitution displyed in Figure 7. These vribles respetively represent the index of the urrent sux being inserted, nd pointer on the text. These vribles never derese. Therefore, the totl running time of the lgorithm is liner in the length of x. 6 Conlusion We hve onsidered the Compt Diret Ayli Word Grph, whih is n eient ompt dt struture to represent ll suxes of word. There re mny dt strutures representing this set. But, this one llows n interesting spe gin ompred to the well-known DAWG, whih is referene. ndeed, on the one hnd, the upper bounds re of jxj + 1 sttes nd 2jxj 2 trnsitions. This sves jxj sttes nd jxj trnsitions of the DAWG, whih leds to fster utilistion. On the other hnd, experiments on genomi DNA sequenes nd rndom strings disply memory spe gin of 50% ording to the DAWG. Moreover, when the size of the lphbet is smll, trnsition mtries do not tke more spe thn djeny lists, keeping diret ess to trnsitions. Thus, we n onstrut the

15 dt struture of twie lrger strings, keeping them in min memory, whih is tully importnt to get eient tretments. This work shows tht the CDAWG n be onstruted diretly. The lgorithm is liner in the length of the text. Of ourse, it is esier to ompute, by redution, the CDAWG from the DAWG. On the ontrry, our lgorithm sves time nd spe simultneously. Referenes 1. A. Anderson nd S. Nilsson. Eient implementtion of sux trees. Softwre, Prtie nd Experiene, 25(2):129{141, Feb A. Apostolio. The myrid virtues of subword trees. n A. Apostolio & Z. Glil, editor, Combintoril Algorithms on Words., pges 85{95. Springer-Verlg, A. Blumer, J. Blumer, D. Hussler, A. Ehrenfeuht, M.T. Chen, nd J. Seifers. The smllest utomton reognizing the subwords of text. Theoret. Comput. Si., 40:31{55, A. Blumer, J. Blumer, D. Hussler, nd R. MConnell. Complete inverted les for eient text retrievl nd nlysis. Journl of the Assoition for Computing Mhinery, 34(3):578{595, July A. Blumer, D. Hussler, nd A. Ehrenfeuht. Averge sizes of sux trees nd dwgs. Disrete Applied Mthemtis, 24:37{45, B. Clift, D. Hussler, R. MDonnell, T.D. Shneider, nd G.D. Stormo. Sequene lndspes. Nulei Aids Reserh, 4(1):141{158, M. Crohemore. Trnsduers nd repetitions. Theor. Comp. Si., 45:63{86, M. Crohemore nd W. Rytter. Text Algorithms, hpter 5-6, pges 73{130. Oxford University Press, New York, R. W. rving. Sux binry serh trees. Tehnil report TR , Computing Siene Deprtment, University of Glsgow, April J. Krkkinen. Sux tus : ross between sux tree nd sux rry. CPM, 937:191{204, July C. Lefevre nd J-E. ked. The position end-set tree: A smll utomton for word reognition in biologil sequenes. CABOS, 9(3):343{348, U. Mnber nd G. Myers. Sux rrys: A new method for on-line string serhes. SAM J. Comput., 22(5):935{948, Ot E. MCreight. A spe-eonomil sux tree onstrution lgorithm. Journl of the ACM, 23(2):262{272, Apr E. Ukkonen. On-line onstrution of sux trees. Algorithmi, 14:249{260, This rtile ws proessed using the LATEX mro pkge with LLNCS style

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18 Computt onl Biology Leture 18 Genome Rerrngements Finding preserved genes We hve seen before how to rerrnge genome to obtin nother one bsed on: Reversls Knowledge of preserved bloks (or genes) Now we re

More information

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4.

Tutorial Worksheet. 1. Find all solutions to the linear system by following the given steps. x + 2y + 3z = 2 2x + 3y + z = 4. Mth 5 Tutoril Week 1 - Jnury 1 1 Nme Setion Tutoril Worksheet 1. Find ll solutions to the liner system by following the given steps x + y + z = x + y + z = 4. y + z = Step 1. Write down the rgumented mtrix

More information

CS 573 Automata Theory and Formal Languages

CS 573 Automata Theory and Formal Languages Non-determinism Automt Theory nd Forml Lnguges Professor Leslie Lnder Leture # 3 Septemer 6, 2 To hieve our gol, we need the onept of Non-deterministi Finite Automton with -moves (NFA) An NFA is tuple

More information

Prefix-Free Regular-Expression Matching

Prefix-Free Regular-Expression Matching Prefix-Free Regulr-Expression Mthing Yo-Su Hn, Yjun Wng nd Derik Wood Deprtment of Computer Siene HKUST Prefix-Free Regulr-Expression Mthing p.1/15 Pttern Mthing Given pttern P nd text T, find ll sustrings

More information

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution

Technische Universität München Winter term 2009/10 I7 Prof. J. Esparza / J. Křetínský / M. Luttenberger 11. Februar Solution Tehnishe Universität Münhen Winter term 29/ I7 Prof. J. Esprz / J. Křetínský / M. Luttenerger. Ferur 2 Solution Automt nd Forml Lnguges Homework 2 Due 5..29. Exerise 2. Let A e the following finite utomton:

More information

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points:

Algorithms & Data Structures Homework 8 HS 18 Exercise Class (Room & TA): Submitted by: Peer Feedback by: Points: Eidgenössishe Tehnishe Hohshule Zürih Eole polytehnique fédérle de Zurih Politenio federle di Zurigo Federl Institute of Tehnology t Zurih Deprtement of Computer Siene. Novemer 0 Mrkus Püshel, Dvid Steurer

More information

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6 CS311 Computtionl Strutures Regulr Lnguges nd Regulr Grmmrs Leture 6 1 Wht we know so fr: RLs re losed under produt, union nd * Every RL n e written s RE, nd every RE represents RL Every RL n e reognized

More information

Finite State Automata and Determinisation

Finite State Automata and Determinisation Finite Stte Automt nd Deterministion Tim Dworn Jnury, 2016 Lnguges fs nf re df Deterministion 2 Outline 1 Lnguges 2 Finite Stte Automt (fs) 3 Non-deterministi Finite Stte Automt (nf) 4 Regulr Expressions

More information

String Transformation Learning. Baltimore, MD learning problem becomes NP-hard.

String Transformation Learning. Baltimore, MD learning problem becomes NP-hard. String Trnsformtion Lerning Giorgio Stt Diprtimento di Elettroni e Informti Universit di Pdov vi Grdenigo, 6/A I-35131 Pdov, Itly stt@dei.unipd.it John C. Henderson Deprtment of Computer Siene Johns Hopkins

More information

Nondeterministic Automata vs Deterministic Automata

Nondeterministic Automata vs Deterministic Automata Nondeterministi Automt vs Deterministi Automt We lerned tht NFA is onvenient model for showing the reltionships mong regulr grmmrs, FA, nd regulr expressions, nd designing them. However, we know tht n

More information

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides.

1 PYTHAGORAS THEOREM 1. Given a right angled triangle, the square of the hypotenuse is equal to the sum of the squares of the other two sides. 1 PYTHAGORAS THEOREM 1 1 Pythgors Theorem In this setion we will present geometri proof of the fmous theorem of Pythgors. Given right ngled tringle, the squre of the hypotenuse is equl to the sum of the

More information

On-Line Construction of Compact Directed Acyclic Word Graphs

On-Line Construction of Compact Directed Acyclic Word Graphs On-Line Constrution of Compt Direte Ayli Wor Grphs Shunsuke neng, Hiroms Hoshino, Ayumi Shinohr, Msyuki Tke,SetsuoArikw, Ginrlo Muri 2, n Giulio Pvesi 2 Dept. of nformtis, Kyushu University, Jpn {s-ine,hoshino,yumi,tke,rikw}@i.kyushu-u..jp

More information

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs Isomorphism of Grphs Definition The simple grphs G 1 = (V 1, E 1 ) n G = (V, E ) re isomorphi if there is ijetion (n oneto-one n onto funtion) f from V 1 to V with the property tht n re jent in G 1 if

More information

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

Project 6: Minigoals Towards Simplifying and Rewriting Expressions MAT 51 Wldis Projet 6: Minigols Towrds Simplifying nd Rewriting Expressions The distriutive property nd like terms You hve proly lerned in previous lsses out dding like terms ut one prolem with the wy

More information

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106

18.06 Problem Set 4 Due Wednesday, Oct. 11, 2006 at 4:00 p.m. in 2-106 8. Problem Set Due Wenesy, Ot., t : p.m. in - Problem Mony / Consier the eight vetors 5, 5, 5,..., () List ll of the one-element, linerly epenent sets forme from these. (b) Wht re the two-element, linerly

More information

Linear Algebra Introduction

Linear Algebra Introduction Introdution Wht is Liner Alger out? Liner Alger is rnh of mthemtis whih emerged yers k nd ws one of the pioneer rnhes of mthemtis Though, initilly it strted with solving of the simple liner eqution x +

More information

General Suffix Automaton Construction Algorithm and Space Bounds

General Suffix Automaton Construction Algorithm and Space Bounds Generl Suffix Automton Constrution Algorithm nd Spe Bounds Mehryr Mohri,, Pedro Moreno, Eugene Weinstein, Cournt Institute of Mthemtil Sienes 251 Merer Street, New York, NY 10012. Google Reserh 76 Ninth

More information

Hyers-Ulam stability of Pielou logistic difference equation

Hyers-Ulam stability of Pielou logistic difference equation vilble online t wwwisr-publitionsom/jns J Nonliner Si ppl, 0 (207, 35 322 Reserh rtile Journl Homepge: wwwtjnsom - wwwisr-publitionsom/jns Hyers-Ulm stbility of Pielou logisti differene eqution Soon-Mo

More information

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of: 22: Union Fin CS 473u - Algorithms - Spring 2005 April 14, 2005 1 Union-Fin We wnt to mintin olletion of sets, uner the opertions of: 1. MkeSet(x) - rete set tht ontins the single element x. 2. Fin(x)

More information

Discrete Structures Lecture 11

Discrete Structures Lecture 11 Introdution Good morning. In this setion we study funtions. A funtion is mpping from one set to nother set or, perhps, from one set to itself. We study the properties of funtions. A mpping my not e funtion.

More information

Linear choosability of graphs

Linear choosability of graphs Liner hoosility of grphs Louis Esperet, Mikel Montssier, André Rspud To ite this version: Louis Esperet, Mikel Montssier, André Rspud. Liner hoosility of grphs. Stefn Felsner. 2005 Europen Conferene on

More information

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version A Lower Bound for the Length of Prtil Trnsversl in Ltin Squre, Revised Version Pooy Htmi nd Peter W. Shor Deprtment of Mthemtil Sienes, Shrif University of Tehnology, P.O.Bo 11365-9415, Tehrn, Irn Deprtment

More information

, g. Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g. Solution 1.

, g. Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g. Solution 1. Exerise Genertor polynomils of onvolutionl ode, given in binry form, re g, g j g. ) Sketh the enoding iruit. b) Sketh the stte digrm. ) Find the trnsfer funtion T. d) Wht is the minimum free distne of

More information

2.4 Theoretical Foundations

2.4 Theoretical Foundations 2 Progrmming Lnguge Syntx 2.4 Theoretil Fountions As note in the min text, snners n prsers re se on the finite utomt n pushown utomt tht form the ottom two levels of the Chomsky lnguge hierrhy. At eh level

More information

Part 4. Integration (with Proofs)

Part 4. Integration (with Proofs) Prt 4. Integrtion (with Proofs) 4.1 Definition Definition A prtition P of [, b] is finite set of points {x 0, x 1,..., x n } with = x 0 < x 1

More information

Hybrid Systems Modeling, Analysis and Control

Hybrid Systems Modeling, Analysis and Control Hyrid Systems Modeling, Anlysis nd Control Rdu Grosu Vienn University of Tehnology Leture 5 Finite Automt s Liner Systems Oservility, Rehility nd More Miniml DFA re Not Miniml NFA (Arnold, Diky nd Nivt

More information

Lecture Notes No. 10

Lecture Notes No. 10 2.6 System Identifition, Estimtion, nd Lerning Leture otes o. Mrh 3, 26 6 Model Struture of Liner ime Invrint Systems 6. Model Struture In representing dynmil system, the first step is to find n pproprite

More information

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths

Intermediate Math Circles Wednesday 17 October 2012 Geometry II: Side Lengths Intermedite Mth Cirles Wednesdy 17 Otoer 01 Geometry II: Side Lengths Lst week we disussed vrious ngle properties. As we progressed through the evening, we proved mny results. This week, we will look t

More information

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER MACHINES AND THEIR LANGUAGES ANSWERS The University of ottinghm SCHOOL OF COMPUTR SCIC A LVL 2 MODUL, SPRIG SMSTR 2015 2016 MACHIS AD THIR LAGUAGS ASWRS Time llowed TWO hours Cndidtes my omplete the front over of their nswer ook nd sign their

More information

= state, a = reading and q j

= state, a = reading and q j 4 Finite Automt CHAPTER 2 Finite Automt (FA) (i) Derterministi Finite Automt (DFA) A DFA, M Q, q,, F, Where, Q = set of sttes (finite) q Q = the strt/initil stte = input lphet (finite) (use only those

More information

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P.

(a) A partition P of [a, b] is a finite subset of [a, b] containing a and b. If Q is another partition and P Q, then Q is a refinement of P. Chpter 7: The Riemnn Integrl When the derivtive is introdued, it is not hrd to see tht the it of the differene quotient should be equl to the slope of the tngent line, or when the horizontl xis is time

More information

NON-DETERMINISTIC FSA

NON-DETERMINISTIC FSA Tw o types of non-determinism: NON-DETERMINISTIC FS () Multiple strt-sttes; strt-sttes S Q. The lnguge L(M) ={x:x tkes M from some strt-stte to some finl-stte nd ll of x is proessed}. The string x = is

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

INTEGRATION. 1 Integrals of Complex Valued functions of a REAL variable

INTEGRATION. 1 Integrals of Complex Valued functions of a REAL variable INTEGRATION NOTE: These notes re supposed to supplement Chpter 4 of the online textbook. 1 Integrls of Complex Vlued funtions of REAL vrible If I is n intervl in R (for exmple I = [, b] or I = (, b)) nd

More information

More Properties of the Riemann Integral

More Properties of the Riemann Integral More Properties of the Riemnn Integrl Jmes K. Peterson Deprtment of Biologil Sienes nd Deprtment of Mthemtil Sienes Clemson University Februry 15, 2018 Outline More Riemnn Integrl Properties The Fundmentl

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

Discrete Structures, Test 2 Monday, March 28, 2016 SOLUTIONS, VERSION α

Discrete Structures, Test 2 Monday, March 28, 2016 SOLUTIONS, VERSION α Disrete Strutures, Test 2 Mondy, Mrh 28, 2016 SOLUTIONS, VERSION α α 1. (18 pts) Short nswer. Put your nswer in the ox. No prtil redit. () Consider the reltion R on {,,, d with mtrix digrph of R.. Drw

More information

Comparing the Pre-image and Image of a Dilation

Comparing the Pre-image and Image of a Dilation hpter Summry Key Terms Postultes nd Theorems similr tringles (.1) inluded ngle (.2) inluded side (.2) geometri men (.) indiret mesurement (.6) ngle-ngle Similrity Theorem (.2) Side-Side-Side Similrity

More information

Introduction to Olympiad Inequalities

Introduction to Olympiad Inequalities Introdution to Olympid Inequlities Edutionl Studies Progrm HSSP Msshusetts Institute of Tehnology Snj Simonovikj Spring 207 Contents Wrm up nd Am-Gm inequlity 2. Elementry inequlities......................

More information

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!) CMSC 330: Orgniztion of Progrmming Lnguges DFAs, nd NFAs, nd Regexps (Oh my!) CMSC330 Spring 2018 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All

More information

] dx (3) = [15x] 2 0

] dx (3) = [15x] 2 0 Leture 6. Double Integrls nd Volume on etngle Welome to Cl IV!!!! These notes re designed to be redble nd desribe the w I will eplin the mteril in lss. Hopefull the re thorough, but it s good ide to hve

More information

CS375: Logic and Theory of Computing

CS375: Logic and Theory of Computing CS375: Logic nd Theory of Computing Fuhu (Frnk) Cheng Deprtment of Computer Science University of Kentucky 1 Tble of Contents: Week 1: Preliminries (set lgebr, reltions, functions) (red Chpters 1-4) Weeks

More information

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA Common intervls of genomes Mthieu Rffinot CNRS LIF Context: omprtive genomis. set of genomes prtilly/totlly nnotte Informtive group of genes or omins? Ex: COG tse Mny iffiulties! iology Wht re two similr

More information

Line Integrals and Entire Functions

Line Integrals and Entire Functions Line Integrls nd Entire Funtions Defining n Integrl for omplex Vlued Funtions In the following setions, our min gol is to show tht every entire funtion n be represented s n everywhere onvergent power series

More information

Where did dynamic programming come from?

Where did dynamic programming come from? Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf

More information

arxiv: v1 [math.ca] 21 Aug 2018

arxiv: v1 [math.ca] 21 Aug 2018 rxiv:1808.07159v1 [mth.ca] 1 Aug 018 Clulus on Dul Rel Numbers Keqin Liu Deprtment of Mthemtis The University of British Columbi Vnouver, BC Cnd, V6T 1Z Augest, 018 Abstrt We present the bsi theory of

More information

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic

Chapter 3. Vector Spaces. 3.1 Images and Image Arithmetic Chpter 3 Vetor Spes In Chpter 2, we sw tht the set of imges possessed numer of onvenient properties. It turns out tht ny set tht possesses similr onvenient properties n e nlyzed in similr wy. In liner

More information

Chapter 4 State-Space Planning

Chapter 4 State-Space Planning Leture slides for Automted Plnning: Theory nd Prtie Chpter 4 Stte-Spe Plnning Dn S. Nu CMSC 722, AI Plnning University of Mrylnd, Spring 2008 1 Motivtion Nerly ll plnning proedures re serh proedures Different

More information

8 THREE PHASE A.C. CIRCUITS

8 THREE PHASE A.C. CIRCUITS 8 THREE PHSE.. IRUITS The signls in hpter 7 were sinusoidl lternting voltges nd urrents of the so-lled single se type. n emf of suh type n e esily generted y rotting single loop of ondutor (or single winding),

More information

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51 Non Deterministic Automt Linz: Nondeterministic Finite Accepters, pge 51 1 Nondeterministic Finite Accepter (NFA) Alphbet ={} q 1 q2 q 0 q 3 2 Nondeterministic Finite Accepter (NFA) Alphbet ={} Two choices

More information

Electromagnetism Notes, NYU Spring 2018

Electromagnetism Notes, NYU Spring 2018 Eletromgnetism Notes, NYU Spring 208 April 2, 208 Ation formultion of EM. Free field desription Let us first onsider the free EM field, i.e. in the bsene of ny hrges or urrents. To tret this s mehnil system

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 5 Supplement Greedy Algorithms Cont d Minimizing lteness Ching (NOT overed in leture) Adm Smith 9/8/10 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov,

More information

Bisimulation, Games & Hennessy Milner logic

Bisimulation, Games & Hennessy Milner logic Bisimultion, Gmes & Hennessy Milner logi Leture 1 of Modelli Mtemtii dei Proessi Conorrenti Pweł Soboiński Univeristy of Southmpton, UK Bisimultion, Gmes & Hennessy Milner logi p.1/32 Clssil lnguge theory

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design nd Anlysis LECTURE 8 Mx. lteness ont d Optiml Ching Adm Smith 9/12/2008 A. Smith; sed on slides y E. Demine, C. Leiserson, S. Rskhodnikov, K. Wyne Sheduling to Minimizing Lteness Minimizing

More information

A Study on the Properties of Rational Triangles

A Study on the Properties of Rational Triangles Interntionl Journl of Mthemtis Reserh. ISSN 0976-5840 Volume 6, Numer (04), pp. 8-9 Interntionl Reserh Pulition House http://www.irphouse.om Study on the Properties of Rtionl Tringles M. Q. lm, M.R. Hssn

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 ) Neessry n suient onitions for some two vrile orthogonl esigns in orer 44 C. Koukouvinos, M. Mitrouli y, n Jennifer Seerry z Deite to Professor Anne Penfol Street Astrt We give new lgorithm whih llows us

More information

Metodologie di progetto HW Technology Mapping. Last update: 19/03/09

Metodologie di progetto HW Technology Mapping. Last update: 19/03/09 Metodologie di progetto HW Tehnology Mpping Lst updte: 19/03/09 Tehnology Mpping 2 Tehnology Mpping Exmple: t 1 = + b; t 2 = d + e; t 3 = b + d; t 4 = t 1 t 2 + fg; t 5 = t 4 h + t 2 t 3 ; F = t 5 ; t

More information

Compression of Palindromes and Regularity.

Compression of Palindromes and Regularity. Compression of Plinromes n Regulrity. Kyoko Shikishim-Tsuji Center for Lierl Arts Eution n Reserh Tenri University 1 Introution In [1], property of likstrem t t view of tse is isusse n it is shown tht

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Regular languages refresher

Regular languages refresher Regulr lnguges refresher 1 Regulr lnguges refresher Forml lnguges Alphet = finite set of letters Word = sequene of letter Lnguge = set of words Regulr lnguges defined equivlently y Regulr expressions Finite-stte

More information

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies Stte spce systems nlysis (continued) Stbility A. Definitions A system is sid to be Asymptoticlly Stble (AS) when it stisfies ut () = 0, t > 0 lim xt () 0. t A system is AS if nd only if the impulse response

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

Nondeterministic Finite Automata

Nondeterministic Finite Automata Nondeterministi Finite utomt The Power of Guessing Tuesdy, Otoer 4, 2 Reding: Sipser.2 (first prt); Stoughton 3.3 3.5 S235 Lnguges nd utomt eprtment of omputer Siene Wellesley ollege Finite utomton (F)

More information

1 Online Learning and Regret Minimization

1 Online Learning and Regret Minimization 2.997 Decision-Mking in Lrge-Scle Systems My 10 MIT, Spring 2004 Hndout #29 Lecture Note 24 1 Online Lerning nd Regret Minimiztion In this lecture, we consider the problem of sequentil decision mking in

More information

Co-ordinated s-convex Function in the First Sense with Some Hadamard-Type Inequalities

Co-ordinated s-convex Function in the First Sense with Some Hadamard-Type Inequalities Int. J. Contemp. Mth. Sienes, Vol. 3, 008, no. 3, 557-567 Co-ordinted s-convex Funtion in the First Sense with Some Hdmrd-Type Inequlities Mohmmd Alomri nd Mslin Drus Shool o Mthemtil Sienes Fulty o Siene

More information

6.5 Improper integrals

6.5 Improper integrals Eerpt from "Clulus" 3 AoPS In. www.rtofprolemsolving.om 6.5. IMPROPER INTEGRALS 6.5 Improper integrls As we ve seen, we use the definite integrl R f to ompute the re of the region under the grph of y =

More information

Math 32B Discussion Session Week 8 Notes February 28 and March 2, f(b) f(a) = f (t)dt (1)

Math 32B Discussion Session Week 8 Notes February 28 and March 2, f(b) f(a) = f (t)dt (1) Green s Theorem Mth 3B isussion Session Week 8 Notes Februry 8 nd Mrh, 7 Very shortly fter you lerned how to integrte single-vrible funtions, you lerned the Fundmentl Theorem of lulus the wy most integrtion

More information

Lecture 6: Coding theory

Lecture 6: Coding theory Leture 6: Coing theory Biology 429 Crl Bergstrom Ferury 4, 2008 Soures: This leture loosely follows Cover n Thoms Chpter 5 n Yeung Chpter 3. As usul, some of the text n equtions re tken iretly from those

More information

Lecture Summaries for Multivariable Integral Calculus M52B

Lecture Summaries for Multivariable Integral Calculus M52B These leture summries my lso be viewed online by liking the L ion t the top right of ny leture sreen. Leture Summries for Multivrible Integrl Clulus M52B Chpter nd setion numbers refer to the 6th edition.

More information

Lecture 09: Myhill-Nerode Theorem

Lecture 09: Myhill-Nerode Theorem CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2 CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Siene Deprtment Compiler Design Spring 7 Lexil Anlysis Smple Exerises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sienes Institute 47 Admirlty Wy, Suite

More information

Fast index for approximate string matching

Fast index for approximate string matching Fst index for pproximte string mthing Dekel Tsur Astrt We present n index tht stores text of length n suh tht given pttern of length m, ll the sustrings of the text tht re within Hmming distne (or edit

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information

A Mathematical Model for Unemployment-Taking an Action without Delay

A Mathematical Model for Unemployment-Taking an Action without Delay Advnes in Dynmil Systems nd Applitions. ISSN 973-53 Volume Number (7) pp. -8 Reserh Indi Publitions http://www.ripublition.om A Mthemtil Model for Unemployment-Tking n Ation without Dely Gulbnu Pthn Diretorte

More information

Magnetically Coupled Coil

Magnetically Coupled Coil Mgnetilly Coupled Ciruits Overview Mutul Indutne Energy in Coupled Coils Liner Trnsformers Idel Trnsformers Portlnd Stte University ECE 22 Mgnetilly Coupled Ciruits Ver..3 Mgnetilly Coupled Coil i v L

More information

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton 25. Finite Automt AUTOMATA AND LANGUAGES A system of computtion tht only hs finite numer of possile sttes cn e modeled using finite utomton A finite utomton is often illustrted s stte digrm d d d. d q

More information

ANALYSIS AND MODELLING OF RAINFALL EVENTS

ANALYSIS AND MODELLING OF RAINFALL EVENTS Proeedings of the 14 th Interntionl Conferene on Environmentl Siene nd Tehnology Athens, Greee, 3-5 Septemer 215 ANALYSIS AND MODELLING OF RAINFALL EVENTS IOANNIDIS K., KARAGRIGORIOU A. nd LEKKAS D.F.

More information

Engr354: Digital Logic Circuits

Engr354: Digital Logic Circuits Engr354: Digitl Logi Ciruits Chpter 4: Logi Optimiztion Curtis Nelson Logi Optimiztion In hpter 4 you will lern out: Synthesis of logi funtions; Anlysis of logi iruits; Tehniques for deriving minimum-ost

More information

LIP. Laboratoire de l Informatique du Parallélisme. Ecole Normale Supérieure de Lyon

LIP. Laboratoire de l Informatique du Parallélisme. Ecole Normale Supérieure de Lyon LIP Lortoire de l Informtique du Prllélisme Eole Normle Supérieure de Lyon Institut IMAG Unité de reherhe ssoiée u CNRS n 1398 One-wy Cellulr Automt on Cyley Grphs Zsuzsnn Rok Mrs 1993 Reserh Report N

More information

CS 491G Combinatorial Optimization Lecture Notes

CS 491G Combinatorial Optimization Lecture Notes CS 491G Comintoril Optimiztion Leture Notes Dvi Owen July 30, August 1 1 Mthings Figure 1: two possile mthings in simple grph. Definition 1 Given grph G = V, E, mthing is olletion of eges M suh tht e i,

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb. CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute Victor Admchik Dnny Sletor Gret Theoreticl Ides In Computer Science CS 5-25 Spring 2 Lecture 2 Mr 3, 2 Crnegie Mellon University Deterministic Finite Automt Finite Automt A mchine so simple tht you cn

More information

Section 1.3 Triangles

Section 1.3 Triangles Se 1.3 Tringles 21 Setion 1.3 Tringles LELING TRINGLE The line segments tht form tringle re lled the sides of the tringle. Eh pir of sides forms n ngle, lled n interior ngle, nd eh tringle hs three interior

More information

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals

AP Calculus BC Chapter 8: Integration Techniques, L Hopital s Rule and Improper Integrals AP Clulus BC Chpter 8: Integrtion Tehniques, L Hopitl s Rule nd Improper Integrls 8. Bsi Integrtion Rules In this setion we will review vrious integrtion strtegies. Strtegies: I. Seprte the integrnd into

More information

Data Structures and Algorithm. Xiaoqing Zheng

Data Structures and Algorithm. Xiaoqing Zheng Dt Strutures nd Algorithm Xioqing Zheng zhengxq@fudn.edu.n String mthing prolem Pttern P ours with shift s in text T (or, equivlently, tht pttern P ours eginning t position s + in text T) if T[s +... s

More information

Formal Languages and Automata

Formal Languages and Automata Moile Computing nd Softwre Engineering p. 1/5 Forml Lnguges nd Automt Chpter 2 Finite Automt Chun-Ming Liu cmliu@csie.ntut.edu.tw Deprtment of Computer Science nd Informtion Engineering Ntionl Tipei University

More information

Learning Partially Observable Markov Models from First Passage Times

Learning Partially Observable Markov Models from First Passage Times Lerning Prtilly Oservle Mrkov s from First Pssge s Jérôme Cllut nd Pierre Dupont Europen Conferene on Mhine Lerning (ECML) 8 Septemer 7 Outline. FPT in models nd sequenes. Prtilly Oservle Mrkov s (POMMs).

More information

Chem Homework 11 due Monday, Apr. 28, 2014, 2 PM

Chem Homework 11 due Monday, Apr. 28, 2014, 2 PM Chem 44 - Homework due ondy, pr. 8, 4, P.. . Put this in eq 8.4 terms: E m = m h /m e L for L=d The degenery in the ring system nd the inresed sping per level (4x bigger) mkes the sping between the HOO

More information

How to simulate Turing machines by invertible one-dimensional cellular automata

How to simulate Turing machines by invertible one-dimensional cellular automata How to simulte Turing mchines by invertible one-dimensionl cellulr utomt Jen-Christophe Dubcq Déprtement de Mthémtiques et d Informtique, École Normle Supérieure de Lyon, 46, llée d Itlie, 69364 Lyon Cedex

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 Automt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Problem (II) Chpter II.6.: Push Down Automt Remrk: This mteril is no longer tught nd not directly exm relevnt Anton Setzer (Bsed

More information

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours Mi-Term Exmintion - Spring 0 Mthemtil Progrmming with Applitions to Eonomis Totl Sore: 5; Time: hours. Let G = (N, E) e irete grph. Define the inegree of vertex i N s the numer of eges tht re oming into

More information

Generalization of 2-Corner Frequency Source Models Used in SMSIM

Generalization of 2-Corner Frequency Source Models Used in SMSIM Generliztion o 2-Corner Frequeny Soure Models Used in SMSIM Dvid M. Boore 26 Mrh 213, orreted Figure 1 nd 2 legends on 5 April 213, dditionl smll orretions on 29 My 213 Mny o the soure spetr models ville

More information

Lossless Compression Lossy Compression

Lossless Compression Lossy Compression Administrivi CSE 39 Introdution to Dt Compression Spring 23 Leture : Introdution to Dt Compression Entropy Prefix Codes Instrutor Prof. Alexnder Mohr mohr@s.sunys.edu offie hours: TBA We http://mnl.s.sunys.edu/lss/se39/24-fll/

More information

Lecture 1 - Introduction and Basic Facts about PDEs

Lecture 1 - Introduction and Basic Facts about PDEs * 18.15 - Introdution to PDEs, Fll 004 Prof. Gigliol Stffilni Leture 1 - Introdution nd Bsi Fts bout PDEs The Content of the Course Definition of Prtil Differentil Eqution (PDE) Liner PDEs VVVVVVVVVVVVVVVVVVVV

More information

Trigonometry and Constructive Geometry

Trigonometry and Constructive Geometry Trigonometry nd Construtive Geometry Trining prolems for M2 2018 term 1 Ted Szylowie tedszy@gmil.om 1 Leling geometril figures 1. Prtie writing Greek letters. αβγδɛθλµπψ 2. Lel the sides, ngles nd verties

More information

Regular expressions, Finite Automata, transition graphs are all the same!!

Regular expressions, Finite Automata, transition graphs are all the same!! CSI 3104 /Winter 2011: Introduction to Forml Lnguges Chpter 7: Kleene s Theorem Chpter 7: Kleene s Theorem Regulr expressions, Finite Automt, trnsition grphs re ll the sme!! Dr. Neji Zgui CSI3104-W11 1

More information

Scientific notation is a way of expressing really big numbers or really small numbers.

Scientific notation is a way of expressing really big numbers or really small numbers. Scientific Nottion (Stndrd form) Scientific nottion is wy of expressing relly big numbers or relly smll numbers. It is most often used in scientific clcultions where the nlysis must be very precise. Scientific

More information