Title. Author(s) 髙木, 拓也. Issue Date DOI. Doc URL. Type. File Information. Studies on Efficient Index Construction for Multiple

Size: px
Start display at page:

Download "Title. Author(s) 髙木, 拓也. Issue Date DOI. Doc URL. Type. File Information. Studies on Efficient Index Construction for Multiple"

Transcription

1 Title Studies on Efficient Index Construction for Multiple Author(s) 髙木, 拓也 Issue Dte DOI /doctorl.k13077 Doc URL Type theses (doctorl) File Informtion Tkuy_Tkgi.pdf Instructions for use Hokkido University Collection of Scholrly nd Ac

2 Studies on Efficient Index Construction for Multiple nd Repetitive Texts ( ) Tkuy Tkgi Jnury 2018 Division of Computer Science nd Informtion Technology Grdute School of Informtion Science nd Technology Hokkido University

3

4 Astrct Text indexing prolem is one of the fundmentl prolems in computer science nd the im is to construct n efficient dt structure tht nswers queries such s text pttern mtching. For the lst decdes, there hs een n incresing mount of multiple texts such s dt generted from multiple sensors nd repetitive texts such s genome sequence collections. For exmple, the GeoLife Project collects trjectories from GPS loggers tht hve vriety of smpling rtes. These trjectories were recorded every 1 to 5 seconds or every 5 to 10 meters per point. For nother exmple, the 1000 Genomes Project collects the humn genomes from vrious groups. Since ech genome informtion is similr to ech other, the sme sustructures pper repetedly in this genome dtse. These projects re iming t dt nlysis, informtion retrievl, nd dt mining for text informtion. For pttern mtching, which is the most fundmentl query for texts, we cn nswer queries y using sic text pttern mtching lgorithms such s Knuth-Morris-Prtt (KMP) lgorithm nd Boyer-Moore (BM) lgorithm. Since these lgorithms scn the texts for ech query, it requires t lest liner time for dtse size in one query. In order to quickly process these dt, preprocessing nd indexing re importnt. For exmple, the suffix tree, one of the sic text indexes, cn support pttern mtching in liner time for pttern length. Therefore, uilding n efficient index structure is the key to processing these lrge mounts of text informtion. In this thesis, we show efficient index construction lgorithms for text dt. For multiple texts nd repetitive texts, there re severl prolems with indexing. 1

5 Since dt grow constntly for multiple sensor dt such s GPS trjectories, it is necessry for the index to support online construction for multiple texts. For repetitive texts tht is similr text collection such s genome sequences, we should e le to uild n index with more compressed size. In order to solve these prolems, we propose severl new index structures nd construction lgorithms. In prticulr, this thesis dels with speeding up construction nd opertions of indexes, online construction of indexes for multiple texts, nd construction of compressed indexes for texts including long repetitions. In Chpter 3, we propose fster version of leled trees (compct tries) clled pcked compct tries, y using it-prllel method. By doing this, we show fster construction of text indexes such s suffix trees nd fster vrious opertions like prefix serch, insertion, nd deletion. Since the compct trie is widely used dt structure, we cn speed up some lgorithms y using pcked compct tries. In prticulr, we show tht LZ-doule fctoriztion which is one kind of text compression lgorithm is speeded up. In Chpter 4, we first defined fully-online construction prolem, which is setting tht llows new input symol cn e dded n ritrry string of the set of input strings. To solve this prolem, we first showed fully-online construction lgorithm of DAG index clled the directed cyclic word grph (DAWG). We lso proposed fully-online construction lgorithm for the suffix tree using similrity etween DAWGs nd suffix trees. In Chpter 5, we proposed self-indexing method y comining n index clled the compct directed cyclic word grph (CDAWG) with grmmr compression, which is one of the compression methods. When the input text is compressile, the index cn e held with size smller thn the originl text. In Chpter 6, we give conclusions nd future work. Overll, we studied efficient lgorithms for text index construction in this thesis. 2

6 Contents 1 Introduction Bckground Reserch gols Summry of the results Contriutions of this thesis Preliminries Nottions on Strings Nottions on grphicl indexes Suffix tries Suffix trees Directed cyclic word grphs (DAWGs) Dulity of suffix trees nd DAWGs Compct directed cyclic word grphs (CDAWGs) Pcked Compct Tries Bckground Relted work Preliminries Compct tries

7 3.2.2 Dynmic predecessor dt structures Pcked dynmic compct tries Micro dynmic compct tries for short strings Pcked dynmic compct tries for long strings Micro trie decomposition Speeding-up with hshing Applictions to online string processing Preliminry experiments Conclusions of Chpter Fully-online Construction of Suffix trees for Multiple Texts Bckground Relted work Preliminries Suffix trees nd DAWGs for multiple texts Fully-online text collection Fully-online version of DAWG nd Weiner s suffix tree lgorithm Semi-online construction of Weiner s suffix trees nd DAWGs Fully-online construction of Weiner s suffix trees nd DAWGs Fully-online version of Ukkonen s suffix tree lgorithm Semi-online left-to-right suffix tree construction Difficulties in fully-online left-to-right suffix tree construction Fully-online left-to-right suffix tree lgorithms Conclusions of Chpter Liner-size Compct Directed Acyclic Word Grphs Bckground Preliminries

8 5.2.1 LSTrie Stright-line progrms The proposed dt structure: L-CDAWG Outline Constructing type-2 nodes nd edge suffix links Construction of the SLP for L-CDAWG The min result Conclusions of Chpter Conclusions nd Future Work Summry of the results Future work

9

10 Chpter 1 Introduction 1.1 Bckground For the lst decdes, there hs een n incresing mount of unstructured dt such s genetic dt, logging dt, nd We nd SNS texts, which hve een coined s ig dt. Most of these unstructured dt re ville in the form of text informtion. Therefore, there re demnds for lgorithms nd dt structure tht cn efficiently hndle these ig unstructured dt. Multiple growing texts nd repetitive texts re one of the fetures of text dt such s logging dt nd genetic dt. Multiple growing texts re text set tht cn e ppended new symol t the end of text in the set. There re mny text dt with this feture in the rel world. For exmple, due to the rpid development of network nd sensor technologies, vrious nd enormous strem dt re generted from multiple source such s GPS trjectory dt [72], sensor nd Twitter strems. These re represented s multiple texts or multiple sequences tht re constntly growing. Another feture of textul ig dt is clled repetitiveness [56]. It mens kind of text sets consisting of similr texts. For exmple, genome sequences [21] nd versioned document collections such s softwre repositories re one of the highly repetitive texts. 7

11 These dt contin mny long repetitions in the text. 1.2 Reserch gols In order to use ig dt, it is necessry to perform vrious queries such s dt mining nd informtion retrievl. However, ecuse of the mssive mount of dt, even simple queries such s text pttern mtching tke too much time. One of the solutions is to preprocess those dt nd crete n index tht supports the query in order to nswer quickly. Among indexes for texts, those indexes tht hve ll sustring informtion of ech text supports the most diverse queries. In this thesis, we study efficient index construction for multiple texts nd repetitive texts. There re the following demnds for construction of indexes with these text dt. First, in order to process lrge mount of dt t high speed, we wnt n index tht supports fst queries. Second, in order to construct n index for multiple growing texts, we need n index tht enles online construction for multiple texts. Finlly, to store lrge mount of dt, it must e smll in its size. 1.3 Summry of the results In this thesis, there re three min results to text indexing s follows. In Chpter 2, we introduce nottions nd definitions of some dt structures. In Chpter 3, we study ccelertion of compct tries using the pcked string technique. The dynmic compct trie [42, 65] is fundmentl dt structure for storing set of vrile-length strings. It cn store set of k strings over n lphet Σ with totl size n in O(n log n) its of spce. we propose pcked compct tries tht support fster prefix serch queries nd updte opertions of compct tries on the stndrd word RAM model. It still keeps n log σ + O(k log n) its of spce. 8

12 In Chpter 4, we study fully-online construction of DAWG nd suffix trees for multiple texts. Let T = {T 1,..., T K } e collection of texts. By fully-online, we men tht new chrcter cn e ppended to ny text in T t ny time. This is nturl generliztion of semi-online construction of indexing dt structures for multiple texts in which, fter new chrcter is ppended to the k-th text T k, then its previous texts T 1,..., T k 1 will remin sttic. We propose fully-online lgorithms which construct the directed cyclic word grph (DAWG) [14], nd the generlized suffix tree (GST ) [42] for T in O(n log σ) time nd O(n) words of spce, where n nd σ denote the totl length of texts in T nd the lphet size, respectively. In Chpter 5, we study compressed index comining CDAWGs nd grmmr compression. Recent studies hve shown tht the compct directed cyclic word grphs (CDAWG) [15] topology chieves the compressed size for repeted strings. However, there is no known method for supporting high-speed serch with the compressed size without hving the originl input string. Liner-size CDAWG proposed in this thesis chieves the compressed size while supporting serch time similr to originl CDAWG. In Chpter 6, we give the summry of this thesis, nd then discuss possile future reserches. 1.4 Contriutions of this thesis We studied three fundmentl prolems which re necessry when we construct the index tht cn efficiently hndle mssive mount of text dt. A verstile text index hs three fetures: high speed queries, fully-online construction, nd, smll spce complexity. Ech result of this thesis shows n index which chieves one of the three fetures. First, s sis of efficient text indexes llowing high speed query processing, we proposed n improved dt structure supporting high speed construction nd queries y using it-prllel methods. Secondly, for multiple growing texts like strem dt 9

13 from multiple sensors, we proposed construction lgorithm of n index in fully-online mnner. Thirdly, for texts tht contin mny repetitive structures, we proposed n index tht cn cpture the repeting structure nd store it in compressed size. Overll, we studied efficient lgorithms for text index construction which re sis to chieve n index with the three fetures. 10

14 Chpter 2 Preliminries In this chpter, we introduce sic definitions nd nottions in strings, suffix tries, suffix trees, directed cyclic word grphs, nd compct directed cyclic word grphs ccording to [24 26, 42]. 2.1 Nottions on Strings Let Σ e n ordered lphet. Any element of Σ is clled string. For ny string T, let T denote its length. Let ε e the empty string, nmely, ε = 0. If T = XY Z, then X, Y, nd Z re clled prefix, sustring, nd suffix of T, respectively. For ny 1 i j T, let T [i..j] denote the sustring of T tht egins t position i nd ends t position j in T. For ny 1 i T, let T [i] denote the ith chrcter of T. For ny string T, let Suffix(T ) denote the set of suffixes of T, nd for ny set T of strings, let Suffix(T ) denote the set of suffixes of ll strings in T. Nmely, Suffix(T ) = T T Suffix(T ). For ny string T, let T denote the reversed string of T, i.e., T = T [ T ] T [1]. Let T = {T 1,..., T K } e collection of K texts. For ny 1 k K, let lrs T (T k ) e the longest repeting suffix of T k tht occurs t lest twice in T. For ny strings 11

15 X, Y, LCP(X, Y ) denotes the longest common prefix of X nd Y. Throughout this thesis, the se of the logrithms will e 2, unless otherwise stted. For ny integers i j, [i, j] denotes the intervl {i, i + 1,..., j}. Our model of computtion is the stndrd word RAM of word size w = log n its. For simplicity, we ssume tht w is multiple of log σ, so α = log σ n letters re pcked in single word. Since we cn red w its in constnt time, we cn red nd process α consecutive letters in constnt time. 2.2 Nottions on grphicl indexes All index structures delt with in this thesis, such s suffix tries, suffix trees, CDAWGs, liner-size suffix tries (LSTries), nd liner-size CDAWGs (L-CDAWGs), re grphicl indexes in the sense tht n index is pointer-sed structure uilt on n underlying DAG G L = (V (L), E(L)) with root r V (L) nd mpping l : E(L) Σ + tht ssign lel idl(e) to ech edge e E(L). For n edge e = (u, v) E(L), we denote its end points y e.hi := u nd e.lo := v, respectively. The lel string of e is idl(e) Σ +. The string length of e is idslen(e) := idl(e) 1. An edge is clled tomic if idslen(e) = 1, nd thus, idl(e) Σ. For pth p = (e 1,..., e k ) of length k 1, we extend its end points, lel string, nd string length y p.hi := e 1.hi, p.lo := e k.lo, idl(p) := idl(e 1 )... idl(e k ) Σ +, nd idslen(p) := idslen(e 1 ) + + idslen(e k ) 1, respectively. 2.3 Suffix tries The suffix trie for text collection T = {T 1,..., T K }, denoted STrie(T ), is trie which represents Suffix(T ). The size of STrie(T ) is O(n 2 ), where n is the totl length of texts in T. We identify ech node v of STrie(T ) with the string tht v represents. 12

16 Suffix Trie c c Suffix Tree c c c c c c c c c c c Pth compction Minimiztion CDAWG c c c DAWG c Minimiztion c c c c c Pth compction Figure 2.1: Illustrtion of STrie(T ), STree(T ), DAWG(T ), nd CDAWG(T ) with T = c. The solid rrows nd roken rrows represent the edges nd the suffix links of ech dt structure, respectively. A sustring x of text in T is sid to e rnching in T, if there exist two distinct chrcters, Σ such tht oth x nd x re sustrings of some texts in T. Clerly, node x of STrie(T ) is rnching iff x is rnching in T. For ech node v of STrie(T ) with Σ nd v Σ, let slink(v) = v. This uxiliry edge slink(v) = v from v to v is clled suffix link. We define the reversed suffix link W (v) = v iff slink(v) = v. For ny node v nd Σ, if v is not sustring of the texts in T, then W (v) is undefined. By definition, the reversed suffix links on STrie(T ) form rooted tree which coincides with STrie(T ), the suffix trie for the collection T = {T 1,..., T K } of the reversed texts. 13

17 2.4 Suffix trees The suffix tree [68] for text collection T, denoted STree(T ), is compcted trie which represents Suffix(T ). STree(T ) is otined y compcting every pth of STrie(T ) which consists of non-rnching internl nodes (see Fig. 2.1). Since every internl node of STree(T ) is rnching, nd since there re t most n leves in STree(T ), the numers of edges nd nodes re O(n). The edge lels of STree(T ) re non-empty sustrings of some text in T. By representing ech edge lel x with triple k, i, j of integers s.t. x = T k [i..j], STree(T ) cn e stored with O(n) spce. We sy tht ny rnching (resp. non-rnching) sustring of T is n explicit node (resp. implicit node) of STree(T ). An implicit node x is represented y triple (v,, l), clled reference to x, such tht v is n explicit ncestor of x, is the first chrcter of the pth from v to x, nd l is the length of the pth from v to x. A reference (v,, l) to node x is clled cnonicl if v is the lowest explicit ncestor of x. For ech explicit node v of STree(T ) with Σ nd v Σ, let slink(v) = v. For ech explicit node v nd Σ, we lso define the reversed suffix link W (v) = vx where x Σ is the shortest string such tht vx is n explicit node of STree(T ). W (v) is undefined if v is not sustring of texts in T. These reversed suffix links re lso clled s Weiner links (or W-link in short) in the literture [16]. A W-link W (v) = vx is sid to e hrd if x = ε, nd soft if x Σ +. Let w e Boolen function such tht for ny explicit node v nd Σ, w (v) = 1 iff (soft or hrd) W-link W (v) exists. Notice tht if w (v) = 1 for node v nd Σ, then w (u) = 1 for every ncestor of v. 2.5 Directed cyclic word grphs (DAWGs) The directed cyclic word grph (DAWG in short) [14,15] of text collection T, denoted DAWG(T ), is smllest DAG which represents Suffix(T ). DAWG(T ) is otined y 14

18 merging identicl sutrees of STrie(T ) connected y the suffix links (see Fig. 2.1). Hence, the lel of every edge of DAWG(T ) is single chrcter. The numers of nodes nd edges of DAWG(T ) re O(n) [15], nd hence DAWG(T ) cn e stored with O(n) spce. DAWG(T ) cn e defined formlly s follows: For ny string x, let Epos T (x) e the set of ending positions of x in the texts in T, i.e., Epos T (x) = {(k, j) x = T k [j x + 1..j], 1 j T k, 1 k K}. Consider n equivlence reltion T on sustrings x, y of texts in T such tht x T y iff Epos T (x) = Epos T (y). For ny sustring x of texts of T, let [x] T denote the equivlence clss w.r.t. T. There is one-to-one correspondence etween ech node v of DAWG(T ) nd ech equivlence clss [x] T, nd hence we will identify ech node v of DAWG(T ) with its corresponding equivlence clss [x] T. Let long([x] T ) denote the longest memer of [x] T. By the definition of equivlence clsses, long([x] T ) is unique for ech [x] T nd every memer of [x] T is suffix of long([x] T ). If x, x re sustrings of some text in T with x Σ nd Σ, then there exists n edge leled with chrcter Σ from node [x] T to node [x] T. This edge is clled primry if long([x] T ) + 1 = long([x] T ), nd is clled secondry otherwise. For ech node [x] T of DAWG(T ) with x 1, let slink([x] T ) = y, where y is the longest suffix of long([x] T ) which does not elong to [x] T. In the exmple of Fig. 2.1, [] T = {, }. The edge leled with from node [] T to node [] T is primry, while the edge leled with from [] T to node [] T is secondry. slink([] T ) = [] T. 2.6 Dulity of suffix trees nd DAWGs There exists nice dulity etween suffix trees nd DAWGs. To oserve this, it is convenient to consider the collection T of the reversed texts ech of which egins with specil mrker $ i, i.e., T = {$ 1 T 1,..., $ K T K }. For ese of nottion, let S k = T k for 15

19 1 k K nd S = {$ 1 S 1,..., $ K S K } = T. Then, it is known (c.f. [14, 15, 25]) tht the reversed suffix links of DAWG(S) coincide with the suffix tree STree(T ) for the originl text collection T. This fct cn lso e oserved from the other direction. Nmely, the hrd (resp. soft) W-links of STree(T ) coincide with the primry (resp. secondry) edges of DAWG(S). Intuitively, this dulity holds ecuse (1) The reversed suffix links of STrie(S) form STrie(T ) (nd vice vers), nd (2) When we construct DAWG(S) from STrie(S), we merge isomorphic sutrees tht re connected y suffix links. During this merging process, the reversed suffix links get compcted nd the resulting compcted links form the edges of STree(T ). Using this dulity, we cn immeditely show tht the totl numer of hrd nd soft W-links is liner in the totl text length n, since the numer of edges of the DAWG is liner in n. This lso mens tht we cn esily mintin the Boolen indictor w with O(n) spce, so tht w (v) for given node v nd Σ cn e nswered in O(log σ) time (e.g., t ech node v we cn mintin BST storing only the chrcters c s.t. w c (v) = 1.) 2.7 Compct directed cyclic word grphs (CDAWGs) The compct directed cyclic word grph [15, 26] for text T, denoted CDAWG(T ), is the miniml compct utomton which represents Suffix(T ). CDAWG(T ) cn e otined from STree(T $) y merging isomorphic sutrees nd deleting ssocited endmrker $ Σ. Since CDAWG(T ) is n edge-leled DAG, we represent directed edge from node u to v with lel string x Σ + y triple f = (u, x, v). For ny node u, the lel strings of out-going edges from u strt with mutully distinct chrcters. 16

20 Formlly, CDAWG(T ) is defined s follows. For ny strings x, y, we denote x L y (resp. x R y) iff the eginning positions (resp. ending positions) of x nd y in T re equl. Let [x] L (resp. [x] R ) denote the equivlence clss of strings w.r.t. L (resp. R ). All strings tht re not sustrings of T form single equivlence clss, nd in the sequel we will consider only the sustrings of T. Let x (resp. x ) denote the longest memer of the equivlence clss [x] L (resp. [x] R ). Notice tht ech memer of [x] L (resp. [x] R ) is prefix of x (resp. suffix of x ). Let x = ( x ) = ( x ). We denote x y iff x = y, nd let [x] denote the equivlence clss w.r.t.. The longest memer of [x] is x nd we will lso denote it y vlue([x]). We define CDAWG(T ) s n edge-leled DAG (V, E) such tht V = {[ x ] R x is sustring of T } nd E = {([ x ] R, α, [ x α] R ) α Σ +, x x α}. The opertor corresponds to compcting non-rnching edges (like conversion from STrie(T ) to STree(T )) nd the [ ] R opertor corresponds to merging isomorphic sutrees of STree(T ). For simplicity, we use nottion so tht when we refer to node of CDAWG(T ) s [x], this implies x = x nd [x] = [ x ] R. Let [x] e ny node of CDAWG(T ) nd consider the suffixes of vlue([x]) which correspond to the suffix tree nodes tht re merged when trnsformed into the CDAWG. We define the suffix link of node [x] y slink([x]) = [y], iff y is the longest suffix of vlue([x]) tht does not elong to [x]. It is shown tht ll nodes of CDAWG(T ) except the sink correspond to the mximl repets of T. Actully, vlue([x]) is mximl repet in T [58]. Following this fct, one cn esily see tht the numers of edges of CDAWG(T ) nd CDAWG(T ) coincide with the numers e r T nd el T respectively [9, 58]. of right- nd left- extensions of mximl repets of T, By representing ech edge lel α with pirs (i, j) of integers such tht T [i..j] = α, CDAWG(T ) cn e stored in O(e r T log n + n log σ) its of spce. 17

21

22 Chpter 3 Pcked Compct Tries In this chpter, we present new dt structure clled the pcked compct trie (pcked c-trie) which stores set S of k strings of totl length n in n log σ + O(k log n) its of spce nd supports fst pttern mtching queries nd updtes, where σ is the lphet size. Assume tht α = log σ n letters re pcked in single mchine word on the stndrd word RAM model, nd let f(k, n) denote the query nd updte times of the dynmic predecessor/successor dt structure of our choice which stores k integers from universe [1, n] in O(k log n) its of spce. Then, given string of length m, our pcked c-tries support pttern mtching queries nd insert/delete opertions in O( m f(k, n)) α worst-cse time nd in O( m α + f(k, n)) expected time. Our experiments show tht our pcked c-tries re fster thn the stndrd compct tries (.k.. Ptrici trees) on rel dt sets. As n ppliction of our pcked c-trie, we show tht the sprse suffix tree for string of length n over prefix codes with k smpled positions, such s evenly-spced nd word delimited sprse suffix trees, for set of k word suffixes cn e constructed online in O(( n + k)f(k, n)) worst-cse time nd O( n + kf(k, n)) α α expected time with n log σ + O(k log n) its of spce. When k = O( n ), y using the α stte-of-the-rt dynmic predecessor/successor dt structures, we otin su-liner time construction lgorithms using only O( n ) its of spce in oth cses. α 19 We lso

23 discuss n ppliction of our pcked c-tries to online LZD fctoriztion. 3.1 Bckground The trie for set S of strings of totl length n is clssicl dt structure which occupies O(n log n + n log σ) its of spce nd llows for prefix serch nd insertion/deletion for given string of length m in O(m log σ) time, where σ is the lphet size. compct trie for S is pth-compressed trie where the edges in every non-rnching pth re merged into single edge [53]. By representing ech edge lel y pir of positions in string in S, the compct trie cn e stored in n log σ + O(k log n) its of spce, where k is the numer of strings in S, retining the sme time efficiency for prefix serch nd insertion/deletion for given string. Thus, compct tries hve widely een used in numerous pplictions such s dynmic dictionry mtching [44], suffix trees [68], sprse suffix trees [47], externl string indexes [30], nd grmmr-sed text compression [39]. In this chpter, we show how to ccelerte prefix serch queries nd updte opertions of compct tries on the stndrd word RAM model with mchine word size w = log n, still keeping n log σ + O(k log n)-it spce usge. A sic ide is to use the pcked string mtching pproch [12], where α = log σ n consecutive letters re pcked in single word nd cn e mnipulted in O(1) time. In this setting, we cn red given pttern P of length m in O( m ) time, ut, during the trversl of P over com- α pct trie, there cn e t most m rnching nodes. Thus, nïve implementtion of compct trie tkes O( m + m log σ) = O(m log σ) time even in the pcked mtching log σ n setting. To overcome the ove difficulty, we propose how to quickly process long nonrnching pths using it mnipultions, nd how to quickly process dense rnching sutrees using fst predecessor/successor queries nd dictionry look-ups. As result, 20 The

24 we otin new compct trie clled the pcked compct trie (pcked c-trie) for dynmic set S of strings with the following efficiency: Theorem 1 (min result) Let f(k, n) e the query/updte times of n ritrry dynmic predecessor/successor dt structure using O(k log n) its of spce for dynmic set of k integers from the universe [1, n]. Our pcked c-trie stores set S of k strings of totl length n in n log σ + O(k log n) its of spce nd supports prefix serch nd insertion/deletion for given string of length m in O( m f(k, n)) worst-cse time or in α O( m α + f(k, n)) expected time. Using Beme nd Fich s dt structure [6] or Willrd s y-fst trie [70] s the dynmic predecessor/successor dt structure, we otin the following corollry: Corollry 2 There exists pcked c-trie for dynmic set S of strings which uses n log σ+o(k log n) its of spce, nd supports prefix serch nd insert/delete opertions for given string of length m in O( m α log log n) expected time. log log k log log n log log log n ) worst-cse time or in O( m α + Unlike most other (compct) tries, our pcked c-trie does not mintin dictionry or serch structure for the children of ech node. Insted, we prtition our c-trie into h/α levels, where h is the length of the longest string in S. Then ech sutree of height α, clled micro c-trie, mintins predecessor/successor dictionry tht processes prefix serch inside the micro c-trie. A reduction from prefix serch to predecessor/successor queries ws lredy considered in n erlier work y Cole et l. [19], however, their dt structure is sttic. On the other hnd, our micro c-tries re dynmic. A similr technique to our pcked c-trie ws used in the linked dynmic uncompcted trie y Jnsson et l. [46]. Our experiments show tht our pcked c-tries re fster thn Ptrici trees for oth construction nd prefix serch in lmost ll dt sets we tested. 21

25 We show tht our pcked c-tries cn e pplied to efficient online construction of evenly sprse suffix trees [47], word suffix trees [45] nd its extension [64]. Also, pcked c-tries cn e used for online computtion of the LZ-Doule fctoriztion [39] (LZDF ), stte-of-the-rt online grmmr-sed text compressor. We lso show two pplictions to our pcked c-tries. The first ppliction is online construction of evenly sprse suffix trees [47], word suffix trees [45] nd its extension [64]. The existing lgorithms for these sprse suffix trees tke O(n log σ) worst-cse time using n log σ + O(k log n) its of where k is the numer of suffixes stored in the output sprse suffix tree. Using our pcked c-tries, we chieve O(( n α + k) log log k log log n log log log n ) worst-cse construction time nd O( n + k log log n) expected construction time. The α former is suliner in n when k = O( n ) nd σ = polylog(n), the ltter is suliner in α n when k = o( n log log n ) nd σ = polylog(n). To chieve these results, we show tht in our pcked c-trie, prefix serches nd insertion opertions cn e strted not only from the root ut from ny node. This cpility is necessry for online sprse suffix tree construction, since during the suffix link trversl we hve to insert new leves from non-root internl nodes. The second ppliction is online computtion of the LZ-Doule fctoriztion [39] (LZDF ), stte-of-the-rt online grmmr-sed text compressor. Goto et l. [39] presented Ptrici-tree sed lgorithm which computes the LZDF of given string T of length n in O(k(M + min{k, M} log σ)) worst-cse time using O(n log σ) its of spce, where k n is the numer of fctors nd M n is the length of the longest fctor. Using our pcked c-tries, we chieve good expected performnce with O(k( M α + f(k, n))) time for LZDF Relted work Belzzougui et l. [7] proposed rndomized compct trie clled the signed dynmic z-fst trie, which stores dynmic set S of k strings in n log σ + O(k log n) its of 22

26 spce. Given string of length m, the signed dynmic z-fst trie supports prefix serch in O( m + log m) worst-cse time only with high proility, nd supports insert/delete α opertions in O( m + log m) expected time only with high proility.1 On the other α hnd, our pcked c-trie lwys return the correct nswer for prefix serch, nd lwys insert/delete given string correctly, in the ounds stted in Theorem 1 nd Corollry 2. Andersson nd Thorup [3] proposed the exponentil serch tree which uses n log σ + O(k log n) its of spce, nd supports prefix serch nd insert/delete opertions in O(m + ) worst-cse time. Ech node v of the exponentil serch tree stores log k log log k constnt-time look-up dictionry for some children of v nd dynmic predecessor/successor dt structure for the other children of v. This implies tht given string of length m, t most m nodes in the serch pth for the string must e processed one y one, nd hence pcking α = log σ n letters in single word does not seem to speed-up the exponentil serch tree. Fischer nd Gwrychowski s wexponentil serch tree [33] proposed uses n log σ + O(k log n) its of spce, nd supports prefix serch nd insert/delete opertions in O(m + (log log σ)2 ) worst-cse time. When σ = polylog(n), our pcked c-trie chieves log log log σ log σ log log k log log n O(m log n log log log n ) = O(m (log log n)2 log n log log log n wexponentil serch tree requires O(m + ) = O(o(1)m) worst-cse time, while the (log log log n)2 log log log log n ) time2. 1 The O(log m) expected ound for insertion/deletion stted in [7] ssumes tht the prefix serch for the string hs lredy een performed. 2 For sufficiently long ptterns of length m = Θ(n), our pcked c-trie chieves worst-cse suliner o(n) time while the wexponentil serch tree requires O(n) time. 23

27 3.2 Preliminries Compct tries Let S = {X 1,..., X k } e set of k non-empty strings of totl length n. We consider dynmic dt structures for S llowing for fst prefix serches of given ptterns over strings in S, nd fst insertion/deletion of strings to/from S. Suppose S is prefix-free. The trie of S is tree s.t. ech edge is leled y single letter, the lels of the out edges of ech node re distinct, nd for ech X i S there is unique lef l i s.t. the pth from the root to l i spells out X i. The compct trie T S of S is pth-compressed trie otined y contrcting nonrnching pths into single edges. Nmely, in T S, ech edge is leled y non-empty sustring of T, ech internl node hs t lest two children, the out-going edges from ech node egin with distinct letters, nd ech edge lel x is encoded y triple i,, such tht x = X i [..] for some 1 i k nd 1 X i. The length of n edge e, denoted e, is the length of its lel string. Let root(t S ) denote the root of the compct trie T S. For ny node v, let prent(v) denotes its prent. For convenience, let e n uxiliry node s.t. prent(root(t S )) =. We ssume the edge from to root(t S ) is leled y n ritrry letter. For ny node v, let str(v) denotes the string otined y conctenting the edge lels from the root to v. Ech node v stores str(v). Let s e prefix of ny string in S. Let v e the shllowest node of T S such tht s is suffix of str(v) (notice s cn e equl to str(v)), nd let u = prent(v). The locus of string s in T S is pir ϕ = (e, h), where e is the edge from u to v nd h is the offset from u, nmely, h = s str(u). 3 We extend the str function to locus ϕ, so tht str(ϕ) = s. The string depth of locus ϕ is d(ϕ) = str(ϕ). A string P is recognized y 3 In the literture the locus is represented y (u, c, h) where c is the first letter of the lel of e. Since our pcked c-trie does not mintin serch structure for rnches, we represent the locus directly on e. 24

28 T S iff there is locus ϕ with str(ϕ) = P. We consider the following query nd opertions on dynmic compct tries. LPS(ϕ, P ): Given locus in T S nd pttern string P, it returns the locus ˆϕ of string str(ϕ)q in T S, where Q is the longest prefix of P for which str(ϕ)q is recognized y T S. When ϕ = ((, root(t S )), 1), then the query is known s the longest prefix serch for the pttern P in the compct trie. Insert(ϕ, X): Given locus ϕ in T S nd string X, it inserts new lef which corresponds to new string str(ϕ)x S into the compct trie, from the given locus ϕ. When there is no node t the locus ˆϕ = LPS(ϕ, X), then new node is creted t ˆϕ s the prent of the lef. When ϕ = ((, root(t S )), 1), then this is stndrd insertion of string X to T S. Delete(X i ): Given string X i S, it deletes the lef l i. If the out-degree of the prent v of l i ecomes 1 fter the deletion of l i, then the in-coming nd out-going edges of v re merged into single edge, nd v is lso deleted Dynmic predecessor dt structures. For dynmic set I [1, n] of k integers of w = log n its ech, dynmic predecessor dt structures (e.g., [6, 7, 71]) efficiently support predecessor query Pred(X) = mx({y I Y X} {0}), successor query Succ(X) = min({y I Y X} {n + 1}), nd insert/delete opertions for I. Theorem 3 Let f(k, n) e the time complexity of for predecessor/successor queries nd insert/delete opertions of n ritrry dynmic predecessor/successor dt structure which occupies O(k log n) its of spce. Beme nd Fich s dt structure [6] chieves f(k, n) = O( (log log k)(log log n) ) worst-cse time. log log log n Theorem 4 Let f(k, n) e the time complexity of for predecessor/successor queries nd insert/delete opertions of n ritrry dynmic predecessor/successor dt structure 25

29 which occupies O(k log n) its of spce. Willrd s Y-fst trie [70] chieves f(k, n) = O(log log n) expected time. 3.3 Pcked dynmic compct tries This section presents our new dynmic compct tries clled the pcked dynmic compct tries (pcked c-tries) for dynmic set S = {X 1,..., X k } of k strings of totl length n, which chieves the min result in Theorem 1. In the sequel, string X Σ is clled short if X α = log σ n, nd is clled long if X > α Micro dynmic compct tries for short strings. In this susection, we present our dt structure storing short strings. Our input is dynmic set S = {X 1,..., X k } of k strings of totl length n, such tht X i α = log σ n for every 1 i k. Hence it holds tht k σ α = n. For simplicity, we ssume for now tht X i = α for every 1 i k. The generl cse where S contins strings shorter thn α will e explined lter in Remrk 1. The dynmic dt structure for short strings, clled micro c-trie nd denoted MT S, consists of the following: (i) A dynmic compct trie of height exctly α storing the set S. Let N e the set of internl nodes, nd let L = {l 1,..., l k } e the set of k leves such tht l i corresponds to X i for 1 i k. Since every internl node is rnching, N k 1. Every node v of MT S corresponds to the string str(v) of log n its. Overll, this compct trie requires n log σ + O(k log n) its of spce (including S). (ii) A dynmic predecessor/successor dt structure D which stores the set S = {X 1,..., X k } of strings in O(k log n) its of spce, where ech X i is regrded s log n- it integer. D supports predecessor/successor queries nd insert/delete opertions in f(k, n) time ech. Clerly MT S requires n log σ + O(k log n) its of totl spce. The next lemm shows how to support in O(1) time LCP queries for strings repre- 26

30 sented y two given nodes on the dynmic micro c-trie MT S. This is relted to the leling scheme (e.g., see [1]) which ssigns short lel to ech node so tht lter, given the lels of two nodes, the lel of the LCA of the nodes cn e nswered in O(1) time. Although the sttic tree is considered in the leling scheme, our micro c-trie is dynmic. Also, our lgorithm is much simpler thn pplying the dynmic LCA dt structure [20] to our micro c-tries. Lemm 1 For ny nodes u nd v of the dynmic micro c-trie MT S, we cn compute LCP(str(u), str(v)) in O(1) time. Proof 1 We pd str(u) nd/or str(v) with n ritrry letter c so they ecome α long ech, nmely, let P = str(u)c α str(u) nd Q = str(v)c α str(v). We compute the most significnt it (ms) of the XOR of the it representtions of P nd Q. Let the it position of the ms, nd let z = ( 1)/ log σ. W.l.o.g. ssume str(u) str(v). (1) If z < str(u), then str(u)[1, z] = LCP(str(u), str(v)). In this cse, there exists rnching node y such tht str(y) = str(u)[1, z], nd hence LCP(str(u), str(v)) = str(y). (2) If z str(u), then str(u) = LCP(str(u), str(v)), nd hence str(u) = LCP(str(u), str(v)). Since ech of P nd Q is stored in single mchine word, we cn compute the XOR of P nd Q in O(1) time. The ms cn e computed in O(1) time using the technique of Fredmn nd Willrd [35]. This completes the proof. On micro c-tries, prefix serches nd insertion opertions cn e strted not only from the root ut from ny node. This is necessry for online sprse suffix tree construction sed on Ukkonen s lgorithm [65], since during the suffix link trversl we hve to insert new leves from non-root internl nodes. Theorem 5 The micro c-trie MT S supports LPS(ϕ, X) queries in O(f(k, n)) time. 27

31 Proof 2 Let P e the prefix of str(ϕ)x of length α, i.e., P = str(ϕ)x[1..α d(phi)]. The cse where P is represented y lef is esy, nd thus, in wht follows we focus on the cse where P is not represented y lef. First, we compute the string depth d = d(ϕ) [0, α]. Oserve tht d = mx{ LCP(P, Pred(P )), LCP(P, Succ(P )) }. Given P, we compute Pred(P ) nd Succ(P ) in O(f(k, n)) time. Then, we cn compute LCP(P, Pred(P )) in O(1) time y computing the ms of the XOR of the it representtions of P nd Pred(P ), s in Lemm 1. LCP(P, Succ(P )) cn e computed nlogously, nd thus, d = d(ϕ) cn e computed in O(f(k, n)) time. Second, we locte e = (u, v). See lso Fig Let Z = P [1, d]. Let LB = Zc α Z 1 nd UB = Zc α Z σ e the lexicogrphiclly lest nd gretest strings of length α with prefix Z, respectively. To locte u in MT S, we find the leftmost nd rightmost leves X L nd X R elow ϕ y X L = Succ(LB) nd X R = Pred(UB). Then, the longer one of LCP(X L 1, X L ) nd LCP(X R, X R+1 ) corresponds to the origin node u of e, nd LCP(X L, X R ) corresponds to the destintion node v of e. These LCPs cn e computed in O(1) time y Lemm 1. Wht remins is how to ccess the nodes u nd v representing these strings. In so doing, let $ e specil chrcter tht does not pper in ny strings in S. For ech string Y represented y n internl node of MT S, we pd $ t the end of Y so its length ecomes exctly α, nmely, we otin Y $ α Y. We insert this pdded string into dynmic dictionry dedicted only for internl nodes (here we use predecessor/successor dt structure). Now, given string represented y n internl node, we cn ccess the corresponding node in O(f(k, n)) time. Finlly we otin ϕ = ((u, v), d str(u) ) in overll O(f(k, n)) time. It follows from the proof of Theorem 5 tht dynmic predecessor/successor dt structure is enough to support pttern mtching queries on our dynmic micro c-tire. This implies tht we do not hve to store (the triples for) the edge lels in the micro c-trie. This oservtion is importnt when we consider delete opertions on the set S, s we will see in the next lemm. 28

32 micro c-trie LCA(l L-1, l L ) LCA(l R, l R+1 ) φ^ φ = root X[1..d] l L-1 X L l L l R X R l R+1 Figure 3.1: Given the initil locus ϕ (which is on the root in this figure) nd query pttern P = , the lgorithm of Theorem 5 nswers the LPS(ϕ, P ) query on the micro c-trie s in this figure. The nswer to the query is the locus ˆϕ for P [1..5] = Lemm 2 The micro c-trie MT S supports Insert(ϕ, X) nd Delete(X) opertions in O(f(k, n)) time. We ssume tht d(ϕ) + X α so tht the height of the micro compct trie will lwys e kept within α. Proof 3 We show how to support Insert(ϕ, X) in O(f(k, n)) time. Initilly S =, the micro compct trie MT S consists only of root(mt S ), nd predecessor/successor dictionry D contins no elements. When the first string X is inserted to S, then we crete lef elow the root nd insert X to D. Suppose tht the dt structure mintins string set S with S 1. To insert string X from the given locus ϕ, we first conduct the LPS(ϕ, X) query of Theorem 5, nd let ˆϕ = (e, h) e the nswer to the query. If h = e, then we simply insert new lef l from the destintion node of e. Otherwise, we split e t ˆϕ nd crete new node v there s the prent of the new lef, such tht str(v) = str( ˆϕ). The rest is the sme s in the former cse. After the new lef is inserted, we insert str(ϕ)x to D in O(f(k, n)) time. We consider Delete(X). Recll tht ech edge of the micro c-trie does not store 29

33 α α α α 0α 1α 2α 3α 4α Figure 3.2: Micro-trie decomposition: The pcked c-trie is decomposed into numer of micro c-tries (gry rectngles) ech of which is of height α = log σ n. Ech micro-trie is equipped with dynmic predecessor/successor dt structure. the triple representing its string lel. Thnks to this property, we need not consider updtes of the lels of the edges in the pth from the root to the deleted lef (which usully ecomes prolemtic in compct tries). Thus, we cn support Delete(X) in similr wy to Insert(ϕ, X), in O(f(k, n)) time. Remrk 1 When d(ϕ) + X < α, then we cn support Insert(ϕ, X) nd LPS(ϕ, X) s follows. When inserting X, we pd X with specil letter $ which does not pper in S. Nmely, we perform Insert(ϕ, X) opertion with X = X$ α d(ϕ) X. When computing LPS(ϕ, X), we pd X with nother specil letter # $ which does not pper in S. Nmely, we perform LPS(ϕ, X ) query with X = X# α d(ϕ) X. This gives us the correct locus for LPS(ϕ, X) Pcked dynmic compct tries for long strings. In this susection, we present the pcked dynmic compct trie (pcked c-trie) PT S for set S of vrile-length strings of length t most O(2 w ) = O(n). 30

34 3.3.3 Micro trie decomposition. We decompose PT S into numer of micro c-tries. See lso Fig Let h > α e the length of the longest string in S. We ctegorize the nodes of PT S into h/α +1 levels: We sy tht node of PT S is t level i (0 i h/α ) iff str(v) [iα, (i + 1)α 1]. The level of node v is denoted y level(v). A locus ϕ of PT S is clled oundry iff d(ϕ) is multiple of α. Consider ny pth from root(pt S ) to lef, nd ssume tht there is no node t some oundry kα on this pth. We crete n uxiliry node t tht oundry on this pth, iff there is t lest one non-uxiliry (i.e., originl) node t level i 1 or i + 1 on this pth. Let BN denote the set of nodes t the oundries, clled the oundry nodes. For ech oundry node v BN, we crete micro compct trie MT whose root root(mt ) is v, internl nodes re ll descendnts u of v with level(u) = level(v), nd leves re ll oundry descendnts l of v with level(l) = level(v) + 1. Notice tht ech oundry node is the root of micro c-trie t its level nd is lso lef of micro c-trie t the previous level. An edge is sid to e long edge iff its lel is t lest α long. We store the lel of ech long edge y triple of integers. Recll tht, on the other hnd, we do not store (encodings) of the edge lels in the micro c-tries. Lemm 3 The pcked c-trie PT S for prefix-free set S of k strings requires n log σ + O(k log n) its of spce. Proof 4 Firstly, we show the numer of uxiliry oundry nodes in PT S. At most 2 uxiliry oundry nodes re creted on ech originl edge of PT S. Since there re t most 2k 2 originl edges, the totl numer of uxiliry oundry nodes is t most 4k 4. Since there re t most 2k 1 originl nodes in PT S, the totl numer of nodes in PT S is t most 6k 5. Clerly, the totl numer of short strings of length t most α mintined y the micro c-tries is no more thn the numer of ll nodes in PT S. The 31

35 numer of long edges in PT S is no more thn the numer of its nodes. Overll, the totl spce of PT S is n log σ + O(k log n) its. For ny locus ϕ on PT S, ld(ϕ) denotes the locl string depth of ϕ in the micro c-trie MT tht contins ϕ. Nmely, if root(mt ) = v, the prent of u in PT S is u, nd e = (u, v), then ld(ϕ) = d(ϕ) d((e, e )). Prefix serch queries nd insert/delete opertions cn e supported y our pcked c-trie, s follows. Lemm 4 The pcked c-trie PT S supports LPS(ϕ, P ) query in O( m f(k, n)) worst-cse α time, where m = P > α. Proof 5 If m + ld(ϕ) α, the ound immeditely follows from Theorem 5. Assume m + ld(ϕ) > α, nd let q = α ld(ϕ) + 1. We fctorize P into h + 1 locks s p 0 = P [1, q 1], p 1 = P [q, q + α 1],..., p h 1 = P [q + (h 1)α, q + hα 1], nd p h = P [q + hα, m], where 1 p 0 α, p i = α for 1 i h 1, nd 1 p h α. Ech lock cn e computed in O(1) time y stndrd it opertions. If there is mismtch in p 0, we re done. Otherwise, for ech i in incresing order from 1 to h, we perform LPS(γ, p i ) query from the root γ of the corresponding micro c-trie t ech level of the corresponding pth strting from ϕ. This continues until we find either the first mismtch for some i or complete mtches for ll i s. Ech LPS query with ech micro c-trie tkes O(f(k, n)) time y Theorem 5. Since h = O( m), it tkes O( m f(k, n)) totl α α time. Lemm 5 The pcked c-trie PT S supports Insert(ϕ, X) nd Delete(X i ) opertions in O( m f(k, n)) worst-cse time, where m = X > α. α Proof 6 Insert(ϕ, X): we first perform LPS(ϕ, X) in O( m f(k, n)) time (Lemm 4). α Let x 0,..., x h e the fctoriztion of X w.r.t. ϕ, nd let x j e the lock of the fctoriztion contining the first mismtch. Then, we conduct Insert(γ, x j ) opertion on the corresponding micro c-trie, where γ is its root. It tkes O(f(k, n)) time (Lemm 2). 32

36 If j = h (x j is the lst lock in the fctoriztion of X), then we re done. Otherwise, we crete new edge with lel x jx j+1 x k, where x j is the suffix of X j which egins t the mismtched position, leding to the new lef l. We crete new oundry node if necessry. These opertions tke O(1) time ech. Hence, Insert(ϕ, X) tkes O( m f(k, n)) totl time. α Delete(X i ): Let Q e the pth from the root r of PT S to lef l i. If l i is child of the root of PT S, then we simply delete the single edge in Q. Otherwise, for ech su-pth of Q tht elongs to micro c-trie, we perform Delete opertion of Lemm 2 in this micro c-trie. Since the pth Q spns t most m α micro c-tries, the delete opertions on these micro c-tries tke O( m f(k, n)) totl time. For ech long edge in Q whose lel α refers to X i, let i,, e the triple representing the lel. We replce the triple with i,,, where X i is the predecessor of X i in S nd X i [.. ] = X[..] (if X i does not hve predecessor, then we cn use the successor of S insted). We cn find X i s follows. First, we compute ϕ = LPS(r, X i ) = LCA(l i, l i ). Then, we cn find l i y trversing the right-most pth from ϕ tht is to the left of the su-pth of Q from ϕ to l i. This cn e done in O( m α f(k, n)) time. The positions nd in X i cn e computed y simple rithmetics, since we know the totl length of the lels in the pth from ϕ to l i. Since the pth Q contins less thn m α edges in Q cn e updted in O( m α ) time. long edges, the triples for ll long Speeding-up with hshing. By ugmenting ech micro c-trie with hsh tle storing the short strings, we chieve good expected performnce, s follows: Lemm 6 The pcked c-trie PT S ugmented with hshing supports LPS(ϕ, X) query, Insert(ϕ, X) nd Delete(X) opertions in O( m α + f(k, n)) expected time. Proof 7 Let MT e ny micro c-trie in the pcked c-trie PT S, nd M the set of 33

37 strings mintined y MT ech eing of length t most α. We store ll strings of M in hsh tle ssocited to MT, which supports look-ups, insertions nd deletions in O(1) expected time. Let x 0,..., x h e the fctoriztion of X w.r.t. ϕ. To perform LPS(ϕ, X), we sk if str(ϕ)x 0 is in the hsh tle of the corresponding micro c-trie. If the nswer is no, the first mismtch occurs in x 0, nd the rest is the sme s in Lemm 4. If the nswer is yes, then for ech i from 1 to h in incresing order, we sk if x i is in the hsh tle of the corresponding micro c-trie, until we receive the first no with some i or we receive yes for ll i s. In the ltter cse, we re done. In the former cse, we perform LPS query with x i from the root of the corresponding micro c-trie. Since we perform t most one LPS query nd O( m) look-ups for hsh tles, it tkes O( m +f(k, n)) expected α α time. O( m + f(k, n)) expected time ounds for Insert(ϕ, X) nd Delete(X) immeditely α follow from the ove rguments. 3.4 Applictions to online string processing Sprse suffix trees. The suffix tree [68] of string T of length n is compct trie which stores ll n suffixes of T. A sprse suffix tree for set K [1, n] of smpled positions of T is compct trie which stores only the suset S = {T [i..n] i K} of the suffixes of T eginning t the smpled positions in K. It is known tht if the set K of smpled positions stisfy some properties (e.g., every r positions for some fixed r > 1 or the positions immeditely fter the word delimiters), the sprse suffix tree cn e constructed in n online mnner in O(n log σ) time nd n log σ + O(n log n) its of spce [45, 47, 64]. Pcked c-tries cn speed up online construction nd pttern mtching for these sprse suffix trees: Here ech input string X to Insert is given s pir (i, j) of positions in T s.t. X = T [i..j]. As Lemm 7 sttes, Insert opertion in such cse cn e 34

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

Dynamic Fully-Compressed Suffix Trees

Dynamic Fully-Compressed Suffix Trees Motivtion Dynmic FCST s Conclusions Dynmic Fully-Compressed Suffix Trees Luís M. S. Russo Gonzlo Nvrro Arlindo L. Oliveir INESC-ID/IST {lsr,ml}@lgos.inesc-id.pt Dept. of Computer Science, University of

More information

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

p-adic Egyptian Fractions

p-adic Egyptian Fractions p-adic Egyptin Frctions Contents 1 Introduction 1 2 Trditionl Egyptin Frctions nd Greedy Algorithm 2 3 Set-up 3 4 p-greedy Algorithm 5 5 p-egyptin Trditionl 10 6 Conclusion 1 Introduction An Egyptin frction

More information

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite

More information

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014 CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA

More information

Formal Languages and Automata

Formal Languages and Automata Moile Computing nd Softwre Engineering p. 1/5 Forml Lnguges nd Automt Chpter 2 Finite Automt Chun-Ming Liu cmliu@csie.ntut.edu.tw Deprtment of Computer Science nd Informtion Engineering Ntionl Tipei University

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2 CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016 CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

Harvard University Computer Science 121 Midterm October 23, 2012

Harvard University Computer Science 121 Midterm October 23, 2012 Hrvrd University Computer Science 121 Midterm Octoer 23, 2012 This is closed-ook exmintion. You my use ny result from lecture, Sipser, prolem sets, or section, s long s you quote it clerly. The lphet is

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb. CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

Formal languages, automata, and theory of computation

Formal languages, automata, and theory of computation Mälrdlen University TEN1 DVA337 2015 School of Innovtion, Design nd Engineering Forml lnguges, utomt, nd theory of computtion Thursdy, Novemer 5, 14:10-18:30 Techer: Dniel Hedin, phone 021-107052 The exm

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.) CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 utomt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Prolem (II) Chpter II.5.: Properties of Context Free Grmmrs (14) nton Setzer (Bsed on ook drft y J. V. Tucker nd K. Stephenson)

More information

Surface maps into free groups

Surface maps into free groups Surfce mps into free groups lden Wlker Novemer 10, 2014 Free groups wedge X of two circles: Set F = π 1 (X ) =,. We write cpitl letters for inverse, so = 1. e.g. () 1 = Commuttors Let x nd y e loops. The

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3 2 The Prllel Circuit Electric Circuits: Figure 2- elow show ttery nd multiple resistors rrnged in prllel. Ech resistor receives portion of the current from the ttery sed on its resistnce. The split is

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

Lecture 09: Myhill-Nerode Theorem

Lecture 09: Myhill-Nerode Theorem CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives

More information

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont. NFA DFA Exmple 3 CMSC 330: Orgniztion of Progrmming Lnguges NFA {B,D,E {A,E {C,D {E Finite Automt, con't. R = { {A,E, {B,D,E, {C,D, {E 2 Equivlence of DFAs nd NFAs Any string from {A to either {D or {CD

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries nd String Mtching CS 240 - Dt Structures nd Dt Mngement Sjed Hque Veronik Irvine Tylor Smith Bsed on lecture notes by mny previous cs240 instructors Dvid R. Cheriton School of Computer

More information

Lecture 08: Feb. 08, 2019

Lecture 08: Feb. 08, 2019 4CS4-6:Theory of Computtion(Closure on Reg. Lngs., regex to NDFA, DFA to regex) Prof. K.R. Chowdhry Lecture 08: Fe. 08, 2019 : Professor of CS Disclimer: These notes hve not een sujected to the usul scrutiny

More information

Where did dynamic programming come from?

Where did dynamic programming come from? Where did dynmic progrmming come from? String lgorithms Dvid Kuchk cs302 Spring 2012 Richrd ellmn On the irth of Dynmic Progrmming Sturt Dreyfus http://www.eng.tu.c.il/~mi/cd/ or50/1526-5463-2002-50-01-0048.pdf

More information

Lecture 3: Equivalence Relations

Lecture 3: Equivalence Relations Mthcmp Crsh Course Instructor: Pdric Brtlett Lecture 3: Equivlence Reltions Week 1 Mthcmp 2014 In our lst three tlks of this clss, we shift the focus of our tlks from proof techniques to proof concepts

More information

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter

Alignment of Long Sequences. BMI/CS Spring 2016 Anthony Gitter Alignment of Long Sequences BMI/CS 776 www.biostt.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostt.wisc.edu Gols for Lecture Key concepts how lrge-scle lignment differs from the simple cse the

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs

Connected-components. Summary of lecture 9. Algorithms and Data Structures Disjoint sets. Example: connected components in graphs Prm University, Mth. Deprtment Summry of lecture 9 Algorithms nd Dt Structures Disjoint sets Summry of this lecture: (CLR.1-3) Dt Structures for Disjoint sets: Union opertion Find opertion Mrco Pellegrini

More information

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9. Regulr Expressions, Pumping Lemm, Right Liner Grmmrs Ling 106 Mrch 25, 2002 1 Regulr Expressions A regulr expression descries or genertes lnguge: it is kind of shorthnd for listing the memers of lnguge.

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

Regular expressions, Finite Automata, transition graphs are all the same!!

Regular expressions, Finite Automata, transition graphs are all the same!! CSI 3104 /Winter 2011: Introduction to Forml Lnguges Chpter 7: Kleene s Theorem Chpter 7: Kleene s Theorem Regulr expressions, Finite Automt, trnsition grphs re ll the sme!! Dr. Neji Zgui CSI3104-W11 1

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information

Formal Languages and Automata Theory. D. Goswami and K. V. Krishna

Formal Languages and Automata Theory. D. Goswami and K. V. Krishna Forml Lnguges nd Automt Theory D. Goswmi nd K. V. Krishn Novemer 5, 2010 Contents 1 Mthemticl Preliminries 3 2 Forml Lnguges 4 2.1 Strings............................... 5 2.2 Lnguges.............................

More information

Tutorial Automata and formal Languages

Tutorial Automata and formal Languages Tutoril Automt nd forml Lnguges Notes for to the tutoril in the summer term 2017 Sestin Küpper, Christine Mik 8. August 2017 1 Introduction: Nottions nd sic Definitions At the eginning of the tutoril we

More information

First Midterm Examination

First Midterm Examination 24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet

More information

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton 25. Finite Automt AUTOMATA AND LANGUAGES A system of computtion tht only hs finite numer of possile sttes cn e modeled using finite utomton A finite utomton is often illustrted s stte digrm d d d. d q

More information

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS CS 310 (sec 20) - Winter 2003 - Finl Exm (solutions) SOLUTIONS 1. (Logic) Use truth tles to prove the following logicl equivlences: () p q (p p) (q q) () p q (p q) (p q) () p q p q p p q q (q q) (p p)

More information

Closure Properties of Regular Languages

Closure Properties of Regular Languages Closure Properties of Regulr Lnguges Regulr lnguges re closed under mny set opertions. Let L 1 nd L 2 e regulr lnguges. (1) L 1 L 2 (the union) is regulr. (2) L 1 L 2 (the conctention) is regulr. (3) L

More information

Context-Free Grammars and Languages

Context-Free Grammars and Languages Context-Free Grmmrs nd Lnguges (Bsed on Hopcroft, Motwni nd Ullmn (2007) & Cohen (1997)) Introduction Consider n exmple sentence: A smll ct ets the fish English grmmr hs rules for constructing sentences;

More information

Thoery of Automata CS402

Thoery of Automata CS402 Thoery of Automt C402 Theory of Automt Tle of contents: Lecture N0. 1... 4 ummry... 4 Wht does utomt men?... 4 Introduction to lnguges... 4 Alphets... 4 trings... 4 Defining Lnguges... 5 Lecture N0. 2...

More information

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms The Minimum Lel Spnning Tree Prolem: Illustrting the Utility of Genetic Algorithms Yupei Xiong, Univ. of Mrylnd Bruce Golden, Univ. of Mrylnd Edwrd Wsil, Americn Univ. Presented t BAE Systems Distinguished

More information

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages Deprtment of Computer Science, Austrlin Ntionl University COMP2600 Forml Methods for Softwre Engineering Semester 2, 206 Assignment Automt, Lnguges, nd Computility Smple Solutions Finite Stte Automt nd

More information

Bases for Vector Spaces

Bases for Vector Spaces Bses for Vector Spces 2-26-25 A set is independent if, roughly speking, there is no redundncy in the set: You cn t uild ny vector in the set s liner comintion of the others A set spns if you cn uild everything

More information

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms Preview Greed Algorithms Greed Algorithms Coin Chnge Huffmn Code Greed lgorithms end to e simple nd strightforwrd. Are often used to solve optimiztion prolems. Alws mke the choice tht looks est t the moment,

More information

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings... Tle of contents: Lecture N0.... 3 ummry... 3 Wht does utomt men?... 3 Introduction to lnguges... 3 Alphets... 3 trings... 3 Defining Lnguges... 4 Lecture N0. 2... 7 ummry... 7 Kleene tr Closure... 7 Recursive

More information

Balanced binary search trees

Balanced binary search trees 02110 Inge Li Gørtz Overview Blnced binry serch trees: Red-blck trees nd 2-3-4 trees Amortized nlysis Dynmic progrmming Network flows String mtching String indexing Computtionl geometry Introduction to

More information

Revision Sheet. (a) Give a regular expression for each of the following languages:

Revision Sheet. (a) Give a regular expression for each of the following languages: Theoreticl Computer Science (Bridging Course) Dr. G. D. Tipldi F. Bonirdi Winter Semester 2014/2015 Revision Sheet University of Freiurg Deprtment of Computer Science Question 1 (Finite Automt, 8 + 6 points)

More information

Name Ima Sample ASU ID

Name Ima Sample ASU ID Nme Im Smple ASU ID 2468024680 CSE 355 Test 1, Fll 2016 30 Septemer 2016, 8:35-9:25.m., LSA 191 Regrding of Midterms If you elieve tht your grde hs not een dded up correctly, return the entire pper to

More information

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages 5//6 Grmmr Automt nd Lnguges Regulr Grmmr Context-free Grmmr Context-sensitive Grmmr Prof. Mohmed Hmd Softwre Engineering L. The University of Aizu Jpn Regulr Lnguges Context Free Lnguges Context Sensitive

More information

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018 CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35 Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA

More information

The size of subsequence automaton

The size of subsequence automaton Theoreticl Computer Science 4 (005) 79 84 www.elsevier.com/locte/tcs Note The size of susequence utomton Zdeněk Troníček,, Ayumi Shinohr,c Deprtment of Computer Science nd Engineering, FEE CTU in Prgue,

More information

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1 Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more

More information

Finite Automata-cont d

Finite Automata-cont d Automt Theory nd Forml Lnguges Professor Leslie Lnder Lecture # 6 Finite Automt-cont d The Pumping Lemm WEB SITE: http://ingwe.inghmton.edu/ ~lnder/cs573.html Septemer 18, 2000 Exmple 1 Consider L = {ww

More information

Section 4: Integration ECO4112F 2011

Section 4: Integration ECO4112F 2011 Reding: Ching Chpter Section : Integrtion ECOF Note: These notes do not fully cover the mteril in Ching, ut re ment to supplement your reding in Ching. Thus fr the optimistion you hve covered hs een sttic

More information

1.3 Regular Expressions

1.3 Regular Expressions 56 1.3 Regulr xpressions These hve n importnt role in describing ptterns in serching for strings in mny pplictions (e.g. wk, grep, Perl,...) All regulr expressions of lphbet re 1.Ønd re regulr expressions,

More information

State Minimization for DFAs

State Minimization for DFAs Stte Minimiztion for DFAs Red K & S 2.7 Do Homework 10. Consider: Stte Minimiztion 4 5 Is this miniml mchine? Step (1): Get rid of unrechle sttes. Stte Minimiztion 6, Stte is unrechle. Step (2): Get rid

More information

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary Outline Genetic Progrmming Evolutionry strtegies Genetic progrmming Summry Bsed on the mteril provided y Professor Michel Negnevitsky Evolutionry Strtegies An pproch simulting nturl evolution ws proposed

More information

Faster Regular Expression Matching. Philip Bille Mikkel Thorup

Faster Regular Expression Matching. Philip Bille Mikkel Thorup Fster Regulr Expression Mtching Philip Bille Mikkel Thorup Outline Definition Applictions History tour of regulr expression mtching Thompson s lgorithm Myers lgorithm New lgorithm Results nd extensions

More information

3 Regular expressions

3 Regular expressions 3 Regulr expressions Given n lphet Σ lnguge is set of words L Σ. So fr we were le to descrie lnguges either y using set theory (i.e. enumertion or comprehension) or y n utomton. In this section we shll

More information

1 From NFA to regular expression

1 From NFA to regular expression Note 1: How to convert DFA/NFA to regulr expression Version: 1.0 S/EE 374, Fll 2017 Septemer 11, 2017 In this note, we show tht ny DFA cn e converted into regulr expression. Our construction would work

More information

Exercises Chapter 1. Exercise 1.1. Let Σ be an alphabet. Prove wv = w + v for all strings w and v.

Exercises Chapter 1. Exercise 1.1. Let Σ be an alphabet. Prove wv = w + v for all strings w and v. 1 Exercises Chpter 1 Exercise 1.1. Let Σ e n lphet. Prove wv = w + v for ll strings w nd v. Prove # (wv) = # (w)+# (v) for every symol Σ nd every string w,v Σ. Exercise 1.2. Let w 1,w 2,...,w k e k strings,

More information

GNFA GNFA GNFA GNFA GNFA

GNFA GNFA GNFA GNFA GNFA DFA RE NFA DFA -NFA REX GNFA Definition GNFA A generlize noneterministic finite utomton (GNFA) is grph whose eges re lele y regulr expressions, with unique strt stte with in-egree, n unique finl stte with

More information

On Suffix Tree Breadth

On Suffix Tree Breadth On Suffix Tree Bredth Golnz Bdkoeh 1,, Juh Kärkkäinen 2, Simon J. Puglisi 2,, nd Bell Zhukov 2, 1 Deprtment of Computer Science University of Wrwick Conventry, United Kingdom g.dkoeh@wrwick.c.uk 2 Helsinki

More information

Solving the String Statistics Problem in Time O(n log n)

Solving the String Statistics Problem in Time O(n log n) Alcom-FT Technicl Report Series ALCOMFT-TR-02-55 Solving the String Sttistics Prolem in Time O(n log n) Gerth Stølting Brodl 1,,, Rune B. Lyngsø 3, Ann Östlin1,, nd Christin N. S. Pedersen 1,2, 1 BRICS,

More information

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations. Lecture 3 3 Solving liner equtions In this lecture we will discuss lgorithms for solving systems of liner equtions Multiplictive identity Let us restrict ourselves to considering squre mtrices since one

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014 CS125 Lecture 12 Fll 2014 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

The Regulated and Riemann Integrals

The Regulated and Riemann Integrals Chpter 1 The Regulted nd Riemnn Integrls 1.1 Introduction We will consider severl different pproches to defining the definite integrl f(x) dx of function f(x). These definitions will ll ssign the sme vlue

More information

Section 6.1 INTRO to LAPLACE TRANSFORMS

Section 6.1 INTRO to LAPLACE TRANSFORMS Section 6. INTRO to LAPLACE TRANSFORMS Key terms: Improper Integrl; diverge, converge A A f(t)dt lim f(t)dt Piecewise Continuous Function; jump discontinuity Function of Exponentil Order Lplce Trnsform

More information

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA) Finite Automt (FA or DFA) CHAPTER Regulr Lnguges Contents definitions, exmples, designing, regulr opertions Non-deterministic Finite Automt (NFA) definitions, equivlence of NFAs DFAs, closure under regulr

More information

CISC 4090 Theory of Computation

CISC 4090 Theory of Computation 9/6/28 Stereotypicl computer CISC 49 Theory of Computtion Finite stte mchines & Regulr lnguges Professor Dniel Leeds dleeds@fordhm.edu JMH 332 Centrl processing unit (CPU) performs ll the instructions

More information

2.4 Linear Inequalities and Interval Notation

2.4 Linear Inequalities and Interval Notation .4 Liner Inequlities nd Intervl Nottion We wnt to solve equtions tht hve n inequlity symol insted of n equl sign. There re four inequlity symols tht we will look t: Less thn , Less thn or

More information

Lecture 2: January 27

Lecture 2: January 27 CS 684: Algorithmic Gme Theory Spring 217 Lecturer: Év Trdos Lecture 2: Jnury 27 Scrie: Alert Julius Liu 2.1 Logistics Scrie notes must e sumitted within 24 hours of the corresponding lecture for full

More information

dx dt dy = G(t, x, y), dt where the functions are defined on I Ω, and are locally Lipschitz w.r.t. variable (x, y) Ω.

dx dt dy = G(t, x, y), dt where the functions are defined on I Ω, and are locally Lipschitz w.r.t. variable (x, y) Ω. Chpter 8 Stility theory We discuss properties of solutions of first order two dimensionl system, nd stility theory for specil clss of liner systems. We denote the independent vrile y t in plce of x, nd

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

Quadratic Forms. Quadratic Forms

Quadratic Forms. Quadratic Forms Qudrtic Forms Recll the Simon & Blume excerpt from n erlier lecture which sid tht the min tsk of clculus is to pproximte nonliner functions with liner functions. It s ctully more ccurte to sy tht we pproximte

More information

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute Victor Admchik Dnny Sletor Gret Theoreticl Ides In Computer Science CS 5-25 Spring 2 Lecture 2 Mr 3, 2 Crnegie Mellon University Deterministic Finite Automt Finite Automt A mchine so simple tht you cn

More information

20 MATHEMATICS POLYNOMIALS

20 MATHEMATICS POLYNOMIALS 0 MATHEMATICS POLYNOMIALS.1 Introduction In Clss IX, you hve studied polynomils in one vrible nd their degrees. Recll tht if p(x) is polynomil in x, the highest power of x in p(x) is clled the degree of

More information

Fast Frequent Free Tree Mining in Graph Databases

Fast Frequent Free Tree Mining in Graph Databases The Chinese University of Hong Kong Fst Frequent Free Tree Mining in Grph Dtses Peixing Zho Jeffrey Xu Yu The Chinese University of Hong Kong Decemer 18 th, 2006 ICDM Workshop MCD06 Synopsis Introduction

More information

Myhill-Nerode Theorem

Myhill-Nerode Theorem Overview Myhill-Nerode Theorem Correspondence etween DA s nd MN reltions Cnonicl DA for L Computing cnonicl DFA Myhill-Nerode Theorem Deepk D Souz Deprtment of Computer Science nd Automtion Indin Institute

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions CS 330 Forml Methods nd Models Dn Richrds, George Mson University, Spring 2016 Quiz Solutions Quiz 1, Propositionl Logic Dte: Ferury 9 1. (4pts) ((p q) (q r)) (p r), prove tutology using truth tles. p

More information

Homework Solution - Set 5 Due: Friday 10/03/08

Homework Solution - Set 5 Due: Friday 10/03/08 CE 96 Introduction to the Theory of Computtion ll 2008 Homework olution - et 5 Due: ridy 10/0/08 1. Textook, Pge 86, Exercise 1.21. () 1 2 Add new strt stte nd finl stte. Mke originl finl stte non-finl.

More information

DFA minimisation using the Myhill-Nerode theorem

DFA minimisation using the Myhill-Nerode theorem DFA minimistion using the Myhill-Nerode theorem Johnn Högerg Lrs Lrsson Astrct The Myhill-Nerode theorem is n importnt chrcteristion of regulr lnguges, nd it lso hs mny prcticl implictions. In this chpter,

More information

arxiv: v1 [cs.ds] 19 Jul 2012

arxiv: v1 [cs.ds] 19 Jul 2012 Efficient LZ78 fctoriztion of grmmr compressed text Hideo Bnni, Shunsuke Ineng, nd Msyuki Tked rxiv:1207.4607v1 [cs.ds] 19 Jul 2012 Deprtment of Informtics, Kyushu University {nni,ineng,tked}@inf.kyushu-u.c.jp

More information

CM10196 Topic 4: Functions and Relations

CM10196 Topic 4: Functions and Relations CM096 Topic 4: Functions nd Reltions Guy McCusker W. Functions nd reltions Perhps the most widely used notion in ll of mthemtics is tht of function. Informlly, function is n opertion which tkes n input

More information

First Midterm Examination

First Midterm Examination Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does

More information

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying

W. We shall do so one by one, starting with I 1, and we shall do it greedily, trying Vitli covers 1 Definition. A Vitli cover of set E R is set V of closed intervls with positive length so tht, for every δ > 0 nd every x E, there is some I V with λ(i ) < δ nd x I. 2 Lemm (Vitli covering)

More information

More on automata. Michael George. March 24 April 7, 2014

More on automata. Michael George. March 24 April 7, 2014 More on utomt Michel George Mrch 24 April 7, 2014 1 Automt constructions Now tht we hve forml model of mchine, it is useful to mke some generl constructions. 1.1 DFA Union / Product construction Suppose

More information

CSE : Exam 3-ANSWERS, Spring 2011 Time: 50 minutes

CSE : Exam 3-ANSWERS, Spring 2011 Time: 50 minutes CSE 260-002: Exm 3-ANSWERS, Spring 20 ime: 50 minutes Nme: his exm hs 4 pges nd 0 prolems totling 00 points. his exm is closed ook nd closed notes.. Wrshll s lgorithm for trnsitive closure computtion is

More information

expression simply by forming an OR of the ANDs of all input variables for which the output is

expression simply by forming an OR of the ANDs of all input variables for which the output is 2.4 Logic Minimiztion nd Krnugh Mps As we found ove, given truth tle, it is lwys possile to write down correct logic expression simply y forming n OR of the ANDs of ll input vriles for which the output

More information