A Unified Construction of the Glushkov, Follow, and Antimirov Automata

Size: px
Start display at page:

Download "A Unified Construction of the Glushkov, Follow, and Antimirov Automata"

Transcription

1 A Unified Construction of the Glushkov, Follow, nd Antimirov Automt Cyril Alluzen nd Mehryr Mohri Cournt Institute of Mthemticl Sciences 251 Mercer Street, New York, NY 10012, USA Astrct. A numer of different techniques hve een introduced in the lst few decdes to crete ɛ-free utomt representing regulr expressions such s the Glushkov utomt, follow utomt, or Antimirov utomt. This pper presents simple nd unified view of ll these construction methods oth for unweighted nd weighted regulr expressions. It descries simpler lgorithms with time complexities t lest s fvorle s tht of the est previously known techniques, nd provides concise proof of their correctness. Our lgorithms re ll sed on two stndrd utomt opertions: epsilon-removl nd minimiztion. This contrsts with the multitude of complicted nd specil-purpose techniques previously descried in the literture, nd mkes it strightforwrd to generlize these lgorithms to the weighted cse. In prticulr, we extend the definition nd construction of follow utomt to the cse of weighted regulr expressions over closed semiring nd present the first lgorithm to compute weighted Antimirov utomt. 1 Introduction The construction of finite utomt representing regulr expressions hs een widely studied due to their multiple pplictions to pttern-mtching nd mny other res of text processing [1, 21]. The most clssicl construction, Thompson s construction [13, 24], cretes finite utomton with numer of sttes nd trnsitions liner in the length m of the regulr expression. The time complexity of the lgorithm is lso liner, O(m). But Thompson s utomton contins trnsitions leled with the empty string ɛ which crete dely in pttern mtching. Mny lterntive techniques hve een introduced in the lst few decdes to crete ɛ-free utomt representing regulr expressions, in prticulr, Glushkov utomt [11], follow utomt [12], nd Antimirov utomt [2]. The Glushkov utomton, or position utomton, ws independently introduced y [11] nd [16]. It hs exctly n + 1 sttes ut my hve up to n 2 trnsitions, where n is the numer of occurrences of lphet symols ppering in the expression. Thus, its size is qudrtic in tht of the Thompson utomton for resonle regulr expressions for which m = O(n). However, when using itprllelism for regulr expression serch, thnks to its smller numer of sttes, the Glushkov utomton cn e represented with hlf the numer of mchine words required y the Thompson utomton [20,21].

2 Automton Algorithm Complexity Glushkov rmeps(t) O(mn) Follow min(rmeps(t)) O(mn) Antimirov rmeps(min(rmeps( T))) O(m log m + mn) Tle 1. Simple lgorithms for the construction of Glushkov, follow, nd Antimirov utomt nd their time complexity. T is the Thompson utomton. For n utomton A, A denotes the utomton derived from A y mrking lphet symols with their position in the expression. When the symols re mrked, the sme nottion denotes the opertion tht removes the mrking. T is otined y mrking some ɛ-trnsitions in T, mking it deterministic (the ɛ-trnsitions mrked re removed y the rmeps opertion). Severl techniques hve een suggested for constructing the Glushkov utomton. In [3], the construction is sed on the recursive definition of the follow function nd its complexity is in O(n 3 ). The lgorithm descried y [4] is sed on n optimiztion of the recursive definition of the follow function nd its complexity is in O(m + n 2 ). It requires the expression to e first rewritten in str-norml form, which cn e done non-trivilly in O(m). Severl other qudrtic lgorithms hve een given: tht of [9] which is sed on n optimiztion of the follow recursion, nd tht of [22], sed on the ZPC structure, which consists of two mutully linked copies of the syntctic tree of the expression. The Antimirov or prtil derivtives utomton ws introduced y [2]. It is in generl smller thn the Glushkov utomton with up to n + 1 sttes nd up to n 2 trnsitions. It ws in fct proven y [8] (see [12] for simpler proof) to e the quotient of the Glushkov utomton for some equivlence reltion. The complexity of the originl construction lgorithm of [2] is O(m 5 ). [8] presented n lgorithm whose complexity is O(m 2 ). Finlly, the follow utomton ws introduced y [12]. It is the quotient of the Glushkov utomton y the follow equivlence: two sttes re equivlent if they hve the sme follow nd the sme finlity. The uthor gve n O(m + n 2 ) lgorithm where some ɛ-trnsitions re removed from the utomton t ech step of the Thompson construction s well s t the end. An O(m + n 2 ) lgorithm using the ZPC structure ws given in [7], which requires the regulr expression to e rewritten in str-norml form. Some of these results hve een extended to weighted regulr expressions over ritrry semirings. The generliztion of the Thompson construction trivilly follows from [23]. The Glushkov utomton cn e nturlly extended to the weighted cse [5], nd n O(m 2 ) construction lgorithm sed on the generliztion of the ZPC construct ws given y [6]. The Antimirov utomton ws generlized to the weighted cse y [15], ut no explicit construction lgorithm or complexity nlysis ws given y the uthors. This pper presents simple nd unified view of ll these ɛ-free utomt (Glushkov, follow, nd Antimirov) oth in the cse of unweighted nd weighted regulr expressions. It descries simpler lgorithms with time complexities t lest s fvorle s tht of the est previously known techniques, nd provides concise proofs. Our lgorithms re ll sed on two stndrd utomt opertions: epsilon-removl nd minimiztion, s summrized in Tle 1.

3 This contrsts with the multitude of complicted nd specil-purpose techniques nd proofs descried y others to construct these utomt: no need for fine-tuning some recursions, no requirement tht the regulr expression e in str-norml form, nd no need to mintin multiple copies of the syntctic tree. Our nlysis provides etter understnding of ɛ-free utomt representing regulr expressions: they re ll the results of the ppliction of some comintions of epsilon-removl nd minimiztion to the clssicl Thompson utomt. This mkes it strightforwrd to generlize these lgorithms to the weighted cse y using the generliztion of ɛ-removl nd minimiztion [17, 18]. It lso results in much simpler lgorithms thn existing ones. In prticulr, this leds to strightforwrd lgorithm for the construction of the Glushkov utomton of weighted regulr expression, nd, in the cse of closed semirings, helps us generlize follow utomt to the weighted cse. We lso give the first explicit construction lgorithm of the Antimirov utomton of weighted expression. When the semiring is k-closed, or, in the Glushkov cse, only null-k-closed for the regulr expression considered, the complexity of our lgorithms is the sme s in the unweighted cse. The pper is orgnized s follows. Section 2 introduces the definitions nd rief description of the elementry lgorithms used in the following sections. Section 3 presents nd nlyzes our lgorithm for the construction of the Glushkov utomton, Section 4 the sme for the follow utomton, nd Section 5 for the Antimirov utomton. 2 Preliminries Semirings. A system (K,,, 0, 1) is semiring when (K,, 0) is commuttive monoid with identity element 0; (K,, 1) is monoid with identity element 1; distriutes over ; nd 0 is n nnihiltor for : for ll K, 0 = 0 = 0. Thus, semiring is ring tht my lck negtion. Some fmilir semirings include the oolen semiring (B,,, 0, 1), the tropicl semiring (R + { }, min, +,, 0), nd the rel semiring (R +, +,, 0, 1). A semiring K is sid to e closed if for ll K, the infinite sum n=0 n is well-defined nd in K, nd if ssocitivity, commuttivity, nd distriutivity pply to countle sums [19]. K is sid to e k-closed if for ll K, k+1 n=0 n = k n=0 n. More generlly, we will sy tht K is closed (k-closed) for n utomton A, if the closedness (resp. k-closedness) xioms hold for ll cycle weights of A. In some semirings, e.g., the proility semiring (R +, +,, 0, 1), the equlity k+1 n=0 n = k n=0 n my hold for the cycle weights of A only pproximtely, modulo ɛ > 0. A is then sid to e ɛ-k-closed for tht semiring. Weighted utomt. A weighted utomton A over semiring K is 7-tuple (Σ, Q, E, I, λ, F, ρ) where: Σ is finite lphet; Q is finite set of sttes; I Q the set of initil sttes; F Q the set of finl sttes; E Q (Σ {ɛ}) K Q finite set of trnsitions; λ : I K the initil weight function; nd ρ : F K the finl weight function mpping F to K. We denote y p[π] the origin nd n[π] the destintion stte of pth π E in n utomton A. i[π] denotes the lel of π, nd w[π] its weight otined

4 y -multiplying the weights of its constituent trnsitions. We lso denote y P(p, q) the set of pths from p to q nd y P(I, x, F) the set of pths from the initil sttes I to the finl sttes F leled with x Σ. The weight ssocited y A to n input string x Σ is otined y summing the weights of these pths multiplied y their initil nd finl weights: [A](x) = π P(I,x,F) λ(p[π]) w[π] ρ(n[π]). [A](x) is defined to e 0 when P(I, x, F) =. Shortest distnce. Let A e weighted utomton over K. The shortest distnce from p to q is defined s d[p, q] = π P(p,q) w[π]. It cn e computed using the generic single-source shortest-distnce lgorithm of [19] if K is k-closed for A, or using generliztion of Floyd-Wrshll [14,19] if K is closed for A. Epsilon-removl. The generl ɛ-removl lgorithm of [18] consists of first computing the ɛ-closure of ech stte p in A, closure(p) = {(q, w): w = d ɛ [p, q] = w[π] 0}, (1) π P(p,q),i[π]=ɛ nd then, for ech stte p, of deleting ll the outgoing ɛ-trnsitions of p, nd dding out of p ll the non-ɛ trnsitions leving ech stte q closure(p) with their weight pre- -multiplied y d ɛ [p, q]. If K is k-closed for the ɛ-cycles of A, 1 then the generic single-source shortest-distnce lgorithm [19] cn e used to compute the ɛ-closures. Weight-pushing nd weighted minimiztion. Weight-pushing [17] is normliztion lgorithm tht redistriutes the weights long the pths of A such tht e E[q] w[e]+ρ(q) = 1 for every stte q Q. We denote y push(a) the resulting utomton. The lgorithm requires tht K e zero-sum free, wekly left-divisile nd closed or k-closed for A since it depends on the computtion of d[q, F] for ll q Q. It ws proved in [17] tht, if A is deterministic, i.e., if no two trnsitions leving ny stte shre the sme lel nd if it hs unique initil stte, then the weight-pushing followed y unweighted minimiztion with ech pir (lel, weight) viewed s n lphet symol, leds to miniml deterministic weighted utomton equivlent to A, denoted y min(a). Regulr expressions. A weighted regulr expression over the semiring K is defined recursively y:, ɛ nd Σ re regulr expressions, nd if α nd β re regulr expressions then kα, αk for k K, α + β, α β nd α re lso regulr expressions. We denote y null(α) the weight ssocited y α to the empty string ɛ. A weighted regulr expression α is well-defined iff for every suterm of the form β, null(β) is well-defined nd in K. We will sy tht K is null-k-closed for α if there exist k 0 such tht for every suterm β of α, null(β) = k i=0 null(β)i. We denote y α the length of α, nd y α Σ the width of α, i.e., the numer of occurrences of lphet symols in α. Let pos(α) = {1, 2,..., α Σ } e the set of (lphet symol) positions in α. An unweighted regulr expression cn e seen s weighted expression over the oolen semiring (B,,, 0, 1). Thompson utomton. We denote y A T (α) the Thompson utomton of α nd y I AT (α) nd F AT (α) its unique initil nd finl sttes. For i pos(α), we denote y p i nd q i the sttes of A T (α) such tht the trnsition from p i to q i is leled with the lphet symol t the i-th position in α. The sttes p i 1 For A to e well-defined, K needs to e closed for the ɛ-cycles of A.

5 () 1,2, 3, ,5 0 1,2 6 3,4,5 () (c) (d) Fig. 1. () The Thompson utomton, () Glushkov utomton, (c) Follow utomton, nd (d) Antimirov utomton of the regulr expression α = ( + )( + + ), the running exmple used in [12]. In (d), stte {0} corresponds to the derived term α, {1, 2} to τ = ( + + ), {6} to τ, nd {3, 4, 5} to τ. (sttes q i ) re the only sttes hving non-ɛ outgoing (resp. incoming) trnsition. Figure 1() shows the Thompson utomton in the specil cse of the regulr expression α = ( + )( + + ). 3 Glushkov Automton Let α e weighted regulr expression over the lphet Σ nd the semiring K. The Glushkov utomton of α is n ɛ-free non-deterministic weighted utomton representing α tht hs n initil stte plus one stte for ech position in α, i.e. ech occurrence of n lphet symol in α. Figure 1() shows n exmple. The forml definition of the Glushkov utomton is sed on the functions null, first, lst, nd follow. Tle 2 gives the recursive definition of these functions. null(α) K is the weight ssocited y α to the empty string ɛ nd is thus the finl weight of the initil stte of the utomton. lst(α) pos(α) K is the set of positions with the corresponding finl weights where non-empty string ccepted y α cn end. first(α) pos(α) K is the set of positions with the corresponding weights tht cn e reched y reding one lphet symol from the initil stte. Similrly, follow(α, i) pos(α) K is the set of positions with the corresponding weights tht cn e reched y reding one lphet symol from the position i. In these definitions, the union of two weighted susets X, Y pos(α) K is defined y X Y = {(i, X, i Y, i ) : X, i Y, i 0}. For ny weighted suset X pos(α) K, weight k K, nd position i, k X denotes the weighted

6 α null(α) first(α) lst(α) follow(α, i) 0 ɛ 1 j 0 {(j, 1)} {(j, 1)} kβ k null(β) k first(β) lst(β) follow(β, i) βk null(β) k first(β) lst(β) k follow(β, j i) follow(β, i) if i pos(β) β + γ null(β) null(γ) first(β) first(γ) lst(β) lst(γ) follow(γ, i) if i pos(γ) β γ null(β) null(γ) null(β) first(γ) first(β) lst(β) null(γ) lst(γ) 8 >< follow(β, i) lst(β), i first(γ) if i pos(β) >: follow(γ, i) if i pos(γ) β null(β) null(β) first(β) lst(β) null(β) follow(β, i) lst(β ), i first(γ) Tle 2. Definition of the functions null, first, lst, nd follow. For convenience, we lso define follow(α,0) = first(α) nd lst 0(α) s lst(α) {(0, null(α))} if null(α) 0, lst(α) otherwise. suset nd X, i the weight defined y: { {(i, k w) (i, w) X} if k 0, k X = otherwise, nd X, i = { w if w: (i, w) X, 0 otherwise. X k is defined similrly. Let α denote the weighted regulr expression otined y mrking ech symol of α with its position. The Glushkov or position utomton A G (α) of α is defined y A G (α) = (Σ, pos 0 (α), E, 0, 1, F, ρ) where its sttes set is pos 0 (α) = {0} pos(α) nd its trnsition set E = {(i,, w, j): (j, w) follow(α, i) nd pos(α, j) = }. (2) A stte i pos 0 (α) is finl iff there exist w K such tht (i, w) lst 0 (α) nd when it is finl ρ(i) = w. The following lemm shows tht there exists simple reltionship etween the first, lst, nd follow functions nd the ɛ-closures of the sttes in the Thompson utomton tht dmit non-ɛ incoming trnsition (sttes q i ). Lemm 1. Let α e weighted regulr expression nd let A = A T (α). Then (i) (i, w) first(α) iff (p i, w) closure(i A ); (ii) (i, w) follow(α, j) iff (p i, w) closure(q j ); nd (iii) (i, w) lst(α) iff (F A, w) closure(q i ). Proof. Note tht if α =, α = ɛ or α =, then the properties trivilly hold. The proof is y induction on the length of the regulr expression nd is given in the cse α = β γ. Other cses cn e treted similrly. Assume tht the properties hold for ll expressions shorter thn α. Let A = A T (α), B = A T (β) nd C = A T (γ). If α = β γ, then closure A (I A ) = closure B (I B ) [B][ɛ] closure C (I A ), thus, since [B][ɛ] = null(β), (i) holds

7 y induction. If j pos(γ), then closure A (q j ) = closure C (q j ). Otherwise, if j pos(β), then closure A (q j ) = closure B (q j ) closure B (q j ), F B closure C (I C ). (3) Thus, y induction, oth (ii) nd (iii) hold. The following proposition follows directly from the lemm just presented. Proposition 1. Let α e weighted regulr expression. Then A G (α) = rmeps(a T (α)). (4) Proof. The only sttes potentilly ccessile in rmeps(a T (α)) re the sttes q i, i 0, since they re the only sttes with non-ɛ incoming trnsitions. A stte q i is finl with weight w in rmeps(a T (α)) iff (F AT (α), w) closure(q i ), tht is, y Lemm 1, iff (i, w) lst 0 (α). (q j, i, w, q i ) is trnsition in rmeps(a T (α)) iff (p i, w) closure(q j ), tht is, y Lemm 1, iff (i, w) follow(α, j). Thus, A G (α) = rmeps(a T (α)). This proposition suggests nturl lgorithm to compute the Glushkov utomton. The following lemm helps determine its complexity. Lemm 2. Let A e the Thompson utomton of weighted regulr expression over k-closed semiring nd let s e stte of A. Then, the shortest-distnce lgorithm of [19] cn e used to compute the shortest distnces from the source stte s to ll sttes of A in liner time. Proof. We give sketch of the proof. The complexity of the single-source shortestdistnce lgorithm of [19] depends on the queue discipline used, tht is the order in which sttes re extrcted from the queue. One cn use queue discipline tht tkes dvntge of the specific structure of the Thompson utomton. Ech suterm of the form β +γ or β defines su-utomton with n entry stte nd n exit stte. The pproprite queue discipline enforces tht ech su-utomton e fully visited efore eing exited. The lgorithm of [19] cn lso e modified to store the shortest-distnce through the su-utomton of β suterm once it hs een computed nd void susequent revisit. With tht queue discipline, the complexity of the lgorithm is liner. Theorem 1. Let α e weighted regulr expression over semiring K nullk-closed for α nd let m = α nd n = α Σ. Then, the Glushkov utomton of α cn e constructed in time O(mn) y pplying ɛ-removl to its Thompson utomton. Proof. If K is null-k-closed for α, then K is k-closed for ll the pths considered during the computtion of the ɛ-closures nd, y Lemm 2, ech ɛ-closure cn e computed in liner time, tht is in O(m). Since n + 1 closures need to e computed, the totl complexity is in O(mn + n 2 ) = O(mn). In the unweighted cse, the unpulished mnuscript of [10] showed tht the Glushkov utomton could e otined y removing the ɛ-trnsitions from the Thompson utomton using specil-purpose ɛ-removl lgorithm.

8 4 Follow Automton The follow utomton of n unweighted regulr expression α, denoted y A F (α) ws introduced y [12]. Figure 1(c) shows n exmple. It is the quotient of A G (α) y the equivlence reltion F defined over pos 0 (α) y: i F j iff { {i, j} lst0 (α) or {i, j} lst 0 (α) =, nd follow(α, i) = follow(α, j). (5) Proposition 2. For ny regulr expression α, the following identities hold: A F (α) = min(a G (α)) nd A F (α) = min(a G (α)). Note tht it is mentioned in [12] tht minimiztion could e used to construct the follow utomt ut the uthors clim tht the complexity of minimiztion would e in O(n 2 log n) mking this pproch less efficient. The following lemm shows tht minimiztion hs in fct etter complexity in this cse. Oserve tht A G (α) is deterministic. Lemm 3. The time complexity of Hopcroft s minimiztion lgorithm pplied to A G (α) is liner in the size of A G (α): it is in O(n 2 ) where n = α Σ. Proof. We give sketch of the proof. The log Q fctor in Hopcroft s lgorithm corresponds to the numer of times the incoming trnsitions t given stte q re used to split suset (tenttive equivlence clss). In A G (α), trnsitions shring the sme lel hve ll the sme destintion stte (the utomton is 1-locl), thus ech incoming trnsition of stte q cn only e used to split suset once. The numer of trnsitions in A G (α) is t most n 2. The lemm holds in fct for ll 1-locl utomt. This leds to simple lgorithm for constructing the follow utomton of regulr expression α sed on: A F (α) = min(rmeps(a T (α))). (6) The complexity of this lgorithm is in O(mn) which is the sme s tht of the more complicted nd specil-purpose lgorithms of [12, 7]. When the semiring K is wekly divisile, zero-sum free, nd closed, we cn define the follow utomton of weighted regulr expression α s: A F (α) = min(a G (α)). Theorem 2. Let α e weighted regulr expression over K. If K is k-closed for the Thompson utomton of α, then the follow utomton of α cn e computed in O(mn) y pplying epsilon-removl followed y weighted minimiztion to the Thompson utomton of α. Proof. The shortest-distnce computtion required y weight-pushing cn e done in O(m) in the cse of A T (α) nd is preserved y ɛ-removl. The weighted utomton push(a G (α)) is 1-locl when considered s finite utomton over pirs (lel, weight), thus Lemm 3 cn e pplied.

9 5 Antimirov Automton The definition of the Antimirov utomton of regulr expression is sed on tht of the prtil derivtives of regulr expressions, which re multisets of pirs of the form (w, α) where w K is weight nd α weighted regulr expression over K. For ny weight k K nd ny regulr expression β, we define the following opertions: k (w, α) = (k w, α) (w, α) k = (w, αk) (w, α) β = (w, α β), (7) which cn e nturlly extended to multisets of pirs (w, α). By multisets, we men tht {(w, α)} {(w, α )} = {(w, α), (w, α )}. The prtil derivtive of α with respect to Σ is the multiset of pirs (w, α) defined recursively y: (ɛ) = (1) = (β + γ) = (β) (γ) () = ɛ if =, otherwise (β γ) = (β) γ null(β) (γ) (kβ) = k (β) (β ) = null(β) (β) β (βk) = (β) k. The prtil derivtive of α with respect to the string s Σ is denoted y s (α) nd recursively defined y s (α) = ( s (α)). Let D(α) = {β : (w, β) s (α) with s Σ nd w K}. Note tht for D(α) to e well-defined, we need to define when two expressions re the sme. Here, we will only llow the following identities: α = α =, + α = α + =, 0α = α0 =, ɛ α = α ɛ = α, 1α = α1 = α, k(k α) = (k k )α, (αk)k = α(k k ) nd (α + β) γ = α γ + β γ. 2 The Antimirov or prtil derivtives utomton A A (α) of α is then the utomton defined y A A (α) = (Σ, D(α), E, α, 1, F, null) where E = {(β,, w, γ): w = (w,γ) (β) w } nd F = {β D(α): null(β) 0}. Figure 1(d) shows the Antimirov utomton for specific regulr expression. Let Σ = Σ {ɛ 1 +, ɛ 2 +, ɛ 1, ɛ 2 }. We denote y ÂT(α) the weighted utomton over Σ otined y recursively mrking some of the ɛ-trnsitions of the Thompson utomton A T (α) s follows: if α = β + γ, we lel y ɛ 1 + (ɛ 2 +) the ɛ-trnsition from I AT (α) to I AT (β) (resp. I AT (γ)); if α = β, we lel y ɛ 1 (ɛ 2 ) the two ɛ-trnsitions to I AT (β) (resp. F AT (α)). Oserve tht ÂT(α) cn e viewed s n utomton recognizing the expression α over Σ recursively defined y =, ɛ = ɛ, â =, kβ = k β, βk = βk, β + γ = ɛ 1 + β + ɛ 2 + γ, β γ = β γ nd β = (ɛ 1 β) ɛ 2. For i pos 0 (α), we use the sme nottion q i (with q 0 = I) for the corresponding sttes in A T (α), Â T (α) nd rmeps(ât(α)). For stte q in rmeps(ât(α)), we define y L(q) the lnguge recognized from q considering rmeps(ât(α)) s n unweighted utomton over pirs (symol,weight). Lemm 4 follows from our mrking of the ɛ-trnsitions. 2 These identities re the trivil identities considered in [15] except for the lst two which were dded to simplify our presenttion. Any lrger set of identities cn e hndled with our method y rewriting α in the corresponding norml form.

10 Lemm 4. For i pos 0 (α), L(q i ) uniquely defines regulr expression over Σ, denoted y δ i (or δ α i in the presence of miguity). Lemm 5. For ll i pos 0 (α) nd j pos(α), we hve for p j, q i in A T (α) tht: (p j, w) closure(q i ) iff (w, δ j ) (δ i ). (8) Proof. The proof is y induction on the length of the regulr expression. If α =, α = ɛ or α =, then the properties trivilly hold. We give the proof in the cse α = β γ, other cses cn e treted similrly. Let A = A T (α), B = A T (β) nd C = A T (γ). If q i is in C, then δi α = δ γ i nd closure A (q i ) = closure C (q i ). Therefore, if (w, p j ) closure A (q i ), p j is in C nd then δj α = δγ j. Hence (8) recursively holds. If q i is in B, then δi α = δ β i γ nd we hve: (δ α i ) = (δ β i ) γ null(δβ i ) (γ) (9) closure A (q i ) = closure B (q i ) null(δ β i ) closure C(I C ). (10) By induction, we hve (p j, w) closure B (q i ) iff (w, δ β j ) (δ β i ), nd (p j, w) closure C (I C ) iff (w, δ γ j ) (δ γ 0 ) = (γ). Hence (8) follows. Note tht δ 0 = α, thus Lemm 5 implies tht the δ i re the derived terms of α, more precisely, i δ i is surjection from pos 0 (α) onto D(α). This leds us to the following result, where min B is unweighted minimiztion when ech pir (lel, weight) is treted s regulr symol nd rmeps denotes the removl of the mrked ɛ s. Proposition 3. We hve A A (α) = rmeps(min B (rmeps(ât(α)))). Proof. Note tht rmeps(ât(α)) is deterministic. During minimiztion, two sttes q i nd q j re merged iff L(q i ) = L(q j ), tht is, y Lemm 4, iff δ i = δ j. Thus, there is ijection etween D(α) nd the set of sttes of min B (rmeps(ât(α))) hving n incoming trnsition with lel in Σ, nd thus lso etween D(α) nd the set of sttes of A = rmeps(min B (rmeps(ât(α)))). Lemm 5 ensures tht the trnsitions of A re consistent with the definition of A A (α). Theorem 3. Let α e weighted regulr expression over K. If K is null-k-closed for α, then the Antimirov utomton of α cn e computed in O(m log m + mn) using ɛ-removl nd minimiztion. Theorem 3 follows from the fct tht rmeps(ât(α)) hs O(m) sttes nd trnsitions. In the unweighted cse, this complexity mtches tht of the more complicted nd est known lgorithm of [8]. In the weighted cse, the use of minimiztion over (lel,weight) pirs is suoptiml since sttes tht would e equivlent modulo -multiplictive fctor re not merged. When possile, using weighted minimiztion insted would led to smller utomton in generl. For exmple, if K is closed, we cn defined

11 the normlized Antimirov utomton of α s rmeps(min K (rmeps(ât(α)))). This utomton is lwys smller thn the Antimirov utomton nd the utomton of unitry derived terms of [15]. 3 When K is k-closed, it cn e constructed in O(m log m + mn). Remrk. When the condition out k-closedness (null-k-closedness for α) of K is relxed to the closedness of K (resp. tht α is well-defined), ll our construction lgorithms cn still e used y replcing the generic single-source shortestdistnce lgorithm with generliztion of the Floyd-Wrshll lgorithm [14, 19], leding to complexity of O(m 3 ). It is not hrd however to mintin the qudrtic complexity y modifying the generic single-source shortest-distnce lgorithm to tke dvntge of the specil topology of the Thompson utomton. In the unweighted cse, every regulr expression cn e strightforwrdly rewritten in ɛ-norml form such tht m = O(n). In tht cse, our O(mn) nd O(m log m + mn) complexities ecome O(m + n 2 ) which coincides with wht is often reported in the literture. 6 Conclusion We presented simple nd unified view of ɛ-free utomt representing unweighted nd weighted regulr expressions. We showed tht stndrd unweighted nd weighted epsilon-removl nd minimiztion lgorithms cn e used to crete the Glushkov, follow, nd Antimirov utomt nd tht the time complexity of our construction lgorithms is t lest s fvorle s tht of the est previously known lgorithm. This provides etter understnding of the ɛ-free utomt representing regulr expressions. It lso suggests using other comintions of epsilon-removl nd minimiztion for creting ɛ-free utomt. For exmple, in some contexts, it might e eneficil to use reverse-epsilon-removl rther thn epsilon-removl [18]. Note lso tht the Glushkov utomton cn e constructed on-the-fly since Thompson s construction nd epsilon-removl oth dmit n on-demnd implementtion. Acknowledgments. This project ws sponsored in prt y the Deprtment of the Army Awrd Numer W23RYX-3275-N605. The U.S. Army Medicl Reserch Acquisition Activity, 820 Chndler Street, Fort Detrick MD is the wrding nd dministering cquisition office. The content of this mteril does not necessrily reflect the position or the policy of the Government nd no officil endorsement should e inferred. This work ws lso prtilly funded y the New York Stte Office of Science Technology nd Acdemic Reserch (NYSTAR). References 1. A. V. Aho, R. Sethi, nd J. D. Ullmn. Compilers, Principles, Techniques nd Tools. Addison Wesley: Reding, MA, This utomton cn e viewed in our pproch s the result of simpler form of reweighting thn weight-pushing, the reweighting used y weighted minimiztion.

12 2. V. M. Antimirov. Prtil derivtives of regulr expressions nd finite utomton constructions. Theoreticl Computer Science, 155(2): , G. Berry nd R. Sethi. From regulr expressions to deterministic utomt. Theoreticl Computer Science, 48(3): , A. Brüggemnn-Klein. Regulr expressions into finite utomt. Theoreticl Computer Science, 120(2): , P. Cron nd M. Flouret. Glushkov construction for series: the non commuttive cse. Interntionl Journl of Computer Mthemtics, 80(4): , J.-M. Chmprnud, É. Lugerotte, F. Ourdi, nd D. Zidi. From regulr weighted expressions to finite utomt. In Proceedings of CIAA 2003, volume 2759 of Lecture Notes in Computer Science, pges Springer-Verlg, J.-M. Chmprnud, F. Nicrt, nd D. Zidi. Computing the follow utomton of n expression. In Proceedings of CIAA 2004, volume 3317 of Lecture Notes in Computer Science, pges Springer-Verlg, J.-M. Chmprnud nd D. Zidi. Computing the eqution utomton of regulr expression in O(s 2 ) spce nd time. In Proceedings of CPM 2001, volume 2089 of Lecture Notes in Computer Science, pges Springer-Verlg, C.-H. Chng nd R. Pge. From regulr expressions to DFA s using compressed NFA s. Theoreticl Computer Science, 178(1-2):1 36, D. Gimmrresi, J.-L. Ponty, nd D. Wood. Glushkov nd Thompson constructions: synthesis V. M. Glushkov. The strct theory of utomt. Russin Mthemticl Surveys, 16:1 53, L. Ilie nd S. Yu. Follow utomt. Informtion nd Computtion, 186(1): , S. C. Kleene. Representtions of events in nerve sets nd finite utomt. In C. E. Shnnon, J. McCrthy, nd W. R. Ashy, editors, Automt Studies, pges Princeton University Press, D. J. Lehmnn. Algeric structures for trnsitives closures. Theoreticl Computer Science, 4:59 76, S. Lomrdy nd J. Skrovitch. Derivtives of rtionl expressions with multiplicity. Theoreticl Computer Science, 332(1-3): , R. McNughton nd H. Ymd. Regulr expressions nd stte grphs for utomt. IEEE Trnsctions on Electronic Computers, 9(1):39 47, M. Mohri. Finite-Stte Trnsducers in Lnguge nd Speech Processing. Computtionl Linguistics, 23:2, M. Mohri. Generic e-removl nd input e-normliztion lgorithms for weighted trnsducers. Interntionl Journl of Foundtions of Computer Science, 13(1): , M. Mohri. Semiring Frmeworks nd Algorithms for Shortest-Distnce Prolems. Journl of Automt, Lnguges nd Comintorics, 7(3): , G. Nvrro nd M. Rffinot. Fst regulr expression serch. In Proceedings of WAE 99, volume 1668 of Lecture Notes in Computer Science, pges Springer-Verlg, G. Nvrro nd M. Rffinot. Flexile pttern mtching. Cmridge University Press, J.-L. Ponty, D. Zidi, nd J.-M. Chmprnud. A new qudrtic lgorithm to convert regulr expression into utomt. In Proceedings of WIA 96, volume 1260 of Lecture Notes in Computer Science, pges Springer-Verlg, M.-P. Schützenerger. On the definition of fmily of utomt. Informtion nd Control, 4: , K. Thompson. Regulr expression serch lgorithm. Communictions of the ACM, 11(6): , 1968.

A Unified Construction of the Glushkov, Follow, and Antimirov Automata, (TR )

A Unified Construction of the Glushkov, Follow, and Antimirov Automata, (TR ) A Unified Construction of the Glushkov, Follow, nd Antimirov Automt, (TR2006-880) Cyril Alluzen nd Mehryr Mohri Cournt Institute of Mthemticl Sciences 251 Mercer Street, New York, NY 10012, USA {lluzen,

More information

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research Speech Recognition Lecture 2: Finite Automt nd Finite-Stte Trnsducers Mehryr Mohri Cournt Institute nd Google Reserch mohri@cims.nyu.com Preliminries Finite lphet Σ, empty string. Set of ll strings over

More information

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers Speech Recognition Lecture 2: Finite Automt nd Finite-Stte Trnsducers Eugene Weinstein Google, NYU Cournt Institute eugenew@cs.nyu.edu Slide Credit: Mehryr Mohri Preliminries Finite lphet, empty string.

More information

Lecture 08: Feb. 08, 2019

Lecture 08: Feb. 08, 2019 4CS4-6:Theory of Computtion(Closure on Reg. Lngs., regex to NDFA, DFA to regex) Prof. K.R. Chowdhry Lecture 08: Fe. 08, 2019 : Professor of CS Disclimer: These notes hve not een sujected to the usul scrutiny

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016 CS125 Lecture 12 Fll 2016 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

Designing finite automata II

Designing finite automata II Designing finite utomt II Prolem: Design DFA A such tht L(A) consists of ll strings of nd which re of length 3n, for n = 0, 1, 2, (1) Determine wht to rememer out the input string Assign stte to ech of

More information

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38 Theory of Computtion Regulr Lnguges (NTU EE) Regulr Lnguges Fll 2017 1 / 38 Schemtic of Finite Automt control 0 0 1 0 1 1 1 0 Figure: Schemtic of Finite Automt A finite utomton hs finite set of control

More information

Minimal DFA. minimal DFA for L starting from any other

Minimal DFA. minimal DFA for L starting from any other Miniml DFA Among the mny DFAs ccepting the sme regulr lnguge L, there is exctly one (up to renming of sttes) which hs the smllest possile numer of sttes. Moreover, it is possile to otin tht miniml DFA

More information

Theory of Computation Regular Languages

Theory of Computation Regular Languages Theory of Computtion Regulr Lnguges Bow-Yw Wng Acdemi Sinic Spring 2012 Bow-Yw Wng (Acdemi Sinic) Regulr Lnguges Spring 2012 1 / 38 Schemtic of Finite Automt control 0 0 1 0 1 1 1 0 Figure: Schemtic of

More information

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton 25. Finite Automt AUTOMATA AND LANGUAGES A system of computtion tht only hs finite numer of possile sttes cn e modeled using finite utomton A finite utomton is often illustrted s stte digrm d d d. d q

More information

Regular expressions, Finite Automata, transition graphs are all the same!!

Regular expressions, Finite Automata, transition graphs are all the same!! CSI 3104 /Winter 2011: Introduction to Forml Lnguges Chpter 7: Kleene s Theorem Chpter 7: Kleene s Theorem Regulr expressions, Finite Automt, trnsition grphs re ll the sme!! Dr. Neji Zgui CSI3104-W11 1

More information

General Algorithms for Testing the Ambiguity of Finite Automata

General Algorithms for Testing the Ambiguity of Finite Automata Generl Algorithms for Testing the Amiguity of Finite Automt Cyril Alluzen 1,, Mehryr Mohri 2,1, nd Ashish Rstogi 1, 1 Google Reserch, 76 Ninth Avenue, New York, NY 10011. 2 Cournt Institute of Mthemticl

More information

Formal Languages and Automata

Formal Languages and Automata Moile Computing nd Softwre Engineering p. 1/5 Forml Lnguges nd Automt Chpter 2 Finite Automt Chun-Ming Liu cmliu@csie.ntut.edu.tw Deprtment of Computer Science nd Informtion Engineering Ntionl Tipei University

More information

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University CS415 Compilers Lexicl Anlysis nd These slides re sed on slides copyrighted y Keith Cooper, Ken Kennedy & Lind Torczon t Rice University First Progrmming Project Instruction Scheduling Project hs een posted

More information

Model Reduction of Finite State Machines by Contraction

Model Reduction of Finite State Machines by Contraction Model Reduction of Finite Stte Mchines y Contrction Alessndro Giu Dip. di Ingegneri Elettric ed Elettronic, Università di Cgliri, Pizz d Armi, 09123 Cgliri, Itly Phone: +39-070-675-5892 Fx: +39-070-675-5900

More information

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014 CS125 Lecture 12 Fll 2014 12.1 Nondeterminism The ide of nondeterministic computtions is to llow our lgorithms to mke guesses, nd only require tht they ccept when the guesses re correct. For exmple, simple

More information

Finite Automata-cont d

Finite Automata-cont d Automt Theory nd Forml Lnguges Professor Leslie Lnder Lecture # 6 Finite Automt-cont d The Pumping Lemm WEB SITE: http://ingwe.inghmton.edu/ ~lnder/cs573.html Septemer 18, 2000 Exmple 1 Consider L = {ww

More information

General Algorithms for Testing the Ambiguity of Finite Automata and the Double-Tape Ambiguity of Finite-State Transducers

General Algorithms for Testing the Ambiguity of Finite Automata and the Double-Tape Ambiguity of Finite-State Transducers Interntionl Journl of Foundtions of Computer Science c World Scientific Pulishing Compny Generl Algorithms for Testing the Amiguity of Finite Automt nd the Doule-Tpe Amiguity of Finite-Stte Trnsducers

More information

Coalgebra, Lecture 15: Equations for Deterministic Automata

Coalgebra, Lecture 15: Equations for Deterministic Automata Colger, Lecture 15: Equtions for Deterministic Automt Julin Slmnc (nd Jurrin Rot) Decemer 19, 2016 In this lecture, we will study the concept of equtions for deterministic utomt. The notes re self contined

More information

Deterministic Finite Automata

Deterministic Finite Automata Finite Automt Deterministic Finite Automt H. Geuvers nd J. Rot Institute for Computing nd Informtion Sciences Version: fll 2016 J. Rot Version: fll 2016 Tlen en Automten 1 / 21 Outline Finite Automt Finite

More information

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1 Chpter Five: Nondeterministic Finite Automt Forml Lnguge, chpter 5, slide 1 1 A DFA hs exctly one trnsition from every stte on every symol in the lphet. By relxing this requirement we get relted ut more

More information

Convert the NFA into DFA

Convert the NFA into DFA Convert the NF into F For ech NF we cn find F ccepting the sme lnguge. The numer of sttes of the F could e exponentil in the numer of sttes of the NF, ut in prctice this worst cse occurs rrely. lgorithm:

More information

1 Nondeterministic Finite Automata

1 Nondeterministic Finite Automata 1 Nondeterministic Finite Automt Suppose in life, whenever you hd choice, you could try oth possiilities nd live your life. At the end, you would go ck nd choose the one tht worked out the est. Then you

More information

Java II Finite Automata I

Java II Finite Automata I Jv II Finite Automt I Bernd Kiefer Bernd.Kiefer@dfki.de Deutsches Forschungszentrum für künstliche Intelligenz Finite Automt I p.1/13 Processing Regulr Expressions We lredy lerned out Jv s regulr expression

More information

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Fall Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz University of Southern Cliforni Computer Science Deprtment Compiler Design Fll Lexicl Anlysis Smple Exercises nd Solutions Prof. Pedro C. Diniz USC / Informtion Sciences Institute 4676 Admirlty Wy, Suite

More information

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science CSCI 340: Computtionl Models Kleene s Theorem Chpter 7 Deprtment of Computer Science Unifiction In 1954, Kleene presented (nd proved) theorem which (in our version) sttes tht if lnguge cn e defined y ny

More information

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4 Intermedite Mth Circles Wednesdy, Novemer 14, 2018 Finite Automt II Nickols Rollick nrollick@uwterloo.c Regulr Lnguges Lst time, we were introduced to the ide of DFA (deterministic finite utomton), one

More information

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers

80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES. 2.6 Finite State Automata With Output: Transducers 80 CHAPTER 2. DFA S, NFA S, REGULAR LANGUAGES 2.6 Finite Stte Automt With Output: Trnsducers So fr, we hve only considered utomt tht recognize lnguges, i.e., utomt tht do not produce ny output on ny input

More information

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.) CS 373, Spring 29. Solutions to Mock midterm (sed on first midterm in CS 273, Fll 28.) Prolem : Short nswer (8 points) The nswers to these prolems should e short nd not complicted. () If n NF M ccepts

More information

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true. York University CSE 2 Unit 3. DFA Clsses Converting etween DFA, NFA, Regulr Expressions, nd Extended Regulr Expressions Instructor: Jeff Edmonds Don t chet y looking t these nswers premturely.. For ech

More information

CHAPTER 1 Regular Languages. Contents

CHAPTER 1 Regular Languages. Contents Finite Automt (FA or DFA) CHAPTE 1 egulr Lnguges Contents definitions, exmples, designing, regulr opertions Non-deterministic Finite Automt (NFA) definitions, euivlence of NFAs nd DFAs, closure under regulr

More information

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck. Outline Automt Theory 101 Rlf Huuck Introduction Finite Automt Regulr Expressions ω-automt Session 1 2006 Rlf Huuck 1 Session 1 2006 Rlf Huuck 2 Acknowledgement Some slides re sed on Wolfgng Thoms excellent

More information

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018 CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35 Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA

More information

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages Deprtment of Computer Science, Austrlin Ntionl University COMP2600 Forml Methods for Softwre Engineering Semester 2, 206 Assignment Automt, Lnguges, nd Computility Smple Solutions Finite Stte Automt nd

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2 CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute Victor Admchik Dnny Sletor Gret Theoreticl Ides In Computer Science CS 5-25 Spring 2 Lecture 2 Mr 3, 2 Crnegie Mellon University Deterministic Finite Automt Finite Automt A mchine so simple tht you cn

More information

More on automata. Michael George. March 24 April 7, 2014

More on automata. Michael George. March 24 April 7, 2014 More on utomt Michel George Mrch 24 April 7, 2014 1 Automt constructions Now tht we hve forml model of mchine, it is useful to mke some generl constructions. 1.1 DFA Union / Product construction Suppose

More information

Fundamentals of Computer Science

Fundamentals of Computer Science Fundmentls of Computer Science Chpter 3: NFA nd DFA equivlence Regulr expressions Henrik Björklund Umeå University Jnury 23, 2014 NFA nd DFA equivlence As we shll see, it turns out tht NFA nd DFA re equivlent,

More information

Lecture 9: LTL and Büchi Automata

Lecture 9: LTL and Büchi Automata Lecture 9: LTL nd Büchi Automt 1 LTL Property Ptterns Quite often the requirements of system follow some simple ptterns. Sometimes we wnt to specify tht property should only hold in certin context, clled

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automt Theory nd Forml Lnguges TMV027/DIT321 LP4 2018 Lecture 10 An Bove April 23rd 2018 Recp: Regulr Lnguges We cn convert between FA nd RE; Hence both FA nd RE ccept/generte regulr lnguges; More

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 CMSC 330 1 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All exmples so fr Nondeterministic

More information

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-* Regulr Expressions (RE) Regulr Expressions (RE) Empty set F A RE denotes the empty set Opertion Nottion Lnguge UNIX Empty string A RE denotes the set {} Alterntion R +r L(r ) L(r ) r r Symol Alterntion

More information

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb. CMSC 330: Orgniztion of Progrmming Lnguges Finite Automt 2 Types of Finite Automt Deterministic Finite Automt () Exctly one sequence of steps for ech string All exmples so fr Nondeterministic Finite Automt

More information

Harvard University Computer Science 121 Midterm October 23, 2012

Harvard University Computer Science 121 Midterm October 23, 2012 Hrvrd University Computer Science 121 Midterm Octoer 23, 2012 This is closed-ook exmintion. You my use ny result from lecture, Sipser, prolem sets, or section, s long s you quote it clerly. The lphet is

More information

Formal languages, automata, and theory of computation

Formal languages, automata, and theory of computation Mälrdlen University TEN1 DVA337 2015 School of Innovtion, Design nd Engineering Forml lnguges, utomt, nd theory of computtion Thursdy, Novemer 5, 14:10-18:30 Techer: Dniel Hedin, phone 021-107052 The exm

More information

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014 CMPSCI 250: Introduction to Computtion Lecture #31: Wht DFA s Cn nd Cn t Do Dvid Mix Brrington 9 April 2014 Wht DFA s Cn nd Cn t Do Deterministic Finite Automt Forml Definition of DFA s Exmples of DFA

More information

3 Regular expressions

3 Regular expressions 3 Regulr expressions Given n lphet Σ lnguge is set of words L Σ. So fr we were le to descrie lnguges either y using set theory (i.e. enumertion or comprehension) or y n utomton. In this section we shll

More information

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Converting Regular Expressions to Discrete Finite Automata: A Tutorial Converting Regulr Expressions to Discrete Finite Automt: A Tutoril Dvid Christinsen 2013-01-03 This is tutoril on how to convert regulr expressions to nondeterministic finite utomt (NFA) nd how to convert

More information

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1 Non-Deterministic Finite Automt Fll 2018 Costs Busch - RPI 1 Nondeterministic Finite Automton (NFA) Alphbet ={} q q2 1 q 0 q 3 Fll 2018 Costs Busch - RPI 2 Nondeterministic Finite Automton (NFA) Alphbet

More information

Lecture 09: Myhill-Nerode Theorem

Lecture 09: Myhill-Nerode Theorem CS 373: Theory of Computtion Mdhusudn Prthsrthy Lecture 09: Myhill-Nerode Theorem 16 Ferury 2010 In this lecture, we will see tht every lnguge hs unique miniml DFA We will see this fct from two perspectives

More information

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun: CMPU 240 Lnguge Theory nd Computtion Spring 2019 NFAs nd Regulr Expressions Lst clss: Introduced nondeterministic finite utomt with -trnsitions Tody: Prove n NFA- is no more powerful thn n NFA Introduce

More information

Finite-State Automata: Recap

Finite-State Automata: Recap Finite-Stte Automt: Recp Deepk D Souz Deprtment of Computer Science nd Automtion Indin Institute of Science, Bnglore. 09 August 2016 Outline 1 Introduction 2 Forml Definitions nd Nottion 3 Closure under

More information

Chapter 2 Finite Automata

Chapter 2 Finite Automata Chpter 2 Finite Automt 28 2.1 Introduction Finite utomt: first model of the notion of effective procedure. (They lso hve mny other pplictions). The concept of finite utomton cn e derived y exmining wht

More information

NFAs continued, Closure Properties of Regular Languages

NFAs continued, Closure Properties of Regular Languages Algorithms & Models of Computtion CS/ECE 374, Fll 2017 NFAs continued, Closure Properties of Regulr Lnguges Lecture 5 Tuesdy, Septemer 12, 2017 Sriel Hr-Peled (UIUC) CS374 1 Fll 2017 1 / 31 Regulr Lnguges,

More information

1 From NFA to regular expression

1 From NFA to regular expression Note 1: How to convert DFA/NFA to regulr expression Version: 1.0 S/EE 374, Fll 2017 Septemer 11, 2017 In this note, we show tht ny DFA cn e converted into regulr expression. Our construction would work

More information

Parse trees, ambiguity, and Chomsky normal form

Parse trees, ambiguity, and Chomsky normal form Prse trees, miguity, nd Chomsky norml form In this lecture we will discuss few importnt notions connected with contextfree grmmrs, including prse trees, miguity, nd specil form for context-free grmmrs

More information

CISC 4090 Theory of Computation

CISC 4090 Theory of Computation 9/6/28 Stereotypicl computer CISC 49 Theory of Computtion Finite stte mchines & Regulr lnguges Professor Dniel Leeds dleeds@fordhm.edu JMH 332 Centrl processing unit (CPU) performs ll the instructions

More information

BACHELOR THESIS Star height

BACHELOR THESIS Star height BACHELOR THESIS Tomáš Svood Str height Deprtment of Alger Supervisor of the chelor thesis: Study progrmme: Study rnch: doc. Štěpán Holu, Ph.D. Mthemtics Mthemticl Methods of Informtion Security Prgue 217

More information

Nondeterminism and Nodeterministic Automata

Nondeterminism and Nodeterministic Automata Nondeterminism nd Nodeterministic Automt 61 Nondeterminism nd Nondeterministic Automt The computtionl mchine models tht we lerned in the clss re deterministic in the sense tht the next move is uniquely

More information

Lexical Analysis Finite Automate

Lexical Analysis Finite Automate Lexicl Anlysis Finite Automte CMPSC 470 Lecture 04 Topics: Deterministic Finite Automt (DFA) Nondeterministic Finite Automt (NFA) Regulr Expression NFA DFA A. Finite Automt (FA) FA re grph, like trnsition

More information

Worked out examples Finite Automata

Worked out examples Finite Automata Worked out exmples Finite Automt Exmple Design Finite Stte Automton which reds inry string nd ccepts only those tht end with. Since we re in the topic of Non Deterministic Finite Automt (NFA), we will

More information

Homework 3 Solutions

Homework 3 Solutions CS 341: Foundtions of Computer Science II Prof. Mrvin Nkym Homework 3 Solutions 1. Give NFAs with the specified numer of sttes recognizing ech of the following lnguges. In ll cses, the lphet is Σ = {,1}.

More information

Non-deterministic Finite Automata

Non-deterministic Finite Automata Non-deterministic Finite Automt From Regulr Expressions to NFA- Eliminting non-determinism Rdoud University Nijmegen Non-deterministic Finite Automt H. Geuvers nd J. Rot Institute for Computing nd Informtion

More information

Non-deterministic Finite Automata

Non-deterministic Finite Automata Non-deterministic Finite Automt Eliminting non-determinism Rdoud University Nijmegen Non-deterministic Finite Automt H. Geuvers nd T. vn Lrhoven Institute for Computing nd Informtion Sciences Intelligent

More information

Myhill-Nerode Theorem

Myhill-Nerode Theorem Overview Myhill-Nerode Theorem Correspondence etween DA s nd MN reltions Cnonicl DA for L Computing cnonicl DFA Myhill-Nerode Theorem Deepk D Souz Deprtment of Computer Science nd Automtion Indin Institute

More information

Formal Language and Automata Theory (CS21004)

Formal Language and Automata Theory (CS21004) Forml Lnguge nd Automt Forml Lnguge nd Automt Theory (CS21004) Khrgpur Khrgpur Khrgpur Forml Lnguge nd Automt Tle of Contents Forml Lnguge nd Automt Khrgpur 1 2 3 Khrgpur Forml Lnguge nd Automt Forml Lnguge

More information

FABER Formal Languages, Automata and Models of Computation

FABER Formal Languages, Automata and Models of Computation DVA337 FABER Forml Lnguges, Automt nd Models of Computtion Lecture 5 chool of Innovtion, Design nd Engineering Mälrdlen University 2015 1 Recp of lecture 4 y definition suset construction DFA NFA stte

More information

A negative answer to a question of Wilke on varieties of!-languages

A negative answer to a question of Wilke on varieties of!-languages A negtive nswer to question of Wilke on vrieties of!-lnguges Jen-Eric Pin () Astrct. In recent pper, Wilke sked whether the oolen comintions of!-lnguges of the form! L, for L in given +-vriety of lnguges,

More information

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9. Regulr Expressions, Pumping Lemm, Right Liner Grmmrs Ling 106 Mrch 25, 2002 1 Regulr Expressions A regulr expression descries or genertes lnguge: it is kind of shorthnd for listing the memers of lnguge.

More information

NFAs continued, Closure Properties of Regular Languages

NFAs continued, Closure Properties of Regular Languages lgorithms & Models of omputtion S/EE 374, Spring 209 NFs continued, losure Properties of Regulr Lnguges Lecture 5 Tuesdy, Jnury 29, 209 Regulr Lnguges, DFs, NFs Lnguges ccepted y DFs, NFs, nd regulr expressions

More information

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh Lnguges nd Automt Finite Automt Informtics 2A: Lecture 3 John Longley School of Informtics University of Edinburgh jrl@inf.ed.c.uk 22 September 2017 1 / 30 Lnguges nd Automt 1 Lnguges nd Automt Wht is

More information

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy:

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy: Scnner Specifying ptterns source code tokens scnner prser IR A scnner must recognize the units of syntx Some prts re esy: errors mps chrcters into tokens the sic unit of syntx x = x + y; ecomes

More information

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages 5//6 Grmmr Automt nd Lnguges Regulr Grmmr Context-free Grmmr Context-sensitive Grmmr Prof. Mohmed Hmd Softwre Engineering L. The University of Aizu Jpn Regulr Lnguges Context Free Lnguges Context Sensitive

More information

First Midterm Examination

First Midterm Examination Çnky University Deprtment of Computer Engineering 203-204 Fll Semester First Midterm Exmintion ) Design DFA for ll strings over the lphet Σ = {,, c} in which there is no, no nd no cc. 2) Wht lnguge does

More information

Regular Languages and Applications

Regular Languages and Applications Regulr Lnguges nd Applictions Yo-Su Hn Deprtment of Computer Science Yonsei University 1-1 SNU 4/14 Regulr Lnguges An old nd well-known topic in CS Kleene Theorem in 1959 FA (finite-stte utomton) constructions:

More information

First Midterm Examination

First Midterm Examination 24-25 Fll Semester First Midterm Exmintion ) Give the stte digrm of DFA tht recognizes the lnguge A over lphet Σ = {, } where A = {w w contins or } 2) The following DFA recognizes the lnguge B over lphet

More information

State Minimization for DFAs

State Minimization for DFAs Stte Minimiztion for DFAs Red K & S 2.7 Do Homework 10. Consider: Stte Minimiztion 4 5 Is this miniml mchine? Step (1): Get rid of unrechle sttes. Stte Minimiztion 6, Stte is unrechle. Step (2): Get rid

More information

Thoery of Automata CS402

Thoery of Automata CS402 Thoery of Automt C402 Theory of Automt Tle of contents: Lecture N0. 1... 4 ummry... 4 Wht does utomt men?... 4 Introduction to lnguges... 4 Alphets... 4 trings... 4 Defining Lnguges... 5 Lecture N0. 2...

More information

GNFA GNFA GNFA GNFA GNFA

GNFA GNFA GNFA GNFA GNFA DFA RE NFA DFA -NFA REX GNFA Definition GNFA A generlize noneterministic finite utomton (GNFA) is grph whose eges re lele y regulr expressions, with unique strt stte with in-egree, n unique finl stte with

More information

Foundations of XML Types: Tree Automata

Foundations of XML Types: Tree Automata 1 / 43 Foundtions of XML Types: Tree Automt Pierre Genevès CNRS (slides mostly sed on slides y W. Mrtens nd T. Schwentick) University of Grenole Alpes, 2017 2018 2 / 43 Why Tree Automt? Foundtions of XML

More information

CS 275 Automata and Formal Language Theory

CS 275 Automata and Formal Language Theory CS 275 utomt nd Forml Lnguge Theory Course Notes Prt II: The Recognition Prolem (II) Chpter II.5.: Properties of Context Free Grmmrs (14) nton Setzer (Bsed on ook drft y J. V. Tucker nd K. Stephenson)

More information

On NFA reductions. N6A 5B7, London, Ontario, CANADA ilie 2 Department of Computer Science, University of Chile

On NFA reductions. N6A 5B7, London, Ontario, CANADA ilie 2 Department of Computer Science, University of Chile On NFA reductions Lucin Ilie 1,, Gonzlo Nvrro 2, nd Sheng Yu 1, 1 Deprtment of Computer Science, University of Western Ontrio N6A 5B7, London, Ontrio, CANADA ilie syu@csd.uwo.c 2 Deprtment of Computer

More information

Random subgroups of a free group

Random subgroups of a free group Rndom sugroups of free group Frédérique Bssino LIPN - Lortoire d Informtique de Pris Nord, Université Pris 13 - CNRS Joint work with Armndo Mrtino, Cyril Nicud, Enric Ventur et Pscl Weil LIX My, 2015 Introduction

More information

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!) CMSC 330: Orgniztion of Progrmming Lnguges DFAs, nd NFAs, nd Regexps (Oh my!) CMSC330 Spring 2018 Types of Finite Automt Deterministic Finite Automt (DFA) Exctly one sequence of steps for ech string All

More information

A tutorial on sequential functions

A tutorial on sequential functions A tutoril on sequentil functions Jen-Éric Pin LIAFA, CNRS nd University Pris 7 30 Jnury 2006, CWI, Amsterdm Outline (1) Sequentil functions (2) A chrcteriztion of sequentil trnsducers (3) Miniml sequentil

More information

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata CS103B ndout 18 Winter 2007 Ferury 28, 2007 Finite Automt Initil text y Mggie Johnson. Introduction Severl childrens gmes fit the following description: Pieces re set up on plying ord; dice re thrown or

More information

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings... Tle of contents: Lecture N0.... 3 ummry... 3 Wht does utomt men?... 3 Introduction to lnguges... 3 Alphets... 3 trings... 3 Defining Lnguges... 4 Lecture N0. 2... 7 ummry... 7 Kleene tr Closure... 7 Recursive

More information

DFA minimisation using the Myhill-Nerode theorem

DFA minimisation using the Myhill-Nerode theorem DFA minimistion using the Myhill-Nerode theorem Johnn Högerg Lrs Lrsson Astrct The Myhill-Nerode theorem is n importnt chrcteristion of regulr lnguges, nd it lso hs mny prcticl implictions. In this chpter,

More information

Let's start with an example:

Let's start with an example: Finite Automt Let's strt with n exmple: Here you see leled circles tht re sttes, nd leled rrows tht re trnsitions. One of the sttes is mrked "strt". One of the sttes hs doule circle; this is terminl stte

More information

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA) Finite Automt (FA or DFA) CHAPTER Regulr Lnguges Contents definitions, exmples, designing, regulr opertions Non-deterministic Finite Automt (NFA) definitions, equivlence of NFAs DFAs, closure under regulr

More information

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh Finite Automt Informtics 2A: Lecture 3 Mry Cryn School of Informtics University of Edinburgh mcryn@inf.ed.c.uk 21 September 2018 1 / 30 Lnguges nd Automt Wht is lnguge? Finite utomt: recp Some forml definitions

More information

CS 267: Automated Verification. Lecture 8: Automata Theoretic Model Checking. Instructor: Tevfik Bultan

CS 267: Automated Verification. Lecture 8: Automata Theoretic Model Checking. Instructor: Tevfik Bultan CS 267: Automted Verifiction Lecture 8: Automt Theoretic Model Checking Instructor: Tevfik Bultn LTL Properties Büchi utomt [Vrdi nd Wolper LICS 86] Büchi utomt: Finite stte utomt tht ccept infinite strings

More information

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science CSCI 340: Computtionl Models Trnsition Grphs Chpter 6 Deprtment of Computer Science Relxing Restrints on Inputs We cn uild n FA tht ccepts only the word! 5 sttes ecuse n FA cn only process one letter t

More information

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51 Non Deterministic Automt Linz: Nondeterministic Finite Accepters, pge 51 1 Nondeterministic Finite Accepter (NFA) Alphbet ={} q 1 q2 q 0 q 3 2 Nondeterministic Finite Accepter (NFA) Alphbet ={} Two choices

More information

Tutorial Automata and formal Languages

Tutorial Automata and formal Languages Tutoril Automt nd forml Lnguges Notes for to the tutoril in the summer term 2017 Sestin Küpper, Christine Mik 8. August 2017 1 Introduction: Nottions nd sic Definitions At the eginning of the tutoril we

More information

The size of subsequence automaton

The size of subsequence automaton Theoreticl Computer Science 4 (005) 79 84 www.elsevier.com/locte/tcs Note The size of susequence utomton Zdeněk Troníček,, Ayumi Shinohr,c Deprtment of Computer Science nd Engineering, FEE CTU in Prgue,

More information

Closure Properties of Regular Languages

Closure Properties of Regular Languages Closure Properties of Regulr Lnguges Regulr lnguges re closed under mny set opertions. Let L 1 nd L 2 e regulr lnguges. (1) L 1 L 2 (the union) is regulr. (2) L 1 L 2 (the conctention) is regulr. (3) L

More information

ɛ-closure, Kleene s Theorem,

ɛ-closure, Kleene s Theorem, DEGefW5wiGH2XgYMEzUKjEmtCDUsRQ4d 1 A nice pper relevnt to this course is titled The Glory of the Pst 2 NICTA Resercher, Adjunct t the Austrlin Ntionl University nd Griffith University ɛ-closure, Kleene

More information

Review of Gaussian Quadrature method

Review of Gaussian Quadrature method Review of Gussin Qudrture method Nsser M. Asi Spring 006 compiled on Sundy Decemer 1, 017 t 09:1 PM 1 The prolem To find numericl vlue for the integrl of rel vlued function of rel vrile over specific rnge

More information

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University

Farey Fractions. Rickard Fernström. U.U.D.M. Project Report 2017:24. Department of Mathematics Uppsala University U.U.D.M. Project Report 07:4 Frey Frctions Rickrd Fernström Exmensrete i mtemtik, 5 hp Hledre: Andres Strömergsson Exmintor: Jörgen Östensson Juni 07 Deprtment of Mthemtics Uppsl University Frey Frctions

More information