Regular expressions. Regular expressions. Regular expressions. Regular expressions. Remark (FAs with initial set recognize the regular languages)

Definition (Finite automata with set of initial states) A finite automata with set of initial states, or FA with initial set for short, is a 5-tupel (Q, Σ, δ, S, F ) where Q, Σ, δ, and F are defined as for an FA, and S Q is the set of initial states. An FA A = (Q, Σ, δ, S, F ) with set of initial states accepts a word w if there is some state s S such that the FA (Q, Σ, δ, s, F ) accepts w; the language recognized by A is L(A) = L( (Q, Σ, δ, s, F ) ). }{{} s S this is an FA Intuitively speaking, an FA with set of initial states S accepts an input w if and only if there is some starting state s S from which some accepting state can be reached on processing w. Remark (FAs with initial set recognize the regular languages) The class of languages recognized by FAs with set of initial state coincides with the regular languages. For every FA (Q, Σ, δ, s, F ), the FA with initial set (Q, Σ, δ, {s}, F ) accepts the same language, hence every regular language is recognized by an FA with set of initial states. Conversely, by definition the language recognized by an FA with initial set is a finite union of regular languages, hence is regular. The transition function of an FA (Q, Σ, δ, s, F ) can be extended to set arguments H Q by letting δ(h, w) = q H δ(q, w). Then an FA with initial set (Q, Σ, δ, S, F ) accepts a word w if for the corresponding function δ the set δ(s, w) intersects F. It can be shown that equivalently δ can be extended inductively by δ(h, λ) = H and δ(h, ua) = δ(q, a). q δ(h,u) Definition (Finite automata with ε transitions) A finite automata with ε-transitions, or ε-fa for short, is a 5-tupel E = (Q, Σ, δ, s, F ) where Q, Σ, s, and F are defined as for an FA, and δ : Q (Σ {ε}) 2 Q is a transition relation with ε-transitions. Given an ε-fa E as above, the ε-closure of a state q Q, or ε closure(q) for short, satisfies (i) q is a member of the ε-closure of q, (i) for each state q in the ε-closure of q, the set δ(q, ε) is contained in the ε-closure of q. Formally, the ε-closure of q is defined as the least subset X of Q that satisfies (i) and (ii) with ε-closure of q replaced by set X. The ε-closure of q contains exactly the states that are reachable from q in the directed graph on Q given by all ε-transitions. Definition (Extended transition function of an ε-fa) The transition relation δ of an ε-fa E = (Q, Σ, δ, s, F ) extends canonically to an (extended) transition relation δ : Q Σ 2 Q. The values δ(q, w) are defined by induction over the length of w simultaneously for all states q as follows. Basis w = λ: For all q Q let δ(q, λ) = ε closure(q). Induction step w = ua for some a Σ: For all q Q let δ(q, ua) = ε closure(q ). q δ(q,u) q δ(q,a) Intuitively, δ(q, w) is the set of states that can be reached from q after having processed w where a finite number of ε-transitions can be made initially and after reading each symbol of w. Note that in general it holds that δ(q, a) δ(q, a).

Definition (Language recognized by an ε-fa) The language recognized by an ε-fa E = (Q, Σ, δ, s, F ) is L(E) = {w Σ : δ(s, w) F }. Theorem (ε-fas recognize exactly the regular languages) The class of languages recognized by ε-fas coincides with the regular languages. Proof. Any FA can be transformed into an ε-fa recognizing the same language by simply letting δ(q, ε) = for all states q, hence any regular language is recognized by some ε-fa. Proof, cont.: It remains to show that the language recognized by a given ε-fa E(Q, Σ, δ E, s E, F ) is regular. We show this by constructing an FA with set of initial states N = (Q, Σ, δ N, S N, F ) such that L(E) = L(A) where we let S N = ε closure(s E ), δ N (q, a) = ε closure(q ) for all q Q, a Σ. q δ E (q,a) Intuitively speaking, the FA N can simulate E where ε-transitions that E may perform before reading the first symbol of its input are taken care of by changing s E into the set S N, ε-transitions that E may perform after reading a symbol a are taken care of by adding all states to δ N (q, a) that E can reach from q by first processing a, followed by finitely many ε-transitions. Proof, cont.: We show that for all words w over Σ it holds that δ E (s E, w) = δ N (S N, w). By definition, a word w is accepted by E if the set on the left-hand side of ( ) contains an accepting state of E. Similarly, a word w is accepted by N if the set on the right-hand side of ( ) contains an accepting state of E. Now E and N have the same set F of accepting states, hence it is immediate from ( ) that E and N recognize the same language. ( ) Proof, cont.: We show that for all words w over Σ it holds that δ E (s E, w) = δ N (S N, w) Basis w = λ: By definition, the sets δ E (s E, λ) and δ N (S N, λ) are both equal to the ε-closure of s E. Induction step w = ua for some a Σ: δ E (s E, ua) = q δ E (s E,u) = q S N q δ N (q,u) q δ E (q,a) ( ) ε closure(q ) }{{} =δ N (q,a) }{{} = δ N (q,ua) δ N (q, a) = δ N (S N, ua), where the second equation holds because the two sets δ E (s E, u) and δ N (s N, u) coincide by the induction hypothesis.

Definition (Concatenation and Kleene closure) The concatenation of two languages L 1 and L 2 is the language L 1 L 2 = {w 1 w 2 : w 1 L 1 and w 2 L 2 }. The Kleene closure of a language L is the language L = {w 1 w m : m N and w i L for i = 1,..., m}. Note that λ L for all L. The operation is called Kleene star. Theorem (Closure under concatenation and Kleene closure) The class of regular languages is closed under concatenation and Kleene closure, i.e., (i) If languages L 1 and L 2 are regular, then L 1 L 2 is also regular. (ii) If a language L is regular, then its Kleene closure L is also regular. Proof:(i) Let regular languages L 1 and L 2 be given, as well as FAs A 1 = (Q 1, Σ 1, δ 1, s 1, F 1 ) and A 2 = (Q 2, Σ 2, δ 2, s 2, F 2 ) such that L 1 = L(A 1 ) and L 2 = L(A 2 ). It suffices to construct an FA E with ε-transitions that recognizes the language L 1 L 2. Let E = (Q 1 Q 2, Σ 1 Σ 2, δ, s 1, F 2 ), where we can assume that Q 1 and Q 2 are disjoint. Moreover, we let δ 1 (q, a) in case q Q 1 and a Σ 1, δ 2 (q, a) in case q Q 2 and a Σ 2, δ(q, a) = {s 2 } in case q F 1 and a = ε, otherwise. By construction, the ε-fa E recognizes the language L 1 L 2, the straightforward verification of this fact is left to the reader. Proof, cont.:(ii) Let L = L(N) for some FA N = (Q, Σ, δ N, s N, F ). For some new state s E / Q, we construct an FA with ε-transitions, E = (Q {s E }, Σ, δ, s E, F {s E }), that recognizes the language L. We let δ N (s N, a) in case q = s E and a Σ, [λ L(E)] δ N (q, a) in case q Q and a Σ, [simulation of N] δ(q, a) = {s N } in case q F N and a = ε, [iteration] otherwise. [no other ε-trans.] The new initial state s E is accepting, hence enforces λ L(E). After the first transition, the ε-fa E works like N but has additional ε-transitions from the accepting states of N to the initial state of N, hence E allows to iterate accepting computations of N. By construction, the ε-fa E recognizes the language L. Proof, cont.: We show L L(E). The empty word λ is in L(E). All other words w L can be written in the form w = w 1 w m where m > 0, w i L, w i λ. We show by induction over m that all such w are accepted by E. Basis m = 1: Since w 1 L = L(N) is nonempty, by construction the ε-fa E and the FA N can reach exactly the same symbols on reading the first symbol of w 1, and from there on E can simulate all computations of N on input w including the accepting ones. Induction step m > 0: By the induction hypothesis, the nonempty word w = w 1 w m 1 is accepted by E. So there is an accepting computation of E on input w that - first reaches an accepting state q after having read w 1 w m 1, - then performs an ε-transition from q to s E, - finally simulates an accepting run of N on input w m.

Proof, cont.: We show L(E) L. Fix some nonempty w L(E). The word w is accepted by E, i.e., there are states q 0,..., q t Q E and symbols b 1,..., b t Σ {ε} such that q 0 = s E, q t F, and δ(q j 1, b j ) = q j for j = 1,..., t. If we let w 1,..., w m be the maximal subwords of b 1 b t over Σ in the natural order, we obtain w = w 1 w m where m > 0 and b 1 b t = ε k 0 w 1 ε k 1 w 2 ε k2 ε k m 2 w m 1 ε k m 1 w m ε km. By construction of E, for all j such that b j = ε we have q j 1 F and q j = s E, hence the processing of each word w i corresponds to a contiguous subsequence of q 0,..., q t that starts in state s E and ends in an accepting state, i.e., all w i are in L and w is in L. Remark (Closure properties of the regular languages) The class of regular languages over some fixed alphabet Σ contains the languages of the form, {λ}, {a} for a Σ, and is closed under union, concatenation and Kleene star. Definition () With an alphabet Σ understood, the set of regular expressions over Σ is denoted by RA and is inductively defined as follows. (i) and λ are regular expressions, (ii) a is a regular expression for all a Σ, (iii) if α and β are regular expressions, then α + β, αβ, and α are regular expressions. Definition (Semantics of regular expressions) With an alphabet Σ understood, the language L(α) denoted by an regular expression α is inductively defined as follows. (i) L( ) = and L(λ ) = {λ}, (ii) L(a ) = {a} for all a Σ, (iii) L(α + β) = L(α) L(β), L(α ) = (L(α)). L(αβ) = L(α)L(β), Theorem ( and regular languages) For any given alphabet Σ, a language L over Σ is regular if and only if there is a regular expression α over Σ such that L = L(α). Proof. Fix some alphabet Σ. First we show by induction over the inductive definition of regular expressions, that the language L(α) is regular for all regular expressions α over Σ. Basis: Induction step. The languages L( ), L(λ ), and L(a ) = {a} are regular. If γ = α + β, then L(γ) = L(α) L(β) is regular. If γ = αβ, then L(γ) = L(α)L(β) is regular. If γ = α, then L(γ) = (L(α)) is regular. The three assertions hold by closure of the class of regular languages under union, concatenation, and Kleene star, and because by the induction hypothesis we can assume that the languages L(α) and L(β) are regular.

Proof, cont.: Next we show that every regular L is equal to L(α) for some regular expression α. Given a regular language L, choose some DEA D that recognizes L such that D = (Q, Σ, δ, s, F ) where Q = {q 1,..., q k }, q i q j for i j. For all i, j {1,..., k} and l {0,..., k} let P i,j,l = {u Σ : u = u 1 u n for some n N and u 1,..., u n Σ, and there are indices r 2,..., r n 1 l such that δ(q i, u 1 u t ) = q rt δ(q rn 1, u n ) = q j }, for t = 1,..., n 1 and Proof, cont.: We show simultaneously for all i and j by induction over l, that for every triple i, j, l there is a regular expression α i,j,l such that P i,j,l = L(α i,j,l ). For i 0 such that s = q i0, we then have L = {j : q j F } P i 0,j,k = {j : q j F } L(α i 0,j,k) = L(+ {j : qj F }α i0,j,k). Basis l = 0: There cannot be intermediate states, hence we have { {a Σ: δ(q i, a) = q j } in case i j, P i,j,0 = {λ} {a Σ: δ(q i, a) = q i } in case i = j, i.e., the finite set P i,j,0 can be described by a regular expression. Induction step l > 0: We have that is, the set P i,j,l contains the words u such that δ(q i, u) = q j and, except possibly for the first state q i and the last state q j, all states occurring while processing u are in the set {q 1,..., q l }. P i,j,l = P i,j,l 1 P i,l,l 1 P l,l,l 1 P l,j,l 1, and accordingly we let α i,j,l = α i,j,l 1 + α i,l,l 1 α l,l,l 1 α l,j,l 1. Definition (Mirror word and mirror language) Given a word w = a 1, a n where the a i are symbols from some alphabet, the mirror word of w is w R = a n a 1. For a language L, the mirror language of L is L R = {w R : w L}. A word w is a palindrome if w = w R. Theorem (Closure under mirror languages) The class of regular languages is closed under mirror languages, that is, if L is regular, then its mirror language L R is also regular. Proof. Variant 1: Given a DEA D = (Q, Σ, δ, s, F ), the FA with set of initial states N = (Q, Σ, δ N, F, {s}) recognizes L R if we let δ N (q, a) = {q Q : δ(q, a) = q}. Intuitively speaking, the FA N simulates reversed computations of D. Proof, cont.: Variant 2: Show by induction over the definition of regular expressions, that for every regular expression α there is a regular expression α R such that (L(α)) R = L(α R ). Basis: The assertion holds for α equal to, to λ, or to a for some symbol a because in these cases we have (L(α)) R = L(α). Induction step. The mirror language of L(α + β) is L(α R + β R ), The mirror language of L(αβ) is L(β R α R ), The mirror language of L(α ) is L((α R ) ).

Definition (Homomorphisms) Let Σ 1 and Σ 2 be alphabets. A mapping h : Σ 1 Σ 2 is a homomorphism if for all words u and v over Σ 1 it holds that h(uv) = h(u)h(v). ( ) Proposition (Homomorphisms as mappings on Σ) For every homomorphism h : Σ Σ holds for all a 1,..., a n Σ 1 h(a 1 a n ) = h(a 1 ) h(a n ), ( ) which in case n = 0 reads as h(λ) = λ. Conversely, every mapping h : Σ 1 Σ 2 defines via ( ) a homomorphism h : Σ 1 Σ 2. Proof: Equation ( ) follows from ( ) by an easy inductive argument. Obviously ( ) implies that h is an homomorphism. Definition (Images of languages) Let h : Σ 1 Σ 2 be a mapping. For languages L 1 over Σ 1 and L 2 over Σ 2 let h[l 1 ] = {h(w): w L 1 } and h 1 [L 2 ] = {w Σ 1 : h(w) L 2 } be the images of L 1 under h and of L 2 under the inverse of h. Remark As usual, for a mapping h that is 1-to-1, we have L = h 1 [h[l]] for all languages L, whereas this fails in general. For example, if we let h(a) = b 0 for all a Σ 1 and some fixed b 0 Σ 2, then the language h 1 [h[l]] contains exactly the words w over Σ 1 such that there is some word of the same length in L. Theorem (Closure under homomorphisms) The class of regular languages is closed under homomorphisms, that is, for all regular languages L and all homomorphisms h, the language h[l] is again regular. Proof. Let h : Σ 1 Σ 2 be any homomorphism. We show by induction over the definition of regular expressions, that for every regular expression α there is a regular expression α h such that h[l(α)] = L(α h ). Basis: The image of a finite language under h is again finite, hence the assertion holds for α equal to, λ, or a for some a Σ 1. Induction step: The image of L(α + β) under h is equal to L(α h + β h ). The image of L(αβ) under h is equal to L(α h β h ). The image of L(α ) under h is equal to L(αh ). Theorem (Closure under inverse homomorphisms) The class of regular languages is closed under inverse homomorphisms, that is, for all regular languages L and all homomorphisms h, the language h 1 [L] is again regular. Proof. Let h : Σ 1 Σ 2 be a homomorphism. Let L be a language over Σ 2 that is recognized by some DFA D = (Q, Σ 2, δ D, s, F )). Consider the DFA A = (Q, Σ 1, δ, s, F ) where we let for all q Q and a Σ 1 δ(q, a) = δ D (q, h(a)). We leave it to the reader to show by induction on the word length, that for every word w over Σ 1 we have δ(s, w) = δ D (s, h(w)), and that hence A accepts w if and only if h(w) is in L.