Regular expressions. Regular expressions. Regular expressions. Regular expressions. Remark (FAs with initial set recognize the regular languages)

Similar documents
Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Properties of Context-Free Languages. Closure Properties Decision Properties

CS 455/555: Mathematical preliminaries

COM364 Automata Theory Lecture Note 2 - Nondeterminism

The Pumping Lemma and Closure Properties

CS 455/555: Finite automata

Sri vidya college of engineering and technology

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Ogden s Lemma for CFLs

Equivalence of DFAs and NFAs

CS243, Logic and Computation Nondeterministic finite automata

Homomorphisms and Efficient State Minimization

Finite Automata and Regular Languages (part III)

NPDA, CFG equivalence

Regular expressions and Kleene s theorem

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Before we show how languages can be proven not regular, first, how would we show a language is regular?

Lecture 3: Nondeterministic Finite Automata

Non-deterministic Finite Automata (NFAs)

CSE 105 Theory of Computation Professor Jeanne Ferrante

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

Fooling Sets and. Lecture 5

Inf2A: Converting from NFAs to DFAs and Closure Properties

CS 154, Lecture 3: DFA NFA, Regular Expressions

Regular Expressions. Definitions Equivalence to Finite Automata

Uses of finite automata

Finite Automata and Regular languages

Lecture 7 Properties of regular languages

Regular Expressions and Language Properties

Deterministic Finite Automaton (DFA)

Regular expressions and Kleene s theorem

UNIT II REGULAR LANGUAGES

Finite Automata and Regular Languages

Chapter 6: NFA Applications

Intro to Theory of Computation

Subset construction. We have defined for a DFA L(A) = {x Σ ˆδ(q 0, x) F } and for A NFA. For any NFA A we can build a DFA A D such that L(A) = L(A D )

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Extended transition function of a DFA

Automata and Formal Languages - CM0081 Non-Deterministic Finite Automata

Finite Automata and Languages

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

1 More finite deterministic automata

Theory of Computation (I) Yijia Chen Fudan University

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Nondeterministic finite automata

How do regular expressions work? CMSC 330: Organization of Programming Languages

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

CSE 105 THEORY OF COMPUTATION

CS21 Decidability and Tractability

CS 121, Section 2. Week of September 16, 2013

Finite Universes. L is a fixed-length language if it has length n for some

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Obtaining the syntactic monoid via duality

Properties of Regular Languages (2015/10/15)

Miscellaneous. Closure Properties Decision Properties

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

Automata Theory and Formal Grammars: Lecture 1

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

Formal Languages, Automata and Models of Computation

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Lecture 2: Connecting the Three Models

Further discussion of Turing machines

3515ICT: Theory of Computation. Regular languages

Foundations of Informatics: a Bridging Course

Nondeterministic Finite Automata. Nondeterminism Subset Construction

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

Context-free grammars and languages

Computational Models - Lecture 5 1

Classes and conversions

Formal Language and Automata Theory (CS21004)

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Outline. Nondetermistic Finite Automata. Transition diagrams. A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F)

Theory of Computation 4 Non-Deterministic Finite Automata

Chap. 1.2 NonDeterministic Finite Automata (NFA)

State Complexity of Two Combined Operations: Catenation-Union and Catenation-Intersection

This lecture covers Chapter 7 of HMU: Properties of CFLs

Einführung in die Computerlinguistik

UNIT-III REGULAR LANGUAGES

Finite Automata. BİL405 - Automata Theory and Formal Languages 1

Finite State Automata Design

CS 154 Formal Languages and Computability Assignment #2 Solutions

Theory of Computation

Computational Models Lecture 2 1

Warshall s algorithm

CSE 105 THEORY OF COMPUTATION

Nondeterministic Finite Automata

Foundations of

Recitation 2 - Non Deterministic Finite Automata (NFA) and Regular OctoberExpressions

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Formal Models in NLP

Automata on linear orderings

Regular Language Equivalence and DFA Minimization. Equivalence of Two Regular Languages DFA Minimization

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata.

CS 208: Automata Theory and Logic

Finite Automata. Dr. Neil T. Dantam. Fall CSCI-561, Colorado School of Mines. Dantam (Mines CSCI-561) Finite Automata Fall / 35

Transcription:

Definition (Finite automata with set of initial states) A finite automata with set of initial states, or FA with initial set for short, is a 5-tupel (Q, Σ, δ, S, F ) where Q, Σ, δ, and F are defined as for an FA, and S Q is the set of initial states. An FA A = (Q, Σ, δ, S, F ) with set of initial states accepts a word w if there is some state s S such that the FA (Q, Σ, δ, s, F ) accepts w; the language recognized by A is L(A) = L( (Q, Σ, δ, s, F ) ). }{{} s S this is an FA Intuitively speaking, an FA with set of initial states S accepts an input w if and only if there is some starting state s S from which some accepting state can be reached on processing w. Remark (FAs with initial set recognize the regular languages) The class of languages recognized by FAs with set of initial state coincides with the regular languages. For every FA (Q, Σ, δ, s, F ), the FA with initial set (Q, Σ, δ, {s}, F ) accepts the same language, hence every regular language is recognized by an FA with set of initial states. Conversely, by definition the language recognized by an FA with initial set is a finite union of regular languages, hence is regular. The transition function of an FA (Q, Σ, δ, s, F ) can be extended to set arguments H Q by letting δ(h, w) = q H δ(q, w). Then an FA with initial set (Q, Σ, δ, S, F ) accepts a word w if for the corresponding function δ the set δ(s, w) intersects F. It can be shown that equivalently δ can be extended inductively by δ(h, λ) = H and δ(h, ua) = δ(q, a). q δ(h,u) Definition (Finite automata with ε transitions) A finite automata with ε-transitions, or ε-fa for short, is a 5-tupel E = (Q, Σ, δ, s, F ) where Q, Σ, s, and F are defined as for an FA, and δ : Q (Σ {ε}) 2 Q is a transition relation with ε-transitions. Given an ε-fa E as above, the ε-closure of a state q Q, or ε closure(q) for short, satisfies (i) q is a member of the ε-closure of q, (i) for each state q in the ε-closure of q, the set δ(q, ε) is contained in the ε-closure of q. Formally, the ε-closure of q is defined as the least subset X of Q that satisfies (i) and (ii) with ε-closure of q replaced by set X. The ε-closure of q contains exactly the states that are reachable from q in the directed graph on Q given by all ε-transitions. Definition (Extended transition function of an ε-fa) The transition relation δ of an ε-fa E = (Q, Σ, δ, s, F ) extends canonically to an (extended) transition relation δ : Q Σ 2 Q. The values δ(q, w) are defined by induction over the length of w simultaneously for all states q as follows. Basis w = λ: For all q Q let δ(q, λ) = ε closure(q). Induction step w = ua for some a Σ: For all q Q let δ(q, ua) = ε closure(q ). q δ(q,u) q δ(q,a) Intuitively, δ(q, w) is the set of states that can be reached from q after having processed w where a finite number of ε-transitions can be made initially and after reading each symbol of w. Note that in general it holds that δ(q, a) δ(q, a).

Definition (Language recognized by an ε-fa) The language recognized by an ε-fa E = (Q, Σ, δ, s, F ) is L(E) = {w Σ : δ(s, w) F }. Theorem (ε-fas recognize exactly the regular languages) The class of languages recognized by ε-fas coincides with the regular languages. Proof. Any FA can be transformed into an ε-fa recognizing the same language by simply letting δ(q, ε) = for all states q, hence any regular language is recognized by some ε-fa. Proof, cont.: It remains to show that the language recognized by a given ε-fa E(Q, Σ, δ E, s E, F ) is regular. We show this by constructing an FA with set of initial states N = (Q, Σ, δ N, S N, F ) such that L(E) = L(A) where we let S N = ε closure(s E ), δ N (q, a) = ε closure(q ) for all q Q, a Σ. q δ E (q,a) Intuitively speaking, the FA N can simulate E where ε-transitions that E may perform before reading the first symbol of its input are taken care of by changing s E into the set S N, ε-transitions that E may perform after reading a symbol a are taken care of by adding all states to δ N (q, a) that E can reach from q by first processing a, followed by finitely many ε-transitions. Proof, cont.: We show that for all words w over Σ it holds that δ E (s E, w) = δ N (S N, w). By definition, a word w is accepted by E if the set on the left-hand side of ( ) contains an accepting state of E. Similarly, a word w is accepted by N if the set on the right-hand side of ( ) contains an accepting state of E. Now E and N have the same set F of accepting states, hence it is immediate from ( ) that E and N recognize the same language. ( ) Proof, cont.: We show that for all words w over Σ it holds that δ E (s E, w) = δ N (S N, w) Basis w = λ: By definition, the sets δ E (s E, λ) and δ N (S N, λ) are both equal to the ε-closure of s E. Induction step w = ua for some a Σ: δ E (s E, ua) = q δ E (s E,u) = q S N q δ N (q,u) q δ E (q,a) ( ) ε closure(q ) }{{} =δ N (q,a) }{{} = δ N (q,ua) δ N (q, a) = δ N (S N, ua), where the second equation holds because the two sets δ E (s E, u) and δ N (s N, u) coincide by the induction hypothesis.

Definition (Concatenation and Kleene closure) The concatenation of two languages L 1 and L 2 is the language L 1 L 2 = {w 1 w 2 : w 1 L 1 and w 2 L 2 }. The Kleene closure of a language L is the language L = {w 1 w m : m N and w i L for i = 1,..., m}. Note that λ L for all L. The operation is called Kleene star. Theorem (Closure under concatenation and Kleene closure) The class of regular languages is closed under concatenation and Kleene closure, i.e., (i) If languages L 1 and L 2 are regular, then L 1 L 2 is also regular. (ii) If a language L is regular, then its Kleene closure L is also regular. Proof:(i) Let regular languages L 1 and L 2 be given, as well as FAs A 1 = (Q 1, Σ 1, δ 1, s 1, F 1 ) and A 2 = (Q 2, Σ 2, δ 2, s 2, F 2 ) such that L 1 = L(A 1 ) and L 2 = L(A 2 ). It suffices to construct an FA E with ε-transitions that recognizes the language L 1 L 2. Let E = (Q 1 Q 2, Σ 1 Σ 2, δ, s 1, F 2 ), where we can assume that Q 1 and Q 2 are disjoint. Moreover, we let δ 1 (q, a) in case q Q 1 and a Σ 1, δ 2 (q, a) in case q Q 2 and a Σ 2, δ(q, a) = {s 2 } in case q F 1 and a = ε, otherwise. By construction, the ε-fa E recognizes the language L 1 L 2, the straightforward verification of this fact is left to the reader. Proof, cont.:(ii) Let L = L(N) for some FA N = (Q, Σ, δ N, s N, F ). For some new state s E / Q, we construct an FA with ε-transitions, E = (Q {s E }, Σ, δ, s E, F {s E }), that recognizes the language L. We let δ N (s N, a) in case q = s E and a Σ, [λ L(E)] δ N (q, a) in case q Q and a Σ, [simulation of N] δ(q, a) = {s N } in case q F N and a = ε, [iteration] otherwise. [no other ε-trans.] The new initial state s E is accepting, hence enforces λ L(E). After the first transition, the ε-fa E works like N but has additional ε-transitions from the accepting states of N to the initial state of N, hence E allows to iterate accepting computations of N. By construction, the ε-fa E recognizes the language L. Proof, cont.: We show L L(E). The empty word λ is in L(E). All other words w L can be written in the form w = w 1 w m where m > 0, w i L, w i λ. We show by induction over m that all such w are accepted by E. Basis m = 1: Since w 1 L = L(N) is nonempty, by construction the ε-fa E and the FA N can reach exactly the same symbols on reading the first symbol of w 1, and from there on E can simulate all computations of N on input w including the accepting ones. Induction step m > 0: By the induction hypothesis, the nonempty word w = w 1 w m 1 is accepted by E. So there is an accepting computation of E on input w that - first reaches an accepting state q after having read w 1 w m 1, - then performs an ε-transition from q to s E, - finally simulates an accepting run of N on input w m.

Proof, cont.: We show L(E) L. Fix some nonempty w L(E). The word w is accepted by E, i.e., there are states q 0,..., q t Q E and symbols b 1,..., b t Σ {ε} such that q 0 = s E, q t F, and δ(q j 1, b j ) = q j for j = 1,..., t. If we let w 1,..., w m be the maximal subwords of b 1 b t over Σ in the natural order, we obtain w = w 1 w m where m > 0 and b 1 b t = ε k 0 w 1 ε k 1 w 2 ε k2 ε k m 2 w m 1 ε k m 1 w m ε km. By construction of E, for all j such that b j = ε we have q j 1 F and q j = s E, hence the processing of each word w i corresponds to a contiguous subsequence of q 0,..., q t that starts in state s E and ends in an accepting state, i.e., all w i are in L and w is in L. Remark (Closure properties of the regular languages) The class of regular languages over some fixed alphabet Σ contains the languages of the form, {λ}, {a} for a Σ, and is closed under union, concatenation and Kleene star. Definition () With an alphabet Σ understood, the set of regular expressions over Σ is denoted by RA and is inductively defined as follows. (i) and λ are regular expressions, (ii) a is a regular expression for all a Σ, (iii) if α and β are regular expressions, then α + β, αβ, and α are regular expressions. Definition (Semantics of regular expressions) With an alphabet Σ understood, the language L(α) denoted by an regular expression α is inductively defined as follows. (i) L( ) = and L(λ ) = {λ}, (ii) L(a ) = {a} for all a Σ, (iii) L(α + β) = L(α) L(β), L(α ) = (L(α)). L(αβ) = L(α)L(β), Theorem ( and regular languages) For any given alphabet Σ, a language L over Σ is regular if and only if there is a regular expression α over Σ such that L = L(α). Proof. Fix some alphabet Σ. First we show by induction over the inductive definition of regular expressions, that the language L(α) is regular for all regular expressions α over Σ. Basis: Induction step. The languages L( ), L(λ ), and L(a ) = {a} are regular. If γ = α + β, then L(γ) = L(α) L(β) is regular. If γ = αβ, then L(γ) = L(α)L(β) is regular. If γ = α, then L(γ) = (L(α)) is regular. The three assertions hold by closure of the class of regular languages under union, concatenation, and Kleene star, and because by the induction hypothesis we can assume that the languages L(α) and L(β) are regular.

Proof, cont.: Next we show that every regular L is equal to L(α) for some regular expression α. Given a regular language L, choose some DEA D that recognizes L such that D = (Q, Σ, δ, s, F ) where Q = {q 1,..., q k }, q i q j for i j. For all i, j {1,..., k} and l {0,..., k} let P i,j,l = {u Σ : u = u 1 u n for some n N and u 1,..., u n Σ, and there are indices r 2,..., r n 1 l such that δ(q i, u 1 u t ) = q rt δ(q rn 1, u n ) = q j }, for t = 1,..., n 1 and Proof, cont.: We show simultaneously for all i and j by induction over l, that for every triple i, j, l there is a regular expression α i,j,l such that P i,j,l = L(α i,j,l ). For i 0 such that s = q i0, we then have L = {j : q j F } P i 0,j,k = {j : q j F } L(α i 0,j,k) = L(+ {j : qj F }α i0,j,k). Basis l = 0: There cannot be intermediate states, hence we have { {a Σ: δ(q i, a) = q j } in case i j, P i,j,0 = {λ} {a Σ: δ(q i, a) = q i } in case i = j, i.e., the finite set P i,j,0 can be described by a regular expression. Induction step l > 0: We have that is, the set P i,j,l contains the words u such that δ(q i, u) = q j and, except possibly for the first state q i and the last state q j, all states occurring while processing u are in the set {q 1,..., q l }. P i,j,l = P i,j,l 1 P i,l,l 1 P l,l,l 1 P l,j,l 1, and accordingly we let α i,j,l = α i,j,l 1 + α i,l,l 1 α l,l,l 1 α l,j,l 1. Definition (Mirror word and mirror language) Given a word w = a 1, a n where the a i are symbols from some alphabet, the mirror word of w is w R = a n a 1. For a language L, the mirror language of L is L R = {w R : w L}. A word w is a palindrome if w = w R. Theorem (Closure under mirror languages) The class of regular languages is closed under mirror languages, that is, if L is regular, then its mirror language L R is also regular. Proof. Variant 1: Given a DEA D = (Q, Σ, δ, s, F ), the FA with set of initial states N = (Q, Σ, δ N, F, {s}) recognizes L R if we let δ N (q, a) = {q Q : δ(q, a) = q}. Intuitively speaking, the FA N simulates reversed computations of D. Proof, cont.: Variant 2: Show by induction over the definition of regular expressions, that for every regular expression α there is a regular expression α R such that (L(α)) R = L(α R ). Basis: The assertion holds for α equal to, to λ, or to a for some symbol a because in these cases we have (L(α)) R = L(α). Induction step. The mirror language of L(α + β) is L(α R + β R ), The mirror language of L(αβ) is L(β R α R ), The mirror language of L(α ) is L((α R ) ).

Definition (Homomorphisms) Let Σ 1 and Σ 2 be alphabets. A mapping h : Σ 1 Σ 2 is a homomorphism if for all words u and v over Σ 1 it holds that h(uv) = h(u)h(v). ( ) Proposition (Homomorphisms as mappings on Σ) For every homomorphism h : Σ Σ holds for all a 1,..., a n Σ 1 h(a 1 a n ) = h(a 1 ) h(a n ), ( ) which in case n = 0 reads as h(λ) = λ. Conversely, every mapping h : Σ 1 Σ 2 defines via ( ) a homomorphism h : Σ 1 Σ 2. Proof: Equation ( ) follows from ( ) by an easy inductive argument. Obviously ( ) implies that h is an homomorphism. Definition (Images of languages) Let h : Σ 1 Σ 2 be a mapping. For languages L 1 over Σ 1 and L 2 over Σ 2 let h[l 1 ] = {h(w): w L 1 } and h 1 [L 2 ] = {w Σ 1 : h(w) L 2 } be the images of L 1 under h and of L 2 under the inverse of h. Remark As usual, for a mapping h that is 1-to-1, we have L = h 1 [h[l]] for all languages L, whereas this fails in general. For example, if we let h(a) = b 0 for all a Σ 1 and some fixed b 0 Σ 2, then the language h 1 [h[l]] contains exactly the words w over Σ 1 such that there is some word of the same length in L. Theorem (Closure under homomorphisms) The class of regular languages is closed under homomorphisms, that is, for all regular languages L and all homomorphisms h, the language h[l] is again regular. Proof. Let h : Σ 1 Σ 2 be any homomorphism. We show by induction over the definition of regular expressions, that for every regular expression α there is a regular expression α h such that h[l(α)] = L(α h ). Basis: The image of a finite language under h is again finite, hence the assertion holds for α equal to, λ, or a for some a Σ 1. Induction step: The image of L(α + β) under h is equal to L(α h + β h ). The image of L(αβ) under h is equal to L(α h β h ). The image of L(α ) under h is equal to L(αh ). Theorem (Closure under inverse homomorphisms) The class of regular languages is closed under inverse homomorphisms, that is, for all regular languages L and all homomorphisms h, the language h 1 [L] is again regular. Proof. Let h : Σ 1 Σ 2 be a homomorphism. Let L be a language over Σ 2 that is recognized by some DFA D = (Q, Σ 2, δ D, s, F )). Consider the DFA A = (Q, Σ 1, δ, s, F ) where we let for all q Q and a Σ 1 δ(q, a) = δ D (q, h(a)). We leave it to the reader to show by induction on the word length, that for every word w over Σ 1 we have δ(s, w) = δ D (s, h(w)), and that hence A accepts w if and only if h(w) is in L.