Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

Similar documents
September 11, Second Part of Regular Expressions Equivalence with Finite Aut

Theory of Languages and Automata

acs-04: Regular Languages Regular Languages Andreas Karwath & Malte Helmert Informatik Theorie II (A) WS2009/10

Theory of Computation (II) Yijia Chen Fudan University

Closure under the Regular Operations

UNIT-III REGULAR LANGUAGES

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

Theory of Computation (I) Yijia Chen Fudan University

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS21 Decidability and Tractability

Computational Models Lecture 2 1

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

Recap DFA,NFA, DTM. Slides by Prof. Debasis Mitra, FIT.

September 7, Formal Definition of a Nondeterministic Finite Automaton

What we have done so far

Computational Models Lecture 2 1

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Nondeterministic Finite Automata

Computer Sciences Department

Nondeterminism. September 7, Nondeterminism

Sri vidya college of engineering and technology

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )

CS 154, Lecture 3: DFA NFA, Regular Expressions

Lecture 3: Nondeterministic Finite Automata

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Chapter 5. Finite Automata

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

Exam 1 CSU 390 Theory of Computation Fall 2007

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Lexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University.

CS243, Logic and Computation Nondeterministic finite automata

Deterministic Finite Automaton (DFA)

Introduction to Languages and Computation

TDDD65 Introduction to the Theory of Computation

UNIT II REGULAR LANGUAGES

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Finite Automata and Regular languages

Computational Theory

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Nondeterministic Finite Automata

Foundations of

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Automata: a short introduction

CS 322 D: Formal languages and automata theory

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Incorrect reasoning about RL. Equivalence of NFA, DFA. Epsilon Closure. Proving equivalence. One direction is easy:

Automata and Computability. Solutions to Exercises

Deterministic Finite Automata (DFAs)

Formal Definition of a Finite Automaton. August 26, 2013

Deterministic Finite Automata (DFAs)

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

Computational Models - Lecture 1 1

Uses of finite automata

Automata and Computability. Solutions to Exercises

COMP4141 Theory of Computation

Lecture 4: More on Regexps, Non-Regular Languages

front pad rear pad door

Computational Models - Lecture 3

Formal Definition of Computation. August 28, 2013

Chapter Five: Nondeterministic Finite Automata

CS21 Decidability and Tractability

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Non-deterministic Finite Automata (NFAs)

Lecture 4 Nondeterministic Finite Accepters

Finite Automata. Seungjin Choi

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Nondeterministic Finite Automata. Nondeterminism Subset Construction

Finite Automata. BİL405 - Automata Theory and Formal Languages 1

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

Recitation 2 - Non Deterministic Finite Automata (NFA) and Regular OctoberExpressions

Closure under the Regular Operations

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Regular Expressions. Definitions Equivalence to Finite Automata

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

Outline. CS21 Decidability and Tractability. Regular expressions and FA. Regular expressions and FA. Regular expressions and FA

CS 455/555: Finite automata

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

(Refer Slide Time: 0:21)

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Applied Computer Science II Chapter 1 : Regular Languages

Automata and Formal Languages - CM0081 Non-Deterministic Finite Automata

Lecture 1: Finite State Automaton

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Extended transition function of a DFA

CHAPTER 1 Regular Languages. Contents

Deterministic Finite Automata (DFAs)

Outline. Nondetermistic Finite Automata. Transition diagrams. A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F)

Harvard CS 121 and CSCI E-207 Lecture 6: Regular Languages and Countability

Introduction to Theory of Computing

Subset construction. We have defined for a DFA L(A) = {x Σ ˆδ(q 0, x) F } and for A NFA. For any NFA A we can build a DFA A D such that L(A) = L(A D )

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

ECS 120 Lesson 15 Turing Machines, Pt. 1

Lecture 17: Language Recognition

Transcription:

Examples of Regular Expressions 1. 0 10, L(0 10 ) = {w w contains exactly a single 1} 2. Σ 1Σ, L(Σ 1Σ ) = {w w contains at least one 1} 3. Σ 001Σ, L(Σ 001Σ ) = {w w contains the string 001 as a substring} 4. (ΣΣ), L(ΣΣ) = {w w is a string of even length} 5. (ΣΣΣ), L(ΣΣΣ) = {w the length of w is a multiple of three} 6. 01 01, L(01 01) = {01, 01} 7. 0Σ 0 1Σ 1 0 1, L(0Σ 0 1Σ 1 0 1) = {w w starts and ends with the same symbol} 8. (0 ɛ)1, L((0 ɛ)1 ) = {01 1 } 9. (0 ɛ)(1 ɛ), L((0 ɛ)(1 ɛ)) = {ɛ, 0, 1, 01} Finite Automata vs. Regular Expressions Hantao Zhang hzhang@cs.uiowa.edu The University of Iowa, Department of Computer Science Based on the slides prepared by Professor Rus Computation Theory p.2/?? Computation Theory p.1/?? Example of Using flex DIGIT [0-9] LETTER [A-Za-z_] %% "(" ")" "," { return(*yytext); } {DIGIT}({DIGIT})* { get_integer(*yytext); return(integer); } {LETTER}({LETTER} {DIGIT})* { get_symbol(*yytext); return(symbol); } [ \t]+ { /* jump over blanks */ } \n { /* ignore newlines */ }. { report_error("unknown symbol", *yytext); } %% Save the above text in a file called try.l and type flex -i try.l > try.c Application Regular expressions are useful tools for the design of compilers Language lexicon is described by regular expressions. Example: numerical constants can be described by: {+,, ɛ}(dd DD.D D.DD ) where D = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} From the lexicon description by regular expressions on can generate automatically lexical analyzers, such as lex and flex on unix/linux machines. Computation Theory p.4/?? Computation Theory p.3/??

Theorem 1.28 A language can be recognized by a finite automaton iff it can be represented by a regular expression. Proof Idea: This proof has two parts: First part: (Lemma 1.29) We show that if a language is represented by a regular expression, then there is a finite automaton that recognizes it. Second part: (Lemma 1.32) We show that if a language is recognized by a finite automaton, then there is a regular expression that represents it. Equivalence with Finite Automata Regular expressions and finite automata are equivalent in their descriptive power. Any regular language L can be recognized by a finite automaton M L. Any finite automaton recognizing a language A can be represented by a regular expression r A specifying the language A. Computation Theory p.6/?? Computation Theory p.5/?? Step 1: If r = a Σ, then L(r) = {a} and the NFA N recognizing L(R) is: q 1 a q 2 Note: this is an NFA but not a DFA because it has states with no exiting arrow for each possible input symbol Formal construction: N = ({q 1, q 2 }, Σ, δ, q 1, {q 2 }) where δ(q 1, a) = {q 2 }, δ(r, b) =, for r q 1 and b a Lemma 1.29 If a language is represented by a regular expression, then there is a finite automaton that recognizes it. Proof idea: Assume that we have a regular expression r that represents the language A. 1. We will show how to construct from r an NFA that recognizes A, using a six-step procedure. 2. Then by Corollary 1.20, if an NFA recognizes A then A is recognizable by a DFA. Computation Theory p.8/?? Computation Theory p.7/??

Step 3: Step 2: If r = then L(r) =, and the NFA N that recognizes L(r) is: If r = ɛ then L(r) = {ɛ} and the NFA N that recognizes L(r) is: q 1 Formal construction: N = ({q}, Σ, δ, q, ) where δ(r, b) = for any r and b Formal construction: N = ({q 1 }, Σ, δ, q 1, {q 1 }) where δ(r, b) = for any r and b Σ Computation Theory p.10/?? Computation Theory p.9/?? NFA N recognizing L(r 1 ) L(r 2 ) Step 4: N 1 N If r = r 1 r 2 then L(r) = L(r 1 ) L(r 2 ). Note: in view with the inductive nature of r we may assume that: N 2 ɛ ɛ 1. N 1 = (Q 1, Σ, δ 1, q 1, F 1 ) is an NFA recognizing L(r 1 ) 2. N 2 = (Q 2, Σ, δ 2, q 2, F 2 ) is an NFA recognizing L(r 2 ) The NFA N recognizing L(r 1 r 2 ) is given in the next slide. Computation Theory p.12/?? Computation Theory p.11/??

Step 5: Construction procedure If r = r 1 r 2 then L(r) = L(r 1 ) L(r 2 ). Note: in view with the inductive nature of r we may assume that: 1. N 1 = (Q 1, Σ, δ 1, q 1, F 1 ) is an NFA recognizing L(r 1 ) 2. N 2 = (Q 2, Σ, δ 2, q 2, F 2 ) is an NFA recognizing L(r 2 ). The NFA N recognizing L(r 1 r 2 ) is given in the next slide. 1. Q = {q 0 } Q 1 Q 2 : That is, the states of N are all states on N 1 and N 2 with the addition of a new state q 0 2. The start state of N is q 0 3. The accept states of N are F = F 1 F 2 : That is, the accept states of N are all the accept states of N 1 and N 2 4. Define δ so that for any q Q and any a Σ ɛ : δ 1 (q, a), if q Q 1 δ 2 (q, a), if q Q 2 δ(q, a) = {q 1, q 2 }, if q = q 0 and a = ɛ, if q = q 0 and a ɛ. Computation Theory p.14/?? Computation Theory p.13/?? Construction procedure Construction of NFA N 1. Q = Q 1 Q 2. The states of N are all states of N 1 and N 2 2. The start state is the state q 1 of N 1 3. The accept states is the set F 2 of the accept states of N 2 4. Define δ so that for any q Q and any a Σ ɛ : δ 1 (q, a), if q Q 1 and q F 1 δ 1 (q, a), if q F 1 and a ɛ δ(q, a) = δ 1 (q, a) {q 2 }, if q F 1 and a = ɛ δ 2 (q, a), if q Q 2. N 1 N N 2 ɛ ɛ ɛ Computation Theory p.16/?? Computation Theory p.15/??

Procedure for the construction of N Step 6: N 1 N If r = r 1 then L(r) = i 0L(r 1 ) i. ɛ ɛ ɛ Note: in view with the inductive nature of r we may assume that: 1. N 1 = (Q 1, Σ, δ 1, q 1, F 1 ) is an NFA recognizing L(r 1 ) The NFA N recognizing L(r1 ) is given in the next slide. Computation Theory p.18/?? Computation Theory p.17/?? Examples conversion Construction procedure Convert the following regular expressions into NFA following the procedure presented above 1. (ab a) to an NFA 2. (a b) aba 1. Q = {q 0 } Q 1 ; that is, states of N are the states of N 1 plus a new state q 0 2. Start state if N is q 0 3. F = {q 0 } F 1 ; that is, the accept states of N are the accept states of N 1 plus the new start state 4. Define δ so that for any q Q and a Σ ɛ : δ 1 (q, a), if q Q 1 and q F 1 δ 1 (q, a), if q F 1 and a = ɛ δ(q, a) = δ 1 (q, a) {q 1 }, if q F 1 and a = ɛ {q 1 }, if q = q 0 and a = ɛ, if q = q 0 and a ɛ. Computation Theory p.20/?? Computation Theory p.19/??

Procedure Because A is regular, there is a DFA D A that recognizes A D A will be converted into a regular expression r A that specifies A Note: This procedure is broken in two parts: 1. Convert a DFA into a generalized nondeterministic finite automaton GNFA 2. Convert GNFA into a regular expression Lemma 1.32 If a language can be recognized by a finite automaton, then it is specified by a regular expression Proof idea: For a given regular language A we will construct a regular expression that specifies A. Computation Theory p.22/?? Computation Theory p.21/?? Example GNFA What is an GNFA? q start ab aa q 1 ab ba (aa) b a q 2 ab b q accept A GNFA is an NFA wherein the transition arrows may have any regular expressions as labels, instead only members of the alphabet or ɛ Hence, GNFA reads strings specified by regular expressions (block of symbols) from the input (not necessarily just one symbol) GNFA moves along a transition arrow connecting two states representing regular expression. Computation Theory p.24/?? Computation Theory p.23/??

GNFA of special form The start state has transition arrows to every other state but no arrow coming from any other state There is only one accept state and it has arrows coming in from every other state, but has no arrows going to any other state; in addition, the accept state is not the same with the start state Except for start and accept states,one arrow go from every state to every other state and from each state to itself Note A GNFA is nondeterministic and so, it may have many different ways to process the same input string A GNFA accepts its input if its processing can cause the GNFA to be in an accept state at the end of the input Computation Theory p.26/?? Computation Theory p.25/?? Note Adding transitions don t change the language recognized by DFA because a transition labeled by can never be used Assumption: from here one we assume that all GNFAs are in the special form Converting DFA to GNFA A DFA is converted to a GNFA of special form as follows: Add a new start state with an ɛ arrow to the old start state and a new accept state with an ɛ arrow from all old accept states If any arrows have multiple labels or if there are multiple arrows going between the same two states in the same direction replace each with a single arrow whose label is the union of the previous labels Add arrows labeled between states that had now arrows Computation Theory p.28/?? Computation Theory p.27/??

Example DFA conversion Assuming that the original DFA has 3 states the process of its conversion is: 3-state DFA 5-state GNFA 4-state GNFA Regular 2-state GNFA 3-state GNFA expression Converting GNFA to Regular Expressions Assume that GNFA has k states Because start and accept states are different from each other, it results that k 2 If k > 2 we construct an equivalent GNFA with k 1 states. This can be repeated for each new GNFA until we obtain a GNFA with k = 2 states. If k = 2, GNFA has a single arrow that goes from start to accept and is labeled by a regular expression that specifies the language recognized by the original DFA Computation Theory p.30/?? Computation Theory p.29/?? Repairing after ripping a state Assume that state of GNFA selected for ripping is q rip After removing q rip we repair the machine by altering the regular expressions that label each of the remaining transitions The new labels compensate for the absence of q rip by adding back the lost computation The new label of the arrow going from state q i to q j is a regular expression that specifies all strings that would take the machine from q i to q j either directly or via q rip Note The crucial step is in constructing an equivalent GNFA with one fewer states when k > 2 This is done by selecting a state, ripping it out of the machine, and repairing the remainder so that the same language is still recognized Any state can be selected for ripping, providing that it is not start or accept state. Such a state exist because k > 2 Computation Theory p.32/?? Computation Theory p.31/??

Note New labels are obtained by concatenating regular expressions of the arrows that go through q rip and union them with the labels of the arrows that travel directly between q i and q j This construct is carried out for each arrow that goes from state q i to any state q j including q i = q j Illustration We illustrate the approach of ripping and repairing below. q i r 4 q j r3 r 1 (r q rip q 1 )(r 2 ) (R3) (r i 4 ) q j r 2 before ripping after ripping Computation Theory p.34/?? Computation Theory p.33/?? GNFA G 1 obtained from D The following figure shows the four-state GNFA obtained from D by adding new start state and accept state and replacing a, b by a b a q ɛ s 1 b a b q a ɛ 2 Example 1.35 Convert the DFA D into the regular expression that specifies the language accepted by D a 1 b a, b 2 Computation Theory p.36/?? Computation Theory p.35/??

Eliminating node 1 from G 2 Eliminating node 2 from G 1 The following figure shows the GNFA G 3 obtained from G 2 by ripping off node 1 q s a b(a b) q a The following figure shows the GNFA G 2 obtained from G 1 by ripping off node 2 a q ɛ s 1 b(a b) q a Computation Theory p.38/?? Computation Theory p.37/?? Transition function of a GNFA Because an arrow connects every state to every other state, except that no arrows are coming from q a or going to q s, the domain of the transition function of a GNFA is δ : (Q {q a }) (Q {q s }) R If δ(q i, q j ) = R the arrow from q i to q j has the label r Formal Proof First we need to define formally the GNFA Since new labels are regular expressions we use the symbol R to denote the collection of regular expressions over an alphabet Σ To simplify, denote by q s and q a the start and accept states of the GNFA Computation Theory p.40/?? Computation Theory p.39/??

Computation performed by a GNFA A GNFA accepts a string w Σ if w = w 1 w 2... w k where w i Σ, 1 i k, and a sequence of states q 0, q 1,..., q k exits such that: 1. q o = q s is the start state 2. q k = q a is the accept state 3. For each i, δ(q i 1, q i ) = r i and w i L(r i ), i.e., r i is the regular expression labeling the arrow from q i 1 to q i and w i is an element of the language specified by this expression Definition 1.33 A generalized nondeterministic finite automaton is a 5-tuple (Q, Σ, δ, q s, q a ) where: 1. Q is the finite set of state 2. Σ is the input alphabet 3. δ : (Q {q a }) (Q {q s }) R is the transition function 4. q s is the start state 5. q a is the unique accept state Computation Theory p.42/?? Computation Theory p.41/?? Convert(G) More proof ideas 1. Let k be the number of state of G 2. If k = 2 then G must consists of a start state and an accept state and a single arrow connecting them, labeled by a regular expression r. Return r 3. While k > 2, select any state q rip Q, different from q s and q a and let G be the GNFA (Q, Σ, δ, q s, q a ) where: Q = Q {q rip } for any q i Q {q a } and any q j Q {q s } let δ (q i, q j ) = (r 1 )(r 2 ) (r 3 ) (r 4 ) where: r 1 = δ(q i, q rip ), r 2 = δ(q rip, q rip ), r 3 = δ(q rip, q j ), r 4 = δ(q i, q j ) Convert(G ); Returning to the proof of Lemma 1.32, we assume that M is a DFA recognizing the language A and proceed as follows: Convert M into a GNFA G by adding a new start state and a new accept state and the additional arrows Use the procedure Convert(G) that maps G into a regular expression, as explained before, while preserving the language A Note: Convert() is recursive; however that case when GNFA has only two sates is handled without recursion Computation Theory p.44/?? Computation Theory p.43/??

Induction Basis: Claim 1.34 k = 2 If G has only two states, by definition, it can have only a single arrow which goes from q s to q a The regular expression labeling this arrow specify the language accepted by G Since this expression is returned by Convert(G), it means that G and Convert(G) are equivalent For any GNFA G, Convert(G) is equivalent to G Proof: by induction on k, the number of states of G Computation Theory p.46/?? Computation Theory p.45/?? Induction Step (2) Induction Step 1. If none of the states q s, q 1, q 2,..., q a is q rip, clearly G also accepts w because each of the new regular expressions labeling arrows of G contain the old regular expressions as part of a union 2. If q rip does appear in the computation q s, q 1, q 2,..., q a by removing each run of consecutive q rip states we obtain an accepting computation for G. This is because states q i and q j bracketing a run of consecutive q rip states have a new regular expression on the arrow between them that specify all strings taking q i to q j via q rip on G. So, G accepts w in this case too. Assume that the claim is true for G having k 1 states and use this assumption to show that the claim is true for an GNFA with k states Observe from construction that G and G recognize the same language Suppose G accepts the input w. Then in an accepting branch of computation, G enters the sequence of states q s, q 1, q 2, q 3,..., q a Show that G has an accepting computation for w, too. Computation Theory p.48/?? Computation Theory p.47/??

Conclusion Induction Step (3) The induction hypothesis states that when the algorithm calls itself recursively on input G, the result is a regular expression that is equivalent to G because G has k 1 states Hence, that regular expression is also equivalent to G because G is equivalent to G Consequently Convert(G) and G are equivalent For the other direction, suppose that G accepts w. 1. Each arrow between any two states q i and q j in G is labeled by a regular expression that specifies strings specified by arrows in G from q i directly to q j or via q rip 2. Hence, by the definition of GNFA it follows that G must also accept w. That is, G and G accept the same language Computation Theory p.50/?? Computation Theory p.49/??