Finite Automata and Formal Languages

Similar documents
Finite Automata and Formal Languages TMV026/DIT321 LP4 2012

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Regular Expressions [1] Regular Expressions. Regular expressions can be seen as a system of notations for denoting ɛ-nfa

Regular Expressions. Definitions Equivalence to Finite Automata

DFA to Regular Expressions

Regular Expression Unit 1 chapter 3. Unit 1: Chapter 3

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Automata and Formal Languages - CM0081 Finite Automata and Regular Expressions

Regular Expressions and Language Properties

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

CS 154, Lecture 3: DFA NFA, Regular Expressions

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017

CS 154. Finite Automata, Nondeterminism, Regular Expressions

This Lecture will Cover...

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

CS 455/555: Finite automata

COMP4141 Theory of Computation

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

UNIT-III REGULAR LANGUAGES

Theory of Computation

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

CMPSCI 250: Introduction to Computation. Lecture #23: Finishing Kleene s Theorem David Mix Barrington 25 April 2013

Formal Languages. We ll use the English language as a running example.

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

CS 121, Section 2. Week of September 16, 2013

Lecture 2: Regular Expression

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Finite Automata and Formal Languages TMV026/DIT321 LP Testing Equivalence of Regular Languages

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

Kleene Algebras and Algebraic Path Problems

UNIT II REGULAR LANGUAGES

Algorithms for NLP

Notes on Pumping Lemma

Warshall s algorithm

Formal Language and Automata Theory (CS21004)

CMSC 330: Organization of Programming Languages

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

Foundations of

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

Sri vidya college of engineering and technology

Nondeterministic Finite Automata. Nondeterminism Subset Construction

What we have done so far

Closure under the Regular Operations

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

CS 322 D: Formal languages and automata theory

Computational Models Lecture 2 1

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Computational Models Lecture 2 1

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Deterministic Finite Automaton (DFA)

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

Finite Automata and Regular Languages

Automata and Formal Languages - CM0081 Non-Deterministic Finite Automata

Formal Languages. We ll use the English language as a running example.

Finite Automata. BİL405 - Automata Theory and Formal Languages 1

Intro to Theory of Computation

Lecture 3: Finite Automata

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs

Before we show how languages can be proven not regular, first, how would we show a language is regular?

Chapter 6. Properties of Regular Languages

CS21 Decidability and Tractability

Inf2A: Converting from NFAs to DFAs and Closure Properties

Lecture 3: Finite Automata

Extended transition function of a DFA

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

Nondeterminism. September 7, Nondeterminism

V Honors Theory of Computation

Computational Models - Lecture 4

Finite Automata and Regular languages

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

CSE 105 THEORY OF COMPUTATION

Computational Models: Class 3

Author: Vivek Kulkarni ( )

Automata and Formal Languages - CM0081 Algebraic Laws for Regular Expressions

3515ICT: Theory of Computation. Regular languages

Formal Languages, Automata and Models of Computation

Context Free Grammars

Lecture 4: Finite Automata

jflap demo Regular expressions Pumping lemma Turing Machines Sections 12.4 and 12.5 in the text

Non-deterministic Finite Automata (NFAs)

Computational Models - Lecture 5 1

Parsing Regular Expressions and Regular Grammars

Regular expressions and Kleene s theorem

Recap from Last Time

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Equivalence of DFAs and NFAs

FS Properties and FSTs

Computational Models - Lecture 3

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Transcription:

Finite Automata and Formal Languages TMV26/DIT32 LP4 2 Lecture 6 April 5th 2 Regular expressions (RE) are an algebraic way to denote languages. Given a RE R, it defines the language L(R). Actually, they can be seen as a simpler representation of ǫ-nfa. We will show that RE are as expressive as DFA and hence, they define all and only the regular languages. Overview of today s lecture: From Finite Automata to RE can also be seen as a declarative way to express the strings we want to accept and serve as input language for certain systems. Example: grep command in UNIX (K. Thompson). Lecture 6 April 5th 2 Ana Bove Slide Inductive Definition of Definition: Given an alphabet Σ, we can inductively define the regular expressions over Σ as: Basic cases: The constants and ǫ are RE If a Σ then a is a RE Inductive cases: Given the RE R and S, we define the following RE: R + S and RS are RE R is RE Another Way to Define the A nicer way to define the regular expressions is by giving the following BNF (Backus-Naur Form), for a Σ: alternatively R ::= ǫ a R + R RR R R, S ::= ǫ a R + S RS R The precedence of the operands is the following: The closure operator has the highest precedence Next comes concatenation Finally, comes the operator + We use parentheses (,) to change the precedences Lecture 6 April 5th 2 Ana Bove Slide 2 Note: BNF is a way to declare the syntax of a language. It is very useful when describing context-free grammars and in particular the syntax of most programming languages. Lecture 6 April 5th 2 Ana Bove Slide 3

Example of Let Σ = {, }. () + ( + ) () + ((( )) + ) () + (ǫ + )() (ǫ + ) () + () + () + () Functional Representation of data RExp a = Empty Epsilon Atom a Plus (RExp a) (RExp a) Concat (RExp a) (RExp a) Star (RExp a) For example the expression b + (bc) is given as Plus (Atom "b") (Star (Concat (Atom "b") (Atom "c"))) Can you guess their meaning? Are there expressions that are equivalent? Lecture 6 April 5th 2 Ana Bove Slide 4 Recall: Some Operations on Languages (Lecture 2) Definition: Given L, L and L 2 languages then we define the following languages: Union: L L 2 = {x x L or x L 2 } Intersection: L L 2 = {x x L and x L 2 } Concatenation: L L 2 = {x x 2 x L, x 2 L 2 } Closure: L = n Æ Ln where L = {ǫ}, L n+ = L n L. Note: We have then that = {ǫ} and L = L L L 2... = {ǫ} {x... x n n >, x i L} Notation: L + = L L 2 L 3... and L? = L {ǫ}. Lecture 6 April 5th 2 Ana Bove Slide 6 Lecture 6 April 5th 2 Ana Bove Slide 5 Language Defined by the Definition: The language defined by a regular expression is defined by induction on the expression: L( ) = L(ǫ) = {ǫ} Given a Σ, L(a) = {a} L(R + S) = L(R) L(S) L(RS) = L(R)L(S) L(R ) = L(R) Note: x L(R) iff x is generated/accepted by R. Notation: We write x R or x L(R) indistinctly. Lecture 6 April 5th 2 Ana Bove Slide 7

Algebraic Laws for The following equalities hold for any RE R, S and T : Associativity: R + (S + T ) = (R + S) + T and R(ST ) = (RS)T Commutativity: R + S = S + R In general, RS SR Distributivity: R(S + T ) = RS + RT and (S + T )R = SR + T R Identity: R + = + R = R and Rǫ = ǫr = R Annihilator: R = R = Idempotent: R + R = R = ǫ = ǫ R? = ǫ + R R + = RR = R R R = (R ) = R R = ǫ + R + Lecture 6 April 5th 2 Ana Bove Slide 8 Example: Proving Equalities Using the Algebraic Laws Example: A proof that a b(c + da b) = (a + bc d) bc : a b(c + da b) = a b(c da b) c by denesting (R = c, S = da b) a b(c da b) c = (a bc d) a bc by shifting (R = a b, S = c d) (a bc d) a bc = (a + bc d) bc by denesting (R = a, S = bc d) Example: The set of all words with no substring of more than two adjacent s is ( + + ) (ǫ + + ). Now, ( + + ) (ǫ + + ) = ((ǫ + )(ǫ + )) (ǫ + )(ǫ + ) = (ǫ + )(ǫ + )((ǫ + )(ǫ + )) by shifting = (ǫ + + )( + + ) Then ( + + ) (ǫ + + ) = (ǫ + + )( + + ) Lecture 6 April 5th 2 Ana Bove Slide Algebraic Laws for Other useful laws to simplify regular expressions are Shifting rule: R(SR) = (RS) R Denesting rule: (R S) R = (R + S) Note: By the shifting rule we also get R (SR ) = (R + S) Variation of the denesting rule: (R S) = ǫ + (R + S) S Lecture 6 April 5th 2 Ana Bove Slide 9 Equality of Remember that RE are a way to denote languages. Then, for RE R and S, R = S actually means L(R) = L(S). Hence we can prove the equality of RE in the same way we can prove the equality of languages. Example: Let us prove that R = R R. Let L = L(R ). L L L since ǫ L. Conversely, if L L L then x = x x 2 with x L and x 2 L. If x = ǫ or x 2 = ǫ then it is clear that x L. Otherwise x = u u 2... u n with u i L and x 2 = v v 2... v m with v j L. Then x = x x 2 = u u 2... u n v v 2... v m is in L. Lecture 6 April 5th 2 Ana Bove Slide

Proving Algebraic Laws for Given the RE R and S we can prove the law R = S as follows:. Convert R and S into concrete regular expressions C and D, respectively, by replacing each variable in the RE R and S by (different) concrete symbols. Example: R(SR) = (RS) R can be converted into a(ba) = (ab) a. 2. Prove or disprove whether L(C) = L(D). If L(C) = L(D) then R = S is a true law, otherwise it is not. Theorem: The above procedure correctly identifies the true laws for RE. Proof: See theorems 3.4 and 3.3 in pages 2 and 2 respectively. Example: Proving the shifting law was (somehow) one of the exercises in assignment : prove that for all n, a(ba) n = (ab) n a. Lecture 6 April 5th 2 Ana Bove Slide 2 Example: Proving the Denesting Rule We can state (R S) R = (R + S) by proving L((a b) a ) = L((a + b) ): : Let x (a b) a, then x = vw with v (a b) and w a. By induction on v. If v = ǫ we are done. Otherwise v = av or v = bv. Observe that in both cases v (a b) hence by IH v w (a + b) and so is vw. : Let x (a + b). By induction on x. If x = ǫ then done. Otherwise x = x a or x = x b and x (a + b). By IH x (a b) a and then x = vw with v (a b) and w a. If x a = v(wa) (a b) a since v (a b) and (wa) a. If x b = (v(wb))ǫ (a b) a since v(wb) (a b) and ǫ a. Lecture 6 April 5th 2 Ana Bove Slide 3 Regular Languages and From FA to RE: Computing R (k) from an Automaton A Theorem: If L is a regular language then there exists a regular expression R such that L = L(R). Proof: Recall that each regular language has an automata that recognises it. We shall construct a regular expression from such automata. The book shows 2 ways of constructing a regular expression from an automata (sections 3.2. computing R (k) and 3.2.2. eliminating states ). Let Q A = {, 2,..., n} with being the initial state. We construct a collection of RE that progressively describe the paths in the transition diagram of A: Let R (k) be the RE whose language is the set of strings w which label a path from state i to state j in A without passing by an intermediate state bigger than k. Note that neither i nor j are intermediate states! We define R (k) by induction on k. If F A = {f,..., f r } then our final regular expression is R (n) f +... + R (n) f r Lecture 6 April 5th 2 Ana Bove Slide 4 Lecture 6 April 5th 2 Ana Bove Slide 5

Base Case: R () Inductive Step: from R (k) to R (k+) We have no intermediate states here! We have the following scenarios: Arcs from state i to j?: If there are no arc then R () = If there is one arc labelled a then R () = a If there are m arcs labelled a,..., a m then R () = a +... + a m Note: If i = j then we must consider the loops from i to itself. We have a path of length from i to itself. In a ǫ-nfa we can also have paths of length between i and j. Such a path is represented as an ǫ-transition in the automaton and as the RE ǫ. Then we need to add ǫ to the corresponding case above, obtaining then R () = ǫ, R () = ǫ + a or R () = ǫ + a +... + a m respectively. Lecture 6 April 5th 2 Ana Bove Slide 6 Remarks on the Method for Computing R (k) Works for any kind of FA (DFA, NFA and ǫ-nfa). The method is similar to Floyd-Warshall algorithm (graph analysis algorithm for finding shortest paths in a weighted, directed graph). See Wikipedia. It is expensive: we need to compute n 2 RE! It also produces very big and complicated expressions! The (intermediate) RE can usually be simplified. Still not trivial! Given a path from state i to state j without passing by an intermediate state bigger than (k + ), we have 2 possible cases: The path does not actually pass by state (k + ). Hence the label of the path is in the language of the RE R (k). The path goes through (k + ) at least once. We can break the path into pieces that do not pass through k + : first from i to (k + ), one or more from (k + ) to (k + ), last from (k + ) to j. The label for this path is represented by the RE R (k) i(k+) (R(k) (k+)(k+) ) R (k) (k+)j. The resulting RE is R (k+) = R (k) + R (k) i(k+) (R(k) (k+)(k+) ) R (k) (k+)j Lecture 6 April 5th 2 Ana Bove Slide 7 From FA to RE: Eliminating States in an Automaton A This method of constructing a RE from a FA involves eliminating states. When we eliminate the state s all the paths that went through s do not longer exists! To preserve the language of the automaton we must include, on an arc that goes directly from q to p, the labels of the paths that went from q to p passing through s. Labels now are not longer symbols but (possible an infinite number of) strings: hence we will use RE as labels. Example: See example 3.5 in the book (pages 95-97). 2, Lecture 6 April 5th 2 Ana Bove Slide 8 Lecture 6 April 5th 2 Ana Bove Slide 9

Eliminating State s in A Eliminating State s in A R q R m p If some arc does not exists q R + Q S P p Q S P in A, then it is labelled here. s R k Q k P m For simplification, we assume the q s are R k + Q k S P R m + Q S P m q k p m different from the p s. q k R km + Q k S P m p m R km Lecture 6 April 5th 2 Ana Bove Slide 2 Eliminating States in A For each accepting state q we proceed as before until we have only q and q left. For each q we have 2 cases: q q or q = q. Lecture 6 April 5th 2 Ana Bove Slide 2 Example: Regular Expression Representing Gilbreath s Principle Recall: q q 3 q 2 q 4 q 5 If q q: R q S T U q The expression is (R + SU T ) SU q q q q 3 q 4 q 5 If q = q: R q The expression is R q 2 q 4 q q 3 q 5 The final expression is the sum of the expressions derived for each final state. Lecture 6 April 5th 2 Ana Bove Slide 22 Observe: Eliminating q is trivial. Eliminating q q 3 and q 2 q 4 is also easy. Lecture 6 April 5th 2 Ana Bove Slide 23

Example: Regular Expression Representing Gilbreath s Principle Example: Regular Expression Representing Gilbreath s Principle After eliminating q, q q 3 and q 2 q 4 we get: The final RE is the sum of the 3 previous expressions. Let us first do some simplifications. q + q q 3 q 4 q 5 q 2 q 4 q 5 ( + )() (() ) = ( + )() (() ) by shifting = ( + )( + ) by the shifted-denesting rule = ( + ) + q q 3 q 5 Similarly ( + )() (() ) = ( + ) +. RE when final state is q q 3 q 4 q 5 : ( + )( + ) = ( + ) + RE when final state is q 2 q 4 q 5 : ( + )() (() ) RE when final state is q q 3 q 5 : ( + )() (() ) Lecture 6 April 5th 2 Ana Bove Slide 24 Hence the final RE is ( + ) + + ( + ) + + ( + ) + which is equivalent to ( + ) + (ǫ + + ) Lecture 6 April 5th 2 Ana Bove Slide 25