Finite Automata and Formal Languages TMV026/DIT321 LP4 2012

Similar documents
Finite Automata and Formal Languages

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Regular Expressions [1] Regular Expressions. Regular expressions can be seen as a system of notations for denoting ɛ-nfa

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Regular Expression Unit 1 chapter 3. Unit 1: Chapter 3

Regular Expressions. Definitions Equivalence to Finite Automata

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017

CS 154, Lecture 3: DFA NFA, Regular Expressions

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

COMP4141 Theory of Computation

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Regular Expressions and Language Properties

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

CS 154. Finite Automata, Nondeterminism, Regular Expressions

UNIT-III REGULAR LANGUAGES

DFA to Regular Expressions

Formal Languages. We ll use the English language as a running example.

Algorithms for NLP

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Finite Automata and Formal Languages TMV026/DIT321 LP Testing Equivalence of Regular Languages

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

Automata and Formal Languages - CM0081 Finite Automata and Regular Expressions

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Computational Models Lecture 2 1

CS 121, Section 2. Week of September 16, 2013

This Lecture will Cover...

Kleene Algebras and Algebraic Path Problems

Sri vidya college of engineering and technology

Finite Automata and Regular Languages

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

CS 455/555: Finite automata

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

What we have done so far

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

CMSC 330: Organization of Programming Languages

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Automata and Formal Languages - CM0081 Non-Deterministic Finite Automata

Author: Vivek Kulkarni ( )

Notes on Pumping Lemma

Computational Models Lecture 2 1

Extended transition function of a DFA

Formal Languages. We ll use the English language as a running example.

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Nondeterministic Finite Automata. Nondeterminism Subset Construction

Foundations of

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

Closure under the Regular Operations

UNIT II REGULAR LANGUAGES

Automata and Formal Languages - CM0081 Algebraic Laws for Regular Expressions

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

jflap demo Regular expressions Pumping lemma Turing Machines Sections 12.4 and 12.5 in the text

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

Lecture 2: Regular Expression

Finite Automata. BİL405 - Automata Theory and Formal Languages 1

Non-deterministic Finite Automata (NFAs)

Equivalence of DFAs and NFAs

Theory of Computation (I) Yijia Chen Fudan University

Lecture 1: Finite State Automaton

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Theory of Languages and Automata

Finite Automata Theory and Formal Languages TMV027/DIT321 LP Recap: Logic, Sets, Relations, Functions

Theory of Computation

Computational Models - Lecture 3

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

Finite Automata and Regular languages

CS 322 D: Formal languages and automata theory

Formal Languages, Automata and Models of Computation

Automata: a short introduction

Deterministic Finite Automaton (DFA)

Lecture 17: Language Recognition

TDDD65 Introduction to the Theory of Computation

Recap DFA,NFA, DTM. Slides by Prof. Debasis Mitra, FIT.

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

Lecture 3: Finite Automata

Recap from Last Time

Computational Models - Lecture 4

Computer Sciences Department

CS21 Decidability and Tractability

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Lecture 3: Finite Automata

Intro to Theory of Computation

Theory of computation: initial remarks (Chapter 11)

Regular expressions and Kleene s theorem

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

Parsing Regular Expressions and Regular Grammars

Computational Models: Class 3

Transcription:

Finite Automata and Formal Languages TMV26/DIT32 LP4 22 Lecture 7 Ana Bove March 27th 22 Overview of today s lecture: Regular Expressions From FA to RE Regular Expressions Regular expressions (RE) are an algebraic way to denote languages. Given a RE R, it defines the language L(R). We will show that RE are as expressive as DFA and hence, they define all and only the regular languages. RE can also be seen as a declarative way to express the strings we want to accept and serve as input language for certain systems. Example: grep command in UNIX (K. Thompson). (Note: UNIX regular expressions are not exactly as the RE we will study in the course.) March 27th 22, Lecture 7 TMV26/DIT32 /26

Inductive Definition of Regular Expressions Definition: Given an alphabet Σ, we can inductively define the regular expressions over Σ as: Base cases: The constants and ǫ are RE; If a Σ then a is a RE. Inductive steps: Given the RE R and S, we define the following RE: R + S and RS are RE; R is RE. The precedence of the operands is the following: The closure operator has the highest precedence; Next comes concatenation; Finally, comes the operator +; We use parentheses (,) to change the precedences. March 27th 22, Lecture 7 TMV26/DIT32 2/26 Another Way to Define the Regular Expressions A nicer way to define the regular expressions is by giving the following BNF (Backus-Naur Form), for a Σ: R ::= ǫ a R + R RR R alternatively R, S ::= ǫ a R + S RS R Question: Can you guess their meaning? Note: BNF is a way to declare the syntax of a language. It is very useful when describing context-free grammars and in particular the syntax of most programming languages. March 27th 22, Lecture 7 TMV26/DIT32 3/26

Functional Representation of Regular Expressions data RExp a = Empty Epsilon Atom a Plus (RExp a) (RExp a) Concat (RExp a) (RExp a) Star (RExp a) For example the expression b + (bc) is given as Plus (Atom "b") (Star (Concat (Atom "b") (Atom "c"))) March 27th 22, Lecture 7 TMV26/DIT32 4/26 Recall: Some Operations on Languages (Lecture 3) Definition: Given L, L and L 2 languages then we define the following languages: Union: L L 2 = {x x L or x L 2 } Intersection: L L 2 = {x x L and x L 2 } Concatenation: L L 2 = {x x 2 x L, x 2 L 2 } Closure: L = n Æ Ln where L = {ǫ}, L n+ = L n L. Note: We have then that = {ǫ} and L = L L L 2... = {ǫ} {x... x n n >, x i L} Notation: L + = L L 2 L 3... and L? = L {ǫ}. March 27th 22, Lecture 7 TMV26/DIT32 5/26

Language Defined by the Regular Expressions Definition: The language defined by a regular expression is defined by recursion on the expression: Base cases: L( ) = ; Recursive cases: L(ǫ) = {ǫ}; Given a Σ, L(a) = {a}. L(R + S) = L(R) L(S); L(RS) = L(R)L(S); L(R ) = L(R). Note: x L(R) iff x is generated/accepted by R. Notation: We write x R or x L(R) indistinctly. March 27th 22, Lecture 7 TMV26/DIT32 6/26 Example of Regular Expressions Let Σ = {, }: () + ( + ) () + ((( )) + ) () + (ǫ + )() (ǫ + ) () + () + () + () What do they mean? Are there expressions that are equivalent? March 27th 22, Lecture 7 TMV26/DIT32 7/26

Algebraic Laws for Regular Expressions The following equalities hold for any RE R, S and T : Associativity: R + (S + T ) = (R + S) + T and R(ST ) = (RS)T ; Commutativity: R + S = S + R; In general, RS SR; Distributivity: R(S + T ) = RS + RT and (S + T )R = SR + TR; Identity: R + = + R = R and Rǫ = ǫr = R; Annihilator: R = R = ; Idempotent: R + R = R; = ǫ = ǫ; R? = ǫ + R; R + = RR = R R; R = (R ) = R R = ǫ + R +. Note: Compare this slide with slide 9 of lecture 3. March 27th 22, Lecture 7 TMV26/DIT32 8/26 Algebraic Laws for Regular Expressions Other useful laws to simplify regular expressions are: Shifting rule: R(SR) = (RS) R Denesting rule: (R S) R = (R + S) Note: By the shifting rule we also get R (SR ) = (R + S) Variation of the denesting rule: (R S) = ǫ + (R + S) S March 27th 22, Lecture 7 TMV26/DIT32 9/26

Example: Proving Equalities Using the Algebraic Laws Example: A proof that a b(c + da b) = (a + bc d) bc : a b(c + da b) = a b(c da b) c by denesting (R = c, S = da b) a b(c da b) c = (a bc d) a bc by shifting (R = a b, S = c d) (a bc d) a bc = (a + bc d) bc by denesting (R = a, S = bc d) Example: The set of all words with no substring of more than two adjacent s is ( + + ) (ǫ + + ). Now, ( + + ) (ǫ + + ) = ((ǫ + )(ǫ + )) (ǫ + )(ǫ + ) = (ǫ + )(ǫ + )((ǫ + )(ǫ + )) by shifting = (ǫ + + )( + + ) Then ( + + ) (ǫ + + ) = (ǫ + + )( + + ) March 27th 22, Lecture 7 TMV26/DIT32 /26 Equality of Regular Expressions Remember that RE are a way to denote languages. Then, for RE R and S, R = S actually means L(R) = L(S). Hence we can prove the equality of RE in the same way we can prove the equality of languages. Example: Let us prove that R = R R. Let L = L(R). L L L since ǫ L. Conversely, if L L L then x = x x 2 with x L and x 2 L. If x = ǫ or x 2 = ǫ then it is clear that x L. Otherwise x = u u 2... u n with u i L and x 2 = v v 2... v m with v j L. Then x = x x 2 = u u 2... u n v v 2... v m is in L. March 27th 22, Lecture 7 TMV26/DIT32 /26

Proving Algebraic Laws for Regular Expressions Given the RE R and S we can prove the law R = S as follows: Convert R and S into concrete regular expressions C and D, respectively, by replacing each variable in the RE R and S by (different) concrete symbols. Example: R(SR) = (RS) R can be converted into a(ba) = (ab) a. 2 Prove or disprove whether L(C) = L(D). If L(C) = L(D) then R = S is a true law, otherwise it is not. Theorem: The above procedure correctly identifies the true laws for RE. Proof: See theorems 3.4 and 3.3 in pages 2 and 2 respectively. Example: Proving the shifting law was (somehow) one of the exercises in assignment : prove that for all n, a(ba) n = (ab) n a. March 27th 22, Lecture 7 TMV26/DIT32 2/26 Example: Proving the Denesting Rule We can state (R S) R = (R + S) by proving L((a b) a ) = L((a + b) ): : Let x (a b) a, then x = vw with v (a b) and w a. By induction on v. If v = ǫ we are done. Otherwise v = av or v = bv. Observe that in both cases v (a b) hence by IH v w (a + b) and so is vw. : Let x (a + b). By induction on x. If x = ǫ then we are done. Otherwise x = x a or x = x b and x (a + b). By IH x (a b) a and then x = vw with v (a b) and w a. If x a = v(wa) (a b) a since v (a b) and (wa) a. If x b = (v(wb))ǫ (a b) a since v(wb) (a b) and ǫ a. March 27th 22, Lecture 7 TMV26/DIT32 3/26

Regular Languages and Regular Expressions Theorem: If L is a regular language then there exists a regular expression R such that L = L(R). Proof: Recall that each regular language has an automata that recognises it. We shall construct a regular expression from such automata. The book shows 2 ways of constructing a regular expression from an automata. Computing R (k) ij (section 3.2.): too expensive, produces big and complicated regular expressions; Eliminating states (section 3.2.2). We will also see how to do this by solving a linear equation system using Arden s Lemma. March 27th 22, Lecture 7 TMV26/DIT32 4/26 From FA to RE: Eliminating States in an Automaton A This method of constructing a RE from a FA involves eliminating states. When we eliminate the state s, all the paths that went through s do not longer exists! To preserve the language of the automaton we must include, on an arc that goes directly from q to p, the labels of the paths that went from q to p passing through s. Labels now are not just symbols but (possible an infinite number of) strings: hence we will use RE as labels. March 27th 22, Lecture 7 TMV26/DIT32 5/26

Eliminating State s in A R q R m Q S P p If an arc does not exist in A, then it is labelled here. s R k Q k P m For simplification, we assume the q s are different from the p s. q k p m R km March 27th 22, Lecture 7 TMV26/DIT32 6/26 Eliminating State s in A q R + Q S P p R k + Q k S P R m + Q S P m q k R km + Q k S P m p m March 27th 22, Lecture 7 TMV26/DIT32 7/26

Eliminating States in A For each accepting state q we proceed as before until we have only q and q left. For each accepting state q we have 2 cases: q q or q = q. If q q: R U q S T q The expression is (R + SU T ) SU If q = q: R q The expression is R The final expression is the sum of the expressions derived for each final state. March 27th 22, Lecture 7 TMV26/DIT32 8/26 Example: Regular Expression Representing Gilbreath s Principle Recall: q q 3 q 2 q 4 q 5 q q q q 3 q 4 q 5 q 2 q 4 q q 3 q 5 Observe: Eliminating q is trivial. Eliminating q q 3 and q 2 q 4 is also easy. March 27th 22, Lecture 7 TMV26/DIT32 9/26

Example: Regular Expression Representing Gilbreath s Principle After eliminating q, q q 3 and q 2 q 4 we get: q 2 q 4 q 5 q + q q 3 q 4 q 5 q q 3 q 5 RE when final state is q q 3 q 4 q 5 : ( + )( + ) = ( + ) + RE when final state is q 2 q 4 q 5 : ( + )() (() ) RE when final state is q q 3 q 5 : ( + )() (() ) March 27th 22, Lecture 7 TMV26/DIT32 2/26 Example: Regular Expression Representing Gilbreath s Principle The final RE is the sum of the 3 previous expressions. Let us first do some simplifications. ( + )() (() ) = ( + )() (() ) by shifting = ( + )( + ) by the shifted-denesting rule = ( + ) + Similarly ( + )() (() ) = ( + ) +. Hence the final RE is ( + ) + + ( + ) + + ( + ) + which is equivalent to ( + ) + (ǫ + + ) March 27th 22, Lecture 7 TMV26/DIT32 2/26

From FA to RE: Linear Equation System To any automaton we associate a system of equations such that the solution will be regular expressions. At the end we get a regular expression for the language recognised by the automaton. This works for DFA, NFA and ǫ-nfa. To every state q i we associate a variable E i. Each E i represents the set {x Σ ˆδ(q i, x) F } (for DFA). Then E represents the set of words accepted by the FA. The solution to the linear system of equations associates a RE to each variable E i. Then the solution for E is the RE generating the same language that is accepted by the FA. March 27th 22, Lecture 7 TMV26/DIT32 22/26 Constructing the Linear Equation System Consider a state q i and all the transactions coming out if it: q Then we have the equation E i = a i E +... + a ij E j +... + a in E n a i If E i is final then we add ǫ q i a ij q j E i = ǫ + a i E +... + a ij E j +... + a in E n a in If there is no arrow coming out of q i then E i = if q i is not final q n or E i = ǫ if q i is final March 27th 22, Lecture 7 TMV26/DIT32 23/26

Solving the Linear Equation System Lemma: (Arden) A solution to X = RX + S is X = R S. Furthermore, if ǫ / L(R) then this is the only solution to the equation X = RX + S. Proof: We have that R = RR + ǫ. Hence R S = RR S + S and then X = R S is a solution to X = RX + S. One should also prove that: Any solution to X = RX + S contains at least R S; If ǫ / L(R) then R S is the only solution to the equation X = RX + S (that is, no solution is bigger than R S). Note: See for example Theorem 6., pages 85 86 of Theory of Finite Automata, with an introduction to formal languages by John Carroll and Darrell Long, Prentice-Hall International Editions. March 27th 22, Lecture 7 TMV26/DIT32 24/26 Example: Regular Expression Representing Gilbreath s Principle We obtain the following system of equations (see slide 9): E = E 3 + E 24 E 345 = ǫ + E 245 + E 35 E 3 = E 345 + E q E 245 = ǫ + E 345 E 24 = E 345 + E q E 35 = ǫ + E 345 E q = This can be simplified to: E = E 3 + E 24 E 345 = ǫ + E 245 + E 35 E 3 = E 345 E 245 = ǫ + E 345 E 24 = E 345 E 35 = ǫ + E 345 March 27th 22, Lecture 7 TMV26/DIT32 25/26

Example: Regular Expression Representing Gilbreath s Principle And further to: E = ( + )E 345 E 345 = ( + )E 345 + ǫ + + Then a solution to E 345 is ( + ) (ǫ + + ) and the RE which is the solution to the problem is ( + )( + ) (ǫ + + ) or ( + ) + (ǫ + + ) March 27th 22, Lecture 7 TMV26/DIT32 26/26