Regular Expressions [1] Regular Expressions. Regular expressions can be seen as a system of notations for denoting ɛ-nfa

Similar documents
Warshall s algorithm

Finite Automata and Formal Languages

Finite Automata and Formal Languages TMV026/DIT321 LP4 2012

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

DFA to Regular Expressions

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017

UNIT II REGULAR LANGUAGES

Regular Expressions. Definitions Equivalence to Finite Automata

CS 154, Lecture 3: DFA NFA, Regular Expressions

Theory of Computation

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata and Regular Languages

Author: Vivek Kulkarni ( )

1 Alphabets and Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

COMP4141 Theory of Computation

Closure under the Regular Operations

In English, there are at least three different types of entities: letters, words, sentences.

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Sri vidya college of engineering and technology

CS 455/555: Finite automata

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

Uses of finite automata

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

Regular expressions and Kleene s theorem

Automata: a short introduction

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

CS 121, Section 2. Week of September 16, 2013

CS 133 : Automata Theory and Computability

Computational Models - Lecture 4

Regular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza

Section 1.3 Ordered Structures

Before we show how languages can be proven not regular, first, how would we show a language is regular?

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

3515ICT: Theory of Computation. Regular languages

Lecture 2: Connecting the Three Models

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS375 Midterm Exam Solution Set (Fall 2017)

FS Properties and FSTs

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

Automata and Formal Languages - CM0081 Algebraic Laws for Regular Expressions

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Kleene Algebras and Algebraic Path Problems

Mathematical terminology

More on Finite Automata and Regular Languages. (NTU EE) Regular Languages Fall / 41

This Lecture will Cover...

Closure Properties of Regular Languages

Computational Theory

Formal Languages, Automata and Models of Computation

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

Lecture 4: More on Regexps, Non-Regular Languages

Formal Languages. We ll use the English language as a running example.

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

CSCI 340: Computational Models. Regular Expressions. Department of Computer Science

Formal Languages. We ll use the English language as a running example.

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Computational Models - Lecture 3 1

Formal Languages and Automata

CS 275 Automata and Formal Language Theory

Automata and Formal Languages - CM0081 Finite Automata and Regular Expressions

Finite Automata and Regular languages

Equivalence of DFAs and NFAs

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

False. They are the same language.

Properties of Regular Languages (2015/10/15)

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

acs-04: Regular Languages Regular Languages Andreas Karwath & Malte Helmert Informatik Theorie II (A) WS2009/10

Regular expressions and Kleene s theorem

CSE 105 THEORY OF COMPUTATION

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

CMSC 330: Organization of Programming Languages

UNIT-III REGULAR LANGUAGES

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Regular Expressions and Language Properties

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Recap from Last Time

Inf2A: Converting from NFAs to DFAs and Closure Properties

CS 322 D: Formal languages and automata theory

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Theory Of Computation UNIT-II

Theory of Computation (Classroom Practice Booklet Solutions)

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

The Pumping Lemma and Closure Properties

Name: Student ID: Instructions:

Equivalence of Regular Expressions and FSMs

NOTES ON AUTOMATA. Date: April 29,

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

CMPSCI 250: Introduction to Computation. Lecture #23: Finishing Kleene s Theorem David Mix Barrington 25 April 2013

Deterministic Finite Automaton (DFA)

Closure under the Regular Operations

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

Subset construction. We have defined for a DFA L(A) = {x Σ ˆδ(q 0, x) F } and for A NFA. For any NFA A we can build a DFA A D such that L(A) = L(A D )

Foundations of

Transcription:

Regular Expressions [1] Regular Expressions Regular expressions can be seen as a system of notations for denoting ɛ-nfa They form an algebraic representation of ɛ-nfa algebraic : expressions with equations such as E 1 +E 2 = E 2 +E 1 E(E 1 + E 2 ) = EE 1 + EE 2 Each regular expression E represents also a language L(E) Very convenient for representing pattern in documents (K. Thompson) 1

Regular Expressions [2] Regular Expressions: Abstract Syntax Given an alphabet Σ the regular expressions are defined by the following BNF (Backus-Naur Form) E ::= ɛ a E + E E EE This defines the abstract syntax of regular expressions to be contrasted with the concrete syntax (how we write regular expressions; see 3.1.3) 2

Regular Expressions [3] Concrete syntax 01 + 1 means (0(1 )) + 1 (01) + 1 is a different regular expression 0(1 + 1) yet another one 3

Regular Expressions [4] Regular Expressions: Abstract Syntax Notice that there is no intersection operation there is no complement operation Sometimes there are added (like in the Brzozozski algorithm that we shall explain later) 4

Regular Expressions [5] Regular expressions in functional programming data Reg a = Empty Epsilon Atom a Plus (Reg a) (Reg a) Concat (Reg a) (Reg a) Star (Reg a) For instance Plus (Atom "b") (Star (Concat (Atom "b") (Atom "c"))) is written b + (bc). 5

Regular Expressions [6] Regular Expressions: Examples If Σ = {a, b, c} The expressions (ab) represents the language {ɛ, ab, abab, ababab,... } The expression (a + b) represents the words built only with a and b. The expression a + b represents the set of strings with only as or with only bs (and ɛ is possible) The expression (aaa) represents the words built only with a, with a length divisible by 3 6

Regular Expressions [7] Regular Expressions: Examples If Σ = {0, 1} (ɛ + 1)(01) (ɛ + 0) is the set of strings that alternate 0 s and 1 s Another expression for the same language is (01) + 1(01) + (01) 0 + 1(01) 0 7

Regular Expressions [8] Some Operations on Languages Three operations 1. union L 1 L 2 of two languages L 1 and L 2 2. concatenation L 1 L 2 this is the set of all words x 1 x 2 with x i L i. If L 1 or L 2 is this is empty 3. closure L of a language; L is the union of ɛ and all words x 1... x n with x i L 8

Regular Expressions [9] Some Operations on Languages Definition: L 0 = {ɛ}, L n+1 = L n L Notice that = {ɛ} and L = L 0 L 1 L 2 = n N Ln 9

Regular Expressions [10] Semantics of regular expressions This is defined by induction on the abstract syntax: x L(E) iff x is accepted by E 1. L( ) =, L(ɛ) = {ɛ} 2. L(a) = {a} if a Σ 3. L(E 1 + E 2 ) = L(E 1 ) L(E 2 ) 4. L(E 1 E 2 ) = L(E 1 )L(E 2 ) 5. L(E ) = L(E) 10

Regular Expressions [11] Regular Languages and Regular Expressions Theorem: If L is a regular language there exists a regular expression E such that L = L(E). We prove this in the following way. To any automaton we associate a system of equations (the solution should be regular expressions) We solve this system like we solve a linear equation system using Arden s Lemma At the end we get a regular expression for the language recognised by the automaton. This works for DFA, NFA, ɛ-nfa 11

Regular Expressions [12] Regular Languages and Regular Expressions For the automata with accepting states C and D and defined by A.0 = {A, B}, A.1 = B, B.0 = B.1 = C, C.0 = C.1 = D We get the system E A = (0 + 1)E A + 1E B E B = (0 + 1)E C E C = ɛ + (0 + 1)E D E D = ɛ where E S = {w Σ S.w F } 12

Regular Expressions [13] Arden s Lemma Arden s Lemma: A solution of x = Rx + S is x = R S. Furthermore, if ɛ / L(R) then this is the only solution of the equation x = Rx + S. We have R = RR + ɛ and so R S = RR S + S So x = R S is a solution of x = Rx + S 13

Regular Expressions [14] Arden s Lemma For the system E 1 = be 2 E 2 = ae 1 + be 3 E 3 = ɛ + be 1 we get E 1 = be 2, E 3 = ɛ + bbe 2 and then E 2 = (ab + bbb)e 2 + b and hence E 2 = (ab + bbb) b and E 1 = b(ab + bbb) b This is the same as the method described in 3.2.2 but it is expressed in the language of equations and eliminating variables 14

Regular Expressions [15] Regular Languages and Regular Expressions We can find a solution of the original system by eliminating states E A = (0 + 1)E A + 1E B E B = (0 + 1)E C E C = ɛ + (0 + 1)E D E D = ɛ in the following way E D = ɛ, E C = ɛ + 0 + 1, E B = 0 + 1 + (0 + 1) 2 and E A = (0 + 1) (10 + 11 + 1(0 + 1) 2 ) 15

Regular Expressions [16] Regular Languages and Regular Expressions How to remember the solution of x = Rx + S? Notice that we have, if x = Rx + S? x = Rx + S = R(Rx + S) + S = R 2 x + RS + S and so x = R(R 2 x + RS + S) + S = R 3 x + R 2 S + RS + S and in general x = R n+1 x + (R n + + R + ɛ)s 16

Regular Expressions [17] Regular Languages and Regular Expressions The result depends on the way we solve the system For X = ax + by, Y = ɛ + cy + dx If we eliminate X first we get X = a b(c + da b) If we eliminate Y first we get X = (a + bc d) bc Hence a b(c + da b) = (a + bc d) bc! 17

Regular Expressions [18] Elimination of states The books present two other methods. The first method is similar to Warshall s algorithm (see wikipedia) The second method is by elimination of states, and is in fact the same as the method of equations that I have presented (even if it does not look so similar at first) 18

Regular Expressions [19] Elimination of states There is only one formula needed when we eliminate the state k E ij = E ij + E ik (E kk ) E kj A nice trick (which is not in the book) is to add one extra initial state and one extra final state 19

Regular Expressions [20] Algorithm on regular expressions Test if a regular expression denotes the empty language data Reg a = Empty Epsilon Atom a Plus (Reg a) (Reg a) Concat (Reg a) (Reg a) Star (Reg a) isempty Empty = True isempty (Plus e1 e2) = isempty e1 && isempty e2 isempty (Concat e1 e2) = isempty e1 isempty e2 isempty _ = False 20

Regular Expressions [21] Algorithm on regular expressions Test if a regular expression contains ɛ hasepsilon :: Reg a -> Bool hasepsilon Epsilon = True hasepsilon (Star _) = True hasepsilon (Plus e1 e2) = hasepsilon e1 hasepsilon e2 hasepsilon (Concat e1 e2) = hasepsilon e1 && hasepsilon e2 hasepsilon _ = False 21

Regular Expressions [22] Algorithm on regular expressions Test if L(e) {ɛ} atmosteps :: Reg a -> Bool atmosteps Empty = True atmosteps Epsilon = True atmosteps (Star e) = atmosteps e atmosteps (Plus e1 e2) = atmosteps e1 && atmosteps e2 atmosteps (Concat e1 e2) = Empty e1 Empty e2 (atmosteps e1 && atmosteps e2) atmosteps _ = False 22

Regular Expressions [23] Algorithm on regular expressions Test if a regular expression denotes an infinite language infinite :: Reg a -> Bool infinite (Star e) = not (atmostexp e) infinite (Plus e1 e2) = infinite e1 infinite e2 infinite (Concat e1 e2) = (infinite e1 && notempty e2) (notempty e1 && infinite e2) infinite _ = False notempty e = not (isempty e) 23

Regular Expressions [24] Derivative of a regular expression If L is a language L Σ and a Σ we define the language L/a (derivative of L by a) by L/a = {x Σ ax L} We give an algorithm computing E/a such that L(E/a) = L(E)/a using the equivalence ax L iff x L/a 24

Regular Expressions [25] Derivative of a regular expression Examples (abab + abba)/a = bab + bba (abab + abba)/b = (a b)/a = (aa b + b)/a = a b ((ab) a)/a = (ab(ab) a + a)/a = b(ab) a + ɛ 25

Regular Expressions [26] Derivative of a regular expression der :: Eq a => a -> Reg a -> Reg a der b (Atom b1) = if b == b1 then Epsilon else Empty der b (Plus e1 e2) = Plus (der b e1) (der b e2) der b (Concat e1 e2) hasepsilon e1 = Plus (Concat (der b e1) e2) (der b e2) der b (Concat e1 e2) = Concat (der b e1) e2 der b (Star e) = Concat (der b e) (Star e) der b _ = Empty 26

Regular Expressions [27] Application Is a given word in the language defined by a regular expression E? isin :: Eq a => [a] -> Reg a -> Bool isin [] e = hasepsilon e isin (a:as) e = isin as (der a e) This is essentially Ken Thompson s algorithm This works if we add intersection and complement 27

Regular Expressions [28] Application: extended regular expressions data Reg a = Empty Epsilon Atom a Plus (Reg a) (Reg a) Concat (Reg a) (Reg a) Star (Reg a) Inter (Reg a) (Reg a) Compl (Reg a) hasepsilon Epsilon = True hasepsilon (Star _) = True hasepsilon (Inter e1 e2) = hasepsilon e1 && hasepsilon e2 hasepsilon (Compl e) = not (hasepsilon e) hasepsilon (Plus e1 e2) = hasepsilon e1 hasepsilon e2 hasepsilon (Concat e1 e2) = hasepsilon e1 && hasepsilon e2 hasepsilon _ = False 28

Regular Expressions [29] Application: extended regular expressions der :: Eq a => a -> Reg a -> Reg a der b (Atom b1) = if b == b1 then Epsilon else Empty der b (Plus e1 e2) = Plus (der b e1) (der b e2) der b (Inter e1 e2) = Inter (der b e1) (der b e2) der b (Compl e) = Compl (der b e) der b (Concat e1 e2) hasepsilon e1 = Plus (Concat (der b e1) e2) (der b e2) der b (Concat e1 e2) = Concat (der b e1) e2 der b (Star e) = Concat (der b e) (Star e) der b _ = Empty 29

Regular Expressions [30] Application Is a given word in the language defined by a regular expression E? algorithm is the same The isin :: Eq a => [a] -> Reg -> Bool isin [] e = hasepsilon e isin (a:as) e = isin as (der a e) 30

Regular Expressions [31] Derivatives Example: x = abba and E = abba + abab The algorithm works with generalised regular expressions x = 1010 and E = (01 + 10) (101) 31

Regular Expressions [32] Regular Languages and Regular Expressions Theorem: If E is a regular expression then L(E) is a regular language We prove this by induction on E. The main steps are to prove that if L 1, L 2 are regular then so is L 1 L 2 and L 1 L 2 if L is regular then so is L 32

Regular Expressions [33] Regular Languages and Regular Expressions At the end we shall get an ɛ-nfa that we know how to transform into a DFA by the subset construction There is a beautiful algorithm that builds directly a DFA from a regular expression, due to Brzozozski, and we present also this algorithm 33

Regular Expressions [34] Regular Languages and Regular Expressions Lemma: If L 1, L 2 are regular then so is L 1 L 2 We have seen a proof of this with the product construction. This is easy also if L 1 = L(A 1 ), L 2 = L(A 2 ) and A 1, A 2 are ɛ-nfas Lemma: If L 1, L 2 are regular then so is L 1 L 2 Lemma: If L is regular then so is L 34

Regular Expressions [35] Regular Languages and Regular Expressions This can be seen as an algorithm transforming a regular expression E to an ɛ-nfa Example: we transform a + ab to an ɛ-nfa As you can see on this example the automaton we obtain is quite complex A priori even more complex if we want a DFA See also Figure 3.18 35

Regular Expressions [36] Brzozozski s algorithm The idea is to use derivatives as states For instance if E = a + ab we have E/a = a + b, E/b = E/aa = a, E/ab = ɛ, E/ba = E/bb = E/b E/aaa = E/aa, E/aab = E/aba = E/abb = E/b We get a DFA with 5 states. The accepting states are the ones that contain ɛ 36

Regular Expressions [37] Brzozozski s algorithm Other examples E = (a + ɛ) E = F 10F where F = (0 + 1) E = F 1(0 + 1) 37

Regular Expressions [38] Brzozozski s algorithm Furthermore this algorithm works even with extended regular expressions that admit intersections and complements For instance E = a(ba) (ab) a E/a = (ba) (b(ab) a + ɛ), E/b = E/aa =, E/ab = a(ab) (ab) a = E and none of these expressions contains ɛ, so E =! 38

Regular Expressions [39] Brzozozski s algorithm Example: We can prove in this way (01 + 10) (101) = ɛ More generally we get an algorithm for testing E = F : we build a tree with nodes E/x, F/x for finite x Examples: E = (01 + 10), F = (101) E = (10) 1, F = 1(01) 39

Regular Expressions [40] Algebraic Laws for Languages L 1 L 2 = L 2 L 1 Union is commutative Note: Concatenation is not commutative we can find L 1, L 2 such that L 1 L 2 L 2 L 1 L{ɛ} = {ɛ}l = L L = L = L(M N) = LM LN (M N)L = ML NL 40

Regular Expressions [41] Algebraic Laws for Languages = {ɛ} = {ɛ} L + = LL = L L L? = L {ɛ} (L ) = L 41

Regular Expressions [42] Algebraic Laws for Regular Expressions We write E = F for L(E) = L(F ) For instance (E 1 + E 2 )E = E 1 E + E 2 E follows from (L 1 L 2 )L = L 1 L L 2 L by taking L i = L(E i ), L = L(E) Similarly (E ) = E 42

Regular Expressions [43] Algebraic Laws for Regular Expressions E + (F + G) = (E + F ) + G, E + F = F + E, E + E = E, E + 0 = E E(F G) = (EF )G, E0 = 0E = 0, Eɛ = ɛe = E E(F + G) = EF + EG, (F + G)E = F E + GE ɛ + EE = E = ɛ + E E 43

Regular Expressions [44] Algebraic Laws for Regular Expressions We have also E = E E = (E ) E = (EE) + E(EE) 44

Regular Expressions [45] Algebraic Laws for Regular Expressions How can one prove equalities between regular expressions? In usual algebra, we can simplify an algebraic expression by rewriting (x + y)(x + z) xx + yx + xz + yz For regular expressions, there is no such way to prove equalities. There is not even a complete finite set of equations. 45

Regular Expressions [46] Algebraic Laws for Regular Expressions Example: L L L since ɛ L Conversely if x L L then x = x 1 x 2 with x 1 L and x 2 L x L is clear if x 1 = ɛ or x 2 = ɛ. Otherwise So x 1 = u 1... u n with u i L and x 2 = v 1... v m with v j L Then x = x 1 x 2 = u 1... u n v 1... v m is in L 46

Regular Expressions [47] Algebraic Laws for Regular Expressions Two laws that are useful to simplify regular expressions Shifting rule Denesting rule E(F E) = (EF ) E (E F ) E = (E + F ) 47

Regular Expressions [48] Variation of the denesting rule One has also (E F ) = ɛ + (E + F ) F and this represents the words empty or finishing with F 48

Regular Expressions [49] Algebraic Laws for Regular Expressions Example: a b(c + da b) = a b(c da b) c by denesting a b(c da b) c = (a bc d) a bc by shifting (a bc d) a bc = (a + bc d) bc by denesting. Hence a b(c + da b) = (a + bc d) bc 49

Regular Expressions [50] Algebraic Laws for Regular Expressions Examples: 10?0? = 1 + 10 + 100 (1 + 01 + 001) (ɛ + 0 + 00) = ((ɛ + 0)(ɛ + 0)1) (ɛ + 0)(ɛ + 0) is the same as (ɛ + 0)(ɛ + 0)(1(ɛ + 0)(ɛ + 0)) = (ɛ + 0 + 00)(1 + 10 + 100) Set of all words with no substring of more than two adjacent 0 s 50

Regular Expressions [51] Proving by induction Let Σ be {a, b} Lemma: For all n we have a(ba) n = (ab) n a Proof: by induction on n Theorem: a(ba) = (ab) a Similarly we can prove (a + b) = (a b) a 51

Regular Expressions [52] Complement of a(n ordinary) regular expression For building the complement of a regular expression, or the intersection of two regular expressions, we can use NFA/DFA For instance to build E such that L(E) = {0, 1} {0} we first build a DFA for the expression 0, then the complement DFA. We can compute E from this complement DFA. We get for instance ɛ + 1(0 + 1) + 0(0 + 1) + 52