Regular expressions and Kleene s theorem

Similar documents
Regular expressions and Kleene s theorem

Regular expressions and Kleene s theorem

Constructions on Finite Automata

Constructions on Finite Automata

Deterministic finite Automata

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

Closure under the Regular Operations

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Nondeterministic finite automata

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Computational Models Lecture 2 1

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Deterministic Finite Automaton (DFA)

Nondeterministic Finite Automata. Nondeterminism Subset Construction

Computational Models Lecture 2 1

Lecture 3: Nondeterministic Finite Automata

Nondeterministic Finite Automata

CS 121, Section 2. Week of September 16, 2013

Nondeterminism. September 7, Nondeterminism

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Lecture 2: Connecting the Three Models

Before we show how languages can be proven not regular, first, how would we show a language is regular?

CMPSCI 250: Introduction to Computation. Lecture #23: Finishing Kleene s Theorem David Mix Barrington 25 April 2013

Equivalence of DFAs and NFAs

CS 154, Lecture 3: DFA NFA, Regular Expressions

Automata: a short introduction

Inf2A: Converting from NFAs to DFAs and Closure Properties

Finite Automata. BİL405 - Automata Theory and Formal Languages 1

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs

3515ICT: Theory of Computation. Regular languages

Lecture 1: Finite State Automaton

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Outline. Nondetermistic Finite Automata. Transition diagrams. A finite automaton is a 5-tuple (Q, Σ,δ,q 0,F)

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Computational Models - Lecture 1 1

NFA and regex. the Boolean algebra of languages. regular expressions. Informatics 1 School of Informatics, University of Edinburgh

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

UNIT-III REGULAR LANGUAGES

Uses of finite automata

COMP4141 Theory of Computation

Computational Theory

1 Alphabets and Languages

Chapter Five: Nondeterministic Finite Automata

Kleene Algebras and Algebraic Path Problems

CSE 311 Lecture 25: Relating NFAs, DFAs, and Regular Expressions. Emina Torlak and Kevin Zatloukal

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

Foundations of

CS21 Decidability and Tractability

Non-Deterministic Finite Automata

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

Automata and Formal Languages - CM0081 Non-Deterministic Finite Automata

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Nondeterministic Finite Automata

Regular Language Equivalence and DFA Minimization. Equivalence of Two Regular Languages DFA Minimization

Johns Hopkins Math Tournament Proof Round: Automata

Theory of computation: initial remarks (Chapter 11)

Lecture 7 Properties of regular languages

Theory of Computation (I) Yijia Chen Fudan University

CSC236 Week 10. Larry Zhang

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Course 4 Finite Automata/Finite State Machines

Finite-State Machines (Automata) lecture 12

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Regular Expressions Kleene s Theorem Equation-based alternate construction. Regular Expressions. Deepak D Souza

Decision, Computation and Language

CSC173 Workshop: 13 Sept. Notes

Lecture 4 Nondeterministic Finite Accepters

September 7, Formal Definition of a Nondeterministic Finite Automaton

Finite Automata Part Two

Non-deterministic Finite Automata (NFAs)

Harvard CS 121 and CSCI E-207 Lecture 4: NFAs vs. DFAs, Closure Properties

CS 455/555: Finite automata

Exam 1 CSU 390 Theory of Computation Fall 2007

Computer Sciences Department

The Pumping Lemma: limitations of regular languages

Languages, regular languages, finite automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro Diniz

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Minimization of DFAs

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Classes of Boolean Functions

Theory of computation: initial remarks (Chapter 11)

CPS 220 Theory of Computation REGULAR LANGUAGES

Finite Automata and Regular Languages

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

Figure 1: NFA N. Figure 2: Equivalent DFA N obtained through function nfa2dfa

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

CS21 Decidability and Tractability

How do regular expressions work? CMSC 330: Organization of Programming Languages

TDDD65 Introduction to the Theory of Computation

Recitation 2 - Non Deterministic Finite Automata (NFA) and Regular OctoberExpressions

Chapter 5. Finite Automata

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Computational Models #1

CMPSCI 250: Introduction to Computation. Lecture #29: Proving Regular Language Identities David Mix Barrington 6 April 2012

Transcription:

Regular expressions and Kleene s theorem Informatics 2A: Lecture 5 Mary Cryan School of Informatics University of Edinburgh mcryan@inf.ed.ac.uk 26 September 2018 1 / 18

Finishing DFA minimization An algorithm for minimization More closure properties of regular languages Operations on languages ɛ-nfas Closure under concatenation and Kleene star Regular expressions Regular expressions 2 / 18

An algorithm for minimization First eliminate any unreachable states (easy). Then create a table of all possible pairs of states (p, q), initially unmarked. (E.g. a two-dimensional array of booleans, initially set to false.) We mark pairs (p, q) as and when we discover that p and q cannot be equivalent. 1. Start by marking all pairs (p, q) where p F and q F, or vice versa. 2. Look for unmarked pairs (p, q) such that for some u Σ, the pair (δ(p, u), δ(q, u)) is marked. Then mark (p, q). 3. Repeat step 2 until no such unmarked pairs remain. If (p, q) is still unmarked, can collapse p, q to a single state. 3 / 18

Why does this algorithm work? Let s say a string s separates states p, q if s takes us from p to an accepting state and from q to a rejecting state, or vice versa. Such an s is a reason for not merging p, q into a single state. We mark (p, q) when we find that there s a string separating p, q: If p F and q F, or vice versa, then ɛ separates p, q. Suppose we mark (p, q) because we ve found a previously marked pair (p, q ) where p a p and q a q for some a. If s is a separating string for p, q, then as separates p, q. We stop when there are no more pairs we can mark. If (p, q) remains unmarked, why are p, q equivalent? If s = a 1... a n were a string separating p, q, we d have a p = p 1 a 0 2 a p1 n pn 1 pn, a q = q 1 a 0 2 a q1 n qn 1 qn with just one of p n, q n accepting. So we d have marked (p n, q n ) in Round 0, (p n 1, q n 1 ) by Round 1,... and (p, q) by Round n. 4 / 18

Alternative: Brzozowski s minimization algorithm There s a surprising alternative algorithm for minimizing a DFA M = (Q, δ, s, F ) for a language L. Assume no unreachable states. Reverse the machine M: flip all the arrows, make F the set of start states, and make s the only accepting state. This gives an NFA N (not typically a DFA) which accepts L rev = {rev(s) s L}. Apply the subset construction to N, omitting unreachable states, to get a DFA P. It turns out that P is minimal for L rev (clever)! Now apply the same two steps again, starting from P. The result is a minimal DFA for (L rev ) rev = L. 5 / 18

Comparing Brzozowski and marking algorithms Both algorithms result in the same minimal DFA for a given DFA M (recall that there s a unique minimal DFA up to isomorphism.) In the worst case, Brzozowski s algorithm can take time O(2 n ) for a DFA with n states. The marking algorithm, as presented, runs within time O(kn 4 ), where k = Σ. (Can be improved further.) There are some practical cases where Brzozowski does better. Marking algorithm is probably easier to understand, and illustrates a common pattern (more examples later in course). 6 / 18

Improving determinization Now we have a minimization algorithm, the following improved determinization procedure is possible. To determinize an NFA M with n states: 1. Perform the subset construction on M to produce an equivalent DFA N with 2 n states. 2. Perform the minimization algorithm on N to produce a DFA Min(N) with 2 n states. Using this method we are guaranteed to produce the smallest possible DFA equivalent to M. In many cases this avoids the exponential state-space blow-up. In some cases, however, an exponential blow-up is unavoidable. 7 / 18

Question from lecture 4 Consider our example NFA over {0, 1}: 0,1 1 0,1 0,1 0,1 0,1 q0 q1 q2 q3 q4 q5 What is the number of states of the smallest DFA that recognises the same language? 8 / 18

Question from lecture 4 Consider our example NFA over {0, 1}: 0,1 1 0,1 0,1 0,1 0,1 q0 q1 q2 q3 q4 q5 What is the number of states of the smallest DFA that recognises the same language? Answer: The smallest DFA has 32 states. 8 / 18

Question from lecture 4 Consider our example NFA over {0, 1}: 0,1 1 0,1 0,1 0,1 0,1 q0 q1 q2 q3 q4 q5 What is the number of states of the smallest DFA that recognises the same language? Answer: The smallest DFA has 32 states. More generally, the smallest DFA for the language: {x Σ the n-th symbol from the end of x is 1} has 2 n states. Whereas, there is an NFA with n + 1 states. 8 / 18

Concatenation We write L 1.L 2 for the concatenation of languages L 1 and L 2, defined by: L 1.L 2 = {xy x L 1, y L 2 } For example, if L 1 = {aaa} and L 2 = {b, c} then L 1.L 2 is the language {aaab, aaac}. Later we will prove the following closure property. If L 1 and L 2 are regular languages then so is L 1.L 2. 9 / 18

Kleene star We write L for the Kleene star of the language L, defined by: L = {ɛ} L L.L L.L.L... For example, if L 3 = {aaa, b} then L 3 contains strings like aaaaaa, bbbbb, baaaaaabbaaa, etc. More precisely, L 3 contains all strings over {a, b} in which the letter a always appears in sequences of length some multiple of 3 Later we will prove the following closure property. If L is a regular language then so is L. 10 / 18

Exercise Consider the language over the alphabet {a, b, c} L = {x x starts with a and ends with c} Which of the following strings are valid for the language L.L? 1. abcabc 2. acacac 3. abcbcac 4. abcbacbc 11 / 18

Exercise Consider the language over the alphabet {a, b, c} L = {x x starts with a and ends with c} Which of the following strings are valid for the language L.L? 1. abcabc 2. acacac 3. abcbcac 4. abcbacbc Answer: 1,2,3 are valid, but 4 isn t. (To split the string into two L-strings, we d need c followed by a.) 11 / 18

Another exercise Consider the (same) language over the alphabet {a, b, c} L = {x x starts with a and ends with c} Which of the following strings are valid for the language L? 1. ɛ 2. acaca 3. abcbc 4. acacacacac 12 / 18

Another exercise Consider the (same) language over the alphabet {a, b, c} L = {x x starts with a and ends with c} Which of the following strings are valid for the language L? 1. ɛ 2. acaca 3. abcbc 4. acacacacac Answer: 1,3,4 are valid, but not 2. (In this particular case, it so happens that L = L + {ɛ}, but this won t be true in general.) 12 / 18

NFAs with ɛ-transitions We can vary the definition of NFA by also allowing transitions labelled with the special symbol ɛ (not a symbol in Σ). The automaton may (but doesn t have to) perform a spontaneous ɛ-transition at any time, without reading an input symbol. This is quite convenient: for instance, we can turn any NFA into an ɛ-nfa with just one start state and one accepting state: ε ε ε............... ε ε ε (Add ɛ-transitions from new start state to each state in S, and from each state in F to new accepting state.) 13 / 18

Equivalence to ordinary NFAs Allowing ɛ-transitions is just a convenience: it doesn t fundamentally change the power of NFAs. If N = (Q,, S, F ) is an ɛ-nfa, we can convert N to an ordinary NFA with the same associated language, by simply expanding and S to allow for silent ɛ-transitions. To achieve this, perform the following steps on N. For every pair of transitions q a q (where a Σ) and q ɛ q, add a new transition q a q. For every transition q ɛ q, where q is a start state, make q a start state too. Repeat the two steps above until no further new transitions or new start states can be added. Finally, remove all ɛ-transitions from the ɛ-nfa resulting from the above process. This produces the desired NFA. 14 / 18

Closure under concatenation We use ɛ-nfas to show, as promised, that regular languages are closed under the concatenation operation: L 1.L 2 = {xy x L 1, y L 2 } If L 1, L 2 are any regular languages, choose ɛ-nfas N 1, N 2 that define them. As noted earlier, we can pick N 1 and N 2 to have just one start state and one accepting state. Now hook up N 1 and N 2 like this: N1 ε N2 Clearly, this NFA corresponds to the language L 1.L 2. 15 / 18

Closure under Kleene star Similarly, we can now show that regular languages are closed under the Kleene star operation: L = {ɛ} L L.L L.L.L... For suppose L is represented by an ɛ-nfa N with one start state and one accepting state. Consider the following ɛ-nfa: ε N Clearly, this ɛ-nfa corresponds to the language L. ε 16 / 18

Regular expressions We ve been looking at ways of specifying regular languages via machines (often presented as pictures). But it s very useful for applications to have more textual ways of defining languages. A regular expression is a written mathematical expression that defines a language over a given alphabet Σ. The basic regular expressions are ɛ a (for a Σ) From these, more complicated regular expressions can be built up by (repeatedly) applying the two binary operations +,. and the unary operation. Example: (a.b + ɛ) + a We use brackets to indicate precedence. In the absence of brackets, binds more tightly than., which itself binds more tightly than +. So a + b.a means a + (b.(a )) Also the dot is often omitted: ab means a.b 17 / 18

Reading Relevant reading: DFA minimization: Kozen Chapters 13 & 14. Regular expressions: Kozen chapters 7,8; J & M chapter 2.1. (Both texts actually discuss more general patterns see next lecture.) From regular expressions to NFAs: Kozen chapter 8; J & M chapter 2.3. Next two lectures: Some applications of all this theory. String and pattern matching Lexical analysis Model checking 18 / 18