Regular Expressions and Language Properties

Similar documents
Regular Expressions. Definitions Equivalence to Finite Automata

Nondeterminism and Epsilon Transitions

The Pumping Lemma and Closure Properties

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Nondeterministic Finite Automata. Nondeterminism Subset Construction

Homomorphisms and Efficient State Minimization

Finite Automata. BİL405 - Automata Theory and Formal Languages 1

Chapter Five: Nondeterministic Finite Automata

Before we show how languages can be proven not regular, first, how would we show a language is regular?

Automata Theory, Computability and Complexity

CS 154, Lecture 3: DFA NFA, Regular Expressions

Finite Automata and Formal Languages

More on Regular Languages and Non-Regular Languages

More on Finite Automata and Regular Languages. (NTU EE) Regular Languages Fall / 41

Nondeterministic Finite Automata

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

Sri vidya college of engineering and technology

3515ICT: Theory of Computation. Regular languages

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

CS 455/555: Finite automata

Ogden s Lemma for CFLs

Theory of computation: initial remarks (Chapter 11)

Automata and Formal Languages - CM0081 Finite Automata and Regular Expressions

Theory of computation: initial remarks (Chapter 11)

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Regular Language Equivalence and DFA Minimization. Equivalence of Two Regular Languages DFA Minimization

Finite Automata and Formal Languages TMV026/DIT321 LP4 2012

Deterministic Finite Automaton (DFA)

CMPSCI 250: Introduction to Computation. Lecture #22: From λ-nfa s to NFA s to DFA s David Mix Barrington 22 April 2013

What we have done so far

Nondeterministic finite automata

Equivalence of CFG s and PDA s

Properties of Context-Free Languages. Closure Properties Decision Properties

Lecture 2: Regular Expression

CS21 Decidability and Tractability

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

Formal Languages, Automata and Models of Computation

Lecture 3: Nondeterministic Finite Automata

Properties of Regular Languages (2015/10/15)

COM364 Automata Theory Lecture Note 2 - Nondeterminism

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

Equivalence of Regular Expressions and FSMs

NPDA, CFG equivalence

Lecture 4 Nondeterministic Finite Accepters

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Foundations of

Uses of finite automata

More Properties of Regular Languages

Kleene Algebras and Algebraic Path Problems

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CMSC 330: Organization of Programming Languages

Computational Models: Class 3

Regular Expression Unit 1 chapter 3. Unit 1: Chapter 3

Nondeterministic Finite Automata and Regular Expressions

Chap. 1.2 NonDeterministic Finite Automata (NFA)

Let us first give some intuitive idea about a state of a system and state transitions before describing finite automata.

Nondeterministic Finite Automata

CPS 220 Theory of Computation REGULAR LANGUAGES

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Finite Automata and Regular Languages

UNIT-III REGULAR LANGUAGES

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

CS243, Logic and Computation Nondeterministic finite automata

Computational Models - Lecture 3

Computational Models Lecture 2 1

Languages. Non deterministic finite automata with ε transitions. First there was the DFA. Finite Automata. Non-Deterministic Finite Automata (NFA)

Theory of Computation

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Regular expressions and Kleene s theorem

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

Decision, Computation and Language

MA/CSSE 474 Theory of Computation. Your Questions? Previous class days' material Reading Assignments

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Chapter 6. Properties of Regular Languages

CS 322 D: Formal languages and automata theory

Regular expressions and Kleene s theorem

Finite Automata and Regular languages

Computational Models Lecture 2 1

TDDD65 Introduction to the Theory of Computation

CMPSCI 250: Introduction to Computation. Lecture #29: Proving Regular Language Identities David Mix Barrington 6 April 2012

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

Undecidability COMS Ashley Montanaro 4 April Department of Computer Science, University of Bristol Bristol, UK

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Subset construction. We have defined for a DFA L(A) = {x Σ ˆδ(q 0, x) F } and for A NFA. For any NFA A we can build a DFA A D such that L(A) = L(A D )

Computational Theory

Languages, regular languages, finite automata

CSE 105 Theory of Computation Professor Jeanne Ferrante

CSci 311, Models of Computation Chapter 4 Properties of Regular Languages

CSE 105 THEORY OF COMPUTATION

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

Formal Languages. We ll use the English language as a running example.

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

How do regular expressions work? CMSC 330: Organization of Programming Languages

Transcription:

Regular Expressions and Language Properties Mridul Aanjaneya Stanford University July 3, 2012 Mridul Aanjaneya Automata Theory 1/ 47

Tentative Schedule HW #1: Out (07/03), Due (07/11) HW #2: Out (07/10), Due (07/18) HW #3: Out (07/17), Due (07/25) Midterm: 07/31 HW #4: Out (07/31), Due (08/08) Tentative grades out by 08/12. Final:? Mridul Aanjaneya Automata Theory 2/ 47

Epsilon Transitions: Extended transition function Basis: δ E (q,ε) = CL(q). Induction: δ E (q,xa) is computed as follows: 1 Start with δ E (q,x) = S. 2 Take the union of CL(δ(p,a)) for all p in S. Intuition: δ E (q,w) is the set of states you can reach from q following a path labeled w with ε s in between. Mridul Aanjaneya Automata Theory 3/ 47

Equivalence of NFA, ε-nfa Compute δ N (q,a) as follows: Let S = CL(q). δ N (q,a) is the union over all p in S of δ E (p,a). F = set of states q such that CL(q) contains a state of F. Intuition: δ N incorporates ε-transitions before using a. Proof of equivalence is by induction on w that CL(δ N (q 0,w)) = δ E (q 0,w). Basis: CL(δ N (q 0,ε)) = CL(q 0 ) = δ E (q 0,ε). Inductive step: Assume IH is true for all x shorter than w. Let w = xa. Then CL(δ N (q 0,xa)) = CL(δ E (CL(δ N (q 0,x)),a)) (by definition). But from IH, CL(δ N (q 0,x)) = δ E (q 0,x). Hence, CL(δ N (q 0,w)) = CL(δ E (δ E (q 0,x),a)) = δ E (q 0,w). Mridul Aanjaneya Automata Theory 4/ 47

Example ε B 1 1 C D 1 A 0 ε ε 0 E 0 F B 1 1 C D 1 A 0 E 1 0 1 F 0 1 Mridul Aanjaneya Automata Theory 5/ 47

Recap DFA s, NFA s and ε-nfa s all accept exactly the same set of languages: the regular languages. NFA types are easier to design and may have exponentially fewer states than a DFA. But only a DFA can be implemented! Mridul Aanjaneya Automata Theory 6/ 47

Challenge Problem #1 Question Are such tilings always possible? Mridul Aanjaneya Automata Theory 7/ 47

Challenge Problem #2 Question How many regions can you cut? Mridul Aanjaneya Automata Theory 8/ 47

Regular Operators We define three regular operations on languages. Definition Let A and B be languages. We define the regular operations union, concatenation, and star as follows. Union: A B = {x x A or x B}. Concatenation: A B = {xy x A and y B}. Star: A = {x 1 x 2... x k k 0 and each x i A}. Kleene Closure Denoted as A and defined as the set of strings x 1 x 2... x n, for some n 0, where each x i is in A. Note: When n = 0, the string is ε. Mridul Aanjaneya Automata Theory 9/ 47

Example Let Σ = {a, b,..., z}. If A = {good,bad} and B = {boy,girl}, A B = {good,bad,boy,girl}, A B = {goodboy,goodgirl,badboy,badgirl}, A = {ε,good,bad,goodgood,goodbad,badgood,badbad,...}, Mridul Aanjaneya Automata Theory 10/ 47

Closure Properties: Union Theorem The class of regular languages is closed under the union operation, i.e., if A 1 and A 2 are regular languages, so is A 1 A 2. ε ε Mridul Aanjaneya Automata Theory 11/ 47

Closure Properties: Concatenation Theorem The class of regular languages is closed under concatenation. ε ε Mridul Aanjaneya Automata Theory 12/ 47

Closure Properties: Star Theorem The class of regular languages is closed under the star operation. ε ε ε ε Mridul Aanjaneya Automata Theory 13/ 47

Regular Expressions Regular expressions describe languages algebraically. They describe exactly the regular languages. If E is a regular expression, then L(E) is its language. We give a recursive definition of RE s and their languages. Mridul Aanjaneya Automata Theory 14/ 47

Regular Expressions: Definition 1 Basis: If a is any symbol, then a is a RE, and L(a)={a}. Note: {a} is the language containing one string, and that string is of length 1. 2 Basis: ε is a RE, and L(ε) = {ε}. 3 Basis: is a RE, and L( ) =. Mridul Aanjaneya Automata Theory 15/ 47

Regular Expressions: Definition 1 Induction: If E 1 and E 2 are regular expressions, then E 1 +E 2 is a regular expression, and L(E 1 +E 2 ) = L(E 1 ) L(E 2 ). 2 Induction: If E 1 and E 2 are regular expressions, then E 1 E 2 is a regular expression, and L(E 1 E 2 ) = L(E 1 )L(E 2 ). 3 Induction: If E is a regular expression, then E is a regular expression, and L(E ) = (L(E)). Mridul Aanjaneya Automata Theory 16/ 47

Precedence of operators Parentheses may be used wherever needed to influence the grouping of operators. Order of precedence is (highest), then concatenation, then + (lowest). Mridul Aanjaneya Automata Theory 17/ 47

Examples L(01) = {01}. L(01+0) = {01,0}. L(0(1+0)) = {01,00}. Note: order of precedence. L(0 ) = {ε,0,00,000,...} L((0+10) (ε+1)) = all strings over {0,1} without 11 s. Mridul Aanjaneya Automata Theory 18/ 47

Algebraic Laws for Regular Expressions Union and concatenation behave sort of like addition and multiplication. + is commutative and associative. concatenation is associative. concatenation distributes over +. Exception: concatenation is not commutative. Mridul Aanjaneya Automata Theory 19/ 47

Identities and Annihilators is the identity for +. R + = R. ε is the identity for concatenation. εr = Rε = R is the annihilator for concatenation. R = R =. Mridul Aanjaneya Automata Theory 20/ 47

Equivalence of Regular Expressions and Automata We need to show that for every regular expression, there is an automaton that accepts the same language. Pick the most powerful automaton type: ε-nfa. And we need to show that for every automaton, there is a regular expression defining its language. Pick the most restrictive type: the DFA. Mridul Aanjaneya Automata Theory 21/ 47

Converting a RE to an ε-nfa Proof is an induction on the number of operators (+, concatenation, ) in the regular expression. We always construct an automaton of a special form (next slide). Mridul Aanjaneya Automata Theory 22/ 47

RE to ε-nfa: Basis Symbol a: a ε: ε : Mridul Aanjaneya Automata Theory 23/ 47

RE to ε-nfa: Induction (Union) For E 1 E 2 ε E 1 ε ε E 2 ε Mridul Aanjaneya Automata Theory 24/ 47

RE to ε-nfa: Induction (Concatenation) For E 1 E 2 E ε 1 E 2 Mridul Aanjaneya Automata Theory 25/ 47

RE to ε-nfa: Induction (Star) For E ε ε E ε ε Mridul Aanjaneya Automata Theory 26/ 47

DFA to RE A strange sort of induction. States of the DFA are assumed to be 1, 2,..., n. We construct RE s for the labels of restricted sets of paths. Basis: single arcs or no arcs at all. Induction: paths that are allowed to traverse next state in order. Mridul Aanjaneya Automata Theory 27/ 47

k-paths A k-path is a path through the DFA that goes through no state numbered higher than k. End-points are not restricted, they can be any state. Mridul Aanjaneya Automata Theory 28/ 47

k-paths: Example 1 0 1 2 0 0 1 1 3 0-paths from 2 to 3: RE for labels = 0 1-paths from 2 to 3: RE for labels = 0+11 2-paths from 2 to 3: RE for labels = (10) 0+1(01) 1 3-paths from 2 to 3: RE for labels =?? Mridul Aanjaneya Automata Theory 29/ 47

k-paths: Induction Let R k ij be the RE for the set of labels of k-paths from state i to state j. Basis: k = 0. R 0 ij = sum of labels of arcs from i to j. is no such arc. But add ε if i=j. 1 0 1 2 0 0 1 1 3 Example: R12 0 = 0, R0 11 = + ε = ε. Mridul Aanjaneya Automata Theory 30/ 47

k-paths: Inductive Step A k-path from i to j either: 1 Never goes through state k, or 2 Goes through state k one or more times. Rij k = R k 1 ij + R k 1 ik (R k 1 kk ) R k 1 kj The equivalent RE is the sum (union) of Rij n, where: 1 n is the number of states, i.e., the paths are unconstrained. 2 i is the start state. 3 j is one of the final states. Mridul Aanjaneya Automata Theory 31/ 47

Summary Each of the three types of automata (DFA, NFA, ε-nfa) we discussed, and regular expressions as well, define exactly the same set of languages: the regular languages. Mridul Aanjaneya Automata Theory 32/ 47

Challenge Problem Question Can you find the shortest path from A to B? A B Mridul Aanjaneya Automata Theory 33/ 47

Properties of Language Classes A language class is a set of languages. We have seen one example: the regular languages. We ll see many more in the class. Language classes have two important kinds of properties: 1 Decision properties 2 Closure properties Mridul Aanjaneya Automata Theory 34/ 47

Representation of Languages Representations can be formal or informal. Example (formal): represent a language by a DFA or RE defining it. Example (informal): a logical or prose statement about its strings: {0 n 1 n n is a nonnegative integer.} The set of strings consisting of some number of 0 s followed by the same number of 1 s. Mridul Aanjaneya Automata Theory 35/ 47

Decision Properties A decision property for a class of languages is an algorithm that takes a formal description of a language (e.g., a DFA) and tells whether or not some property holds. Example: Is language L empty? Mridul Aanjaneya Automata Theory 36/ 47

Representation matters... You might imagine that the language is described informally, so if my description is the empty language then yes, otherwise no. But the representation is a DFA (or a RE that you will convert to a DFA). Can you tell if L(A) = for a DFA A? Mridul Aanjaneya Automata Theory 37/ 47

Why Decision Properties? Remember that DFA s can represent protocols, and good protocols are related to the language of the DFA. Example: Does the protocol terminate? = Is the language finite? Example: Can the protocol fail? = Is the language nonempty? We might want a smallest representation for a language, e.g., a minimum-state DFA or a shortest RE. If you can t decide Are these two languages the same?, i.e., do two DFA s define the same language - you can t find a smallest! Mridul Aanjaneya Automata Theory 38/ 47

Closure Properties A closure property of a language class says that given languages in the class, an operator (e.g., union) produces another language in the same class. Example: We saw that regular languages are closed under union, concatenation and Kleene closure (star) operations. Mridul Aanjaneya Automata Theory 39/ 47

Why Closure Properties? Helps construct representations. Helps show (informally described) languages not to be in the class. Mridul Aanjaneya Automata Theory 40/ 47

The Membership Question Our first decision property is the question: is the string w in regular language L? Assume L is represented by a DFA A. Simulate the action of A on the sequence of input symbols forming w. Question What if L is not represented by a DFA? Use the circle of conversions: RE ε-nfa NFA DFA RE Mridul Aanjaneya Automata Theory 41/ 47

The Emptiness Problem Question Does a regular language L contain any string at all? Assume representation is a DFA. Compute the set of states reachable from the start state. If any final state is reachable, then yes, else no. Mridul Aanjaneya Automata Theory 42/ 47

The Infiniteness Problem Question Is a given regular language L infinite? Start with a DFA for the language. Key idea: If the DFA has n states, and the language contains any string of length n or more, then the language is infinite. Otherwise, the language is surely finite. Limited to strings of length n or less. Mridul Aanjaneya Automata Theory 43/ 47

Proof of Key Idea If an n-state DFA accepts a string w of length n or more, then there must be a state that appears twice on the path labeled w from the start state to a final state. Note: Pigeonhole principle! Because there are at least n + 1 states along the path. y x z Since y is not ε, we see an infinite number of strings in L of the form xy i z for all i 0. Mridul Aanjaneya Automata Theory 44/ 47

The Infiniteness Problem We do not have an algorithm yet. There are an infinite number of strings of length > n, and we can t test them all! Second Key Idea: If there is a string of length n, then there is a string of length between n and 2n 1. Mridul Aanjaneya Automata Theory 45/ 47

Proof of Second Key Idea Remember: y x z We can choose y to be the first cycle on the path. So xy n; in particular, 1 y n. Thus, if w is of length 2n or more, there is a shorter string in L that is still of length at least n. Keep shortening to reach [n,2n 1]. Mridul Aanjaneya Automata Theory 46/ 47

Completion of Infiniteness Algorithm Test for membership all strings of length between [n,2n 1]. If any are accepted, then infinite, else finite. A terrible algorithm! Better: find cycles between the start state and a final state. For finding cycles: 1 Eliminate states not reachable from the start state. 2 Eliminate states that do not reach a final state. 3 Test if the remaining transition graph has any cycles. Mridul Aanjaneya Automata Theory 47/ 47