Chapter 4. Regular Expressions. 4.1 Some Definitions

Similar documents
CSCI 340: Computational Models. Regular Expressions. Department of Computer Science

In English, there are at least three different types of entities: letters, words, sentences.

Author: Vivek Kulkarni ( )

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

The Binomial Theorem.

The assignment is not difficult, but quite labour consuming. Do not wait until the very last day.

A canonical semi-deterministic transducer

CS Automata, Computability and Formal Languages

Recap from Last Time

HW 3 Solutions. Tommy November 27, 2012

CS 133 : Automata Theory and Computability

CS6902 Theory of Computation and Algorithms

Automata Theory CS F-08 Context-Free Grammars

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

Automata: a short introduction

download instant at Assume that (w R ) R = w for all strings w Σ of length n or less.

1 Alphabets and Languages

Stream Codes. 6.1 The guessing game

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

6.4 Binomial Coefficients

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

CMSC 330: Organization of Programming Languages

Deciding Representability of Sets of Words of Equal Length

Theory of Computation (Classroom Practice Booklet Solutions)

Theory of Computation

Theory of Computation

Theory of Computer Science

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

CS A Term 2009: Foundations of Computer Science. Homework 2. By Li Feng, Shweta Srivastava, and Carolina Ruiz.

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

How do regular expressions work? CMSC 330: Organization of Programming Languages

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

Finite Automata and Regular Languages

Solutions to Problem Set 3

Summary of Last Lectures

SFWR ENG 2FA3. Solution to the Assignment #4

Homework 4. Chapter 7. CS A Term 2009: Foundations of Computer Science. By Li Feng, Shweta Srivastava, and Carolina Ruiz

{a, b, c} {a, b} {a, c} {b, c} {a}

The Pumping Lemma (cont.) 2IT70 Finite Automata and Process Theory

Computational Learning Theory Learning Patterns (Monomials)

CS 410/610, MTH 410/610 Theoretical Foundations of Computing

Section 1.3 Ordered Structures

Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries

Regular Expressions [1] Regular Expressions. Regular expressions can be seen as a system of notations for denoting ɛ-nfa

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

Linear Classifiers (Kernels)

FALL 07 CSE 213: HW4

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

CSEP 590 Data Compression Autumn Arithmetic Coding

The Probability of Winning a Series. Gregory Quenell

CMSC 330: Organization of Programming Languages

TAFL 1 (ECS-403) Unit- IV. 4.1 Push Down Automata. 4.2 The Formal Definition of Pushdown Automata. EXAMPLES for PDA. 4.3 The languages of PDA

Closure Properties of Regular Languages

Exam 1 CSU 390 Theory of Computation Fall 2007

6.8 The Post Correspondence Problem

MA/CSSE 474 Theory of Computation

Lecture 11 Context-Free Languages

HW6 Solutions. Micha l Dereziński. March 20, 2015

Computational Learning Theory Regular Expression vs. Monomilas

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

FLAC Context-Free Grammars

Semigroup presentations via boundaries in Cayley graphs 1

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.

CS 154, Lecture 3: DFA NFA, Regular Expressions

Avoiding Large Squares in Partial Words

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Automata Theory CS F-04 Non-Determinisitic Finite Automata

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

The Three Hour Tour Through Automata Theory

CSE 105 THEORY OF COMPUTATION

Computational Theory

Notes for Comp 497 (Comp 454) Week 5 2/22/05. Today we will look at some of the rest of the material in Part 1 of the book.

Lecture 1 09/08/2017

EXAMPLE CFG. L = {a 2n : n 1 } L = {a 2n : n 0 } S asa aa. L = {a n b : n 0 } L = {a n b : n 1 } S asb ab S 1S00 S 1S00 100

Theoretical Computer Science

Chapter-6 Regular Expressions

Automata, Computability and Complexity with Applications Exercises in the Book Solutions Elaine Rich

On Parsing Expression Grammars A recognition-based system for deterministic languages

Pushdown Automata. Reading: Chapter 6

MA/CSSE 474 Theory of Computation. Your Questions? Previous class days' material Reading Assignments

Chapter 1. Introduction

Unit-5: ω-regular properties

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

CS375 Midterm Exam Solution Set (Fall 2017)

arxiv: v1 [math.ra] 15 Jul 2013

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Automata Theory and Formal Grammars: Lecture 1

Equivalent Variations of Turing Machines

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Nondeterministic Finite Automata and Regular Expressions

CS500 Homework #2 Solutions

Compiler Design. Spring Lexical Analysis. Sample Exercises and Solutions. Prof. Pedro Diniz

Lecture 4: More on Regexps, Non-Regular Languages

CSE 105 THEORY OF COMPUTATION

More Properties of Regular Languages

Transcription:

Chapter 4 Regular Expressions 4.1 Some Definitions Definition: If S and T are sets of strings of letters (whether they are finite or infinite sets), we define the product set of strings of letters to be ST = {w = w 1 w 2 : w 1 S, w 2 T } If S = {a, aa, aaa} and T = {b, bb}, then ST = {ab, abb, aab, aabb, aaab, aaabb} If S = {a, ab, aba} and T = {Λ, b, ba}, then ST = {a, ab, aba, abb, abba, abab, ababa} If S = {Λ, a, aa} and T = {Λ, bb, bbbb, bbbbbb,...}, then ST = {Λ, a, aa, bb, abb, aabb, bbbb, abbbb,...} Definition: Let s and t be strings. Then s is a substring of t if there exist strings u and v such that t = usv. 4-1

CHAPTER 4. REGULAR EXPRESSIONS 4-2 Suppose s = aba and t = aababb. Then s is a substring of t since we can define u = a and v = bb, and then t = usv. Suppose s = abb and t = aaabb. Then s is a substring of t since we can define u = aa and v = Λ, and then t = usv. Suppose s = bb and t = aababa. Then s is not a substring of t. Definition: Over the alphabet Σ = {a, b}, a string contains a double letter if it has either aa or bb as a substring. Over the alphabet Σ = {a, b}, 1. The string abaabab contains a double letter. 2. The string bb contains a double letter. 3. The string aba does not contain a double letter. 4. The string abbba contains two double letters. 4.2 Defining Languages Using Regular Expressions Previously, we defined the languages: L 1 = {x n for n = 1, 2, 3,...} L 2 = {x, xxx, xxxxx,...} But these are not very precise ways of defining languages. So we now want to be very precise about how we define languages, and we will do this using regular expressions.

CHAPTER 4. REGULAR EXPRESSIONS 4-3 Languages that are associated with these regular expressions are called regular languages and are also said to be defined by a finite representation. Regular expressions are written in bold face letters and are a way of specifying the language. Recall that we previously saw that for sets S, T, we defined the operations S + T = {w : w S or w T } ST = {w = w 1 w 2 : w 1 S, w 2 T } S = S 0 + S 1 + S 2 + S + = S 1 + S 2 + We will precisely define what a regular expression is later. But for now, let s work with the following sketchy description of a regular expression. Loosely speaking, a regular expression is a way of specifying a language in which the only operations allowed are union (+), concatenation (or product), Kleene- closure, superscript-+. The allowable symbols are parentheses, Λ, and, as well as each letter in Σ written in boldface. No other symbols are allowed in a regular expression. Also, a regular expression must only consist of a finite number of symbols. To introduce regular expressions, think of x = {x}; i.e., x represents the language (i.e., set) consisting of exactly one string, x. Also, think of a = {a}, b = {b}, so a is the language consisting of exactly one string a, and b is the language consisting of exactly one string b.

CHAPTER 4. REGULAR EXPRESSIONS 4-4 Using this interpretation, we can interpret ab to mean ab = {a}{b} = {ab} since the concatenation (or product) of the two languages {a} and {b} is the language {ab}. We can also interpret a + b to mean a + b = {a} + {b} = {a, b} We can also interpret a to mean a = {a} = {Λ, a, aa, aaa,...} We can also interpret a + to mean a + = {a} + = {a, aa, aaa,...} Also, we have (ab + a) b = ({a}{b} + {a}) {b} = {ab, a} {b} Previously, we saw language L 4 = {Λ, x, xx, xxx,...} = {x} = language(x ) Language L 1 = {x, xx, xxx, xxxx,...} = language(xx ) = language(x x) = language(x + ) = language(x xx x ) = language(x x + ) Note that there are several different regular expressions associated with L 1.

CHAPTER 4. REGULAR EXPRESSIONS 4-5 language L of all words of the form one a followed by some number (possibly zero) of b s. L = language(ab ) language L of all words of the form some positive number of a s followed by exactly one b. L = language(aa b) language L = language(ab a), which is the set of all strings of a s and b s that have at least two letters, that begin and end with one a, and that have nothing but b s inside (if anything at all). L = {aa, aba, abba, abbba,...} The language L consisting of all possible words over the alphabet Σ has the following regular expression: (a + b) Other regular expressions for L include (a b ) and (Λ + a + b). alphabet Σ = {x} language L with an even number (possibly zero) of x s L = {Λ, xx, xxxx, xxxxxx,...} = language((xx) ) alphabet Σ = {x} language L with a positive even number of x s L = { xx, xxxx, xxxxxx,...} = language(xx(xx) ) = language((xx) + )

CHAPTER 4. REGULAR EXPRESSIONS 4-6 alphabet Σ = {x} language L with an odd number of x s L = { x, xxx, xxxxx,...} = language(x(xx) ) = language((xx) x) Is L = language(x xx )? No, since it includes the word (xx)x(x). language L of all three-letter words starting with b L = {baa, bab, bba, bbb} = language(b(a + b)(a + b)) = language(baa + bab + bba + bbb) language L of all words starting with a and ending with b L = {ab, aab, abb, aaab, aabb, abab, abbb,...} = language(a(a + b) b) language L of all words starting and ending with b L = {b, bb, bab, bbb, baab, babb, bbab, bbbb,...} = language(b + b(a + b) b) language L of all words with exactly two b s L = language(a ba ba ) language L of all words with at least two b s L = language((a + b) b(a + b) b(a + b) )

CHAPTER 4. REGULAR EXPRESSIONS 4-7 Note that bbaaba L since bbaaba = (Λ)b(Λ)b(aaba) = (b)b(aa)b(a) language L of all words with at least two b s L = language(a ba b(a + b) ) Note that bbaaba L since bbaaba = Λ b Λ b aaba language L of all words with at least one a and at least one b L = language((a + b) a(a + b) b(a + b) + (a + b) b(a + b) a(a + b) ) where = language((a + b) a(a + b) b(a + b) + bb aa ) the first regular expression comes from separately considering the two cases: 1. requiring an a before a b, 2. requiring a b before an a. the second expression comes from the observation that the first term in the first expression only omits words that are of the form some b s followed by some a s. language L consists of Λ and all strings that are either all a s or b followed by a nonnegative number of a s L = language(a + ba ) = language((λ + b)a ) Theorem 5 If L is a finite language, then L can be defined by a regular expression.

CHAPTER 4. REGULAR EXPRESSIONS 4-8 Proof. To make a regular expression that defines the language L, turn all the words in L into boldface type and put pluses between them. language L = {aba, abba, bbaab} Then a regular expression to define L is aba + abba + bbaab 4.3 The Language EVEN-EVEN Consider the regular expression E = [aa + bb + (ab + ba)(aa + bb) (ab + ba)]. We now prove that the regular expression E generates the language EVEN- EVEN, which consists exactly of all strings that have an even number of a s and an even number of b s; i.e., EVEN-EVEN = {Λ, aa, bb, aabb, abab, abba, baab, baba, bbaa, aaaabb,...}. Proof. Let L 1 be the language generated by the regular expression E. Let L 2 be the language EVEN-EVEN. So we need to prove that L 1 = L 2, which we will do by showing that L 1 L 2 and L 2 L 1. First note that any word generated by E is made up of syllables of three types: type 1 = aa type 2 = bb type 3 = (ab + ba)(aa + bb) (ab + ba) E = [type 1 + type 2 + type 3 ]

CHAPTER 4. REGULAR EXPRESSIONS 4-9 We first show that L 1 L 2 : Consider any string w L 1 ; i.e., w can be generated by the regular expression E. We need to show that w L 2. Note that since w can be generated by the regular expression E, the string w must be made up of syllables of type 1, 2, or 3. Each of these types of syllables generate an even number of a s and an even number of b s. type 1 syllable generates 2 a s and 0 b s. type 2 syllable generates 0 a s and 2 b s. type 3 syllable (ab + ba)(aa + bb) (ab + ba) generates exactly 1 a and 1 b at the beginning, exactly 1 a and 1 b at the end, and generates either 2 a s or 2 b s at a time in the middle. Thus, the type 3 syllable generates an even number of a s and an even number of b s. Thus, the total string must have an even number of a s and an even number of b s. Therefore, w EVEN-EVEN, so we can conclude that L 1 L 2. Now we want to show that L 2 L 1 ; i.e., we want to show that any word with an even number of a s and an even number of b s can be generated by E. Consider any string w = w 1 w 2 w 3 w n with an even number of a s and an even number of b s. If w = Λ, then iterate the outer star of the regular expression E zero times to generate Λ. Now assume that w Λ. Let n = length(w). Note that n is even since w consists solely of a s and b s and since the number of a s is even and the number of b s is even. Thus, we can read in the string w two letters at a time from left to right.

CHAPTER 4. REGULAR EXPRESSIONS 4-10 Use the following algorithm to generate w = w 1 w 2 w 3 w n using the regular expression E: 1. Let i = 1. 2. Do the following while i n: (a) If w i = a and w i+1 = a, then iterate the outer star of E and use the type 1 syllable aa. (b) If w i = b and w i+1 = b, then iterate the outer star of E and use the type 2 syllable bb. (c) If (w i = a and w i+1 = b) or if (w i = b and w i+1 = a), then choose the type 3 syllable (ab + ba)(aa + bb) (ab + ba), and do the following: If (w i = a and w i+1 = b), then choose ab in the first part of the type 3 syllable. If (w i = b and w i+1 = a), then choose ba in the first part of the type 3 syllable. Do the following while either (w i+2 = a and w i+3 = a) or (w i+2 = b and w i+3 = b): Let i = i + 2. If w i = a and w i+1 = a, then iterate the inner star of the type 3 syllable, and use aa. If w i = b and w i+1 = b, then iterate the inner star of the type 3 syllable, and use bb. Let i = i + 2. If (w i = a and w i+1 = b), then choose ab in the last part of the type 3 syllable. If (w i = b and w i+1 = a), then choose ba in the last part of the type 3 syllable. Remarks: We must eventually read in either ab or ba, which balances out the previous unbalanced pair. This completes a syllable of type 3. If we never read in the second unbalanced pair, then either the number of a s is odd or the number of b s is odd, which is a contradiction. (d) Let i = i + 2.

CHAPTER 4. REGULAR EXPRESSIONS 4-11 This algorithm shows how to use the regular expression E to generate any string in EVEN-EVEN; i.e., if w EVEN-EVEN, then we can use the above algorithm to generate w using E. Thus, L 2 L 1. 4.4 More Examples and Definitions double a. b (abb ) (Λ + a) generates the language of all words without a What is a regular expression for all valid variable names in C? Definition: The set of regular expressions is defined by the following: Rule 1 Every letter of Σ can be made into a regular expression by writing it in boldface; Λ and are regular expressions. Rule 2 If r 1 and r 2 are regular expressions, then so are 1. (r 1 ) 2. r 1 r 2 3. r 1 + r 2 4. r 1 and r + 1 Rule 3 Nothing else is a regular expression. Definition: For a regular expression r, let L(r) denote the language generated by (or associated with) r; i.e., L(r) is the set of strings that can be generated by r. Definition: The following rules define the language associated with (or generated by) any regular expression: Rule 1 (i) If l Σ, then L(l) = {l}; i.e., the language associated with the regular expression that is just a single letter is that one-letter word alone.

CHAPTER 4. REGULAR EXPRESSIONS 4-12 (ii) L( Λ) = {Λ}; i.e., the language associated with Λ is {Λ}, a one-word language. (iii) L( ) = ; i.e., the language associated with is, the language with no words. Rule 2 If r 1 is a regular expression associated with the language L 1 and r 2 is a regular expression associated with the language L 2, then (i) The regular expression (r 1 )(r 2 ) is associated with the language L 1 concatenated with L 2 : We define L 1 = L 1 =. language(r 1 r 2 ) = L 1 L 2. (ii) The regular expression r 1 +r 2 is associated with the language formed by the union of the sets L 1 and L 2 : language(r 1 + r 2 ) = L 1 + L 2 (iii) The language associated with the regular expression (r 1 ) is L 1, the Kleene closure of the set L 1 as a set of words: language(r 1) = L 1 (iv) The language associated with the regular expression (r 1 ) + is L + 1 : language(r + 1 ) = L + 1