Text Search and Closure Properties

Similar documents
Text Search and Closure Properties

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

CS 154, Lecture 3: DFA NFA, Regular Expressions

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Sri vidya college of engineering and technology

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Closure under the Regular Operations

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Equivalence of DFAs and NFAs

1 Alphabets and Languages

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

CMSC 330: Organization of Programming Languages

Intro to Theory of Computation

Lecture 7 Properties of regular languages

Languages. A language is a set of strings. String: A sequence of letters. Examples: cat, dog, house, Defined over an alphabet:

CS 121, Section 2. Week of September 16, 2013

Recap from Last Time

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

Author: Vivek Kulkarni ( )

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

Concatenation. The concatenation of two languages L 1 and L 2

Nondeterministic Finite Automata and Regular Expressions

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Formal Language and Automata Theory (CS21004)

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Formal Languages, Automata and Models of Computation

CSC236 Week 11. Larry Zhang

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Computability Theory

CS 455/555: Finite automata

How do regular expressions work? CMSC 330: Organization of Programming Languages

CS 455/555: Mathematical preliminaries

CPS 220 Theory of Computation REGULAR LANGUAGES

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

CISC 4090 Theory of Computation

Theory of Computation (I) Yijia Chen Fudan University

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

Computational Theory

Languages, regular languages, finite automata

Closure under the Regular Operations

acs-04: Regular Languages Regular Languages Andreas Karwath & Malte Helmert Informatik Theorie II (A) WS2009/10

Finite Automata Part Two

CSE 355 Homework One Sample Solutions

CSci 311, Models of Computation Chapter 4 Properties of Regular Languages

Nondeterministic Finite Automata

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

UNIT-III REGULAR LANGUAGES

CSE 105 THEORY OF COMPUTATION

More Properties of Regular Languages

3515ICT: Theory of Computation. Regular languages

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

1. Induction on Strings

2017/08/31 Chapter 1.2 & 1.3 in Sipser Ø Announcement:

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

1 More finite deterministic automata

Regular Expressions. Definitions Equivalence to Finite Automata

Introduction to Language Theory and Compilation: Exercises. Session 2: Regular expressions

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs

Harvard CS 121 and CSCI E-207 Lecture 4: NFAs vs. DFAs, Closure Properties

Automata: a short introduction

Computational Models - Lecture 4

The Pumping Lemma and Closure Properties

UNIT II REGULAR LANGUAGES

CS 154 Formal Languages and Computability Assignment #2 Solutions

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

False. They are the same language.

CSC236 Week 10. Larry Zhang

Nondeterministic Finite Automata. Nondeterminism Subset Construction

CSCI 2200 Foundations of Computer Science Spring 2018 Quiz 3 (May 2, 2018) SOLUTIONS

Theory of Computation

Finite Automata and Languages

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

Formal Languages. We ll use the English language as a running example.

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

CS 154 Introduction to Automata and Complexity Theory

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Computer Sciences Department

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

CS5371 Theory of Computation. Lecture 5: Automata Theory III (Non-regular Language, Pumping Lemma, Regular Expression)

Nondeterminism. September 7, Nondeterminism

Exam 1 CSU 390 Theory of Computation Fall 2007

Closure Properties of Regular Languages

Regular languages, regular expressions, & finite automata (intro) CS 350 Fall 2018 gilray.org/classes/fall2018/cs350/

In English, there are at least three different types of entities: letters, words, sentences.

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

2.1 Solution. E T F a. E E + T T + T F + T a + T a + F a + a

Finite Automata and Formal Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Computational Models Lecture 2 1

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

Constructions on Finite Automata

Finite Automata and Regular Languages

Computational Models Lecture 2 1

CMSC 330: Organization of Programming Languages

Theory of Computation (Classroom Practice Booklet Solutions)

Transcription:

Text Search and Closure Properties CSCI 330 Formal Languages and Automata Theory Siu On CHAN Fall 208 Chinese University of Hong Kong /28

Text Search

grep program grep -E regex file.txt Searches for an occurrence of patterns matching a regular expression regex language meaning cat 2 {cat, 2} union [abc] {a, b, c} shorthand for a b c [ab][2] {a, a2, b, b2} concatenation (ab) * {ε, ab, abab,... } star [ab]? {ε, a, b} zero or one (cat)+ {cat, catcat,... } one or more [ab]{2} {aa, ab, ba, bb} n copies 2/28

Searching with grep Words containing savor or savour cd /usr/share/dict/ grep -E 'savou?r' words savor savor's savored savorier savories savoriest savoring savors savory savory's unsavory 3/28

Searching with grep Words containing savor or savour cd /usr/share/dict/ grep -E 'savou?r' words Words with 5 consecutive a or b grep -E '[abab]{5}' words Babbage savor savor's savored savorier savories savoriest savoring savors savory savory's unsavory 3/28

More grep commands. any symbol [a-d] anything in a range ^ beginning of line $ end of line grep -E '^a.pl.$' words 4/28

How do you look for Words that start in go and have another go grep -E '^go.*go' words Words with at least ten vowels? grep -ie '([aeiouy].*){0}' words Words without any vowels? grep -ie '^[^aeiouy]*$' words [^R] means does not contain Words with exactly ten vowels? grep -ie '^[^aeiouy]*([aeiouy][^aeiouy]*){0}$' words 5/28

How grep (could) work regular expression NFA input DFA text file differences in class in grep [ab]?, a+, (cat){3} not allowed allowed input handling matches whole looks for substring output accept/reject finds substring Regular expression also supported in modern languages (C, Java, Python, etc) 6/28

Implementation of grep How do you handle expressions like [ab]? () [ab] zero or more R? ε R (cat)+ (cat)(cat)* one or more R+ RR a{3} aaa n copies R{n} RR }{{... R } n times [^aeiouy]? not containing 7/28

Closure properties

Example The language L of strings that end in 0 is regular (0 + ) 0 How about the language L of strings that do not end in 0? 8/28

Example The language L of strings that end in 0 is regular (0 + ) 0 How about the language L of strings that do not end in 0? Hint: a string does not end in 0 if and only if it ends in 000, 00, 00, 0, 00, 0 or or has length 0,, or 2 So L can be described by the regular expression (0+) (000+00+00+0+00+0+)+ε+(0+)+(0+)(0+) 8/28

Complement The complement L of a language L contains those strings that are not in L L = {w Σ w / L} Examples (Σ = {0, }) L = lang. of all strings that end in 0 L = lang. of all strings that do not end in 0 = lang. of all strings that end in 000,, (but not 0) or have length 0,, or 2 L 2 = lang. of = {ε,,,,... } L 2 = lang. of all strings that contain at least one 0 = lang. of the regular expression (0 + ) 0(0 + ) 9/28

Example The language L of strings that contain 0 is regular (0 + ) 0(0 + ) How about the language L of strings that do not contain 0? You can write a regular expression, but it is a lot of work! 0/28

Closure under complement If L is a regular language, so is L To argue this, we can use any of the equivalent definitions of regular languages regular expression NFA DFA The DFA definition will be the most convenient here We assume L has a DFA, and show L also has a DFA /28

Arguing closure under complement Suppose L is regular, then it has a DFA M accepts L Now consider the DFA M with the accepting and rejecting states of M reversed accepts strings not in L 2/28

Can we do the same with an NFA? 0, q 0 q 0 q 2 (0 + ) 0 0, q 0 q 0 q 2 3/28

Can we do the same with an NFA? 0, q 0 q 0 q 2 (0 + ) 0 0, q 0 q 0 q 2 (0 + ) Not the complement! 3/28

Intersection The intersection L L is the set of strings that are in both L and L Examples: L L L L (0 + ) L L L L (0 + ) 0 If L and L are regular, is L L also regular? 4/28

Closure under intersection If L and L are regular languages, so is L L To argue this, we can use any of the equivalent definitions of regular languages regular expression NFA DFA Suppose L and L have DFAs, call them M and M Goal: construct a DFA (or NFA) for L L 5/28

Example L (odd number of s) M 0 0 s 0 s L (even number of 0s) M r 0 0 0 r L L = lang. of even number of 0s and odd number of s 6/28

Example L (odd number of s) M 0 0 s 0 s L (even number of 0s) M 0 r 0 r 0 r 0, s 0 r 0, s 0 0 0 0 r, s 0 r, s L L = lang. of even number of 0s and odd number of s 6/28

Closure under intersection M and M DFA for L L states Q = {r,..., r s } Q = {s,..., s m } Q Q = {(r, s ), (r, s 2 ),..., (r 2, s ),..., (r n, s m )} start states r i for M (r i, s j ) s j for M accepting states F for M F F = F for M {(r i, s j ) r i F, s j F } Whenever M is in state r i and M is in state s j, the DFA for L L will be in state (r i, s j ) 7/28

Closure under intersection M and M DFA for L L transitions a r i r j r i, s k a r j, s l s k a s l 8/28

Reversal The reversal w R of a string w is w written backwards w = dog w R = god The reversal L R of a language L is the language obtained by reversing all its strings L = {dog, war, level} L R = {god, raw, level} 9/28

Reversal of regular languages L = language of all strings that end in 0 L is regular and has regex (0 + ) 0 How about L R? This is the language of all strings beginning in 0 It is regular and represented by 0(0 + ) 20/28

Closure under reversal If L is a regular language, so is L R How do we argue? regular expression NFA DFA 2/28

Arguing closure under reversal Take a regular expression E for L We will find a regular expression E R representing L R A regular expression can be of the following types: special symbols and ε alphabet symbols like a and b union, concatenation, or star of simpler expressions 22/28

Inductive proof of closure under reversal Regular expression E ε a reversal E R ε a E + E 2 E R + ER 2 E E 2 E R 2 ER E (E R ) 23/28

Duplication? L DUP = {ww w L} Example: L = {cat, dog} L DUP = {catcat, dogdog} If L is regular, is L DUP also regular? 24/28

Attempts Let s try regular expression L DUP? = L 2 25/28

Attempts Let s try regular expression L DUP? = L 2 L = {a, b} L DUP = {aa, bb} LL = {aa, ab, ba, bb} Let s try NFA q ε ε ε 0 NFA for L NFA for L q 25/28

An example L = language of 0 (L is regular) L = {, 0, 00, 000,... } L DUP = {, 00, 0000, 000000,... } = {0 n 0 n n 0} Let s design an NFA for L DUP 26/28

An example L DUP = {, 00, 0000, 000000,... } = {0 n 0 n n 0} 0 0 0 0 0 00 000 27/28

An example L DUP = {, 00, 0000, 000000,... } = {0 n 0 n n 0} 0 0 0 0 0 00 000 Seems to require infinitely many states! Next lecture: will show that languages like L DUP are not regular 27/28

Backreferences in grep Advanced feature in grep and other regular expression libraries grep -E '^(.*)\$' words the special expression \ refers to the substring specified by (.*) (.*)\ looks for a repeated substring, e.g. mama ^(.*)\$ accepts the language L DUP Standard regular expression libraries can accept irregular languages (as defined in this course)! 28/28