Text Search and Closure Properties

Similar documents
Text Search and Closure Properties

CS 154, Lecture 3: DFA NFA, Regular Expressions

CS 154. Finite Automata, Nondeterminism, Regular Expressions

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

Sri vidya college of engineering and technology

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Closure under the Regular Operations

Equivalence of DFAs and NFAs

1 Alphabets and Languages

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

CMSC 330: Organization of Programming Languages

Intro to Theory of Computation

Concatenation. The concatenation of two languages L 1 and L 2

Recap from Last Time

Lecture 7 Properties of regular languages

Languages. A language is a set of strings. String: A sequence of letters. Examples: cat, dog, house, Defined over an alphabet:

CS 121, Section 2. Week of September 16, 2013

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

Author: Vivek Kulkarni ( )

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Nondeterministic Finite Automata and Regular Expressions

Formal Language and Automata Theory (CS21004)

Formal Languages, Automata and Models of Computation

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

COM364 Automata Theory Lecture Note 2 - Nondeterminism

CS 455/555: Finite automata

CSC236 Week 11. Larry Zhang

How do regular expressions work? CMSC 330: Organization of Programming Languages

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

CS 455/555: Mathematical preliminaries

Computability Theory

CPS 220 Theory of Computation REGULAR LANGUAGES

Theory of Computation (I) Yijia Chen Fudan University

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

CMSC 330: Organization of Programming Languages

Languages, regular languages, finite automata

Computational Theory

CISC 4090 Theory of Computation

Closure under the Regular Operations

acs-04: Regular Languages Regular Languages Andreas Karwath & Malte Helmert Informatik Theorie II (A) WS2009/10

Finite Automata Part Two

Nondeterministic Finite Automata

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

CSci 311, Models of Computation Chapter 4 Properties of Regular Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

UNIT-III REGULAR LANGUAGES

More Properties of Regular Languages

Lecture 4: More on Regexps, Non-Regular Languages

CSE 355 Homework One Sample Solutions

2017/08/31 Chapter 1.2 & 1.3 in Sipser Ø Announcement:

3515ICT: Theory of Computation. Regular languages

CSE 105 THEORY OF COMPUTATION

Regular Expressions. Definitions Equivalence to Finite Automata

1. Induction on Strings

1 More finite deterministic automata

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs

Harvard CS 121 and CSCI E-207 Lecture 4: NFAs vs. DFAs, Closure Properties

Computational Models - Lecture 4

Automata: a short introduction

The Pumping Lemma and Closure Properties

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

Finite Automata and Regular Languages

UNIT II REGULAR LANGUAGES

Introduction to Language Theory and Compilation: Exercises. Session 2: Regular expressions

False. They are the same language.

CS 154 Formal Languages and Computability Assignment #2 Solutions

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Nondeterministic Finite Automata. Nondeterminism Subset Construction

Theory of Computation

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

CSCI 2200 Foundations of Computer Science Spring 2018 Quiz 3 (May 2, 2018) SOLUTIONS

TDDD65 Introduction to the Theory of Computation

CSC236 Week 10. Larry Zhang

Formal Languages. We ll use the English language as a running example.

CS 154 Introduction to Automata and Complexity Theory

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Finite Automata and Languages

CS311 Computational Structures Regular Languages and Regular Expressions. Lecture 4. Andrew P. Black Andrew Tolmach

Computer Sciences Department

CS5371 Theory of Computation. Lecture 5: Automata Theory III (Non-regular Language, Pumping Lemma, Regular Expression)

Closure Properties of Regular Languages

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

Nondeterminism. September 7, Nondeterminism

Exam 1 CSU 390 Theory of Computation Fall 2007

2.1 Solution. E T F a. E E + T T + T F + T a + T a + F a + a

In English, there are at least three different types of entities: letters, words, sentences.

Context Free Grammars

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Computational Models Lecture 2 1

Transcription:

Text Search and Closure Properties CSCI 3130 Formal Languages and Automata Theory Siu On CHAN Chinese University of Hong Kong Fall 2017 1/30

Text Search 2/30

grep program grep -E regexp file.txt Searches for an occurrence of patterns matching a regular expression cat 12 {cat, 12} union [abc] {a, b, c} shorthand for a b c [ab][12] {a1, a2, b1, b2} concatenation (ab) * {ε, ab, abab,... } star [ab]? {ε, a, b} zero or one (cat)+ {cat, catcat,... } one or more [ab]{2} {aa, ab, ba, bb} n copies 3/30

Searching with grep Words containing savor or savour cd /usr/share/dict/ grep -E 'savou?r' words savor savor's savored savorier savories savoriest savoring savors savory savory's unsavory 4/30

Searching with grep Words containing savor or savour cd /usr/share/dict/ grep -E 'savou?r' words savor savor's savored savorier savories savoriest savoring savors savory savory's unsavory Words with 5 consecutive a or b grep -E '[abab]{5}' words Babbage 4/30

More grep commands. any symbol [a-d] anything in a range ^ beginning of line $ end of line grep -E '^a.pl.$' words 5/30

How do you look for Words that start in go and have another go grep -E '^go.*go' words Words with at least ten vowels? grep -ie '([aeiouy].*){10}' words Words without any vowels? grep -ie '^[^aeiouy]*$' words [^R] means does not contain Words with exactly ten vowels? grep -ie '^[^aeiouy]*([aeiouy][^aeiouy]*){10}$' words 6/30

How grep (could) work regular expression NFA input DFA text file differences in class in grep [ab]?, a+, (cat){3} not allowed allowed input handling matches whole looks for pattern output accept/reject finds pattern Regular expression also supported in modern languages (C, Java, Python, etc) 7/30

Implementation of grep How do you handle expressions like [ab]? () [ab] zero or more R? ε R (cat)+ (cat)(cat)* one or more R+ RR a{3} aaa n copies R{n} RR }{{... R } n times [^aeiouy]? not containing 8/30

Closure properties 9/30

Example The language L of strings that end in 101 is regular (0 + 1) 101 How about the language L of strings that do not end in 101? 10/30

Example The language L of strings that end in 101 is regular (0 + 1) 101 How about the language L of strings that do not end in 101? Hint: a string does not end in 101 if and only if it ends in 000, 001, 010, 011, 100, 110 or 111 or has length 0, 1, or 2 So L can be described by the regular expression (0+1) (000+001+010+011+100+110+111)+ε+(0+1)+(0+1)(0+1) 10/30

Complement The complement L of a language L contains those strings that are not in L L = {w Σ w / L} Examples (Σ = {0, 1}) L 1 = lang. of all strings that end in 101 L 1 = lang. of all strings that do not end in 101 = lang. of all strings that end in 000,, 111 (but not 101) or have length 0, 1, or 2 L 2 = lang. of 1 = {ε, 1, 11, 111,... } L 2 = lang. of all strings that contain at least one 0 = lang. of the regular expression (0 + 1) 0(0 + 1) 11/30

Example The language L of strings that contain 101 is regular (0 + 1) 101(0 + 1) How about the language L of strings that do not contain 101? You can write a regular expression, but it is a lot of work! 12/30

Closure under complement If L is a regular language, so is L To argue this, we can use any of the equivalent definitions of regular languages regular expression NFA DFA The DFA definition will be the most convenient here We assume L has a DFA, and show L also has a DFA 13/30

Arguing closure under complement Suppose L is regular, then it has a DFA M accepts L Now consider the DFA M with the accepting and rejecting states of M reversed accepts strings not in L 14/30

Can we do the same with an NFA? 0, 1 q 1 0 0 q 1 q 2 (0 + 1) 10 0, 1 q 1 0 0 q 1 q 2 15/30

Can we do the same with an NFA? 0, 1 q 1 0 0 q 1 q 2 (0 + 1) 10 0, 1 q 1 0 0 q 1 q 2 (0 + 1) Not the complement! 15/30

Intersection The intersection L L is the set of strings that are in both L and L Examples: L L L L (0 + 1) 11 1 1 11 L L L L (0 + 1) 10 1 If L and L are regular, is L L also regular? 16/30

Closure under intersection If L and L are regular languages, so is L L To argue this, we can use any of the equivalent definitions of regular languages regular expression NFA DFA Suppose L and L have DFAs, call them M and M Goal: construct a DFA (or NFA) for L L 17/30

Example L (odd number of 1s) M 0 0 s 0 1 s 1 1 L (even number of 0s) M r 0 1 0 0 r 1 1 L L = lang. of even number of 0s and odd number of 1s 18/30

Example L (odd number of 1s) M 0 0 s 0 1 s 1 1 L (even number of 0s) M 0 r 0 1 0 r 1 1 r 0, s 0 1 r 0, s 1 1 0 0 0 0 r 1, s 0 1 r 1, s 1 1 L L = lang. of even number of 0s and odd number of 1s 18/30

Closure under intersection M and M DFA for L L states Q = {r 1,..., r s } Q = {s 1,..., s m } Q Q = {(r 1, s 1 ), (r 1, s 2 ),..., (r 2, s 1 ),..., (r n, s m )} start states r i for M (r i, s j ) s j for M accepting states F for M F F = F for M {(r i, s j ) r i F, s j F } Whenever M is in state r i and M is in state s j, the DFA for L L will be in state (r i, s j ) 19/30

Closure under intersection M and M DFA for L L transitions a r i r j r i, s k a r j, s l s k a s l 20/30

Reversal The reversal w R of a string w is w written backwards w = dog w R = god The reversal L R of a language L is the language obtained by reversing all its strings L = {dog, war, level} L R = {god, raw, level} 21/30

Reversal of regular languages L = language of all strings that end in 01 L is regular and has regex (0 + 1) 01 How about L R? This is the language of all strings beginning in 10 It is regular and represented by 10(0 + 1) 22/30

Closure under reversal If L is a regular language, so is L R regular expression How do we argue? NFA DFA 23/30

Arguing closure under reversal Take a regular expression E for L We will show how to reverse E A regular expression can be of the following types: special symbols and ε alphabet symbols like a and b union, concatenation, or star of simpler expressions 24/30

Inductive proof of closure under reversal Regular expression E ε a reversal E R ε a E 1 + E 2 E R 1 + ER 2 E 1 E 2 E R 2 ER 1 E 1 (E R 1 ) 25/30

Duplication? L DUP = {ww w L} Example: L = {cat, dog} L DUP = {catcat, dogdog} If L is regular, is L DUP also regular? 26/30

Attempts Let s try regular expression L DUP? = L 2 27/30

Attempts Let s try regular expression L DUP? = L 2 L = {a, b} L DUP = {aa, bb} LL = {aa, ab, ba, bb} 27/30

Attempts Let s try regular expression L DUP? = L 2 L = {a, b} L DUP = {aa, bb} LL = {aa, ab, ba, bb} Let s try NFA q ε ε ε 0 NFA for L NFA for L q 1 27/30

Attempts Let s try regular expression L DUP? = L 2 L = {a, b} L DUP = {aa, bb} LL = {aa, ab, ba, bb} Let s try NFA q ε ε ε 0 NFA for L NFA for L q 1 27/30

An example L = language of 0 1 (L is regular) L = {1, 01, 001, 0001,... } L DUP = {11, 0101, 001001, 00010001,... } = {0 n 10 n 1 n 0} Let s design an NFA for L DUP 28/30

An example L DUP = {11, 0101, 001001, 00010001,... } = {0 n 10 n 1 n 0} 0 0 0 0 1 1 1 1 1 01 001 0001 29/30

An example L DUP = {11, 0101, 001001, 00010001,... } = {0 n 10 n 1 n 0} 0 0 0 0 1 1 1 1 1 01 001 0001 Seems to require infinitely many states! Next lecture: will show that languages like L DUP are not regular 29/30

Backreferences in grep Advanced feature in grep and other regular expression libraries grep -E '^(.*)\1$' words the special expression \1 refers to the substring specified by (.*) (.*)\1 looks for a repeated substring, e.g. mama ^(.*)\1$ accepts the language L DUP Standard regular expression libraries can accept irregular languages (as defined in this course)! 30/30