CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

Similar documents
Lecture 4: More on Regexps, Non-Regular Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS 154, Lecture 3: DFA NFA, Regular Expressions

Intro to Theory of Computation

CS 154. Finite Automata, Nondeterminism, Regular Expressions

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY. FLAC (15-453) - Spring L. Blum

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

TDDD65 Introduction to the Theory of Computation

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

What we have done so far

Theory of Computation (II) Yijia Chen Fudan University

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

acs-04: Regular Languages Regular Languages Andreas Karwath & Malte Helmert Informatik Theorie II (A) WS2009/10

CSE 105 THEORY OF COMPUTATION

front pad rear pad door

Finite Automata and Regular languages

Automata: a short introduction

Incorrect reasoning about RL. Equivalence of NFA, DFA. Epsilon Closure. Proving equivalence. One direction is easy:

Computational Theory

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

Computational Models - Lecture 3

Computational Models Lecture 2 1

Computational Models Lecture 2 1

Equivalence of DFAs and NFAs

Closure under the Regular Operations

Theory of Computation (I) Yijia Chen Fudan University

Exam 1 CSU 390 Theory of Computation Fall 2007

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

Sri vidya college of engineering and technology

Harvard CS 121 and CSCI E-207 Lecture 6: Regular Languages and Countability

Applied Computer Science II Chapter 1 : Regular Languages

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

Computer Sciences Department

Nondeterminism. September 7, Nondeterminism

CSE 105 THEORY OF COMPUTATION

CS 455/555: Finite automata

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Regular Languages. Problem Characterize those Languages recognized by Finite Automata.

Proving languages to be nonregular

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Deterministic Finite Automata (DFAs)

TAFL 1 (ECS-403) Unit- II. 2.1 Regular Expression: The Operators of Regular Expressions: Building Regular Expressions

Properties of Regular Languages. BBM Automata Theory and Formal Languages 1

Uses of finite automata

Outline. CS21 Decidability and Tractability. Regular expressions and FA. Regular expressions and FA. Regular expressions and FA

CS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa

CSE 105 THEORY OF COMPUTATION

UNIT-III REGULAR LANGUAGES

Non-deterministic Finite Automata (NFAs)

Theory of Languages and Automata

Closure under the Regular Operations

Languages, regular languages, finite automata

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Deterministic Finite Automata (DFAs)

CS 154 Introduction to Automata and Complexity Theory

Deterministic Finite Automata (DFAs)

CS 133 : Automata Theory and Computability

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

Chapter 5. Finite Automata

Automata and Computability. Solutions to Exercises

Nondeterministic Finite Automata

Theory of Computation Lecture 1. Dr. Nahla Belal

COMP4141 Theory of Computation

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

Text Search and Closure Properties

Computational Models: Class 3

Introduction to Languages and Computation

Automata and Computability. Solutions to Exercises

Intro to Theory of Computation

CSE 105 THEORY OF COMPUTATION

CS243, Logic and Computation Nondeterministic finite automata

CS21 Decidability and Tractability

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

CSE 105 Theory of Computation Professor Jeanne Ferrante

CS375 Midterm Exam Solution Set (Fall 2017)

CS 154 NOTES CONTENTS

CISC 4090 Theory of Computation

V Honors Theory of Computation

Nondeterminism and Epsilon Transitions

CSci 311, Models of Computation Chapter 4 Properties of Regular Languages

Theory of Computation

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

CHAPTER 1 Regular Languages. Contents

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

More Properties of Regular Languages

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

CSE 105 THEORY OF COMPUTATION

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

Formal Languages, Automata and Models of Computation

Computational Models #1

CSE 105 THEORY OF COMPUTATION

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Inf2A: Converting from NFAs to DFAs and Closure Properties

CSC173 Workshop: 13 Sept. Notes

Transcription:

CS 154 Finite Automata vs Regular Expressions, Non-Regular Languages

Deterministic Finite Automata Computation with finite memory

Non-Deterministic Finite Automata Computation with finite memory and guessing

Regular Languages are closed under all of the following operations: Union: A B = { w w A or w B } Intersection: A B = { w w A and w B } Complement: A = { w Σ* w A } Reverse: A R = { w 1 w k w k w 1 A } Concatenation: A B = { vw v A and w B } Star: A* = { w 1 w k k 0 and each w i A }

Regular Expressions Computation as simple, logical description A totally different way of thinking about computation: What is the complexity of describing the strings in the language?

Inductive Definition of Regexp Let Σ be an alphabet. We define the regular expressions over Σ inductively: For all σ Σ, σ is a regexp is a regexp is a regexp If R 1 and R 2 are both regexps, then (R 1 R 2 ), (R 1 + R 2 ), and (R 1 )* are regexps

Precedence Order: * then then + ( R 1 * R 2 Example: R 1 *R 2 + R 3 = ( ) ) + R 3

Definition: Regexps Represent Languages The regexp σ Σ represents the language {σ} The regexp represents {} The regexp represents If R 1 and R 2 are regular expressions representing L 1 and L 2 then: (R 1 R 2 ) represents L 1 L 2 (R 1 + R 2 ) represents L 1 L 2 (R 1 )* represents L 1 * Example: (10 + 0*1) represents {0 k 1 k 0} {10}

Regexps Represent Languages For every regexp R, define L(R) to be the language that R represents A string w Σ* is acceptedby R (or, w matches R) if w L(R) Example: 01010 matches the regexp (01)*0

Assume Σ = {0,1} { w w has exactly a single 1 } 0*10* { w w contains 001 } (0+1)*001(0+1)*

Assume Σ = {0,1} What language does the regexp * represent? {}

Assume Σ = {0,1} { w w has length 3 and its 3rd symbol is 0 } (0+1)(0+1)0(0+1)*

Assume Σ = {0,1} { w every odd position in w is a 1 } (1(0 + 1))*(1 + )

DFAs NFAs Regular Expressions! L can be represented by some regexp L is regular

L can be represented by some regexp L is regular

Given any regexp R, we will construct an NFA N s.t. N accepts exactly the strings accepted by R Proof by induction on the length of the regexp R: Base Cases (R has length 1): R = σ σ R = R =

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1 )*

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1 )* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1 + R 2 ) = L 1 L 2 so L(R) is regular, by the union theorem!

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1 )* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1 R 2 ) = L 1 L 2 so L(R) is regular by the concatenation theorem

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1 )* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1 *) = L 1 * so L(R) is regular, by the star theorem

Induction Step: Suppose every regexp of length < k represents some regular language. Consider a regexp R of length k > 1 Three possibilities for R: R = R 1 + R 2 R = R 1 R 2 R = (R 1 )* By induction, R 1 and R 2 represent some regular languages, L 1 and L 2 But L(R) = L(R 1 *) = L 1 * so L(R) is regular, by the star theorem Therefore: If L is represented by a regexp, then L is regular

Give an NFA that accepts the language represented by (1(0 + 1))* 1 1,0 Regular expression: ( 1(0+1) )*

Generalized NFAs (GNFA) L can be represented by a regexp L is a regular language Idea: Transform an NFA for L into a regular expression by removing states and re-labeling the arcs with regular expressions Rather than reading in just 0 or 1 letters from the string on a step, we can read in entire substrings

A GNFA is a 5-tuple G = (Q, Σ, R, q start, q accept ) Q, Σ are states and alphabet R : (Q-{q accept }) (Q-{q start }) R is the transition function q start Q is the start state q accept Q is the (unique) accept state R = set of all regular expressions over Σ

A GNFA is a 5-tuple G = (Q, Σ, R, q start, q accept ) Let w Σ* and let Gbe a GNFA. G accepts w if w can be written as w = w 1 L w k where w i Σ* and there is a sequence r 0, r 1,..., r k Qsuch that r 0 = q start w i matches R(r i-1, r i ) for all i= 1,..., k, and r k = q accept L(G) = set of all strings that G accepts = the language recognized by G

Generalized NFA (GNFA) cb a*b a q 0 q 1 q 2 This GNFA recognizes L(a*b(cb)*a) Is aaabcbcba accepted or rejected? Is bba accepted or rejected? Is bcba accepted or rejected?

NFA Add unique start and accept states

NFA While the machine has more than 2 states: Pick an internal state, rip it out and re-label the arrows with regexps, to account for paths through the missing state 0 0 1

NFA While the machine has more than 2 states: Pick an internal state, rip it out and re-label the arrows with regexps, to account for paths through the missing state 01*0

GNFA While the machine has more than 2 states: In general: R(q 1,q 3 ) R(q 1,q 2 ) R(q 2,q 3 ) q 1 q 2 q 3 R(q 2,q 2 )

GNFA While the machine has more than 2 states: In general: R(q 1,q 2 )R(q 2,q 2 )*R(q 2,q 3 ) + R(q 1,q 3 ) q 1 q 3

a a,b b q 0 q 1 q 2 q 3 R(q 0,q 3 ) = (a*b)(a+b)* represents L(N)

a,b a*b q 0 q 2 q 3 R(q 0,q 3 ) = (a*b)(a+b)* represents L(N)

(a*b)(a+b)* q 0 q 3 R(q 0,q 3 ) = (a*b)(a+b)* represents L(N)

Formally: Given a DFA, add q start and q acc to create G For all q,q, define R(q,q ) to be σ if δ(q,σ) = q, else CONVERT(G): (Takes a GNFA, outputs a regexp) If #states = 2 return R(q start, q acc ) If #states > 2 select q rip Q different from q start and q acc define Q = Q {q rip } define R on Q -{q acc } x Q -{q start } as: defines a new GNFA G R (q i,q j ) = R(q i,q rip )R(q rip,q rip )*R(q rip,q j ) + R(q i,q j ) return CONVERT(G ) Claim: L(G ) = L(G)

Theorem: Let R = CONVERT(G). Then L(R) = L(G). Proof by induction on k, the number of states in G Base Case: k = 2 CONVERToutputs R(q start, q acc ) Inductive Step: Assume theorem is true for k-1 state GNFAs Let G have k states. Let G be the k-1 state GNFA obtained by ripping out a state. We already claimed L(G) = L(G ) [Sipser, p.73--74] G has k-1 states, so by induction, L(G ) = L(CONVERT(G )) = L(R) Therefore L(R)=L(G). QED

b a q 1 b b a a q 2 q 3

bb b q 1 a a + ba b q 2

bb + (a + ba)b*a q 1 b + (a + ba)b* (bb + (a + ba)b*a)* (b + (a + ba)b*)

Convert the NFA to a regular expression q 1 a, b q 2 b a b b q 3

Convert the NFA to a regular expression q 1 a, b q 2 b b a b q 3

Convert the NFA to a regular expression q 1 a (a + b)b*b bb*b q 3

Convert the NFA to a regular expression q 1 (a + b)b*b(bb*b)* (a + b)b*b(bb*b)*a ((a + b)b*b(bb*b)*a)*( + (a + b)b*b(bb*b)*)

DFAs NFAs DEFINITION Regular Languages Regular Expressions

Some Languages Are Not Regular: Limitations on DFAs

Regular or Not? C = { w w has equal number of 1s and 0s} NOT REGULAR! D = { w w has equal number of occurrences of 01 and 10 } REGULAR!

{ w w has equal number of occurrences of 01 and 10} = { w w = 1, w = 0, or w =, or w starts with a 0 and ends with a 0, or w starts with a 1 and ends with a 1 } 1 + 0 + + 0(0+1)*0 + 1(0+1)*1 Claim: A string w has equal occurrences of 01 and 10 w starts and ends with the same bit.

The Pumping Lemma: Structure in Regular Languages Let L be a regular language Then there is a positive integer P s.t. for all strings w L with w P there is a way to write w = xyz, where: 1. y > 0 (that is, y ) 2. xy P 3. For all i 0, xy i z L Why is it called the pumping lemma? The word w gets pumped into longer and longer strings

Proof: Let M be a DFA that recognizes L Let P be the number of states in M Let w be a string where w L and w P We show: w = xyz y 1. y > 0 2. xy P 3. xy i z L for all i 0 x z q 0 q j q k q w There must exist j and k such that 0 j < k P, and q j = q k