Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Similar documents
Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Chapter 2 SMARTER AUTOMATA. Roger Wattenhofer. ETH Zurich Distributed Computing

CISC4090: Theory of Computation

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

MA/CSSE 474 Theory of Computation

Chapter 6. Properties of Regular Languages

Context Free Grammars

Computational Models - Lecture 3

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Introduction to Theory of Computing

Computational Models - Lecture 4

CPSC 421: Tutorial #1

Proving languages to be nonregular

What we have done so far

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

Theory of Computation (II) Yijia Chen Fudan University

Computational Models - Lecture 4 1

CISC 4090 Theory of Computation

Notes for Comp 497 (454) Week 10

Computational Models - Lecture 4 1

CS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL)

Automata Theory. CS F-10 Non-Context-Free Langauges Closure Properties of Context-Free Languages. David Galles

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

CSE 105 THEORY OF COMPUTATION

CSE 105 THEORY OF COMPUTATION

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

The View Over The Horizon

6.1 The Pumping Lemma for CFLs 6.2 Intersections and Complements of CFLs

Properties of Context-Free Languages. Closure Properties Decision Properties

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

V Honors Theory of Computation

Lecture 4: More on Regexps, Non-Regular Languages

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

2.1 Solution. E T F a. E E + T T + T F + T a + T a + F a + a

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

Before We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions?

The Pumping Lemma and Closure Properties

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

FLAC Context-Free Grammars

CS375: Logic and Theory of Computing

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Finite Automata and Regular languages

Lecture 17: Language Recognition

Chapter 1 AUTOMATA and LANGUAGES

Context-Free Grammars (and Languages) Lecture 7

CSE 105 THEORY OF COMPUTATION

10. The GNFA method is used to show that

Decidability (What, stuff is unsolvable?)

Question Bank UNIT I

Foundations of Informatics: a Bridging Course

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

Formal Languages, Automata and Models of Computation

CS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA)

Languages, regular languages, finite automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Theory of Computation (IV) Yijia Chen Fudan University

Functions on languages:

1 More finite deterministic automata

CISC 4090 Theory of Computation

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Non-context-Free Languages. CS215, Lecture 5 c

Automata and Computability. Solutions to Exercises

CSCE 551: Chin-Tser Huang. University of South Carolina

CS500 Homework #2 Solutions

CS 154. Finite Automata, Nondeterminism, Regular Expressions

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Automata Theory CS F-08 Context-Free Grammars

Sri vidya college of engineering and technology

Context Free Languages and Grammars

CS 455/555: Finite automata

1. Induction on Strings

TDDD65 Introduction to the Theory of Computation

UNIT-III REGULAR LANGUAGES

Introduction and Motivation. Introduction and Motivation. Introduction to Computability. Introduction and Motivation. Theory. Lecture5: Context Free

The Turing Machine. Computability. The Church-Turing Thesis (1936) Theory Hall of Fame. Theory Hall of Fame. Undecidability

Properties of Context-free Languages. Reading: Chapter 7

Solutions to Problem Set 3

The Pumping Lemma for Context Free Grammars

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

Context-Free Grammars and Languages. Reading: Chapter 5

VTU QUESTION BANK. Unit 1. Introduction to Finite Automata. 1. Obtain DFAs to accept strings of a s and b s having exactly one a.

Properties of Regular Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Notes on Pumping Lemma

UNIT II REGULAR LANGUAGES

CSE 105 THEORY OF COMPUTATION. Spring 2018 review class

Transcription:

Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu Part 3 out of 5 ETH Zürich (D-ITET) October, 5 2017 Last week, we learned about closure and equivalence of regular languages Last week, we learned about closure and equivalence of regular languages The class of regular languages is closed under the union concatenation star regular operations

Last week, we learned about closure and equivalence of regular languages The class of regular languages is closed under the if L1 and L2 are regular, then so are union concatenation star L1 L2 L1. L2 L1 * regular operations This week we ll look at REX, the third way of representing regular languages is equivalent to DFA NFA DFA NFA REX

Are REX, NFA and DFA all equivalent? We ll then start asking ourselves whether all languages are regular L1 {0 n 1 n n 0} DFA NFA L2 {w w has an equal number of 0s and 1s}? L3 {w w has an equal number of occurrences of 01 and 10} REX (only one of them actually is) Three tough languages 1) L 1 = {0 n 1 n n 0} 2) L 2 = {w w has an equal number of 0s and 1s} Advanced Automata 1 Thu Oct 5 Equivalence (the end) DFA NFA Regular Expression 3) L 3 = {w w has an equal number of occurrences of 01 and 10 as substrings} 2 Non-regular languages 3 Context-free languages 1/1

Three tough languages Pigeonhole principle 1) L 1 = {0 n 1 n n 0} 2) L 2 = {w w has an equal number of 0s and 1s} Consider language L, which contains word w L. Consider an FA which accepts L, with n < w states. Then, when accepting w, the FA must visit at least one state twice. 3) L 3 = {w w has an equal number of occurrences of 01 and 10 as substrings} In order to fully understand regular languages, we also must understand their limitations! 1/2 1/3 Pigeonhole principle Languages with unbounded strings Consider language L, which contains word w L. Consider an FA which accepts L, with n < w states. Then, when accepting w, the FA must visit at least one state twice. Consequently, regular languages with unbounded strings can only be recognized by FA (finite! bounded!) automata if these long strings loop. This is according to the pigeonhole (a.k.a. Dirichlet) principle: If m>n pigeons are put into n pigeonholes, there's a hole with more than one pigeon. That s a pretty fancy name for a boring observation... The FA can enter the loop once, twice,, and not at all. That is, language L contains all {xz, xyz, xy 2 z, xy 3 z, }. 1/4 1/5

Pumping Lemma Pumping Lemma Theorem: Given a regular language L, there is a number p (the pumping number) such that: any string u in L of length p is pumpable within its first p letters. Theorem: Given a regular language L, there is a number p (the pumping number) such that: any string u in L of length p is pumpable within its first p letters. A string u L with u p is pumpable if it can be split in 3 parts xyz s.t.: y 1 (mid-portion y is non-empty) xy p (pumping occurs in first p letters) xy i z L for all i 0 (can pump y-portion) 1/6 1/7 Pumping Lemma Pumping Lemma Example Theorem: Given a regular language L, there is a number p (the pumping number) such that: any string u in L of length p is pumpable within its first p letters. A string u L with u p is pumpable if it can be split in 3 parts xyz s.t.: y 1 (mid-portion y is non-empty) xy p (pumping occurs in first p letters) xy i z L for all i 0 (can pump y-portion) If there is no such p, then the language is not regular Let L be the language {0 n 1 n n 0} Assume (for the sake of contradiction) that L is regular Let p be the pumping length. Let u be the string 0 p 1 p. Let s check string u against the pumping lemma: In other words, for all u L with u p we can write: u = xyz (x is a prefix, z is a suffix) y 1 (mid-portion y is non-empty) xy p (pumping occurs in first p letters) xy i z L for all i 0 (can pump y-portion) 1/8 1/9

Let s make the example a bit harder Now you try Let L be the language {w w has an equal number of 0s and 1s} Is L = ww w 0 1 } regular? Assume (for the sake of contradiction) that L is regular Let p be the pumping length. Let u be the string 0 p 1 p. Let s check string u against the pumping lemma: Is L = 1 n being a prime number } regular? In other words, for all u L with u p we can write: u = xyz (x is a prefix, z is a suffix) y 1 (mid-portion y is non-empty) xy p (pumping occurs in first p letters) xy i z L for all i 0 (can pump y-portion) 1/10 1/11 Automata & languages A primer on the Theory of Computation Automata & languages A primer on the Theory of Computation Part 1 regular regular language language Part 2 context-free Part 2 context-free language language Part 3 turing turing machine machine

Motivation Example Why is a language such as {0 n 1 n n 0} not regular?!? Palindromes, for example, are not regular. But there is a pattern. It s really simple! All you need to keep track is the number of 0 s In this chapter we first study context-free grammars More powerful than regular languages Recursive structure Developed for human languages Important for engineers (parsers, protocols, etc.) 2/1 2/2 Example Example Palindromes, for example, are not regular. But there is a pattern. Palindromes, for example, are not regular. But there is a pattern. Q: If you have one palindrome, how can you generate another? A: Generate palindromes recursively as follows: Base case:, 0 and 1 are palindromes. Recursion: If x is a palindrome, then so are 0x0 and 1x1. Q: If you have one palindrome, how can you generate another? A: Generate palindromes recursively as follows: Base case:, 0 and 1 are palindromes. Recursion: If x is a palindrome, then so are 0x0 and 1x1. Notation: x 0 1 0x0 1x1. Each pipe ( ) is an or, just as in UNIX regexp s. In fact, all palindromes can be generated from using these rules 2/3 2/4

Example Context Free Grammars (CFG): Definition Palindromes, for example, are not regular. But there is a pattern. Q: If you have one palindrome, how can you generate another? A: Generate palindromes recursively as follows: Base case:, 0 and 1 are palindromes. Recursion: If x is a palindrome, then so are 0x0 and 1x1. Definition: A context free grammar consists of (V,, R, S) with: V: a finite set of variables (or symbols, or non-terminals) a finite set set of terminals (or the alphabet) R: a finite set of rules (or productions) of the form v w with v V, and w ( V )* (read: v yields w or v produces w ) S V: the start symbol. Notation: x 0 1 0x0 1x1. Each pipe ( ) is an or, just as in UNIX regexp s. In fact, all palindromes can be generated from using these rules Q: How would you generate 11011011? 2/5 2/6 Context Free Grammars (CFG): Definition Derivations and Language Definition: A context free grammar consists of (V,, R, S) with: V: a finite set of variables (or symbols, or non-terminals) a finite set set of terminals (or the alphabet) R: a finite set of rules (or productions) of the form v w with v V, and w ( V )* (read: v yields w or v produces w ) S V: the start symbol. Definition: The derivation symbol (read 1-step derives or 1-step produces ) is a relation between strings in ( V )*. We write x y if x and y can be broken up as x = svt and y = swt with v w being a production in R. Q: What are (V,, R, S) for our palindrome example? 2/7 2/8

Derivations and Language Derivations and Language Definition: The derivation symbol (read 1-step derives or 1-step produces ) is a relation between strings in ( V )*. We write x y if x and y can be broken up as x = svt and y = swt with v w being a production in R. Definition: The derivation symbol *, (read derives or produces or yields ) is a relation between strings in ( V )*. We write x * y if there is a sequence of 1-step productions from x to y. I.e., there are strings x i with i ranging from 0 to n such that x = x 0, y = x n and x 0 x 1, x 1 x 2, x 2 x 3,, x n-1 x n. Definition: The derivation symbol (read 1-step derives or 1-step produces ) is a relation between strings in ( V )*. We write x y if x and y can be broken up as x = svt and y = swt with v w being a production in R. Definition: The derivation symbol *, (read derives or produces or yields ) is a relation between strings in ( V )*. We write x * y if there is a sequence of 1-step productions from x to y. I.e., there are strings x i with i ranging from 0 to n such that x = x 0, y = x n and x 0 x 1, x 1 x 2, x 2 x 3,, x n-1 x n. Definition: Let G be a context-free grammar. The context-free language (CFL) generated by G is the set of all terminal strings which are derivable from the start symbol. Symbolically: L(G ) = {w * S * w} 2/9 2/10 Example: Infix Expressions Infix expressions involving {+,, a, b, c, (, )} E stands for an expression (most general) F stands for factor (a multiplicative part) T stands for term (a product of factors) V stands for a variable: a, b, or c Grammar is given by: E T E +T T F T F F V (E) V a b c Convention: Start variable is the first one in grammar (E) Example: Infix Expressions Consider the string u given by a b + (c + (a + c)) This is a valid infix expression. Can be generated from E. 1. A sum of two expressions, so first production must be E E +T 2. Sub-expression a b is a product, so a term so generated by sequence E +T T +T T F +T * a b +T 3. Second sub-expression is a factor only because a parenthesized sum. a b +T a b +F a b +(E ) a b +(E +T) 4. E E +T T +T T F +T F F +T V F+T a F+T a V+T a b +T a b +F a b + (E) a b + (E +T) a b + (T+T) a b+(f +T) a b+(v+t) a b + (c +T) a b + (c+f) a b + (c + (E )) a b + (c +(E +T)) a b+(c+(t+t)) a b +(c + (F+T)) a b+(c+(a+t)) a b +(c +(a + F )) a b + (c+(a+v )) a b + (c + (a+c )) 2/11 2/12

Left- and Right-most derivation Ambiguity The derivation on the previous slide was a so-called left-most derivation. In a right-most derivation, the variable most to the right is replaced. E E +T E + F E + (E) E + (E +T) etc. There can be a lot of ambiguity involved in how a string is derived. Another way to describe a derivation in a unique way is using derivation trees. 2/13 2/14 Derivation Trees Derivation Trees In a derivation tree (or parse tree) each node is a symbol. Each parent is a variable whose children spell out the production from left to right. For, example v abcdefg: v On the right, we see a derivation tree for our string a b + (c + (a + c)) Derivation trees help understanding semantics! You can tell how expression should be evaluated from the tree. a b c d e f g The root is the start variable. The leaves spell out the derived string from left to right. 2/15 2/16

Ambiguity Ambiguity <sentence> <action> <action> with <subject> <action> <subject><activity> <subject> <noun> <noun> and <subject> <activity> <verb> <verb><object> <noun> Hannibal Clarice rice onions <verb> ate played <prep> with and or <object> <noun> <noun><prep><object> A: Consider Hannibal ate rice with Clarice This could either mean Hannibal and Clarice ate rice together. Hannibal ate rice and ate Clarice. This ambiguity arises from the fact that the sentence has two different parse-trees, and therefore two different interpretations: Clarice played with Hannibal Clarice ate rice with onions Hannibal ate rice with Clarice Q: Are there any suspect sentences? 2/17 2/18 Hannibal and Clarice Ate Hannibal the Cannibal sentence sentence action action w i t h subject subject activity subject activity noun noun verb object noun verb object C l a r i c e H a n n i b a l a t e noun prep object H a n n i b a l a t e r i c e r i c e w i t h noun 2/19 C l a r i c e 2/20

Ambiguity: Definition Ambiguity: Definition Definition: A string x is said to be ambiguous relative the grammar G if there are two essentially different ways to derive x in G. x admits two (or more) different parse-trees equivalently, x admits different left-most [resp. right-most] derivations. A grammar G is said to be ambiguous if there is some string x in L(G) which is ambiguous. Definition: A string x is said to be ambiguous relative the grammar G if there are two essentially different ways to derive x in G. x admits two (or more) different parse-trees equivalently, x admits different left-most [resp. right-most] derivations. A grammar G is said to be ambiguous if there is some string x in L(G) which is ambiguous. Question: Is the grammar S ab ba asb bsa SS ambiguous? What language is generated? 2/21 2/22 Ambiguity CFG s: Proving Correctness Answer: L(G ) = the language with equal no. of a s and b s Yes, the language is ambiguous: The recursive nature of CFG s means that they are especially amenable to correctness proofs. S S S S a a b b a a b b S S S S S For example let s consider the grammar G = (S ab ba asb bsa SS) We claim that L(G) = L = { x {a,b}* n a (x) = n b (x) }, where n a (x) is the number of a s in x, and n b (x) is the number of b s. Proof: To prove that L = L(G) is to show both inclusions: i. L L(G ): Every string in L can be generated by G. ii. L L(G ): G only generate strings of L. - This part is easy, so we concentrate on part i. 2/1 2/23

Proving L L(G) Designing Context-Free Grammars L L(G ): Show that every string x with the same number of a s as b s is generated by G. Prove by induction on the length n = x. Base case: The empty string is derived by S. Inductive hypothesis: Assume n > 0. Let u be the smallest non-empty prefix of x which is also in L. Either there is such a prefix with u < x, then x = uv whereas v L as well, and we can use S SS and repeat the argument. Or x = u. In this case notice that u can t start and end in the same letter. If it started and ended with a then write x = ava. This means that v must have 2 more b s than a s. So somewhere in v the b s of x catch up to the a s which means that there s a smaller prefix in L, contradicting the definition of u as the smallest prefix in L. Thus for some string v in L we have x = avb OR x = bva. We can use either S asb OR S bsa. As for regular languages this is a creative process. However, if the grammar is the union of simpler grammars, you can design the simpler grammars (with starting symbols S 1, S 2, respectively) first, and then add a new starting symbol/production S S 1 S 2. If the CFG happens to be regular as well, you can first design the FA, introduce a variable/production for each state of the FA, and then add a rule x ay to the CFG if (x,a) = y is in the FA. If a state x is accepting in FA then add x to CFG. The start symbol of the CFG is of course equivalent to the start state in the FA. There are quite a few other tricks. Try yourself 2/24 2/25