Part 4 out of 5 DFA NFA REX. Automata & languages. A primer on the Theory of Computation. Last week, we showed the equivalence of DFA, NFA and REX

Similar documents
Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Chapter 2 SMARTER AUTOMATA. Roger Wattenhofer. ETH Zurich Distributed Computing

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CISC4090: Theory of Computation

Lecture 17: Language Recognition

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

CSE 105 THEORY OF COMPUTATION

CpSc 421 Final Exam December 15, 2006

CSE 105 THEORY OF COMPUTATION

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Computational Models - Lecture 4

Languages, regular languages, finite automata

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

Introduction to Theory of Computing

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

CISC 4090 Theory of Computation

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Theory of Computation (IV) Yijia Chen Fudan University

Theory of Computation Turing Machine and Pushdown Automata

PS2 - Comments. University of Virginia - cs3102: Theory of Computation Spring 2010

CISC 4090 Theory of Computation

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

2.1 Solution. E T F a. E E + T T + T F + T a + T a + F a + a

Computational Models - Lecture 4 1

Chapter 6. Properties of Regular Languages

Please give details of your answer. A direct answer without explanation is not counted.

The Pumping Lemma for Context Free Grammars

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Computational Models: Class 5

AC68 FINITE AUTOMATA & FORMULA LANGUAGES JUNE 2014

CPS 220 Theory of Computation Pushdown Automata (PDA)

Problem 2.6(d) [4 pts] Problem 2.12 [3pts] Original CFG:

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

CSE 355 Test 2, Fall 2016

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

CS500 Homework #2 Solutions

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

Theory of Computation - Module 3

Computational Models - Lecture 5 1

I have read and understand all of the instructions below, and I will obey the University Code on Academic Integrity.

PUSHDOWN AUTOMATA (PDA)

Notes for Comp 497 (Comp 454) Week 10 4/5/05

CS481F01 Prelim 2 Solutions

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

Pushdown Automata. Reading: Chapter 6

CSE 105 THEORY OF COMPUTATION

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

Foundations of Informatics: a Bridging Course

Notes for Comp 497 (454) Week 10

Section 1 (closed-book) Total points 30

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

DM17. Beregnelighed. Jacob Aae Mikkelsen

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

(pp ) PDAs and CFGs (Sec. 2.2)

Before We Start. The Pumping Lemma. Languages. Context Free Languages. Plan for today. Now our picture looks like. Any questions?

SCHEME FOR INTERNAL ASSESSMENT TEST 3

Computational Theory

Lecture 4: More on Regexps, Non-Regular Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

Theory of Computation (II) Yijia Chen Fudan University

CS 341 Homework 16 Languages that Are and Are Not Context-Free

Computational Models - Lecture 3

Final exam study sheet for CS3719 Turing machines and decidability.

Automata and Computability. Solutions to Exercises

Automata Theory CS F-08 Context-Free Grammars

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

10. The GNFA method is used to show that

Homework. Context Free Languages. Announcements. Before We Start. Languages. Plan for today. Final Exam Dates have been announced.

MA/CSSE 474 Theory of Computation

CSE 105 THEORY OF COMPUTATION

CPS 220 Theory of Computation

Computer Sciences Department

Context-Free Languages

Sri vidya college of engineering and technology

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

Non-context-Free Languages. CS215, Lecture 5 c

The View Over The Horizon

Context-Free and Noncontext-Free Languages

CS375: Logic and Theory of Computing

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

Automata and Computability. Solutions to Exercises

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

Computability and Complexity

Theory of Computation

Undecidable Problems and Reducibility

CS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA)

Pushdown Automata. Pushdown Automata. Pushdown Automata. Pushdown Automata. Pushdown Automata. Pushdown Automata. The stack

CPSC 421: Tutorial #1

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Context Free Languages and Grammars

Intro to Theory of Computation

What we have done so far

5 Context-Free Languages

ÖVNINGSUPPGIFTER I SAMMANHANGSFRIA SPRÅK. 17 april Classrum Edition

Transcription:

Automata & languages A primer on the Theory of Computation Laurent Vanbever www.vanbever.eu Part 4 out of 5 ETH Zürich (D-ITET) October, 12 2017 Last week, we showed the equivalence of DFA, NFA and REX We also looked at nonregular languages is equivalent to Pumping lemma If A is a regular language, then there exist a number p s.t. DFA NFA REX Any string in A whose length is at least p can be divided into three pieces xyz s.t. xy i z A, for each i 0 and y > 0 and xy p

To prove that a language A is not regular: To prove that a language A is not regular: 1 Assume that A is regular 1 Assume that A is regular 2 Since A is regular, it must have a pumping length p 2 Since A is regular, it must have a pumping length p 3 Find one string s in A whose length is at least p 3 Find one string s in A whose length is at least p 4 For any split s=xyz, 4 For any split s=xyz, Show that you can not satisfy all three conditions Show that you can not satisfy all three conditions 5 Conclude that s cannot be pumped 5 Conclude that s cannot be pumped A is not regular CFG s: Proving Correctness This week is all about Context-Free Languages The recursive nature of CFG s means that they are especially amenable to correctness proofs. For example let s consider again our grammar G = (S ab ba asb bsa SS) We claim that L(G) = L = { x {a,b}* n a (x) = n b (x) }, where n a (x) is the number of a s in x, and n b (x) is the number of b s. a superset of Regular Languages Proof: To prove that L = L(G) is to show both inclusions: i. L L(G ): Every string in L can be generated by G. ii. L L(G ): G only generate strings of L. 2/1

CFG s: Proving Correctness Proving L L(G) The recursive nature of CFG s means that they are especially amenable to correctness proofs. For example let s consider again our grammar G = (S ab ba asb bsa SS) We claim that L(G) = L = { x {a,b}* n a (x) = n b (x) }, where n a (x) is the number of a s in x, and n b (x) is the number of b s. Proof: To prove that L = L(G) is to show both inclusions: i. L L(G ): Every string in L can be generated by G. ii. L L(G ): G only generate strings of L. Part ii. is easy (see why?), so we ll concentrate on part i. L L(G): Show that every string x with the same number of a s as b s is generated by G. Prove by induction on the length n = x. Base case: The empty string is derived by S Inductive hypothesis: Assume that G generates all strings of equal number of a s and b s of (even) length up to n. Consider any string of length n+2. There are essentially 4 possibilities: 1. awb 2. bwa 3. awa 4. bwb 2/2 2/3 Proving L L(G) Proving L L(G) Inductive hypothesis: Consider any string of length n+2. There are essentially 4 possibilities: 1. awb 2. bwa 3. awa 4. bwb Given S * w, awb and bwa are generated from w using the rules S asb and S bsa (induction hypothesis) Inductive hypothesis: Now, consider a string like awa. For it to be in L requires that w isn t in L as w needs to have 2 more b s than a s. Split awa as follows: a a where the subscripts after a prefix v of awa denotes n v n v Think of this as counting starting from 0. Each a adds 1. Each b subtracts 1. At the end, we should be at 0. Somewhere along the string (in w), the counter crosses 0 (more b s) 2/4 2/5

Proving L L(G) Designing Context-Free Grammars Inductive hypothesis: Somewhere along the string (in w), the counter crosses 0: u a x y a with x, y a, b v As for regular languages this is a creative process. However, if the grammar is the union of simpler grammars, you can design the simpler grammars (with starting symbols S 1, S 2 ), and then add a new starting symbol/production S S 1 S 2. u and v have an equal number of a s and b s and are shorter than n. Given S * u and S * v, the rule S SS generates awa = uv (induction hypothesis) The same argument applies for strings like bwb 2/6 2/7 Designing Context-Free Grammars Push-Down Automata (PDA) As for regular languages this is a creative process. However, if the grammar is the union of simpler grammars, you can design the simpler grammars (with starting symbols S 1, S 2 ), and then add a new starting symbol/production S S 1 S 2. If the CFG happens to be regular, you can first design the FA, introduce a variable/production for each state of the FA, and then add a rule x ay to the CFG if (x,a) = y is in the FA. If a state x is accepting in FA then add x to CFG. The start symbol of the CFG is the start state in the FA. There are quite a few other tricks. Try yourself Finite automata where the machine interpretation of regular languages. Push-Down Automaton are the machine interpretation for grammars. The problem of finite automata was that they couldn t handle languages that needed some sort of unbounded memory something that could be implemented easily by a single (unbounded) integer register! Example: To recognize the language L = {0 n 1 n n 0}, all you need is to count how many 0 s you have seen so far Push-Down Automata allow even more than a register: a full stack! 2/8 2/9

Recursive Algorithms and Stacks Recursive Algorithms and Stacks A stack allows the following basic operations Push, pushing a new element on the top of the stack. Pop, removing the top element from the stack (if there is one). Peek, checking the top element without removing it. A stack allows the following basic operations Push, pushing a new element on the top of the stack. Pop, removing the top element from the stack (if there is one). Peek, checking the top element without removing it. General Principle in Programming: Any recursive algorithm can be turned into a non-recursive one using a stack and a while-loop which exits only when stack is empty. General Principle in Programming: Any recursive algorithm can be turned into a non-recursive one using a stack and a while-loop which exits only when stack is empty. It seems that with a stack at our fingertips we can even recognize palindromes! Yoo-hoo! Palindromes are generated by the grammar S asa bsb. Let s simplify for the moment and look at S # asa bsb. 2/10 2/11 From CFG s to Stack Machines PDA s à la Sipser The CFG S # asa bsb describes palindromes containing exactly 1 #. Question: Using a stack, how can we recognize such strings? To aid analysis, theoretical stack machines restrict the allowable operations. Each text-book author has his/her own version. Sipser s machines are especially simple: Push/Pop rolled into a single operation: replace top stack symbol. In particular, replacing top by is a pop. No intrinsic way to test for empty stack. Instead often push a special symbol ( $ ) as the very first operation! Epsilon s used to increase functionality result in default nondeterministic machines. 2/12 2/13

Sipser s PDA Version PDA: Formal Definition p x, y z If at state p and next input is x and top stack is y, then go to state q and replace y by z on stack. x = : ignore input, don t read y = : ignore top of stack and push z z = : pop y q Definition: A pushdown automaton (PDA) is a 6-tuple M = (Q,,,, q 0, F ): Q,, and q 0, and F are defined as for an FA. is the stack alphabet. is as follows: Given a state p, an input symbol x and a stack symbol y, p,x,y returns all (q,z) where q is a target state and z a stack replacement for y. δ : Q Σ ε P ( Q ) ε ε In addition, push $ initially to detect the empty stack. 2/14 2/15 PDA Exercises Model Robustness Draw the PDA {a i b j c k i,j,k 0 and i=j or i=k} The class of regular languages was quite robust Allows multiple ways for defining languages (automaton vs. regexp) Slight perturbations of model do not change result (non-determinism) The class of context free languages is also robust: you can use either PDA s or CFG s to describe the languages in the class. Draw the PDA for L = {x {a,e}* n a (x) = 2n e (x)} However, it is less robust than regular languages when it comes to slight perturbations of the model: Smaller classes Right-linear grammars Deterministic PDA s Larger classes Context Sensitive Grammars 2/16 2/17

Right Linear Grammars vs. Regular Languages 0 The DFA above can be simulated by the grammar x 0x 1y y 0x 1z z 0x 1z 1 x y z 1 0 Definition: A right-linear grammar is a CFG such that every production is of the form A ub, or A u where u is a terminal string, and A,B are variables. 0 1 Right Linear Grammars vs. Regular Languages Theorem: If M = (Q,,, q 0, F ) is an NFA then there is a right-linear grammar G(M) which generates the same language as M. Proof: Variables are the states: V = Q Start symbol is start state: S = q 0 Same alphabet of terminals A transition q 1 a q 2 becomes the production q 1 aq 2 For each transition, q 1 aq 2 where q 2 is an accept state, add q 1 a to the grammar Homework: Show that the reverse holds. Right-linear grammar can be converted to a FSA. This implies that RL Right-linear CFL. 2/18 2/19 Right Linear Grammars vs. Regular Languages Chomsky Normal Form Theorem: If M = (Q,,, q 0, F ) is an NFA then there is a right-linear grammar G(M) which generates the same language as M. Proof: Variables are the states: V = Q Start symbol is start state: S = q 0 Same alphabet of terminals A transition q 1 a q 2 becomes the production q 1 aq 2 For each transition, q 1 aq 2 where q 2 is an accept state, add q 1 a to the grammar Homework: Show that the reverse holds. Right-linear grammar can be converted to a FSA. This implies that RL Right-linear CFL. Question: Can every CFG be converted into a right-linear grammar? Chomsky came up with an especially simple type of context free grammars which is able to capture all context free languages, the Chomsky normal form (CNF). Chomsky's grammatical form is particularly useful when one wants to prove certain facts about context free languages. This is because assuming a much more restrictive kind of grammar can often make it easier to prove that the generated language has whatever property you are interested in. Noam Chomsky, linguist at MIT, creator of the Chomsky hierarchy, a classification of formal languages. Chomsky is also widely known for his left-wing political views and his criticism of the foreign policy of U.S. government. 2/20 2/21

Chomsky Normal Form CFG CNF Definition: A CFG is said to be in Chomsky Normal Form if every rule in the grammar has one of the following forms: S ( for epsilon s sake only) A BC (dyadic variable productions) A a (unit terminal productions) where S is the start variable, A,B,C are variables and a is a terminal. Thus epsilons may only appear on the right hand side of the start symbol and other rights are either 2 variables or a single terminal. Converting a general grammar into Chomsky Normal Form works in four steps: 1. Ensure that the start variable doesn't appear on the right hand side of any rule. 2. Remove all epsilon productions, except from start variable. 3. Remove unit variable productions of the form A B where A and B are variables. 4. Add variables and dyadic variable rules to replace any longer nondyadic or non-variable productions 2/22 2/23 CFG CNF: Example CFG CNF: Example continued S ε a b asa bsb 1. No start variable on right hand side S S S ε a b asa bsb 2. Only start state is allowed to have S S ε S ε a b asa bsb aa bb 3. Remove unit variable productions of the form A B. S S ε a b asa bsb aa bb S a b asa bsb aa bb S S ε a b asa bsb aa bb S a b asa bsb aa bb 4. Add variables and dyadic variable rules to replace any longer productions. S ε a b asa bsb aa bb AB CD AA CC S a b asa bsb aa bb AB CD AA CC A a B SA C b D SC 2/24 2/25

CFG PDA CFG GPDA Recipe CFG s can be converted into PDA s. In NFA REX it was useful to consider GNFA s as a middle stage. Similarly, it s useful to consider Generalized PDA s here. A Generalized PDA (GPDA) is like a PDA, except it allows the top stack symbol to be replaced by a whole string, not just a single character or the empty string. It is easy to convert a GPDA s back to PDA s by changing each compound push into a sequence of simple pushes. 1. Push the marker symbol $ and the start symbol S on the stack. 2. Repeat the following steps forever a. If the top of the stack is the variable symbol A, nondeterministically select a rule of A, and substitute A by the string on the right-hand-side of the rule. b. If the top of the stack is a terminal symbol a, then read the next symbol from the input and compare it to a. If they match, continue. If they do not match reject this branch of the execution. c. If the top of the stack is the symbol $, enter the accept state. (Note that if the input was not yet empty, the PDA will still reject this branch of the execution.) 2/26 2/27 CFG GPDA PDA: Example CFG PDA: Now you try! S atb b T Ta Convert the grammar S a b asa bsb q0, S S a S b a a b b, $, S, $ q1 q2 q3 S a, a S b, b q4 q5 q6 q7, S, S 2/28 2/29

PDA CFG Context Sensitive Grammars To convert PDA s to CFG s we ll need to simulate the stack inside the productions. Unfortunately, in contrast to our previous transitions, this is not quite as constructive. We will therefore only state the theorem. Theorem: For each push-down automation there is a context-free grammar which accepts the same language. Corollary: PDA CFG. An even more general form of grammars exists. In general, a non-context free grammar is one in which whole mixed variable/terminal substrings are replaced at a time. For example with = {a,b,c} consider: S ASBC A a CB BC ab ab bb bb bc bc cc cc What language is generated by this noncontext-free grammar? When length of LHS always length of RHS (plus some other minor restrictions), these general grammars are called context sensitive. 2/30 2/31 Are all languages context-free? Tandem Pumping Design a CFG (or PDA) for the following languages: L = { w {0,1,2}* there are k 0 s, k 1 s, and k 2 s for k 0 } L = { w {0,1,2}* with 0 = 1 or 0 = 2 or 1 = 2 } L = { 0 k 1 k 2 k k 0 } Analogous to regular languages there is a pumping lemma for context free languages. The idea is that you can pump a context free language at two places (but not more). Theorem: Given a context free language L, there is a number p (tandem-pumping number) such that any string in L of length p is tandem-pumpable within a substring of length p. In particular, for all w L with w p we we can write: w = uvxyz vy 1 (pumpable areas are non-empty) vxy p (pumping inside length-p portion) uv i xy i z L for all i 0 (tandem-pump v and y) If there is no such p the language is not context-free. 2/32 2/33

Proving Non-Context Freeness: Example Proving Non-Context Freeness: You try! L ={1 n 0 n 1 n 0 n n is non-negative } Let s try w = 1 p 0 p 1 p 0 p. Clearly w L and w p. With vxy p, there are only three places where the sliding window vxy could be: I III 1 10 01 10 0 II In all three cases, pumping up such a case would only change the number of 0s and 1s in that part and not in the other two parts; this violates the language definition. L = { x=y+z x, y, and z are binary bit-strings satisfying the equation } The hard part is to come up with a word which cannot be pumped, such as 1 p+1 =1 p +10 p Applying the tandem pumping lemma is easy I 1 p+1 =1 p +10 p II 2/34 2/35 Transducers Definition: A finite state transducer (FST) is a type of finite automaton whose output is a string and not just accept or reject. Each transition of an FST is labeled with two symbols, one designating the input symbol for that transition (as for automata), and the other designating the output symbol. We allow as output symbol if no symbol should be added to the string. The figure on the right shows an example of a FST operating on the input alphabet {0,1,2} and the output alphabet {0,1} Exercise: Can you design a transducer that produces the inverted bitstring of the input string (e.g. 01001 10110)? 2/36