Context-Free Grammars and Languages

Similar documents
Chapter 5: Context-Free Languages

Automata Theory CS F-08 Context-Free Grammars

Properties of Context-Free Languages

Lecture 11 Context-Free Languages

Context-Free Grammars: Normal Forms

Context-free Grammars and Languages

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Context-Free Grammars and Languages. Reading: Chapter 5

FLAC Context-Free Grammars

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

This lecture covers Chapter 5 of HMU: Context-free Grammars

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

Einführung in die Computerlinguistik

Computational Models - Lecture 4 1

Context-Free Grammar

Concordia University Department of Computer Science & Software Engineering

Context Free Grammars

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Solutions to Problem Set 3

Pushdown Automata. Reading: Chapter 6

MA/CSSE 474 Theory of Computation

CS 373: Theory of Computation. Fall 2010

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008

Introduction to Automata

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Computational Models - Lecture 4 1

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Properties of Context-free Languages. Reading: Chapter 7

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

Finite Automata. Seungjin Choi

5 Context-Free Languages

Computational Models - Lecture 5 1

Theory of Computation - Module 3

CISC 4090 Theory of Computation

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Grammars and Context Free Languages

Context Free Languages and Grammars

Grammars and Context Free Languages

Introduction to Theory of Computing

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

Notes for Comp 497 (Comp 454) Week 10 4/5/05

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Context-Free Grammars (and Languages) Lecture 7

This lecture covers Chapter 7 of HMU: Properties of CFLs

Computational Models - Lecture 4

CISC4090: Theory of Computation

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

d(ν) = max{n N : ν dmn p n } N. p d(ν) (ν) = ρ.

Non-context-Free Languages. CS215, Lecture 5 c

Properties of context-free Languages

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Compiling Techniques

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Notes for Comp 497 (454) Week 10

Computational Models - Lecture 3

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Grammars and Context-free Languages; Chomsky Hierarchy

Chapter 16: Non-Context-Free Languages

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

AC68 FINITE AUTOMATA & FORMULA LANGUAGES JUNE 2014

CSE 355 Test 2, Fall 2016

Chapter 4: Context-Free Grammars

Fall, 2017 CIS 262. Automata, Computability and Complexity Jean Gallier Solutions of the Practice Final Exam

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

Homework 4. Chapter 7. CS A Term 2009: Foundations of Computer Science. By Li Feng, Shweta Srivastava, and Carolina Ruiz

Introduction and Motivation. Introduction and Motivation. Introduction to Computability. Introduction and Motivation. Theory. Lecture5: Context Free

The View Over The Horizon

Foundations of Informatics: a Bridging Course

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

PUSHDOWN AUTOMATA (PDA)

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

Section 1 (closed-book) Total points 30

Introduction to Formal Languages, Automata and Computability p.1/42

The Turing Machine. Computability. The Church-Turing Thesis (1936) Theory Hall of Fame. Theory Hall of Fame. Undecidability

The Pumping Lemma for Context Free Grammars

Functions on languages:

This lecture covers Chapter 6 of HMU: Pushdown Automata

CPS 220 Theory of Computation

Computational Models - Lecture 5 1

Fundamentele Informatica II

An automaton with a finite number of states is called a Finite Automaton (FA) or Finite State Machine (FSM).

Lecture Notes on Inductive Definitions

Context-free Languages and Pushdown Automata

ÖVNINGSUPPGIFTER I SAMMANHANGSFRIA SPRÅK. 17 april Classrum Edition

Formal Languages, Grammars and Automata Lecture 5

Lecture Notes on Inductive Definitions

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Transcription:

Context-Free Grammars and Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 44

Outline Context-free grammars Parse trees 2 / 44

Palindrome Example Consider the language of palindromes, L = {w {0, 1} w = w R }, where a palindrome is a string that reads the same forward and backward (e.g., otto). Question: Any recursive definition of this L? Answer: Yes, there is! Exploiting the idea that if a string is a palindrome, it must begin and end with the same symbol, leading to: Basis: ɛ, 0, and 1 are palindromes. Induction: If w is a palindrome, so are 0w0 and 1w1. No string is a palindrome of 0 s and 1 s, unless it follows from this basis and induction rule. 3 / 44

Grammar: Palindrome Example G pal = ({S}, {0, 1}, S, P), S ɛ, S 0, S 1, S 0S0, S 1S1. 4 / 44

Context-Free Grammars Definition A grammar G = (V, T, S, P) is said to be context-free if all productions in P are of the form where A V and x (V T ). }{{} A }{{} x, head body No restrictions in the right-hand side of productions rules. A restriction in the left-hand side of production rules, allowing only single variable. 5 / 44

Example: Consider the grammar G = (V, T, S, P) with productions S asa bsb ɛ. A typical derivation in this grammar is S asa aasaa aabsbaa aabbaa. This make it clear that L(G) = {ww R w {a, b} }. We know this is not regular but is context-free. 6 / 44

Derivations Using Grammars Apply the productions of a CFG to infer that certain strings are in the language. There are two approaches to this inference: Recursive inference: Use productions from body to head Derivation: Use productions from head to body. Leftmost derivation Rightmost derivation See Fig. 5.2 and 5.3 for recursive inference and see Ex. 5.6 for derivation (pp. 178-179). 7 / 44

Consider the following CFG G = ({E, I }, {+,, (, ), a, b, 0, 1}, E, P) with productions 1. E I, 2. E E + E, 3. E E E, 4. E (E), 5. I a, 6. I b, 7. I Ia, 8. I Ib, 9. I I 0, 10. I I 1. 8 / 44

Context-Free Languages Definition A language is said to be context-free iff there is a context-free grammar G such that L = L(G), where L(G) = {w T S G w}. 9 / 44

The Language of G pal Theorem L(G pal ) = {w {0, 1} w = w R }. That is, w L(G pal ) iff w = w R for w {0, 1}. Proof ( if part ). Suppose w = w R. We prove by induction on w that w L(G pal ). Basis: w = 0 or w = 1. Then w is ɛ, 0, or 1. Since S ɛ 0 1 are productions, we conclude that S w in all base cases. G Induction: Suppose w 2. Since w = w R, we have w = 0x0 or w = 1x1 and x = x R. If w = 0x0, we know from the IH that S x. Then S 0S0 0x0 = w. The case for w = 1x1 is similar. 10 / 44

Proof ( only if part ). We assume that w L(G pal ) and must show that w = w R. Since w L(G pal ), we have S w. We prove by induction on the length of. Basis: The derivation S w is done in one step. Then w must be ɛ, 0, or 1, all palindromes. Induction: IH assumes S x in n steps where x = x R. Suppose the derivation takes n + 1 steps. Then we must have or By IH, w = w R. S 0S0 0x0 = w, S 1S1 1x1 = w. 11 / 44

Example: Show that L = {a n b m n m} is a CFL. Solution. Note that CFG G = ({S}, {a, b}, S, P) with productions leads to S asb ɛ, L(G) = {a n b n n 0}. In order to take care of the case for n > m, we first generate a string with an equal number of a s and b s, then add extra a s on the left, leading to S AS 1, S 1 as 1b ɛ, A aa a. We use a similar reasoning for the case n < m. Thus, the CFG for L is given by S AS 1 S 1B, S 1 as 1b ɛ, A aa a, B bb b. 12 / 44

Leftmost and Rightmost Derivations In CFGs that are not linear, a derivation may involve sentential forms with more than one variable. In such cases, we have a choice in the order in which variables are replaced. A derivation is said to be leftmost/rightmost if in each step the leftmost/rightmost variable in the sentential form is replaced. 13 / 44

Consider G = ({A, B, S}, {a, b}, S, P) with productions 1. S AB, 2. A aaa, 3. A ɛ, 4. B Bb, 5. B ɛ. The following two derivations (the same productions) produce the same sentence although the order in which the productions are applied is different. S S 1 1 AB 2 aaab 3 aab 4 aabb 5 aab, AB 4 ABb 2 aaabb 5 aaab 3 aab. Note that L(G) = {a 2n b m n 0, m 0}. 14 / 44

Parse Trees Definition An ordered tree for a CFG G, is a parse tree for G if and only if 1. The root is labeled S. 2. Every leaf has a label from T {ɛ}. 3. Every interior vertex (a vertex which is not a leaf) is labeled by a variable in V. 4. If a vertex is labeled A and its children are labeled a 1, a 2,..., a n, then P must contain A a 1 a 2 a n. 5. If a leaf is labeled ɛ, then it must be the only child of its parent. 15 / 44

More to Say about Parse Trees... Tells us the syntactic structure of w. An alternative representation to derivations and recursive inference. There can be several parse trees for the same string. (ambiguity) Ideally there should be only parse tree (the true structure) for each string, i.e., the language should be unambiguous. Unfortunately, we cannot always remove the ambiguity. 16 / 44

Example: In the grammar E I, E E + E, E E E, E (E), The following is the parse tree which shows the derivation E I + E. E E + E I 17 / 44

Example: In the grammar P ɛ 0 1 0P0 1P1. The following is the parse tree which shows the derivation P 0110. P 0 P 0 1 P 1 ǫ 18 / 44

The Yield of a Parse Tree The yield of a parse tree is the string of leaves from left to right. Important are those parse trees where: 1. The yield is a terminal string. 2. The root is labeled by the start symbol. We shall see the set of yields of these important parse trees is the language of the grammar. 19 / 44

The yield is a (a + b00). 20 / 44

Let G = (V, T, S, P) be a CFG and A V. We will show that the following are equivalent: 1. We can determine by recursive inference that w is in the language of variable A. 2. A w. 3. A w. 4. A rm w. 5. There is a parse tree of G with root A and yield w. 21 / 44

Inferences trees derivations 22 / 44

From Inferences to Trees Theorem Let G = (V, T, S, P) be a CFG. If the recursive inference procedure tells us that terminal string w is in the language of variable A, then, there is a parse tree with root A and yield w. Proof. We do an induction on the length of the inference. Basis: One step. Then we must have used a production A w. The desired parse tree is then A w 23 / 44

Induction: w is inferred in n + 1 steps. Suppose that the last step was based on a production A X 1 X 2 X k, where X i V T. We break w up as w 1 w 2 w k, where w i = X i when X i T and when X i V, then w i was previously inferred being in X i, in at most n steps. By the IH, there are parse trees i with root X i and yield w i. Then the following is a parse tree for G with root A and yield w: A X 1 X 2 X k w 1 w 2 w k 24 / 44

From Trees to Derivations We will show how to construct a leftmost derivation for a parse tree. Example: In the grammar of slide 6, there clearly is a derivation E I Ib ab. Then, for any α and β there is a derivation αeβ αi β αibβ αabβ. For example, suppose we have a derivation E E + E E + (E). We can choose α = E + ( and β =) and continue the derivation as E + (E) E + (I ) E + (Ib) E(ab). This is why CFG s are called context-free. 25 / 44

Theorem Let G = (V, T, S, P) be a CFG and suppose there is a parse tree with root labeled A and yield w. Then A w in G. Proof. We do an induction on the height of the parse tree. Basis: Height is 1. The tree must look like A w Consequently A w P and A w. 26 / 44

Induction: Height is n + 1. The tree must look like A X 1 X 2 X k w 1 w 2 w k Then w = w 1 w 2 w k, where 1. If X i T, then w i = X i. 2. If X i V, then X i w i in G by the IH. 27 / 44

Now we construct A w by an inner induction by showing that i : A w 1 w 2 w i X i+1 X i+2 X k. Basis (inner): Let i = 0. We already know that A X 1 X 2 X k. Induction (inner): Make the IH that A w 1 w 2 w i 1 X i X i+1 X k. 28 / 44

Case 1: X i T. Do nothing, since X i = w i gives us A w 1 w 2 w i X i+1 X i+2 X k. 29 / 44

Case 2: X i V. By the IH there is a derivation X i α 1 α 2 w i. By the context-free property of derivations we can proceed with A w 1 w 2 w i 1 X i X i+1 X k w 1 w 2 w i 1 α 1 X i+1 X k w 1 w 2 w i 1 α 2 X i+1 X k w 1 w 2 w i 1 w i X i+1 X k. 30 / 44

Example: Let s construct the leftmost derivation for the tree 31 / 44

Suppose we have inductively constructed the leftmost derivation E I a corresponding to the leftmost subtree, and the leftmost derivation E (E) (E + E) (I + E) (a + E) (a + I ) (a + I 0) (a + I 00) (a + b00) corresponding to the rightmost subtree. 32 / 44

For the derivation corresponding to the whole tree, we start with E E E and expand the first E with the first derivation and the second E with the second derivation: E E E I E a E a (E) a (E + E) a (I + E) a (a + E) a (a + I ) a (a + I 0) a (a + I 00) a (a + b00). 33 / 44

From Derivations to Recursive Inferences Observation: Suppose that A X 1 X 2 X k w = w 1 w 2 w k, where X i wi. w. Then The factor w i can be extracted from A w by looking at the expansion of X i only. Example: E a b + a and E }{{} E }{{} }{{} E + }{{}}{{} E X 1 X 2 X 3 We have X 4 X 5. E E E E E + E I E + E I I + E I I + I a I + I a b + I a b + a. By looking at the expansion of X 3 = E only, we can extract E I b. 34 / 44

Theorem Let G = (V, T, S, P) be a CFG. Suppose A G w. and that w is a string of terminals. Then we can infer that w is in the language of variable A. Proof. We do an induction on the length of the derivation A G w. Basis: One step. If A w, there must be a production A w in P. Then we G can infer that w is in the language of A. Induction: Suppose A G w in n + 1 steps. Write the derivation as A G X 1X 2 X k G w. As noted on the previous slide we can break w as w 1w 2 w k where X i G w i. Furthermore, X i G w i can use at most n steps. Now we have a production A X 1X 2 X k, and we know by the IH that we can infer w i to be in the language of X i. Therefore we can infer w 1w 2 w k to be in the language of A. 35 / 44

Ambiguity in Grammars and Languages: Example In the grammar E I, E E + E, E E E, E (E),, the sentential form E + E E has two derivations: E E + E E + E E, and E E E E + E E. 36 / 44

This gives us two parse trees: E E E + E E E E E + E E Left-hand side: The second and the third expressions ar multiplied and the result is added to the first expression. (e.g., 1+(2 3)=7) Right-hand side: Adds the first two expressions and multiplies the result by the third. (e.g. (1+2) 3=9) 37 / 44

Ambiguity in Grammars and Languages Definition A CFG G is said to be ambiguous if there exists some w L(G) that has at least two distinct parse trees. Definition A CFL L is said to be inherently ambiguous if all its grammars are ambiguous. Definition If L is a CFL for which there exists an unambiguous grammar, then L is said to be unambiguous. Even one grammar for L is unambiguous, then L is an unambiguous language. 38 / 44

Removing Ambiguity from Grammars Good news: Sometimes we can remove ambiguity by hand. Bad news: There is no algorithm to do it. More bad news: Some CFL s have only ambiguous CFG s. 39 / 44

Let us consider the grammar: There are two problems: E I E + E E E (E), I a b Ia Ib I 0 I 1. 1. There is no precedence between and +. 2. There is no grouping of sequences of operators, e.g., E + E + E meant to be (E + E) + E or E + (E + E). 40 / 44

Solution: We introduce more variables, each representing expressions that share a level of binding strength. 1. A factor is an expression that cannot be broken apart by an adjacent or +. Our factors are 1.1 Identifiers 1.2 A parenthesized expression 2. A term is an expression that cannot be broken by +. A term is a product of one or more factors. For instance a b can be broken by a1 or a1. It cannot be broken by +, since, e.g., a1 + a b is (by precedence rules) same as a1 + (a b), and a b + a1 is same as (a b) + a1. 3. The rest are expressions, i.e., they can be broken apart by or +. 41 / 44

We will let F stand for factors, T for terms, and E for expressions. Consider the following grammar: I a b Ia Ib I 0 I 1, F I (E), T F T F, E T E + T. Now the only parse tree for a + a a will be E E + T T T F F F I I I a a a 42 / 44

Why is the grammar shown in previous slide unambiguous? A factor is either an identifier or (E), for some expression E. The only parse tree for a sequence f 1 f 2 f n 1 f n of factors is the one that gives f 1 f 2 f n 1 f n as a term and f n as a factor, as in the parse tree on the next slide. An expression is a sequence t 1 + t 2 + + t n 1 + t n of terms t i. It can only be parse with t 1 + t 2 + + t n 1 + t n as an expression and t n as a term. 43 / 44

T T * F T F T T F F 44 / 44