Context-free Grammars and Languages

Similar documents
This lecture covers Chapter 5 of HMU: Context-free Grammars

Context-Free Grammars and Languages

Lecture 11 Context-Free Languages

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

Context-Free Grammars and Languages. Reading: Chapter 5

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Section 1 (closed-book) Total points 30

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Introduction to Theory of Computing

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

Automata Theory CS F-08 Context-Free Grammars

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Einführung in die Computerlinguistik

Solutions to Problem Set 3

Context-Free Grammars (and Languages) Lecture 7

Context Free Grammars

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

5 Context-Free Languages

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Computational Models - Lecture 4

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Computational Models - Lecture 5 1

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Lecture Notes on Inductive Definitions

Chapter 4: Context-Free Grammars

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Introduction and Motivation. Introduction and Motivation. Introduction to Computability. Introduction and Motivation. Theory. Lecture5: Context Free

Lecture Notes on Inductive Definitions

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Computational Models - Lecture 4 1

CPS 220 Theory of Computation

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Introduction to Bottom-Up Parsing

Computational Models - Lecture 4 1

Properties of Context-Free Languages

Properties of Context-free Languages. Reading: Chapter 7

MA/CSSE 474 Theory of Computation

CISC4090: Theory of Computation

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

FLAC Context-Free Grammars

Computational Models - Lecture 5 1

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

Introduction to Bottom-Up Parsing

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

Chapter 5: Context-Free Languages

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Pushdown Automata (2015/11/23)

Introduction to Bottom-Up Parsing

Supplementary Notes on Inductive Definitions

Functions on languages:

Pushdown Automata. Reading: Chapter 6

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Notes for Comp 497 (454) Week 10

Lecture 17: Language Recognition

Introduction to Bottom-Up Parsing

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

PUSHDOWN AUTOMATA (PDA)

MA/CSSE 474 Theory of Computation

What we have done so far

CISC 4090 Theory of Computation

Grammars and Context Free Languages

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) October,

Part 3 out of 5. Automata & languages. A primer on the Theory of Computation. Last week, we learned about closure and equivalence of regular languages

Concordia University Department of Computer Science & Software Engineering

Fundamentele Informatica II

AC68 FINITE AUTOMATA & FORMULA LANGUAGES JUNE 2014

Non-context-Free Languages. CS215, Lecture 5 c

Foundations of Informatics: a Bridging Course

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

Closure Properties of Context-Free Languages. Foundations of Computer Science Theory

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

UNIT-VIII COMPUTABILITY THEORY

The Pumping Lemma for Context Free Grammars

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Grammars and Context Free Languages

Chapter 1. Formal Definition and View. Lecture Formal Pushdown Automata on the 28th April 2009

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

Computational Models - Lecture 3

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Theory Of Computation UNIT-II

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Formal Languages, Grammars and Automata Lecture 5

An automaton with a finite number of states is called a Finite Automaton (FA) or Finite State Machine (FSM).

CISC 4090 Theory of Computation

Properties of Context-Free Languages. Closure Properties Decision Properties

Compiling Techniques

Theory Bridge Exam Example Questions

Context-free grammars and languages

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

SCHEME FOR INTERNAL ASSESSMENT TEST 3

Solution Scoring: SD Reg exp.: a(a

Parsing with Context-Free Grammars

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

Algorithms for NLP

Transcription:

Context-free Grammars and Languages COMP 455 002, Spring 2019 Jim Anderson (modified by Nathan Otterness) 1

Context-free Grammars Context-free grammars provide another way to specify languages. Example: A context-free grammar for mathematical expressions: E E + E E E E E E E i Show that a string is in the language using a derivation: E E + E E + E E + E E i + E E i + i E i + i i Jim Anderson (modified by Nathan Otterness) 2

Formal Definition of CFGs A context-free grammar (CFG) is denoted using a 4-tuple G = V, T, P, S, where: head V is a finite set of variables T is a finite set of terminals P is a finite set of productions of the form variable string body S is the start symbol. (S is a variable in V) Jim Anderson (modified by Nathan Otterness) 3

Formal CFG Definition: Example To define our example grammar using this tuple notation: V = E T = +,,,, i P is the set of rules defined previously: S = E E E + E E E E E E E i Jim Anderson (modified by Nathan Otterness) 4

More CFG Examples In our discussion of the Pumping Lemma for Regular Languages, we discussed the following language: L = x x = x R x 0 + 1 Can we show this language is context-free? Yes: V = R T = 0, 1 S = R P = { R 0R0, R 1R1, R 0, R 1, R ε, } Jim Anderson (modified by Nathan Otterness) 5

More CFG Examples What about the language L consisting of all strings containing an equal number of 0s and 1s? V = R T = 0, 1 S = R P = R 0R1R R 1R0R R ε Jim Anderson (modified by Nathan Otterness) 6

A Historical Note We are talking about context-free languages, but what about a language that is not context-free? These languages exist and are called contextsensitive. Context-sensitive languages allow production rules with strings, e.g. 1S0 110. Context-sensitive languages were used in the study of natural languages, but ended up with few practical applications. Jim Anderson (modified by Nathan Otterness) 7

Derivations We will be following the notational conventions from page 178 of the textbook (Section 5.1.4) We say that string α 1 directly derives α 2 if and only if: α 1 = αaγ, α 2 = αβγ, and A β is a production rule in P. This can be denoted αaγ G αβγ Jim Anderson (modified by Nathan Otterness) 8

Derivations We will be following the notational conventions from page 178 of the textbook (Section 5.1.4) We say that string α 1 directly derives α 2 if and only if: α 1 = αaγ, α 2 = aβγ, and A β is a production rule in P. This can be denoted αaγ G αβγ Lowercase Greek letters: strings (including variables and terminals) Uppercase letters near the start of the alphabet: variables A derivation using a single invocation of a production rule in the grammar G. (We can omit the G if the grammar we re talking about is obvious.) Jim Anderson (modified by Nathan Otterness) 9

Derivations (continued) α 1 αm means α 1 derives α m (in 0 or more steps). G i.e., α 1 α 2, α 2 α 3,, α m 1 α m α i β means α derives β in exactly i steps. α is a sentential form if and only if S α. Jim Anderson (modified by Nathan Otterness) 10

Leftmost and Rightmost Derivations It can be useful to restrict a derivation to only replace the leftmost variables in a string. This is called a leftmost derivation. Steps in a leftmost derivation are indicated using lm for a single step or for many steps. lm A string encountered during a leftmost derivation is called a left sentential form. i.e., α is a left-sentential form if and only if S lm α. Jim Anderson (modified by Nathan Otterness) 11

Leftmost and Rightmost Derivations Similarly to a leftmost derivation, a rightmost derivation only replaces the rightmost variable in each step. Steps in a rightmost derivation are indicated using rm or rm. A right-sentential form is a string encountered during a rightmost derivation from the start symbol. Jim Anderson (modified by Nathan Otterness) 12

Leftmost and Rightmost Derivations Example using the grammar from before: First example Leftmost Rightmost E E + E E E + E E E + E E + E E + E E + E E E + E E i + E E + E i i + E E i + E E E + i i i + i E i + i E E + i i i + i i i + i i i + i i E E + E E E E E E E i Jim Anderson (modified by Nathan Otterness) 13

The Language of a CFG For a CFG G, L G w w T and S G w L is a context-free language if and only if L = L G for some CFG G. Grammars G 1 and G 2 are equivalent if and only if L G 1 = L G 2. w consists only of terminal symbols Jim Anderson (modified by Nathan Otterness) 14

Showing Membership in a CFG Demonstrating that a string is in the language of a CFG can be accomplished two ways: Top-down: Give a derivation of the string. i.e., Begin with the start symbol and use production rules to create the string. Bottom-up: Start with the string, and try to apply production rules backwards to end up with a single start symbol. We will now consider a technique called recursive inference, which is basically a bottom-up approach. Jim Anderson (modified by Nathan Otterness) 15

Recursive Inference Define a language L X for each variable X. L X contains all strings that can be derived from X. If V X 1 X 2 X n is a production rule, then all strings x 1 x 2 x n are in L V, where: If X i is a terminal symbol, then x i = X i, If X i is a variable, then x i is in L X i. Productions with only terminal symbols in the body give us the base case. (So, we basically end up applying productions backwards.) A string x is in L G if and only if it is in L S. Strings that can be derived from the start symbol S. Jim Anderson (modified by Nathan Otterness) 16

Recursive Inference The goal of recursive inference is to look at successively larger substrings of some string x to determine if x is in L S. Jim Anderson (modified by Nathan Otterness) 17

Recursive Inference: Example (This example is from Figure 5.3 in the book.) We want to use recursive inference to show that a a + b00 is in L G. i. a L I, by Production rule 5 ii. b L I, by Production rule 6 iii. b0 L I, by Production rule 9 and ii iv. b00 L I, by Production rule 9 and iii v. a L E, by Production rule 1 and i vi. b00 L E, by Production rule 1 and iv vii.a + b00 L E, by Production rule 2 and v and vi viii. a + b00 L E, by Production rule 4 and vii ix. a a + b00 L E, by Production rule 3 and v and viii. Jim Anderson (modified by Nathan Otterness) 18 Grammar G for simple expressions: V = E, I T = { a, b, 0, a, +,,, } E is the start symbol Production rules: 1. E I 2. E E + E 3. E E E 4. E E 5. I a 6. I b 7. I Ia 8. I Ib 9. I I0 10. I I1

Parse Trees Parse trees show how symbols of a string are grouped into substrings, and the variables and productions used. In general, the root is S, internal nodes are variables, and leaves are variables or terminals. If X 1 A X n, then A X 1 X n Jim Anderson (modified by Nathan Otterness) 19

Parse Tree Example Example grammar: S aas a A SbA SS ba Note: new notation S a A S S b A a An example parse tree Jim Anderson (modified by Nathan Otterness) 20

A Parse Tree s Yield Example grammar, again: S aas a, A SbA SS ba. The yield of a parse tree is the string obtained from reading its leaves left-to-right. The yield of this tree is aabas. Note that the yield of a parse tree is a sentential form. S a A S S b A a Jim Anderson (modified by Nathan Otterness) 21

A Parse Tree s Yield Example grammar, again: S aas a, A SbA SS ba. The yield of a parse tree is the string obtained from reading its leaves left-to-right. The yield of this tree is aabas. S a A S Note that the yield of a parse tree is a sentential form. S b A Jim Anderson (modified by Nathan Otterness) 22 a aabas

Inference, Derivation, and Parse Trees We will show that all of these are equivalent ways for showing that a string is in a CFL. Specifically, we show: Leftmost Derivation Theorem 5.14 Derivation Parse Tree Theorem 5.18 Rightmost Derivation Recursive Inference Theorem 5.12 Jim Anderson (modified by Nathan Otterness) 23

From Recursive Inference to Parse Tree Theorem 5.12: Let G = V, T, P, S be a CFG. If recursive inference tells us that string w T is in the language of variable A V, then a parse tree exists with root A and yield w. We will prove this by induction on the number of steps in the recursive inference. Base case: One step. This means that there is a production rule A w. The tree for this is: A x 1 x 2 x n Where w = x 1 x 2 x n. Jim Anderson (modified by Nathan Otterness) 24

From Recursive Inference to Parse Tree Inductive step: Assume that the last inference step looked at the production A X 1 X 2 X n, and previous inference steps verified that x i L X i, for each x i in w = x 1 x 2 x n. The tree for this is: A The tree from A to X 1 X 2 X n. X 1 X 2 X n x 1 x 2 x n The inductive hypothesis lets us assume we already have trees yielding the terminal strings. w Jim Anderson (modified by Nathan Otterness) 25

From Parse Tree to Derivation Theorem 5.14: Let G = V, T, P, S be a CFG, and suppose there is a parse tree with a root of variable A with yield w T. Then there is a leftmost derivation A lm w in G. We will prove this by induction on tree height. Base case: The tree s height is one. The tree looks like this: A So, there must be a production A X 1 X 2 X n X 1 X 2 X n in G, where w = X 1 X 2 X n. Jim Anderson (modified by Nathan Otterness) 26

From Parse Tree to Derivation Inductive step: The tree s height exceeds 1, so the tree looks like this: Note that A may produce some terminal strings (like x 2 and x n ) and other strings containing variables (like X 1 and X n 1 ). x 1 A X 1 X 2 X n 1 X n = x 2 = x n w x n 1 Jim Anderson (modified by Nathan Otterness) 27

From Parse Tree to Derivation Inductive step: The tree s height exceeds 1, so the tree looks like this: Note that A may produce some terminal strings (like x 2 and x n ) and other strings containing variables (like X 1 and X n 1 ). x 1 A X 1 X 2 X n 1 X n = x 2 = x n w x n 1 Jim Anderson (modified by Nathan Otterness) 28

From Parse Tree to Derivation Inductive step (continued): By the inductive hypothesis, X 1 lm x1, X n 1 lm xn 1, etc. Trivially, X 2 lm x2, X n lm xn, etc., because they are terminals only. Since A X 1 X 2 X n 1 X n, and w = x 1 x 2 x n 1 x n, we know that A lm w. x 1 A X 1 X 2 X n 1 X n = x 2 = x n w x n 1 Jim Anderson (modified by Nathan Otterness) 29

Notes on Derivations From Parse Trees The leftmost derivation corresponding to a parse tree will be unique. We can prove the same conversion is possible for rightmost derivations. Such a rightmost derivation will also be unique. Example: S a A S S b A a a b a Leftmost derivation: S aas asbas aabas aabbas aabbaa. Rightmost derivation: S aas aaa asbaa asbbaa aabbaa. Jim Anderson (modified by Nathan Otterness) 30

From Derivation to Recursive Inference Theorem 5.18: Let G = V, T, P, S be a CFG, w T, and A V. If a derivation A w exists in grammar G, then w L A can be inferred via recursive inference. We will prove this by induction on the length of the derivation. Base case: The derivation is one step. This means that A w is a production, so clearly w L A can be inferred. Jim Anderson (modified by Nathan Otterness) 31

From Derivation to Recursive Inference Inductive step: There is more than one step in the derivation. We can write the derivation as A X 1 X 2 X n x 1 x 2 x n = w By the inductive hypothesis, we can infer that x i L X i for every i. Next, since A X 1 X 2 X n is clearly a production, we can infer that w L A. Jim Anderson (modified by Nathan Otterness) 32

Ambiguity A grammar is ambiguous if some word in it has multiple parse trees. Recall: This is equivalent to saying that some word has more than one leftmost or rightmost derivation. Ambiguity is important to know about, because parsers (i.e. for a programming language compiler) need to determine a program s structure from source code. This is complicated if multiple parse trees are possible. Jim Anderson (modified by Nathan Otterness) 33

Ambiguous Grammar: Example E E + E E E E E id E E + E E + E id + E id + id E E E E + E id + E id + id E E + E E E id id E + E id id Jim Anderson (modified by Nathan Otterness) 34

Resolving Ambiguity E E + E E E E E id Ambiguity in this grammar is caused by the lack of operator precedence. This can be resolved by introducing more variables. For example, E E + E E id, the part of our grammar causing the ambiguity, can be made unambiguous by adding a variable F: E F + F, F E id. Section 5.4 of the book discusses this in more depth. Jim Anderson (modified by Nathan Otterness) 35

Inherent Ambiguity A context-free language for which all possible CFGs are ambiguous is called inherently ambiguous. One example (from the book) is: L = a n b n c m d m m, n 1 a n b m c m d n m, n 1. Proving that languages are inherently ambiguous can be quite difficult. These languages are encountered quite rarely, so this has little practical impact. Jim Anderson (modified by Nathan Otterness) 36