Context Free Grammars: Introduction

Similar documents
Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

Context Free Languages: Decidability of a CFL

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

CS 373: Theory of Computation. Fall 2010

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Context-Free Grammar

CYK Algorithm for Parsing General Context-Free Grammars

Chapter 4: Context-Free Grammars

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

MA/CSSE 474 Theory of Computation

Context-Free Grammars: Normal Forms

Chomsky and Greibach Normal Forms

Simplification and Normalization of Context-Free Grammars

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

MTH401A Theory of Computation. Lecture 17

This lecture covers Chapter 7 of HMU: Properties of CFLs

Pushdown Automata: Introduction (2)

Introduction to Computational Linguistics

Chap. 7 Properties of Context-free Languages

Context Free Grammars

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Properties of context-free Languages

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Introduction to Theory of Computing

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Theory of Computation - Module 3

Properties of Context-free Languages. Reading: Chapter 7

Context Free Languages and Grammars

Chomsky Normal Form for Context-Free Gramars

Notes for Comp 497 (Comp 454) Week 10 4/5/05

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

Even More on Dynamic Programming

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Parsing. Unger s Parser. Laura Kallmeyer. Winter 2016/17. Heinrich-Heine-Universität Düsseldorf 1 / 21

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

CS375: Logic and Theory of Computing

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Follow sets. LL(1) Parsing Table

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

CSCI Compiler Construction

Theory of Computation Turing Machine and Pushdown Automata

CS20a: summary (Oct 24, 2002)

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

CS481F01 Prelim 2 Solutions

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

h>p://lara.epfl.ch Compiler Construc/on 2011 CYK Algorithm and Chomsky Normal Form

NPDA, CFG equivalence

CSE 355 Test 2, Fall 2016

The Pumping Lemma for Context Free Grammars

Lecture 11 Context-Free Languages

Foundations of Informatics: a Bridging Course

Grammars and Context Free Languages

Notes for Comp 497 (454) Week 10

Grammars and Context Free Languages

CISC4090: Theory of Computation

Computability Theory

Computing if a token can follow

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CPS 220 Theory of Computation

Computational Models - Lecture 4 1

Non-context-Free Languages. CS215, Lecture 5 c

Functions on languages:

CFGs and PDAs are Equivalent. We provide algorithms to convert a CFG to a PDA and vice versa.

6.8 The Post Correspondence Problem

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Context-Free and Noncontext-Free Languages

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

The Turing Machine. Computability. The Church-Turing Thesis (1936) Theory Hall of Fame. Theory Hall of Fame. Undecidability

Computational Models - Lecture 5 1

Decidable and undecidable languages

Computability and Complexity

Section 1 (closed-book) Total points 30

CS481F01 Solutions 8

Compiler Principles, PS4

Recitation 4: Converting Grammars to Chomsky Normal Form, Simulation of Context Free Languages with Push-Down Automata, Semirings

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CDM Parsing and Decidability

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

CPSC 421: Tutorial #1

Computational Models - Lecture 3

Computational Models - Lecture 4

Pushdown Automata (2015/11/23)

Context-Free Grammars and Languages. Reading: Chapter 5

Context- Free Parsing with CKY. October 16, 2014

Theory Of Computation UNIT-II

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

Conflict Removal. Less Than, Equals ( <= ) Conflict

Decidability (What, stuff is unsolvable?)

Syntactic Analysis. Top-Down Parsing

Transcription:

Context Free Grammars: Introduction Context free grammar (CFG) G = (V, Σ, R, S), where V is a set of grammar symbols Σ V is a set of terminal symbols R is a set of rules, where R (V Σ) V S is a distinguished non-terminal symbol - the start symbol Unlike RGs, CFGs can have any number of symbols on the right-hand side of a rule RGs CF Gs CFGs are more powerful than RGs because of the following 2 properties: 1. Recursion Rule is recursive if it is of the form X w 1 Y w 2, where Y w 3 Xw 4 and w 1, w 2, w 3, w 4 V Grammar is recursive if it has at least one recursive rule 2. Self-embedment Rule is self-embedding if it is of the form X w 1 Y w 2, where Y w 3 Xw 4 and w 1, w 2, w 3, w 4 Σ + Grammar is self-embedding if it has at least one self-embedding rule If grammar G is not self-embedding, L(G) is regular If every grammar that defines language L is self-embedding, L is not regular 1

Context Free Grammars: Designing CFGs The following heuristics can be helpful in designing CFGs to generate language L 1. If L is such that every string can be divided into 2 distinct regions that are related to each other, the regions must be generated in tandem 2. To generate strings with multiple regions that must occur in a fixed order - but where the regions are not related to each other - use rules of the form A BC... 3. To generate strings with 2 regions that are related to each other and must occur in a fixed order, start at ends of string and work inward. If there is an independent region in the middle, generate it after the surrounding regions have been generated 2

CFGs may have Context Free Grammars: Simplifying CFGs 1. Non-terminals that never terminate in terminal symbols 2. Symbols that are unreachable Want to clean up such grammars 1. Eliminating symbols that do not terminate Strategy: Work from terminal symbols towards nonterminal symbols Any non-terminals that are not encountered are non-terminating Algorithm: CFG removeunproductive (CFG G) { marked = G.Sigma; //productive symbols do { //mark terminals as productive oldmarked = marked; for (each X -> alpha in G.R) { marked = marked + X; for (each s in alpha) if (s not in marked) marked = marked - X; while (marked!= oldmarked); G.R = NULL; for (each X -> alpha in G.R) { G.R = G + X -> alpha; for (each symbol s in alpha) if (s not in marked) G.R = G - X -> alpha; if (X -> alpha in G.R) mark X; G.V = NULL; G.V = marked; G.Sigma = G.Sigma; G.S = G.S; return G ; 3

Context Free Grammars: Simplifying CFGs (2) 2. Eliminating unreachable symbols Strategy: Work from start symbol towards terminal symbols Algorithm: CFG removeunreachable (CFG G) { marked = G.S; //reachable symbols do { oldmarked = marked; for (each X -> alpha in G.R) if (X in marked) for (each s in alpha) marked = marked + s; while (marked!= oldmarked); G.V = G.V; for (each X in (G.V - G.Sigma)) if (X not in marked) G.V = G.V - X; G.Sigma = G.Sigma; for (each s in G.Sigma) if (s not in marked) G.Sigma = G.Sigma - s; G.R = G.R; for (each (X -> alpha) in G.R) if (X not in marked) G.R = G.R - (X -> alpha); G.S = G.S; return G ; 4

Context Free Grammars: Proving Grammar Is Correct To prove grammar G is correct, must show 1. G generates only stings in L(G) 2. G generates all stings in L(G) Most straightforward way to do first step is to perform the following. string generate (grammar G) { w = G.S; do { apply some rule r = X -> alpha A beta from G.R; while (w contains X in (G.V - G.Sigma)); return w; Proof constructed as follows: 1. Construct loop invariant I for above algorithm 2. Show that (a) I is true when loop starts (b) I holds on each iteration of loop (c) At termination, w L(G) 5

Context Free Grammars: Ambiguity Issues arise with CFGs that are not inherent in RGs 1. CFG derivation order not obvious from parse tree Not an issue for RGs there can only be a single NT on RHS Example: Consider grammar < S > < NP >< V P > < NP > < A >< N >< P P > < A >< N > < V P > < V >< NP >< P P > < V >< NP > < P P > < P >< NP > < N > boy < N > man < N > telescope < A > the < A > a < V > saw < P > with 6

Context Free Grammars: Ambiguity (2) Derivations of the man saw a boy S < NP >< V P > < A >< N >< V P > the < N >< V P > the < N >< V >< NP > the < N > saw < NP > the < N > saw < A >< N > the < N > saw < A > boy the < N > saw a boy the man saw a boy Vs S < NP >< V P > < A >< N >< V P > the < N >< V P > the man < V P > the man < V P > the man < V >< NP > the man saw < NP > the man saw < A >< N > the man saw a < N > the man saw a boy Left-most derivation: One in which left-most NT is always replaced Right-most derivation: One in which right-most NT is always replaced 7

2. CFG may be ambiguous Context Free Grammars: Ambiguity (3) Grammar is ambiguous if there is more than one parse tree for the same string Example: Grammar G is unambiguous iff, for all strings w L(G), there is only one rule that can be applied at any point in a left-most or right-most derivation RGs can be ambiguous, but problem is more serious for CGFs Structure modifies meaning of string Cannot guarantee an equivalent unambiguous grammar for one that is ambiguous Inherently ambiguous: Grammar for which there is no equivalent unambiguous grammar To reduce ambiguity 1. Eliminate ɛ rules ɛ rule is of form X ɛ Allows NTs that provide no structure NT X is nullable iff (a) There is a rule of form X ɛ R, OR (b) There is a rule of form X P QR... R, and P, Q, R,... are nullable A rule is modifiable iff it is of the form X αqβ and Q is nullable, for any α, β V 8

Context Free Grammars: Ambiguity (4) Algorithm: CFG removeeps (CFG G) { nullable = NULL; for (each rule (X -> alpha) in G.R) if (alpha == epsilon)) nullable = nullable + X; do { oldnullable = nullable; for (each rule (X -> alpha) in G.R) { if (alpha contains only NTs) { flag = TRUE; for (each Y in alpha) if (Y not in nullable) flag = FALSE; if (flag) nullable = nullable + X; while (oldnullable!= nullable); R = G.R; do { oldr = R ; for (each (X -> alpha Q beta) in R ) if (Q in nullable) if ((X -> alpha beta) not in R ) AND (alpha beta!= epsilon) AND (X!= alpha beta)) R = R + (X -> alpha beta); while (oldr!= R ); for(each X -> alpha in R ) if (alpha == epsilon) R = R - (X -> alpha); G = (G.V, G.Sigma, R, G.S); return G ; 9

Context Free Grammars: Ambiguity (5) Need to check special case in above algorithm: Language may include ɛ Add special rule for this case at end of algorithm: if (nullable(g.s)) { G.V = G.V + G.S ; G.R = G.R - (S -> epsilon); G.R = G.R + (S -> epsilon); G.R = G.R + (S -> S); 2. Eliminating symmetric recursive rules Symmetric recursive rule: (a) Contains 2 or more copies of LHS on RHS (b) RHS is symmetric (c) E.g., X X op X Grammars that have such rules will have additional rules for expanding the LHS To eliminate ambiguity, will force branching to left or right by adding new NT symbols Replication of original LHS symbol achieved via that symbol New symbol provides lower-level structure 3. Ambiguous attachment Problem associated with optional structures Given a recursive structure with such an optional structure, the problem is identifying to which level of recursion the structure should be attached Solution is to add additional rules to remove ambiguity Proving non-ambiguity The problem of determining whether a class of grammars is ambiguous or not is undecidable, in general Can determine this for a specific grammar To prove a grammar is unambiguous, must show that every string in the language has a single left-most (or right-most) derivation General technique simply demonstrates - for each NT - that a given set of terminals can only be generated in one way 10

1. Chomsky Normal Form Context Free Grammars: Normal Forms All rules are in one of forms (a) X a, where a Σ (b) X BC, where B, C V Σ Every CFG in CNF has branching factor of 2 Therefore, every derivation of string w has (a) w 1 applications of rules of form X BC, and (b) w applications of rules of form X a 2. Greibach Normal Form All rules are in form (a) X αβ, where α Σ, β (V Σ) One terminal generated per rule application Therefore, every derivation of string w has w rule applications 11

Context Free Grammars: Converting to Normal Forms Following strategy used: 1. Apply a transformation to grammar to eliminate an undesirable property Language must not change as a result of the transform 2. Repeat, making sure undesirable properties are not re-introduced by subsequent transformations 3. Continue until desired form is achieved Theorem 11.3: Rule Substitution Statement: Let G = (V, Σ, R, S) be a CFG with rules r i of form X αy β, where α, β V, Y V Σ Let Y γ 1 γ 2... γ n be a set of rules in R with Y as LHS Let R = (R r i ) {X αγ 1 β, X αγ 2 β,..., X αγ n β G = (V, Σ, R, S) Then, L(G ) = L(G) Proof: (See p 235) 12

Context Free Grammars: Converting to Normal Forms - CNF Theorem 11.1 CNF Statement: Given CFG G, there is a CNF grammar G 0 such that L(G 0 ) = L(G) ɛ Proof: By construction. Algorithm uses 4 basic steps CFG converttocnf (CFG G) { Eliminate epsilon rules; Eliminate unit productions; Eliminate rules where RHS > 1 and have terminal symbol on RHS; Eliminate rules where RHS > 2; return (G.V, G.Sigma, modified rules, G.S); 1. ɛ productions: See above 2. Unit productions Rules of form X Y, where Y V Σ Replace with rules of form X α, where α Σ or α > 1 Algorithm: CFG removeunits (CFG G) { R = G.R; visited = NULL; while (no unit productions in R ) { r = (X -> Y) in R ; R = R - r; visited = visited + r; for (each r = (Y -> beta) in R ) if ((X -> beta) not in visited) { R = R + (X -> beta); visited = visited + (X -> beta); 13

Context Free Grammars: Converting to Normal Forms - CNF (2) 3. Replacing where RHS > 1 and contain a terminal Algorithm: CFG removemixed (CFG G) { R = G.R; for (each a in G.Sigma) { create terminal symbol Ta; G.V = G.V + Ta; R = R + (Ta -> a); for (each r in R ) { if (length(r.rhs) > 1) for (each s in r.rhs) if (s in G.Sigma) replace s in r with Ts; return (G.V, G.Sigma, R, G.S); 14

Context Free Grammars: Converting to Normal Forms - CNF (3) 4. Replace rules where RHS > 2 with rules where RHS = 2 Algorithm: CFG removelong (CFG G) { R = NULL; for (each r in G.R) if (length(r.rhs) > 2) { n = length(r.rhs); r = (X -> Y 1 Y 2...Y n ); R = R + (X -> Y 1 R 1 ); G.V = G.V + R 1 ; for (i = 1; i < n - 2; i++) { G.V = G.V + R i+1 ; R = R + (R i -> Y i+1 R i+1 ); G.V = G.V + R n 2 ; R = R + (R n 2 -> Y n 1 Y n ); else R = R + r; return (G.V, G.Sigma, R, G.S); 15

Context Free Grammars: Converting to Normal Form - CNF Analysis Analysis 1. Removing ɛ rules Worst case: Consider rule of form X A 1, A 2,..., A k where each A i is nullable Result has 2 k 1 rules Since k n, grammar size O(2 n ) To make more tractable, use short rules Apply removelong first, guaranteeing rules of length no greater than 2 Insures ɛ removal is linear 2. Removing unit productions Halts in at most V Σ 2 steps Each step takes constant time, produces at most one new rule O(n 2 ) 3. Eliminating terminal symbols Grows linearly 4. Eliminating rules where RHS = 2 Grows linearly 5. Overall complexity: Time: O(n 2 ) Grammar: O(n 2 ) 16

Context Free Grammars: Converting to Normal Form - GNF Theorem 11.2 GNF Statement: Given CFG G, there is a GNF grammar G 0 such that L(G 0 ) = L(G) ɛ Proof: By construction (See Appendix D.1) Algorithm uses 3 basic steps CFG converttognf (CFG G) { G = converttocnf(g); Modify rules so that they are of the form S ɛ A cα, where c Σ, α (V Σ) A Bα, where B V Σ, α (V Σ) Modify rules so that they are in GNF; return (G.V, G.Sigma, modified rules, G.S); 1. Convert G to CNF (See CNF algorithm) 2. Modify the rules so that they are of the form S ɛ A cα, where c Σ, α (V Σ) A Bα, where B V Σ,, α (V Σ) (a) Number the NTs S is marked 1 Order of the remaining NTs is immaterial The algorithm considers rules in order of the values of the terminals on their LHSs (b) Left recusion is eliminated Direct left recursion results from rules of the form X Xα Strings are generated by adding symbols to the right of a derivation Strings grow from right to left Nonterminal symbols do not appear as the leftmost characters until the left recursive NT is replaced by one that is not 17

Context Free Grammars: Converting to Normal Form - GNF (2) To eliminate direct left recursion Consider rules A Au 1 Au 2... Au j and A v 1 v 2... v k, where the first symbol of u i A Replace these rules with the following Z u 1 u 2... u j u 1 Z u 2 Z... u j Z, where Z is a new symbol A v 1 v 2...v k v 1 Z v 2 Z... v k Z (c) The goal of this step is to insure that value(x) < value(y ) for rules of the form X Y γ, where Y V Σ NTs Y that violate the above restriction are replaced with RHSs of rules that have Y as their LHS Substitutions may introduce additional left recursion, which must be removed Variables introduced by the elimination of left recursion are not included in this processing At the end of this step, Rules with the highest-numbered NT on their LHS will have only terminal symbols on their RHS Rules with the NT numbered n on their LHS will have only a terminal symbol, or NT numbered > n, as the first symbol on their RHS 3. Modify rules to GNF Process rules - based on the number associated with their LHSs - from highest to lowest For rules of the form X Y γ, where Y V Σ, replace Y with ths RHSs of rules with Y on the LHS Note that this does not address rules introduced by eliminating left recursion 4. Apply the above step to the rules that have new symbols on their LHSs Note that while L(G 0 ) = L(G) ɛ, parse trees for a given string may differ based on creation of G 0 18