Simplification and Normalization of Context-Free Grammars

Similar documents
CS 373: Theory of Computation. Fall 2010

Lecture 12 Simplification of Context-Free Grammars and Normal Forms

MTH401A Theory of Computation. Lecture 17

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

The Idea of a Pushdown Automaton

Context-Free Grammars: Normal Forms

Context Free Grammars: Introduction

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

Einführung in die Computerlinguistik

MA/CSSE 474 Theory of Computation

Context-Free Grammar

Properties of context-free Languages

Context Free Grammars

CFGs and PDAs are Equivalent. We provide algorithms to convert a CFG to a PDA and vice versa.

CISC4090: Theory of Computation

Introduction to Theory of Computing

Chomsky and Greibach Normal Forms

Theory of Computation - Module 3

Computational Models - Lecture 5 1

Chapter 4: Context-Free Grammars

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

NPDA, CFG equivalence

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Properties of Context-free Languages. Reading: Chapter 7

CYK Algorithm for Parsing General Context-Free Grammars

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Lecture 11 Context-Free Languages

CSCI Compiler Construction

Computational Models: Class 3

Computational Models - Lecture 4

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

Chap. 7 Properties of Context-free Languages

Chomsky Normal Form and TURING MACHINES. TUESDAY Feb 4

Lecture 17: Language Recognition

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

Regular Language Equivalence and DFA Minimization. Equivalence of Two Regular Languages DFA Minimization

h>p://lara.epfl.ch Compiler Construc/on 2011 CYK Algorithm and Chomsky Normal Form

Introduction to Computational Linguistics

Context-Free Grammars (and Languages) Lecture 7

This lecture covers Chapter 7 of HMU: Properties of CFLs

Theory of Computation

Solution. S ABc Ab c Bc Ac b A ABa Ba Aa a B Bbc bc.

CPS 220 Theory of Computation

where A, B, C N, a Σ, S ϵ is in P iff ϵ L(G), and S does not occur on the right-hand side of any production. 3.6 The Greibach Normal Form

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Syntax Analysis Part I

Syntax Analysis Part I

Introduction to Formal Languages, Automata and Computability p.1/42

6.8 The Post Correspondence Problem

What languages are Turing-decidable? What languages are not Turing-decidable? Is there a language that isn t even Turingrecognizable?

Fundamentele Informatica II

Context-Free Grammars and Languages. Reading: Chapter 5

Compiling Techniques

Computational Models - Lecture 3

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

Context-free Grammars and Languages

Foundations of Informatics: a Bridging Course

CS375: Logic and Theory of Computing

Non-context-Free Languages. CS215, Lecture 5 c

Pushdown Automata. Reading: Chapter 6

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Notes for Comp 497 (454) Week 10

Syntax Analysis Part III

CS481F01 Solutions 8

Computing if a token can follow

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Theory of Computation (IX) Yijia Chen Fudan University

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Pushdown Automata: Introduction (2)

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Pushdown Automata (2015/11/23)

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

CISC 4090 Theory of Computation

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

Computational Models: Class 5

CONTEXT FREE GRAMMAR AND

Follow sets. LL(1) Parsing Table

Computability Theory

The Pumping Lemma for Context Free Grammars

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

CS5371 Theory of Computation. Lecture 12: Computability III (Decidable Languages relating to DFA, NFA, and CFG)

Computational Models - Lecture 4 1

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

Formal Language and Automata Theory (CS21004)

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Decidability (What, stuff is unsolvable?)

Transcription:

implification and Normalization, of Context-Free Grammars 20100927 Slide 1 of 23 Simplification and Normalization of Context-Free Grammars 5DV037 Fundamentals of Computer Science Umeå University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 2 of 23 Motivation The material in this presentation is motivated by two needs in the processing of CFGs. Some of the productions of a CFG may be useless in terms of generating terminal strings; such parts may be safely eliminated. By converting a CFG to an equivalent one which is of a certain form, or has certain properties, it may become easier to establish certain results or carry out certain tasks (such as parsing). This material is necessarily of a technical nature, sometimes without immediate motivation.

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 3 of 23 Useless Symbols Example: G = (V,Σ,E,P), V = {E,F,T,R}, Σ = {a,+,,,(,)} E E +E T F F F E (T) a P = T E T E +R R T +E T E A (E) a Neither T nor R can derive a terminal string. A can never be used in a derivation starting from E. Such symbols are called useless because they can never be used in a derivation, from the start symbol, of a string of terminal symbols. It is useful to have a means of eliminating useless symbols from a grammar in a systematic fashion.

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 4 of 23 Formal Definition of Useful and Useless Symbols Let A V. A is observable (in G) if A α (equivalently A + α) for some α Σ. G is observable if each A V has that property. A is reachable (in G) if S α 1 Aα 2 for some α 1,α 2 (V Σ). G is reachable if each A V has that property. A V is useful if it is both reachable and observable. Otherwise, it is useless. Define O G = {A V A is observable in G}. Define R G = {A V A is reachable in G}.

implification and Normalization, of Context-Free Grammars 20100927 Slide 5 of 23 Construction of the Observable Set of a CFG Algorithm: Construct O G : O 1 G = {A V A α for some α Σ }. O k+1 G = {A V A α for some α (O k G Σ) }. O G = O k G for the first k N with O k G = O k+1 G. Example: (Start symbol is E): E E +E T F F F E (T) a T E T E +R R T +E T E A (E) a O 1 G = {F,A}, O 2 G = O 3 G = {F,A,E},

implification and Normalization, of Context-Free Grammars 20100927 Slide 6 of 23 Construction of an Equivalent Observable CFG Algorithm: Construct an CFG G = (V,Σ,S,P ) with L(G ) = L(G) which is observable provided that L(G). V = O G {S} P = {A α α (O G Σ) }. P Observation: L(G) = iff S O G. Example: (Start symbol is E): E E +E T F F F E (T) a T E T E +R R T +E T E A (E) a E E +E F F F E a A (E) a O 1 G = {F,A}, O 2 G = O 3 G = {F,A,E},

An Equivalent Observable CFG when L(G) = Recall: L(G) = iff S O G. Algorithm: Construct an observable G with L(G ) = L(G). V = O G {S} If S O G then P = {A α α (O G Σ) }. G If S O G then P =. Thus, if L(G) =, the start symbol S is useless (but must be retained as part of the grammar nevertheless). Example: Remove E F from the previous example. (Start symbol still E): E E +E T F F F E (T) a T E T E +R R T +E T E A (E) a O 1 G = O 2 G = {A,F} L(G) = G = ({S},Σ,S, ) implification and Normalization, of Context-Free Grammars 20100927 Slide 7 of 23

implification and Normalization, of Context-Free Grammars 20100927 Slide 8 of 23 Construction of the Reachable Set of a CFG Algorithm: Construct R G : R 0 G = {S}. R k+1 G = R k G {A V B α 1 Aα 2 G for some B R k G and α 1,α 2 (V Σ) }. R G = R k G for the first k N with R k G = R k+1 G. Example: (Start symbol is E): E E +E F F F E a A (E) a R 0 G = {E}, R 1 G = R 2 G = {E,F},

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 9 of 23 Construction of an Equivalent Reachable CFG Algorithm: Construct a reachable CFG G = (V,Σ,S,P ) with L(G ) = L(G). V = R G P = {A α A V }. G Example: (Start symbol is E): E E +E F F F E a A (E) a E E +E F F F E a R 0 G = {E}, R 1 G = R 2 G = {E,F},

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 10 of 23 Reduced Grammars Need to exercise a little care in defining a grammar with no useless symbols. If L(G) =, then the start symbol must be useless, yet every grammar must have a start symbol. Call G reduced if it has one of the following two properties: P = and V = {S}; or G is both observable and reachable. Algorithm: Construct a grammar G = (V,Σ,S,P ) which is reduced and which satisfies L(G ) = L(G). Apply the previous two algorithms, which already take these cases into account. Must remove unobservable variables first, then unreachable.

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 11 of 23 Order Matters in Reduction Example: (Start symbol is E): E E +E T F F F E (T) a T E T E +R R T +E T E RA A (E) a All variables are reachable: R G = {E,F,T,R,A}. Only {E,F,A} are observable. If unreachable variables are removed first, and then the unobservable ones, the resulting grammar will not be reachable: E E +E F Thus, the unobservable symbols must be removed first. F F E a A (E) a

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 12 of 23 Null Rules A null rule is a production of the form Why null rules are anomalous: A λ They are the only productions A α in which Length(A) > Length(α). Thus, if G has no null rules, Length(A) Length(α) for every production A α. It would be nice to be able to eliminate null rules entirely. However, this is clearly not possible if λ L(G). There is, however, a solution which is almost as good: If λ L(G), then S λ No other null rules are allowed. The means to transform G to achieve this will now be addressed.

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 13 of 23 Nonerasing Grammars A variable A V is recursive if A + α 1 Aα 2 for some α 1,α 2 (V Σ). Here + means derives in one or more steps. The trivial derivation A A in zero steps, (always present), is excluded. The variable A V is nullable if A λ. Define N G to be the set of all nullable variables of G. Call G nonerasing if S is not recursive, and N G {S}. This means: S λ is the only possible null rule; and it is the only way to derive λ.

implification and Normalization, of Context-Free Grammars 20100927 Slide 14 of 23 Construction of N G Algorithm: Construct N G inductively: N 0 G = N k+1 G = N k G {A V A α for some α N k G }. Stop when N k G = N k+1 G with N G = N k G. Example: V = {S,O,Q,E}, Σ = {a,b,c}; S aob O QEQ aob OOO OEcEO P = Q c EE E a λ N 0 G = ; N 1 G = {E}; N 2 G = {E,Q}; N 3 G = {E,Q,O} = N 4 G = N G.

implification and Normalization, of Context-Free Grammars 20100927 Slide 15 of 23 Construction of an Equivalent Nonerasing CFG Algorithm: Construct an equivalent nonerasing CFG G = (V,Σ,S,P ). V = V S. The productions in P are of the following three forms: S S S λ if S N G A α 1...α k iff α 1...α k λ, and There are (not necessarily distinct) A 1,...A n N G with A α 1 A 1 α 2 A 2...A n α n P. The last form must be done for all combinations of variables which produce λ. Remark: This algorithm has exponential complexity. It is possible to do much better (linear).

Example of Nonerasing Construction Example: V = {S,O,Q,E}, Σ = {a,b,c}; S aob O QEQ aob OOO OEcEO P = Q c EE E a λ N G = {E,Q,O}. New productions: S S S aob ab O QEQ QE QQ EQ Q E aob ab OOO OO O OEcEO OEcE OEcO OcEO EcEO OEc OcE OcO ceo EcE EcO Oc Ec ce co c Q c E EE E a implification and Normalization, of Context-Free Grammars 20100927 Slide 16 of 23

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 17 of 23 Chain Rules A unit production or chain rule is a production of the form for some A,B V. A B Unit productions rules are not necessarily bad. Examples from programming language specification: stmt if stmt number digit It is recursive chain rules which are can lead to problems. In any case, from a theoretical point of view, it is often useful to eliminate such rules from a grammar.

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 18 of 23 For A V, define C 1 G,A = {B V A B}. The Chain Set of a Grammar C k+1 G,A = C k G,A {B V C B for some C C k G,A }. Observation: The addition of new elements to C G,A stops as soon as C k G,A = C k+1 G,A, so this set may be computed in a finite number of steps. For A V, define C G,A = C k G,A, where k is the first index for which C k G,A = C k+1 G,A. The variable A V is called chain recursive if A C G,A. Thus, A is chain recursive if it can be derived from itself using unit productions. A chain loop

implification and Normalization, of Context-Free Grammars 20100927 Slide 19 of 23 Example of a Chain Set Nonterminals: { Expr, Ident } Terminals: {A,B,...,Z,(,),+,*} Start symbol: Expr Productions: Ident A B... Y Z Expr Expr + Term Term Term Term Factor Factor Factor ( Expr ) Ident C 1 G, Ident = C 2 G, Ident =, C 1 G, Expr = { Term }, C 2 G, Expr = { Term, Factor }, C 3 G, Expr = C 4 G, Expr = { Term, Factor, Ident }, C 1 G, Term = { Factor }, C 2 G, Term = C 3 G, Term = { Factor, Ident }, C 1 G, Factor = C 2 G, Factor = { Ident },

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 20 of 23 Eliminating Chain Rules Algorithm: Construct an equivalent CFG G = (V,Σ,S,P ) without unit productions. P = {A α α V and there is a B α with B {A} C G,A }. G Example: Ident A B... Y Z Expr Expr + Term Term Term Term Factor Factor Factor ( Expr ) Ident Repaired: Ident A B... Y Z Expr Expr + Term Term Factor ( Expr ) A... Z Term Term Factor ( Expr ) A... Z Factor ( Expr ) A... Z

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 21 of 23 Nonerasing and No Chain Rules The algorithm which makes a grammar nonerasing can easily introduce new chain rules. On the other hand, the algorithm which removes chain rules does not introduce any new null rules. Therefore, to construct a grammar which is both nonerasing and without chain rules, remove the null rules first, and then remove the chain rules.

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 22 of 23 Left Recursion and Greibach Normal Form G is left recursive if there is a derivation of the form A + Aα for some A V and α (V Σ). Left recursion makes the design of parsers more difficult, because of the possibility of an infinite loop for so-called recursive descent parsers which always try to replace the leftmost symbol first. G is in Greibach normal form if every production is of one of the following two forms: A aα for some A V, a Σ, and α (V \{S}) ; or S λ. Theorem: There is an algorithm to convert any CFG G into an equivalent one which is in Greibach normal form. Proof: Consult an advanced textbook. (The proof is tedious but not particularly deep.)

Simplification and Normalization, of Context-Free Grammars 20100927 Slide 23 of 23 Chomsky Normal Form Chomsky normal form guarantees that the productions are very short. G is in Chomsky normal form if every productions is of one of the following three forms: A BC for some A V, and B,C V \{S}. A a for some A V and a Σ. S λ. Theorem: There is an algorithm which converts any CFG G into an equivalent one in Chomsky normal form. Proof: There is a sketch in the textbook. Consult a more advanced book for a complete proof. Note: The proof uses ideas similar to that used in converting a right-linear grammar to a simple right-linear grammar.