CS 373: Theory of Computation. Fall 2010

Similar documents
Lecture 12 Simplification of Context-Free Grammars and Normal Forms

Context-Free Grammars: Normal Forms

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Simplification of CFG and Normal Forms. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

CFG Simplification. (simplify) 1. Eliminate useless symbols 2. Eliminate -productions 3. Eliminate unit productions

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Simplification and Normalization of Context-Free Grammars

Properties of context-free Languages

CPSC 313 Introduction to Computability

This lecture covers Chapter 7 of HMU: Properties of CFLs

Homework 4 Solutions. 2. Find context-free grammars for the language L = {a n b m c k : k n + m}. (with n 0,

Context-Free Grammar

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Properties of Context-free Languages. Reading: Chapter 7

MTH401A Theory of Computation. Lecture 17

Theory of Computation - Module 3

Note: In any grammar here, the meaning and usage of P (productions) is equivalent to R (rules).

Einführung in die Computerlinguistik

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

Context Free Grammars: Introduction. Context Free Grammars: Simplifying CFGs

CS375: Logic and Theory of Computing

NPDA, CFG equivalence

Context Free Grammars

Recitation 4: Converting Grammars to Chomsky Normal Form, Simulation of Context Free Languages with Push-Down Automata, Semirings

Computational Models - Lecture 5 1

Context Free Grammars: Introduction

Chomsky Normal Form for Context-Free Gramars

Context-Free Grammars and Languages

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

MA/CSSE 474 Theory of Computation

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Non-context-Free Languages. CS215, Lecture 5 c

6.8 The Post Correspondence Problem

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

Concordia University Department of Computer Science & Software Engineering

Chomsky and Greibach Normal Forms

Lecture 11 Context-Free Languages

Problem Session 5 (CFGs) Talk about the building blocks of CFGs: S 0S 1S ε - everything. S 0S0 1S1 A - waw R. S 0S0 0S1 1S0 1S1 A - xay, where x = y.

Chapter 4: Context-Free Grammars

Properties of Context-Free Languages

CS A Term 2009: Foundations of Computer Science. Homework 2. By Li Feng, Shweta Srivastava, and Carolina Ruiz.

CS5371 Theory of Computation. Lecture 12: Computability III (Decidable Languages relating to DFA, NFA, and CFG)

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Context-Free Grammars and Languages. Reading: Chapter 5

Finite Automata Theory and Formal Languages TMV026/TMV027/DIT321 Responsible: Ana Bove

Theory Of Computation UNIT-II

Introduction to Theory of Computing

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

Chomsky Normal Form and TURING MACHINES. TUESDAY Feb 4

Chap. 7 Properties of Context-free Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

INSTITUTE OF AERONAUTICAL ENGINEERING

Pushdown Automata. Reading: Chapter 6

Ogden s Lemma for CFLs

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

SFWR ENG 2FA3. Solution to the Assignment #4

Einführung in die Computerlinguistik Kontextfreie Grammatiken - Formale Eigenschaften

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Introduction to Formal Languages, Automata and Computability p.1/42

Theory of Computer Science

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

Context-Free Languages (Pre Lecture)

Computability and Complexity

CDM Parsing and Decidability

(pp ) PDAs and CFGs (Sec. 2.2)

Computational Models - Lecture 4

CSE 468, Fall 2006 Homework solutions 1

CYK Algorithm for Parsing General Context-Free Grammars

Computability Theory

AC68 FINITE AUTOMATA & FORMULA LANGUAGES JUNE 2014

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Computational Models - Lecture 4 1

(pp ) PDAs and CFGs (Sec. 2.2)

CS481F01 Solutions 8

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

CSCI Compiler Construction

Automata Theory Final Exam Solution 08:10-10:00 am Friday, June 13, 2008

Fundamentele Informatica II

Einführung in die Computerlinguistik

Grammars and Context Free Languages

Chapter 5: Context-Free Languages

Theory of Computation Turing Machine and Pushdown Automata

Grammars and Context Free Languages

CS311 Computational Structures More about PDAs & Context-Free Languages. Lecture 9. Andrew P. Black Andrew Tolmach

Automata Theory CS F-08 Context-Free Grammars

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

where A, B, C N, a Σ, S ϵ is in P iff ϵ L(G), and S does not occur on the right-hand side of any production. 3.6 The Greibach Normal Form

Properties of Context-Free Languages. Closure Properties Decision Properties

Theory of Computation

Notes for Comp 497 (Comp 454) Week 10 4/5/05

CSE 355 Test 2, Fall 2016

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.

CS481F01 Prelim 2 Solutions

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Transcription:

CS 373: Theory of Computation Gul Agha Mahesh Viswanathan Fall 2010 1

1 Normal Forms for CFG Normal Forms for Grammars It is typically easier to work with a context free language if given a CFG in a normal form. Normal Forms A grammar is in a normal form if its production rules have a special structure: Chomsky Normal Form: Productions are of the form A BC or A a Greibach Normal Form Productions are of the form A aα, where α V If ɛ is in the language, we allow the rule S ɛ. We will require that S does not appear on the right hand side of any rules. In this lecture... How to convert any context-free grammar to an equivalent grammar in the Chomsky Normal Form We will start with a series of simplifications... 2 Three Simplifications 2.1 Eliminating ɛ-productions Eliminating ɛ-productions Often would like to ensure that the length of the intermediate strings in a derivation are not longer than the final string derived But a long intermediate string can lead to a short final string if there are ɛ-productions (rules of the form A ɛ). Can we rewrite the grammar not to have ɛ-productions? Eliminating ɛ-production The Problem Given a grammar G produce an equivalent grammar G (i.e., L(G) = L(G )) such that G has no rules of the form A ɛ, except possibly S ɛ, and S does not appear on the right hand side of any rule. 2

Note: If S can appear on the RHS of a rule, say S SS, then when there is the rule S ɛ, we can again have long intermediate strings yielding short final strings. Nullable Variables Definition 1. A variable A (of grammar G) is nullable if A ɛ. How do you determine if a variable is nullable? If A ɛ is a production in G then A is nullable If A B 1 B 2 B k is a production and each B i is nullable, then A is nullable. Fixed point algorithm: Propagate the label of nullable until there is no change. Using nullable variables Initial Ideas Intuition: For every variable A in G have a variable A in G such that A G w iff A G w and w ɛ. For every rule B CAD in G, where A is nullable, add two rules in G : B CD and B CAD. The Algorithm G has same variables, except for a new start symbol S. For each rule A X 1 X 2 X k in G, create rules A α 1 α 2 α k where { X i if X i is a non-nullable variable/terminal in G α i = X i or ɛ if X i is nullable in G and not all α i are ɛ Add rule S S. If S nullable in G, add S ɛ also. Correctness of the Algorithm By construction, there are no rules of the form A ɛ in G (except possibly S ɛ), and S does not appear in the RHS of any rule. L(G) = L(G ) L(G ) L(G): For every rule A w in G, we have A G w (by expanding zero or more nullable variables in w to ɛ) L(G) L(G ): If ɛ L(G), then ɛ L(G ). If A G w Σ +, then by induction on the number of steps in the derivation, A G w. Base case: if A w Σ +, then A w. 3

(Proof details skipped.) Eliminating ɛ-productions An Example Example 2. Rules of grammar G be S AB; A AaA ɛ; and B BbB ɛ. Nullables in G are A, B and S Rules for grammar G : S AB A B A AaA aa Aa a B BbB bb Bb b S S ɛ 2.2 Eliminating Unit Productions Eliminating Unit Productions Often would like to ensure that the number of steps in a derivation are not much more than the length of the string derived But can have a long chain of derivation steps that make little or no progress, if the grammar has unit productions (rules of the form A B, where B is a non-terminal). Note: A a is not a unit production Can we rewrite the grammar not to have unit-productions? Eliminating unit-productions Given a grammar G produce an equivalent grammar G (i.e., L(G) = L(G )) such that G has no rules of the form A B where B V. Role of Unit Productions Unit productions can play an important role in designing grammars: While eliminating ɛ-productions we added a rule S S. This is a unit production. We have used unit productions in building an unambiguous grammar: I a b Ia Ib N 0 1 N0 N1 F I N N (E) T F T F E T E + T 4

But as we shall see now, they can be (safely) eliminated Basic Idea Introduce new look-ahead productions to replace unit productions: look ahead to see where the unit production (or a chain of unit productions) leads to and add a rule to directly go there. Example 3. E T F I a b Ia Ib. So introduce new rules E a b Ia Ib But what if the grammar has cycles of unit productions? For example, A B a, B C b and C A c. You cannot use the look-ahead approach, because then you will get into an infinite loop. The Algorithm 1. Determine pairs A, B such that A u B, i.e., A derives B using only unit rules. Such pairs are called unit pairs. Easy to determine unit pairs: Make a directed graph with vertices = V, and edges = unit productions. A, B is a unit pair, if there is a directed path from A to B in the graph. 2. If A, B is a unit pair, then add production rules A β 1 β 2 β k, where B β 1 β 2 β k are all the non-unit production rules of B 3. Remove all unit production rules. Let G be the grammar obtained from G using this algorithm. Then L(G ) = L(G) Correctness Proof L(G ) L(G) Proof. For every rule A w in G, we have A G w (by a sequence of zero or more unit productions followed by a nonunit production of G) Correctness Proof L(G) L(G ) Proof. For w L(G) consider a leftmost derivation S lm w in G. All these derivation steps are possible in G also, except the ones using the unit productions of G. Suppose S xaα 1 xbα 2, where 1 corresponds to a unit rule. Then (in a leftmost derivation) 2 must correspond to using a rule for B. So a leftmost derivation of w in G can be broken up into big-steps each consisting of zero or more unit productions on the leftmost variable, followed by a non-unit production. 5

For each such big-step there is a single production rule in G that yields the same result. 2.3 Eliminating Useless Symbols Eliminating Useless Symbols Ideally one would like to use a compact grammar, with the fewest possible variables But a grammar may have useless variables which do not appear in any valid derivation Can we identify all the useless variables and remove them from the grammar? (Note: there may still be other redundancies in the grammar.) Useless Symbols Definition 4. A symbol X V Σ is useless in a grammar G = (V, Σ, S, P ) if there is no derivation of the form S αxβ w where w Σ and α, β (V Σ). Removing useless symbols (and rules involving them) from a grammar does not change the language of the grammar Revisiting Useless Symbols Recall, X is useless if there is no derivation of the form S αxβ w where w Σ and α, β (V Σ). i.e., X is useless iff either Type 1: X is not reachable from S (i.e., no α, β such that S αxβ), or Type 2: for all α, β such that S αxβ, either α, X or β cannot yield a string in Σ. i.e., either Type 2a: X is not generating (i.e., no w Σ such that X w), or Type 2b: α or β contains a non-generating symbol Algorithm to Remove Useless Symbols Algorithm So, in order to remove useless symbols, 1. First remove all symbols that are not generating (Type 2a) If X was useless, but reachable and generating (i.e., Type 2b) then X becomes unreachable after this step 6

Type 2b: for all α, β such that S αxβ, α or β contains a non-generating symbol. Then in the new grammar all such derivations disappear (because some variable in α or β is removed). 2. Next remove all unreachable symbols in the new grammar. Removes Type 1 (originally unreachable) and Type 2b useless symbols now Doesn t remove any useful symbol in either step (Why?) Only remains to show how to do the two steps in this algorithm Generating and Reachable Symbols Generating symbols If A x, where x Σ, is a production then A is generating If A γ is a production and all variables in γ are generating, then A is generating. Reachable symbols S is reachable If A is reachable and A αbβ is a production, then B is reachable Fixed point algorithm: Propagate the label (generating or reachable) until no change. 2.4 Putting Together the Three Simplifications The Three Simplifications, Together Given a grammar G, such that L(G), we can find a grammar G such that L(G ) = L(G) and G has no ɛ-productions (except possibly S ɛ), unit productions, or useless symbols, and S does not appear in the RHS of any rule. Proof. Apply the following 3 steps in order: 1. Eliminate ɛ-productions 2. Eliminate unit productions 3. Eliminate useless symbols. Note: Applying the steps in a different order may result in a grammar not having all the desired properties. 7

3 Chomsky Normal Form Chomsky Normal Form Proposition 5. For any non-empty context-free language L, there is a grammar G, such that L(G) = L and each rule in G is of the form 1. A a where a Σ, or 2. A BC where neither B nor C is the start symbol, or 3. S ɛ where S is the start symbol (iff ɛ L) Furthermore, G has no useless symbols. Outline of Normalization Given G = (V, Σ, S, P ), convert to CNF Let G = (V, Σ, S, P ) be the grammar obtained after eliminating ɛ-productions, unit productions, and useless symbols from G. If A x is a rule of G, where x = 0, then A must be S (because G has no other ɛ- productions). If A x is a rule of G, where x = 1, then x Σ (because G has no unit productions). In either case A x is in a valid form. All remaining productions are of form A X 1 X 2 X n where X i V Σ, n 2 (and S does not occur in the RHS). We will put these rules in the right form by applying the following two transformations: 1. Make the RHS consist only of variables 2. Make the RHS be of length 2. Make the RHS consist only of variables Let A X 1 X 2 X n, with X i being either a variable or a terminal. We want rules where all the X i are variables. Example 6. Consider A BbCdefG. How do you remove the terminals? For each a, b, c... Σ add variables X a, X b, X c,... with productions X a a, X b b,.... Then replace the production A BbCdefG by A BX b CX d X e X f G For every a Σ 1. Add a new variable X a 2. In every rule, if a occurs in the RHS, replace it by X a 3. Add a new rule X a a 8

Make the RHS be of length 2 Now all productions are of the form A a or A B 1 B 2 B n, where n 2 and each B i is a variable. How do you eliminate rules of the form A B 1 B 2... B n where n > 2? Replace the rule by the following set of rules A B 1 B (2,n) B (2,n) B 2 B (3,n) B (3,n) B 3 B (4,n). B (n 1,n) B n 1 B n where B (i,n) are new variables. An Example Example 7. Convert: S aa bb b, A Baa ba, B baab ab, into Chomsky Normal Form. 1. Eliminate ɛ-productions, unit productions, and useless symbols. This grammar is already in the right form. 2. Remove terminals from the RHS of long rules. New grammar is: X a a, X b b, S X a A X b B b, A BX a X a X b X a, and B X b AAX b X a X b 3. Reduce the RHS of rules to be of length at most two. New grammar replaces A BX a X a by rules A BX aa, X aa X a X a, and B X b AAX b by rules B X b X AAb, X AAb AX Ab, X Ab AX b 9