Lecture 12 Simplification of Context-Free Grammars and Normal Forms COT 4420 Theory of Computation Chapter 6
Normal Forms for CFGs 1. Chomsky Normal Form CNF Productions of form A BC A, B, C V A a a T 2. Greibach Normal Form GNF Productions of form A ax A V, a T, X V*
λ-productions Any production of a CFG of the form A λ is called a λ-production. Any variable A for which the derivation A =>* λ is possible is called nullable.
Removing λ-productions Theorem: Given a grammar G with λ not in L(G), the set of nullable variables V N can be found using an algorithm. Proof: 1. For all productions A λ, put A into V N. 2. Repeat the following until no new variables are added to V N : For all productions B A 1 A 2 A n where A 1, A 2,, A n are in V N, put B into V N.
Removing λ-productions Theorem: Let G be any CFG with λ not in L(G), then there exists an equivalent G having no λ- productions. Algorithm: 1. Find the set V N of all nullable variables. 2. For all productions of the form A x 1 x 2 x m, m 1 where x i V T: We put this production in the new production set, as well as all those generated by replacing nullable variables with λ in all possible combinations. Exception: If all x i s are nullable, do not include A λ
Removing λ-productions Example Example: S ABaC A BC B b λ C D λ D d Nullable variables V N : A, B, C S ABaC BaC AaC ABa ac Ba Aa a A BC B C B b C D D d
Removing λ-productions Proof Proof: We need to show that: 1. If w λ and A =>* old w, then A =>* new w. 2. If A =>* new w then w λ and A =>* old w. Proof of (1): By induction on the number of steps by which A derives w in the old grammar. Basis: If in the old grammar, A derives w in one step, then A w must be a production. Since w λ, this production must appear in the new grammar as well. Therefore, A =>* new w.
Removing λ-productions Proof cont d Induction step: We assume the theorem is true for derivation steps of fewer than k. We show it for A =>* old w which has k steps. Let the first step be A => old X 1 X n, then w can be broken into w = w 1 w n, where X i =>* old w i, for all i, in fewer than k steps. Because of the induction hypothesis we have: X i =>* new w i The new grammar has a production A new X 1 X n, therefore A derives w in the new grammar.
Removing unit productions A unit production is one whose right-hand side has only one variable. A B Use a dependency graph: Whenever the grammar has a unit-production C D, create an edge (C,D) If E =>* F using only unit productions, whenever F α is a non-unit production, add E α. Remove unit productions
Original Grammar Removing unit productions S Aa B B A bb A a bc B Example Dependency Graph B S A New Grammar S Aa bb bc a B bb bc a A a bc bb S=>*B, S=>*A, A=>*B, B=>*A
Remove useless productions Variable A is useful if there exist some w L(G) such that: S =>* xay =>* w Otherwise it is useless. Example 1: S asb ab A A baa A does not derive terminal strings A is not reachable Example 2: S asb ab A baa ba
Useless productions The order is important! To remove useless productions:(follow these steps) 1. Eliminate variables that derive no terminal S aab A baa A is useless 2. Eliminate unreachable variables S AC C aab B Ab B is not reachable
Remove useless productions Example S as A C A a B aa C acb C is useless, so we remove variable C and its productions. After step (1), every remaining symbol derives some terminal.
Remove useless productions Example S as A A a B aa Dependency Graph S V = {S, A, B} B A Construct a dependency graph and determine the unreachable variables. (For every rule of the form C xdy, there is an edge from C to D. ) B is not reachable!
Cleaning up the grammar 1. Eliminate λ-productions 2. Eliminate unit productions 3. Eliminate useless variables The order is important because removing λ- productions, will introduce new unit productions or useless variables. Theorem: Let L be a CFL that does not contain λ. Then there exist a context-free grammar that generates L and does not have any useless productions, λ-productions, or unit-productions.
Converting to CNF A BC A a A, B, C V a T Theorem: Every context-free language L is generated by a Chomsky Normal Form (CNF) grammar. Proof: Let G be a CFG for generating L. Step1: First clean the grammar G. (remove λ-productions and unit-productions) Step2: For every production A x 1 x 2 x n, if n = 1, x 1 is a terminal (since there is no unit productions). If n 1, for every terminal a T, introduce a variable B a. Replace a with B a and add B a a to the set of productions.
Step 2 - Example A GcDe A GB c De B c c
Step 2 - Example A GcDe A GB c DB e B c c B e e
Converting to CNF Cont d Every production is of the form: A K 1 K 2 K 3 K n or A a a T A, K i V Step 3: Break right sides longer than 2 into chain of productions: A K 1 Z 1 A -> BCDE is replaced by Z 1 K 2 Z 2 A -> BF, F -> CG, and G -> DE.
Greibach Normal Form
A ax GNF A V a T X V* Theorem: Every context-free language L is generated by a Greibach Normal Form (GNF) grammar.
Lemma1 (theorem 6.1 in textbook): Let G=(V, T, S, P) be a CFG. Suppose P contains a production of the form A x 1 B x 2. Assume that A and B are different variables and that B y 1 y 2 y n is the set of all productions in P which have B as the left side. We can then remove A x 1 B x 2 from P and add A x 1 y 1 x 2 x 1 y 2 x 2 x 1 y n x 2 And have the same language.
A x 1 B x 2 A x 1 y 1 x 2 y 1 These derive the same sentential forms A ABa B AA b ZC A AAAa Aba AZCa
Lemma2, removing left recursion: Let G=(V, T, S, P) be a CFG. A Aα 1 Aα 2 Aα n be the set of A-productions that have A as the first symbol on the R.H.S. And let A β 1 β 2 β m be all the other A-productions. We can remove the left recursive A-productions and add: A β i 1 i m Z α i 1 i n A β i Z Z α i Z
A A A α i1 Β j Z A α i2 α ip Z A α ip α i2 Z Β j α i1
A ax Converting to GNF A V a T X V* Theorem: Every context-free language L is generated by a Greibach Normal Form (GNF) grammar. Step1: Rewrite the grammar into Chomsky Normal Form. Step2: Relabel all variables as A 1, A 2, A n Step3: Transform all productions into form: a. A i A j x j i < j and x j V* or b. A i a x j or c. Z i A j x j
Converting to GNF - Example S SS BC B CB a C SB b S = A 1, B = A 2, C =A 3 A 1 A 1 A 1 A 2 A 3 A 2 A 3 A 2 A 3 A 1 A 2 A 2 a A 3 b
Converting to GNF - Example A 1 A 1 A 1 apply lemma 2 to remove left recursion A 1 A 2 A 3 Z 1 A 1 A 1 A 2 A 3 Z 1 Z 1 A 1 Z 1 (a) (a) (b) (b) (a) (c) (c) A 1 A 1 A 1 A 1 A 2 A 3 A 2 A 3 A 2 A 3 A 1 A 2 A 2 a A 3 b A 1 A 2 A 3 Z 1 Z 1 A 1 Z 1 A 1 Z 1
Converting to GNF - Example A 3 A 1 A 2 apply lemma 1 to replace A 1 A 3 A 2 A 3 A 2 A 3 A 2 A 3 Z 1 A 2 Apply lemma 1 again A 3 A 3 A 2 A 3 A 2 A 3 aa 3 A 2 A 3 A 3 A 2 A 3 Z 1 A 2 A 3 aa 3 Z 1 A 2 (a) A 1 A 2 A 3 (a) A 2 A 3 A 2 A 3 A 1 A 2 (b) A 2 a (b) A 3 b (a) A 1 A 2 A 3 Z 1 (c) Z 1 A 1 (c) Z 1 A 1 Z 1 A 3 A 3 A 2 A 3 A 2 (b) A 3 aa 3 A 2 A 3 A 3 A 2 A 3 Z 1 A 2 (b) A 3 aa 3 Z 1 A 2
Converting to GNF - Example A 3 A 3 A 2 A 3 A 2 apply lemma2 to remove recursion A 3 A 3 A 2 A 3 Z 1 A 2 A 3 bz 2 aa 3 A 2 Z 2 aa 3 Z 1 A 2 Z 2 Z 2 A 2 A 3 A 2 A 2 A 3 Z 1 A 2 Z 2 A 2 A 3 A 2 Z 2 A 2 A 3 Z 1 A 2 Z 2 (a)a 1 A 2 A 3 (a) A 2 A 3 A 2 (b) A 2 a (b) A 3 b (a) A 1 A 2 A 3 Z 1 (c) Z 1 A 1 (c) Z 1 A 1 Z 1 A 3 A 3 A 2 A 3 A 2 (b) A 3 aa 3 A 2 A 3 A 3 A 2 A 3 Z 1 A 2 (b) A 3 aa 3 Z 1 A 2
Converting to GNF - Example (a) (a) (b) (b) (c) (c) A 1 A 2 A 3 A 2 A 3 Z 1 A 2 A 3 A 2 A 2 a A 3 b aa 3 A 2 aa 3 Z 1 A 2 bz 2 aa 3 A 2 Z 2 aa 3 Z 1 A 2 Z 2 Z 1 A 1 A 1 Z 1 Z 2 A 2 A 3 A 2 A 2 A 3 Z 1 A 2 A 2 A 3 A 2 Z 2 A 2 A 3 Z 1 A 2 Z 2 Now everything is in the form of step 3. Note that A n is in the form of GNF.
Converting to GNF Step4: For every production of the form A n- 1 A n x n use lemma 1 to convert to correct GNF form. Continue to A 1. For all Z-productions, use lemma 1 to convert to correct GNF form.
Example Cont d A 1 A 2 A 3 A 2 A 3 Z 1 A 2 A 3 A 2 A 2 a A 3 b aa 3 A 2 aa 3 Z 1 A 2 bz 2 aa 3 A 2 Z 2 aa 3 Z 1 A 2 Z 2 Z 1 A 1 A 1 Z 1 Z 2 A 2 A 3 A 2 A 2 A 3 Z 1 A 2 A 2 A 3 A 2 Z 2 A 2 A 3 Z 1 A 2 Z 2 For A 2 A 3 A 2 we write: A 2 ba 2 aa 3 A 2 A 2 aa 3 Z 1 A 2 A 2 bz 2 A 2 aa 3 A 2 Z 2 A 2 aa 3 Z 1 A 2 Z 2 A 2
For A 1 A 2 A 3 A 2 A 3 Z 1 we write: A 1 ba 2 A 3 aa 3 A 2 A 2 A 3 aa 3 Z 1 A 2 A 2 A 3 bz 2 A 2 A 3 aa 3 A 2 Z 2 A 2 A 3 aa 3 Z 1 A 2 Z 2 A 2 A 3 aa 3 A 1 ba 2 A 3 Z 1 aa 3 A 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 A 2 A 3 Z 1 bz 2 A 2 A 3 Z 1 aa 3 A 2 Z 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 Z 2 A 2 A 3 Z 1 aa 3 Z 1 Example Cont d A 1 A 2 A 3 A 2 A 3 Z 1 A 2 ba 2 aa 3 A 2 A 2 aa 3 Z 1 A 2 A 2 bz 2 A 2 aa 3 A 2 Z 2 A 2 aa 3 Z 1 A 2 Z 2 A 2 a A 3 b aa 3 A 2 aa 3 Z 1 A 2 bz 2 aa 3 A 2 Z 2 aa 3 Z 1 A 2 Z 2 Z 1 A 1 A 1 Z 1 Z 2 A 2 A 3 A 2 A 2 A 3 Z 1 A 2 A 2 A 3 A 2 Z 2 A 2 A 3 Z 1 A 2 Z 2
Example Cont d A 1 ba 2 A 3 aa 3 A 2 A 2 A 3 aa 3 Z 1 A 2 A 2 A 3 bz 2 A 2 A 3 aa 3 A 2 Z 2 A 2 A 3 aa 3 Z 1 A 2 Z 2 A 2 A 3 aa 3 ba 2 A 3 Z 1 aa 3 A 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 A 2 A 3 Z 1 bz 2 A 2 A 3 Z 1 aa 3 A 2 Z 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 Z 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 ba 2 aa 3 A 2 A 2 aa 3 Z 1 A 2 A 2 bz 2 A 2 aa 3 A 2 Z 2 A 2 aa 3 Z 1 A 2 Z 2 A 2 a A 3 b aa 3 A 2 aa 3 Z 1 A 2 bz 2 aa 3 A 2 Z 2 aa 3 Z 1 A 2 Z 2 Z 1 A 1 A 1 Z 1 Z 2 A 2 A 3 A 2 A 2 A 3 Z 1 A 2 A 2 A 3 A 2 Z 2 A 2 A 3 Z 1 A 2 Z 2
Example Cont d A 1 ba 2 A 3 aa 3 A 2 A 2 A 3 aa 3 Z 1 A 2 A 2 A 3 bz 2 A 2 A 3 aa 3 A 2 Z 2 A 2 A 3 aa 3 Z 1 A 2 Z 2 A 2 A 3 aa 3 ba 2 A 3 Z 1 aa 3 A 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 A 2 A 3 Z 1 bz 2 A 2 A 3 Z 1 aa 3 A 2 Z 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 Z 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 ba 2 aa 3 A 2 A 2 aa 3 Z 1 A 2 A 2 bz 2 A 2 aa 3 A 2 Z 2 A 2 aa 3 Z 1 A 2 Z 2 A 2 a A 3 b aa 3 A 2 aa 3 Z 1 A 2 bz 2 aa 3 A 2 Z 2 aa 3 Z 1 A 2 Z 2 Z 1 A 1 A 1 Z 1 Z 2 A 2 A 3 A 2 A 2 A 3 Z 1 A 2 A 2 A 3 A 2 Z 2 A 2 A 3 Z 1 A 2 Z 2
Example Cont d A 1 ba 2 A 3 aa 3 A 2 A 2 A 3 aa 3 Z 1 A 2 A 2 A 3 bz 2 A 2 A 3 aa 3 A 2 Z 2 A 2 A 3 aa 3 Z 1 A 2 Z 2 A 2 A 3 aa 3 ba 2 A 3 Z 1 aa 3 A 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 A 2 A 3 Z 1 bz 2 A 2 A 3 Z 1 aa 3 A 2 Z 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 Z 2 A 2 A 3 Z 1 aa 3 Z 1 A 2 ba 2 aa 3 A 2 A 2 aa 3 Z 1 A 2 A 2 bz 2 A 2 aa 3 A 2 Z 2 A 2 Use lemma1 for all Z-productions aa 3 Z 1 A 2 Z 2 A 2 a A 3 b aa 3 A 2 aa 3 Z 1 A 2 bz 2 aa 3 A 2 Z 2 aa 3 Z 1 A 2 Z 2 Z 1 A 1 A 1 Z 1 Z 2 A 2 A 3 A 2 A 2 A 3 Z 1 A 2 A 2 A 3 A 2 Z 2 A 2 A 3 Z 1 A 2 Z 2
The CYK Parser
The CYK membership algorithm Input: Grammar G in Chomsky Normal Form String w = a 1 a 2.a n Output: find if w L(G)
The Algorithm Define: w ij : is a substring a i.a j V ij : { A V : A =>* w ij } A V ii if and only if G contains A a i A V ij if and only if G contains A BC, and B V ik, and C V k+1j, (k {i, i+1,, j-1})
The Algorithm 1. Compute V 11, V 22,,V nn 2. Compute V 12, V 23,,V n-1,n 3. Compute V 13, V 24,., V n-2,n 4. And so on. If S V 1n then w L(G), otherwise w L(G).
Example Grammar G and string w is given: S AB w = aabbb A BB A a B AB B b
Example 1. Compute V 11, V 22,,V 55 Note that: A V ii if and only if G contains A a i V 11 =? Is there a rule that directly derives a 1? V 11 = {A} V 22 =? Is there a rule that directly derives a 2? V 22 = {A} V 33 =? Is there a rule that directly derives a 3? V 33 = {B} V 44 = {B}, V 55 = {B}
Example 2. Compute V 12, V 23,,V 45 Note that: A V ij if and only if G contains A BC, and B V ik, and C V k+1j for all k s V 12 =? { A : A BC, B V 11, C V 22 } V 12 = {} V 23 =? { A : A BC, B V 22, C V 33 } V 23 = {S, B} V 34 =? { A : A BC, B V 33, C V 44 } V 34 = {A} V 45 =? { A : A BC, B V 44, C V 55 } V 45 = {A}? Variable AA? Variable AB? Variable BB? Variable BB
Example 3. Compute V 13, V 24,V 35? V 13 =? { A : A BC, B V 11, C V 23 } Variable AS, AB { A : A BC, B V 12, C V 33 } V 13 = {S, B} V 24 =? { A : A BC, B V 22, C V 34 } Variable AA? { A : A BC, B V 23, C V 44 } Variable SB, BB V 24 = {A}? V 35 =? { A : A BC, B V 33, C V 45 } Variable BA? { A : A BC, B V 34, C V 55 } Variable AB V 35 = {S, B}?
Example 4. Compute V 14, V 25 V 14 =? { A : A BC, B V 11, C V 24 } Variable AA { A : A BC, B V 12, C V 34 }? { A : A BC, B V 13, C V 44 } Variable SB, BB V 14 = {A} V 25 =? { A : A BC, B V 22, C V 35 } Variable AS, AB? { A : A BC, B V 23, C V 45 } Variable SA, BA? { A : A BC, B V 24, C V 55 } Variable AB V 25 = {S, B}??
Example 5. Compute V 15 V 15 =? { A : A BC, B V 11, C V 25 } Variable AS, AB { A : A BC, B V 12, C V 35 }? { A : A BC, B V 13, C V 45 } Variable SA, BA? { A : A BC, B V 14, C V 55 } Variable AB V 15 = {S, B}? S V 15, therefore w = aabbb L(G)
1 a 2 a 3 b 4 b 5 b 1 {A} 2 {A} 3 {B} 4 {B} 5 {B}
1 a 2 a 3 b 4 b 5 b 1 {A} {} 2 {A} {S, B} 3 {B} {A} 4 {B} {A} 5 {B}
1 a 2 a 3 b 1 {A} {} {S, B} 4 b 5 b 2 {A} {S, B} {A} 3 {B} {A} {S, B} 4 {B} {A} 5 {B}
1 a 2 a 1 {A} {} {S, B} {A} {S, B} 2 {A} {S, B} {A} {S, B} 3 {B} {A} {S, B} 4 {B} {A} 5 {B} 3 b 4 b 5 b
Approximate time complexity: 2 O ( w w ) = O( w 3 ) Number of V ij s to be computed If w = n n(n-1)/2 Number of evaluations in each V ij at most n