CPSC 313 Introduction to Computability Grammars in Chomsky Normal Form (Cont d) (Sipser, pages 109-111 (3 rd ed) and 107-109 (2 nd ed)) Renate Scheidler Fall 2018
Chomsky Normal Form A context-free grammar G = (V, Σ, R, S) is in Chomsky Normal Form if each rule has the form A BC (A, B, C V; B S; C S), or A a (A V and a Σ) We also allow the rule S ε 2
Transformation Steps 1. Create new start symbol S 1 and add new rule S 1 S 2. Eliminate all ε-productions (A ε) 3. Eliminate all unit productions (A B) 4. Convert all rules to proper format by introducing new variables 3
Step 3 Removing Unit Rules A unit rule has the form A B where A and B are variables Replace by A B B x 1 x 2... x n A x 1 x 2... x n B x 1 x 2... x n (x i (V Σ)*) 4
Algorithm for Step 3 Input: G 2 = (V 1, Σ, R 2, S 1 ) Output: G 3 = (V 1, Σ, R 3, S 1 ) (no unit rules) R 3 R 2 while R 3 contains unit rules do choose any unit rule A B in R 3 remove A B from R 3 For each rule B x add A x to R 3 except if A x is a unit rule we previously removed or is already contained in R 3 or x = A end for end while 5
Correctness of Step 3 Claim 1: L(G 3 ) = L(G 2 ) Claim 2: S 1 does not appear on the right-hand side of any rule in G 3 Claim 3: G 3 contains no ε-rules except possibly S 1 ε (in the case where ε L(G)) Claim 4: G 3 does not contain any unit rules 6
Example G = (V, Σ, R, S) where V = {S, A, B, C} Σ = {a, b, c} R contains the rules S aab ABCA AC A aab ε B BB b C cc ε 7
Example After Step 2 S 1 S ε S aab ab ABCA BCA ABA ABC BA BC AB B AC C A A aab ab B B BB b C cc c 8
Example Step 3: Unit Rules S 1 S ε S aab ab ABCA BCA ABA ABC BA BC AB B AC C A A aab ab B B BB b C cc c 9
Eliminating Unit Rule A B A aab ab B B BB b A aab ab BB b B BB b 10
Eliminating Unit Rule S C S aab ab ABCA BCA ABA ABC BA BC AB B AC C A C cc c S aab ab ABCA BCA ABA ABC BA BC AB B AC cc c A C cc c 11
Eliminating Unit Rule S B S aab ab ABCA BCA ABA ABC BA BC AB B AC cc c A B BB b S aab ab ABCA BCA ABA ABC BA BC AB BB b AC cc c A B BB b 12
Eliminating Unit Rule S A S aab ab ABCA BCA ABA ABC BA BC AB BB b AC cc c A A aab ab B Do not substitute A aab or A ab since S aab and S ab are already present Do not substitute A B, since we previously removed that unit rule 13
Eliminating Unit Rule S A (Cont d) S aab ab ABCA BCA ABA ABC BA BC AB BB b AC cc c A aab ab B (we simply dropped A from right-hand side) 14
Eliminating Unit Rule S 1 S S 1 S ε S aab ab ABCA BCA ABA ABC BA BC AB BB b AC cc c 15 S 1 aab ab ABCA BCA ABA ABC BA BC AB BB b AC cc c ε S (as before)
Removing Unreachable Rules Note that S does not appear in the right-hand side of any rule So any rule of the form S x is never applied Drop all rules with S in the left-hand side from G 3 In general, remove all unreachable rules 16
Example After Step 3 S 1 aab ab ABCA BCA ABA ABC BA BC AB BB b AC cc c ε A aab ab BB b B BB b C cc c 17
Step 4 Format Rules First remove all rules A A 1 A 2... A n (n > 2, A i V Σ) by A A 1 X 1, X 1 A 2 X 2, X 2 A 3 X 3,..., X i-1 A i X i,..., X n-2 A n-1 X n-1, X n-1 A n-1 A n with new variables X 1, X 2,..., X n-1 Then remove all rules B xc (x Σ, C V Σ) by B XC, X x with a new variable X Finally, replace all rules B Cx (x Σ, C V) by B CX, X x 18
Algorithm for Step 4 Input: G 3 = (V 1, Σ, R 3, S 1 ) Output: G 4 = (V 4, Σ, R 4, S 1 ) (all rules in proper format) R 4 R 3, V 4 V 1 while R 4 contains rules whose right-hand side has length > 2 do choose any such rule A A 1 A 2...A n (with A i V 4 Σ) remove the rule A A 1 A 2...A n from R 4 add a new variable X to V 4 add A A 1 X and X A 2 A 3...A n to R 4 end while 19
Algorithm for Step 4 (Cont d) for each a Σ add a new variable X a to V 4 add X a a to R 4 end for for each rule A ab (a Σ and B Σ V) replace A ab with A X a B in R 4 end for for each rule A Ba (a Σ and B V) replace A Ba with A BX a in R 4 end for 20
Correctness of Step 4 Claim 1: L(G 4 ) = L(G 3 ) Claim 2: All the rules in G 4 have the form A BC with A V and B, C V \ {S 1 } A a with A V and a Σ S 1 ε (in the case where ε L(G)) Claims 1 & 2 imply that G 4 is a grammar in CNF that generates the same language as our original grammar G. 21
Example G = (V, Σ, R, S) where V = {S, A, B, C} Σ = {a, b, c} R contains the rules S aab ABCA AC A aab B ε B BB b C cc ε 22
Example After Step 3 S 1 aab ab ABCA BCA ABA ABC BA BC AB BB b AC cc c ε A aab ab BB b B BB b C cc c 23
Example Step 4: Formatting S 1 aab ab ABCA BCA ABA ABC BA BC AB BB b AC cc c ε A aab ab BB b B BB b C cc c 24
Example Step 4: Long Rules Long rules for S 1 : S 1 aab ABCA BCA ABA ABC New rules: S 1 ax 1, X 1 Ab S 1 AY 1, Y 1 BY 2, Y 2 CA S 1 BZ 1, Z 1 CA S 1 AW 1, W 1 BA S 1 AV 1, V 1 BC 25
Example Step 4: Long Rules Long rules for A: A aab Re-use corresponding rule introduced for S 1 : we replaced S 1 aab by S 1 ax 1, X 1 Ab so replace A aab by A ax 1, X 1 Ab 26
Example Step 4: Remaining Rules Remaining rules to be fixed: S 1 ax 1 ab cc X 1 Ab A ax 1 ab C cc New rules: U a, V b, W c S 1 UX 1 UV WC X 1 AV A UX 1 UV C WC 27
Example After Step 4 G 4 = (V 4, Σ, R 4, S 1 ) where V 4 = { S 1, A, B, C, X 1, Y 1, Y 2, Z 1, W 1, V 1, U, V, W } Σ = { a, b, c } R 4 contains the following rules: 28
Example After Step 4 S 1 UX 1 UV AY 1 BZ 1 AW 1 BA BC AB BB AC WC b c ε A UX 1 UV BB b B BB b C WC c X 1 AV Y 1 BY 2, Y 2 CA Z 1 CA W 1 BC U a, V b, W c 29