Simplification of CFG and Normal Forms Wen-Guey Tzeng Computer Science Department National Chiao Tung University
Normal Forms We want a cfg with either Chomsky or Greibach normal form Chomsky normal form A a, A BC Greibach normal form A ax, x V* 2
CFG with normal forms are easier for parsing The membership problem Given a grammar G and a string w, find the parsing tree for w if a parsing tree exists. w = x+y*z 3
-free languages A language that does not contain We consider CFG G such that L(G) is -free For any cfg G, there is G such that L(G )=L(G)-{ } 4
Transformation to normal forms: steps CFG G=(V, T, P, S) ( -free contextfree language) Remove (1) -productions (2) unit-productions (3) useless productions from P to get G Convert G to normal forms 5
A substitution rule For A B, A x 1 Bx 2, B y 1 y 2 y n is equivalent to A x 1 y 1 x 2 x 1 y 2 x 2 x 1 y n x 2, B y 1 y 2 y n Example A a aaa abbc, B abba b is equivalent to A a aaa ababbac abbc, B abba b 6
-production: A Remove -productions Nullable variable A: A * Steps 1. Find the nullable variable set V N 2. For each A x 1 x 2 x m, x i V T, For each combination x i, x j,, x k of variables in V N add A x 1 x i-1 x i+1 x j-1 x j+1... x k-1 x k+1 x m Note: don t add A, if all x i are in V N 7
Example S ABaC, A BC, B b, C D, D d Nullable set V N ={A, B, C} Add productions 8
Remove unit-productions unit-production: A B Steps Remove A A immediately Draw dependency graph for variables A and B with: A *B For A *B and B y 1 y 2 y n Add A y 1 y 2 y n Remove all A B, where A and B are in dependency graph 9
Example S Aa B, B A bb, A a bc B 1. Draw dependency graph 2. Add S bb a bc A bb B a bc 3. Remove S B, B A, A B 4. Finally S a bc bb Aa A a bc bb B a bc bb 10
Remove useless productions A variable A V is useful if S can generate some terminal string through it. That is, S * xay * w, w T* Example S asb AB Ba, A aa, B b Bb, C cb c S Ba ba. Thus, B is useful. S is useful. But, A and C are not useful (useless) 11
Two cases for useless variables Case 1: variables that cannot generate strings in T* S asb AB Ba, A aa, B b Bb, C cb c Algorithm (finding variables that generate strings) 1. V 1 ={} 2. For rule A x, x (T V 1 )*, add A to V 1 3. Repeat 2 until no rules can be added to V 1 V 1 ={S, B, C} S asb Ba, B b Bb, C cb c 12
Case 2: variables that cannot be reached from S S asb Ba, B b Bb, C cb c Algorithm: dependency graph S B C C is un-reachable from S. S asb Ba, B b Bb 13
Algorithm (removing useless productions) Input: G=(V, T, P, S) 1. Find the useless variables in Case 1 and remove related useless productions. 2. Find the useless (un-reachable) variables in Case 2 and remove the related useless productions 14
Chomosy normal form A cfg is in Chomsky normal form (CNF) if all productions are of form A BC, or A a Example S AS a, A SA b Every cfg G, with L(G), has an equivalent CNF grammar. 15
Converting into CNF 1. Apply the rules of removing -, unit-, and useless-productions 2. Convert the productions into the form A C 1 C 2 C n, or A a 3. Convert A C 1 C 2 C n into A C 1 D 1, D 1 C 2 D 2,, D n-2 C n-1 C n 16
Example S ABa, A aab, B Ac Step 2: Step 3: 17
Greibach normal form A cfg is in Greibach normal form (GNF) if all productions are of form A ab 1 B 2 B n, n 0 Example S abc, B aba, A a bbsc Every cfg G, with L(G), has an equivalent GNF grammar. 18
Example Example S AB, A aa bb b, B b Result Example S aab bbb bb, A aa bb b, B b S absb aa Result S absb aa, B b, A a 19
Parsing (membership) Question: Given a CFG G in Chomsky normal form and a string w, determine whether w L(G) Idea: the dynamic programming technique A large problem is decomposed into smaller problems Combine solutions to smaller problems into a solution for the large problem 20
Assume w=a 1 a 2 a n Main problem: V 1n ={V: V * w} w L(G) if and only S V 1n Use the dynamic programming technique V ij ={ V : V * a i a i+1 a j }: variables that can generate substring a i a i+1 a j Solve smaller problems V ik, V k+1,j, for k=i, i+1,, j-1 Combine them to compute V ij V ij = {A:A BC, B V ik, C V k+1,j, i k<j} 21
w = a 1 a 2 a 3 a i a i+1 a j-1 a j a n V ij contains the variables that generate a i a i+1 a j-1 a j a i a i+1 a k a k+1 a k+2 a j-1 a j V i i V i+1 i... V i k V k+1 j V i k+1 V k+2 j... 22
CYK Algorithm Input: G=(V, T, S, P) is in CNF and w=a 1 a 2 a n Compute V ij ={ A V : A * a i a i+1 a j } V 11, V 22,, V nn V 12, V 23,, V n-1n V 1n 1. Smallest problem: add A to V ii if A a i is a production in P 2. Bigger problem: add to A to V ij if For some k, i k j-1, A BC in P, B in V ik, C in V k+1 j 3. w L(G) if and only if S V 1n 24
Example S AB, A BB a, B AB b w=aabbb Steps V 11 ={A}, V 22 ={A}, V 33 ={B}, V 44 ={B}, V 55 ={B} V 12 =, V 23 ={S, B}, V 34 ={A}, V 45 ={A} V 13 ={S, B}, V 24 ={A}, V 35 ={S, B} V 14 ={A}, V 25 ={S, B} V 15 ={S, B} 25
Triangular table (n=5) V 1,5 V 1,4 V 2,5 V 1,3 V 2,4 V 3,5 V 1,2 V 2,3 V 3,4 V 4,5 V 1,1 V 2,2 V 3,3 V 4,4 V 5,5 23
Sum up Context-free grammars are used in designing programming languages, such as, C, PSACAL, etc. Membership problem in CFG is equivalent to the parsing problem in programming languages Normal forms are needed for automatically generating a parser for the programming language 26