Chap. 7 Properties of Context-free Languages 7.1 Normal Forms for Context-free Grammars Context-free grammars A where A N, (N T). 0. Chomsky Normal Form A BC or A a except S where A, B, C N, a T. 1. Eliminating useless symbols(non generating and non reachable) 2. Eliminating -productions(no A except S ) 1 3. Eliminating unit productions(no A B) A or A a where A N, (N T), 2, a T. 4. Introducing variables for each terminals(a a a, a T) A or A a where A N, N, 2, a T. 5. Reducing length of RHS to two A BC or A a except S where A, B, C N, a T. 11/8/16 Kwang-Moo Choe 1
7.1.1 Eliminating Useless Symbols We say X N T is useful, if S X w, X (N T), w T ; useless otherwise. 1. We say X is generating, if X w, w T. 2. We say X is reachable, if S X. 1. Eliminate non generating symbols and productions 2. Eliminate non reachable symbols and productions Theorem 7.2 Let G = (N, T, P, S) be a CGF where L(G). 1. Eliminate non generating symbols, and productions in G, G 2 = (N 2, T 2, P 2, S). 2. Eliminate non reachable symbols and productions from G 2, G 1 = (N 1, T 1, P 1, S). Then G 1 has no useless symbols, and L(G 1 ) = L(G). 11/8/16 Kwang-Moo Choe 2
Theorem 7.4 Finding generating symbols basis a T, a is generating. Rec. A X 1 X n P, if (1 i n: X i is generating) or (n = 0), then X is generating. Proof X w, w T. (induction on number of steps in algorithm) basis zero step. a. Rec. Consider X n-1 w, w T. X X 1 X n P. 1 i n, X i w i, w i T by IH. X X 1 X n w 1 w n = w, w. Theorem 7.6 Finding reachable symbols. Basis S is reachable. G reach = (N, E), (A, B) E, if A B P. Rec. If A N is reachable, A X 1 X n P, 1 i n, X i is reachable. Proof X (N T) is reachable, if S X.G reach = (N, E ) from S N. 11/8/16 Kwang-Moo Choe 3
7.1.3 Eliminating -Productions A is called -production. G is -free, if P has no -production. A is nullable, iff A. Theorem 7.7 Finding nullable symbols Basis If A P, A is nullable. Rec. B C 1 C k P: if (1 i k) C i N is nullable, then B is nullable. Proof A, if and only if, A is nullable in the above algorithm. (If) If A is nullable in the above algorithm, then A P. trivial. (Only if) induction on number the shortest derivation to. basis One step. A P, A 1. A. induction Suppose A n where n 1 derivations. Then A B 1 B k +. where 1 i k: B i n. 1 i k: B i is nullable by IH. A is nullable in the above algorithm. 11/8/16 Kwang-Moo Choe 4
Theorem 7.9 Let G = (N, T, P, S) be a cfg. Then G 1 = (N, T, P 1, S) is -free and L(G 1 ) = L(G) { }. P 1 : A X 1 X n P, add A Z 1 Z n to P 1. i) If X i is not nullable, Z i = X i. (no change!) ii) If X i is nullable, Z i = (X i ). iii) remove A, if any. (if m-nullable symbols, 2 m rules) (if m=n, upper bound 2 n-1 rule) Proof A G1 w if and only if A G w and w (w T + ). (If) If A G k w and w, A G1 w. basis If A G w and w, and w +, then A w P 1. A G1 w. induction A G X 1 X n G k-1 w = w 1 w n, X i G w i and w. If w i, X i G1 w i by IH(X i G k x i ). 11/8/16 Kwang-Moo Choe 5
If w i =, X i is nullable. A G X 1 X i-1 X i X i+1 X n G w 1 w i-1 w i+1 w n = w. A X 1 X i-1 X i+1 X n P 1 by construction of P 1. A G1 X 1 X i-1 X i+1 X n G1 w 1 w i-1 w i+1 w n = w. (Only if) If A G1 k w, A G w and w. basis If A G1 w, w (G 1 is -free). A G, G w ( -rules only, w ) induction Assume A G1 Z 1 Z n G1 k-1 w = x 1 x n, Z i G1 x i. A Z 1 Z n P 1 comes from A X 1 X m P, (m n). A G X 1 X m G Z 1 Z n ( -rules only) G x 1 x n and x i by IH(Z i G1 k x i ) = x and x. 11/8/16 Kwang-Moo Choe 6
7.1.4 Eliminating Unit productions A B is called a unit production, if A, B N. (A, B) is called a unit pair, if A B. Theorem 7.11 Following algorithm finds exactly unit pairs. basis (A, A) is a unit pair. induction If (A, B) is a unit pair and B C P, (A, C) is a unit pair. Proof Number of derivation steps unit pair is found. basis Zero steps. A = B, (A, A) is added in basis. induction Assume A n C. Then B, A n-1 B C. (A, C) is in unit pair(ih) and the induction rule B C P adds (A, B) in unit pair. 11/8/16 Kwang-Moo Choe 7
Theorem 7.13 Let G = (N,, P, S) be a cfg. Then G 1 = (N,, P 1, S) that has no unit productions and L(G 1 ) = L(G), P 1 = {A (A, B) is a unit pair, B P, N}. Proof A G w if and only if A G1 w. If A P 1, N. Non-unit productions. (If) If A P 1, A P or A G B G. If A P 1, A G. If A G1 w, A G w. (Only if) If A G w, A lm G w in G. Assume A = 0 lm 1 lm 2 lm n = w in G. 0 i n, 1) If i lm i+1 by non unit production in G, i lm G1 i+1. 2) If i lm i+1 in G by unit production, 11/8/16 Kwang-Moo Choe 8
i k,. i j k, j lm G j+1 by unit productions and finally k lm k+1 by non unit production i lm G1 k+1. If A lm G w, A lm G1 w. 7.1.5 Chomsky Normal Form(CNF) 1. S P or 2. A BC P where B, C N or 3. A a P where a. Theorem 7.16 Let G = (N, T, P, S) be a CFG. There is a CFG G 1 such that G 1 is CNF and L(G) = L(G 1 ). Proof 1. Eliminate useless symbols and productions. 2. Eliminate -rules. 11/8/16 Kwang-Moo Choe 9
3. Eliminate unit production. No -productions and no unit productions. If L(G), S P 1. A a P, CNF. A X 1 X n P where n 2, X i N T. X i, B a a(=x i ) P 1 and replace X i by B a. A C 1 C n P where n 2, C i N. If n = 2, CNF. A C 1 C n P where n 3, C i N. A C 1 D 1 P 1, D 1 C 2 D 2 P 1, D n-3 C n-2 D n-2 P 1, D n-2 C n-1 C n P 1. 11/8/16 Kwang-Moo Choe 10
Proof G 1 is CNF is trivail. 1) If A X 1 X k P, A G1 + X 1 X k. If A G w, A G1 w. 2) If A G1 w and consider the parse tree of w in G 1. Convert the parse tree into the parse tree of w in G. i) A C 1 D 1,, D n-3 C n-2 D n-2, D n-2 C n-1 C n into A C 1 C n-1 C n. (Fig. 7.4) ii) B a a into a L(G) = L(G 1 ). See Ex. 7.15 in p273. 11/8/16 Kwang-Moo Choe 11
Regular(type 3) grammar(normal form) A ab or b A, B N, a, b T. right linear A ab ac non-deterministic! \ A ab ac, if B C. deterministic! A Ba or b A, B N, a, b T. left linear (Extended )regular(type 3) grammar A xb or y A, B N, x, y T. extended right linear A Bx or y A, B N, x, y T. extended left linear Context-free(type 2) grammar(chomsky s normal form) A BC or a A, B, C N, a T. Context free(type 2) grammar(extended) A A N, (N T). 11/8/16 Kwang-Moo Choe 12
7.2 The Pumping Lemma for context-free Languages 7.2.1 The size of parse tree Theorem 7.17 Let G = (N, T, P, S) be a Chomsky Normal Form contextfree grammar and consider a parse tree for w L(G). If n is the length(# of edges) of the longest path in the parse tree, w 2 n-1. Proof Induction on n, i) n = 1, w, w = 1 2 1-1 = 1. ii) n 1, S AB is the root of the tree. Two subtrees with roots A and B, respectively, and assume A w a, B w b, and w = w a w b. By induction hypothesis, w a 2 n-2 and w b 2 n-2. w 2 n-2 + 2 n-2 = 2 n-1. Binary(A BC) tree with unary leaf(a a). w 2 n-1. 11/8/16 Kwang-Moo Choe 13
7.2.2 Statement of the pumping Lemma Theorem 7.18 (The pumping lemma for context-free languages) Let L be a CFL. n N. if z L and z n, then we write z = uvwxy 1) vwx n, the first pump 2) vx, nontrivial pump 3) i 0, uv i wx i y L. pump Proof Since L is CFL, G = (N, T, P, S) where L = L(G) and G is CNF. Choose n = 2 N and suppose the longest path P of the parse tree for z L is k+1. n = 2 N z 2 (k+1)-1 = 2 k. (Def. of n, and Thm. 7.17) N k. Consider the longest path(k+1), (A 0, A 1,, A k, a) 11/8/16 Kwang-Moo Choe 14
A 0 = S, 0 i k: A i N, a T. (Fig. 7.5) Since N k, 1 i j k. A i = A j = A. Assume S ua i y uva j xy uvwxy. (Fig. 7.6) Note that A = A i va j x = vax or A = A j w, and S uay. (1) S uay uwy = uv 0 wx 0 y; or (2) Assume S uay uv i-1 wx i-1 y for i 1, and S uay uvaxy uvv i-1 wxx i-1 y = uv i wx i y. (Fig. 7.7) S uv i wx i y for i 0. 3) pumping(xy i z in RL s) Since G is useful and -free(cnf) v and x, vx. 2) nonempty pumping(y in RL s) We can select A i to be the closest to the bottom of the tree, k - i N, Since the length of the longest path in A i -subtree N + 1, vwx n (Thm. 7.17) 1) first pumping( xy n in RL s) 11/8/16 Kwang-Moo Choe 15
7.2.3 Application of the Pumping Lemma for CFL s 1. Pick L that we want to prove that L is not context-free. 2. Adversary pick n(any possible n) 3. Pick z, we may use n as a parameter 4. Adversary break z into uvwxy. vwx n, vx 5. To win the game, find i. uv i wx i y L. Context-free languages cannot match more than two groups of symbols for equality or inequality. Example 7.19 L = {0 n 1 n 2 n n 1} Let K be the adversary number, and z = 0 n 1 n 2 n, For all breaks of z into uvwxy. vwx K, vx (1) u = 0 n-i, vwx = 0 i 1 n-i, and y = 1 i 2 n. Since vx, uwy L. (2) u = 0 n 1 n-i, vwx = 1 i 2 n-i, and y = 2 i. Since vx, uwy L. 11/8/16 Kwang-Moo Choe 16
Two groups match cannot be interleaved. Example 7.20 L = {0 i 1 j 2 i 3 j i, j 1} Let n be the adversary number and z = 0 n 1 n 2 n 3 n. For all breaks of z into uvwxy. vwx n, vx, vwy: substring of at most two consecutive symbols Nontrivial (vx ) pumping of v and x Less than or equal to n symbols that is in vwx. CFL s cannot match two strings of arbitrary length Exercise 7.21 L ww = {ww w (0+1) } vs. L wwr = {ww R w (0+1) } Consider z = 0 n 1 n 0 n 1 n. 11/8/16 Kwang-Moo Choe 17
7.3 Closure properties of Context Free Languages Context-free languages are closed under 1. union, 2. concatenation, 3. closure, 4. substitution, 5. reversal Let G A = (N A, T A, P A, S A ) and G B = (N B, T B, P B, S B ) be cfg s. Then G 1 = (N A N B {S 1 }, A B, P A P B {S 1 S A S B }, S 1 ). L(G 1 ) = L(G A ) L(G B ), G 2 = (N A N B {S 2 }, T A T B, P A P B {S 2 S A S B }, S 2 ). L(G 2 ) = L(G A )L(G B ), G 3 = (N A {S 3 }, A, P A {S 3 S A S 3 }, S 3 ). L(G 3 ) = L(G A ), G 5 = (N A, A, {A R A P}, S A ). L(G 5 ) = L(G A ) R. 11/8/16 Kwang-Moo Choe 18
Context-free language is not closed under intersection Example 7.26 We know that L = {0 n 1 n 2 n n 1} is not cfl in Ex. 7.19. Consider L 1 = {0 n 1 n 2 i n 1, i 1} G 1 : S AB A 0A1 01 B 2B 2. L 2 = {0 i 1 n 2 n n 1, i 1} G 2 : S AB A 0A 0 B 1B2 12. L 1 and L 2 are context-free but L = L 1 L 2 = {0 n 1 n 2 n n 1} is not context-free counter example 11/8/16 Kwang-Moo Choe 19
Theorem 7.27 If L is CFL and R is regular language, then L R is context-free. Proof Let P = (Q P, T,, P, q P, Z P, F P ) be a PDA, L(P) = L, and A = (Q A,, A, q A, F A ) be a FA, L(A) = R. Then P = (Q P Q A, T,,, (q P, q A ), Z P, F P F A ) where a { } and ((q, p), a, X) = {((r, s), ) s A (p, a), (r, ) P (q, a, X)}. Induction (q P, w, Z P ) (q,, ) if and only if ((q P, q A ), w, Z P ) ((q, p),, ) and q A (q A, w). Theorem 7.29 If L, L 1, and L 2 are CFL s and R is regular language. 1. L R is context-free. 2. L is not (necessary) context-free. 3. L 1 L 2 is not (necessary) context-free. 11/8/16 Kwang-Moo Choe 20
7.4 Decision Properties of CFL s PDA by empty stack PDA by final state Thm 6.9, 11 O(n) CFG PDA Thm 6.13 O(n) PDA CFG Thm 6.14 O(n 3 ) CFG CNF O(n 2 ) 1. Detecting reachable and generating symbol O(n) Eliminating useless symbols and productions O(n) 2. Eliminating -production O(2 k ) where k is maximum length of RHS O(2 n ) 3. Eliminating unit productions O(n 2 ) 4. Replacing terminal symbols by nonterminal symbols O(n) 5. Breaking length of RHS O(n) 2 Eliminating -production 2 2 O(n) O(n) 11/8/16 Kwang-Moo Choe 21
Membership problem CYK algorithm(coke, Younger, Kasami) Given w = a 1 a n T and a cfg G in CNF, test if w L(G) or not. We can compute X ij = {A N A a i a j }, 1 i j n. If S X 1n, w L(G); otherwise w L(G). How to compute X ij.(w.l.o.g assume CNF) basis X ii = {A A a i P} induction Assume A a i a j. Since i j, and CNF( -free) A BC P where B a i a k and C a k+1 a j, i k j. if B X ik, C X k+1,j, and A BC P; A X ij. Test j i pairs (X ii, X i+1,j ), (X i,i+1, X i+2,j ),,(X i,j-1, X jj ) Since for each O(n 2 ) X ij, test at most n pairs, O(n 3 ). 11/8/16 Kwang-Moo Choe 22
CYK algorithm in PASCAL style for i:=1 to n do for j:=i to n do X ij := ; ( initalize O(n 2 ), i j, see Fig. 7.12 ) for i:=1 to n do ( basis O(n) ) if A a i P then X ii := X ii {A}; for k:=1 to n-1 do for i:=1 to n-k do ( consider X i,i+k ) for j:=i to i+k do ( recursion O(n 3 ) ) for A BC P do if (B X i,j ) and (C X j+1,i+k ) then X i,i+k := X i,i+k {A}; ( See Fig. 7.13 ) 11/8/16 Kwang-Moo Choe 23
Some undecidable problems on CFL s 1. Is a given CFG G ambiguous? 2. Is a given CFL L is inherently ambiguous? 3. Is the intersection of two CFL s are empty? 4. Are two CFL s are same? 5. Is given CFL L, L = where is the alphabet of L. 11/8/16 Kwang-Moo Choe 24