Section 12.4 Context-Free Language Topics Algorithm. Remove Λ-productions from grammars for langauges without Λ. 1. Find nonterminals that derive Λ. 2. For each production A w construct all productions A w where w is obtained from w by removing one or more occurrences of the nonterminals from Step 1. 3. Combine the original productions with those of step 2 and eliminate any Λ-productions. Example. Remove Λ-productions from the grammar S ABc A aa Λ B bb Λ. Solution. Step 1: The nonterminals A and B derive Λ. Step 2: From the production S ABc we construct S Bc Ac c. From the production A aa we construct A a. From the production B bb we construct B b. Step 3: S ABc Bc Ac c B bb b. Quiz. Remove Λ -productions from S ABc Ab c A ABa Λ B Bbc Λ. Solution. S ABc Ab c Bc Ac b A ABa Ba Aa a B Bbc bc. 1
Chomsky Normal Form. Productions have one of the following forms A a (a a terminal) A BC S Λ (if Λ is in the language). Advantages: Parse trees are binary, which are easy to represent. Any string of length n > 0 can be derived in 2n 1 steps. Algorithm. Transform to Chomsky normal form (with the additional property that no start symbol occurs on the right side of a production) 1. If the start symbol S occurs on some right side, create a new start symbol S and a new production S S. 2. Remove A Λ (if A S) by previous algorithm. (If S Λ is removed, add it back.) 3. Remove unit productions (i.e., A B): If A B or A + B, then construct productions A w where B w is not a unit production. Now remove all unit productions. 4. For each production whose right side has two or more symbols, replace all occurrrences of each terminal a with a new nonterminal A and also add the new production A a. 5. Replace each production B C 1 C n with n > 2 with B C 1 D where D C 2 C n. Repeat this step until all right sides have length two. 2
Example. Construct a Chomsky normal form for the grammar S asb D D Dc Λ. Solution. Step 1: Add the production S S. Step 2: Step 3: Step 4: Step 5: S S Λ S asb ab D D Dc c. S asb ab Dc c Λ S asb ab Dc c D Dc c. S ASB AB DC c Λ S ASB AB DC c D DC c A a B b C c. Replace S ASB and S ASB by S AE, S AE, and E SB. 3
Quiz. Construct a Chomsky normal form for the grammar S atbb U Λ T ct c U Ud d. Solution. Step 1: No change. Step 2: No change. Step 3: Remove unit production S U to obtain S atbb Ud d Λ T ct c U Ud d. Step 4: Transform right sides of length at least two into strings of nonterminals. S ATBB UD d Λ T CT c U UD d. A a B b C c D d. Step 5: Replace S ATBB with the productions S AE, E TF, F BB. 4
Greibach Normal Form. Productions have one of the following forms A b (b a terminal) A bd 1 D k S Λ (if Λ is in the language). Advantage: Any string of length n > 0 can be derived in n steps. Algorithm (idea). Transform context-free grammar to Greibach normal form. 1. Perform steps 1, 2, and 3 of the Chomsky algorithm. 2. Remove all left-recursion, including indirect, without adding Λ. 3. Make substitutions to transform the grammar into the proper form. Example. Put the following grammar into Greibach normal form. S AB Ac d B Ab c. Solution: Steps 1 (Chomsky steps 1, 2, and 3) and 2 are non needed. Step 3: Replace A in S AB Ac d with aa a to obtain S aab ab aac ac d. Replace A in B Ab c with theright side of to obtain B aab ab c. Now add the new productions C c and D b to obtain the proper form: S aab ab aac ac d B aad ad c C c D b. 5
Example. Put the following grammar into Greibach normal form. S AB Ac d B Ab c. Solution: Steps 1 (Chomsky steps 1, 2, and 3) and 2 are not needed. Step 3: Replace A in S AB Ac d with aa a to obtain S aab ab aac ac d. Replace A in B Ab c with theright side of to obtain B aab ab c. Now add the new productions C c and D b and make appropriate replacements to obtain the proper form: S aab ab aac ac d B aad ad c C c D b. 6
Properties of Context-Free Languages When we know some properties of context-free languages they can help us argue, BWOC, that certain languages are not context-free. The Pumping Lemma If L is an infinite context-free language, then any grammar for L must be recursive, so there must be derivations of the the following form where u, v, w, x, and y are terminal strings. S + uny N + vnx (where v and x are not both Λ) N + w. These derivations lead to derivations like S + uny + uvnxy + uv 2 Nx 2 y + uv k Nx k y + uv k wx k y L for all k N. This is the basis for the Pumping Lemma: There is an integer m > 0 such that if z L and z m, then z has the form z = uvwxy where 1 vx vwx m and uv k wx k y L for all k N. Note: The number m depends on the grammar as we ll see in the following example. 7
Example. Suppose we have the following grammar for {Λ, bbc} {abc n d n N}. S and bbc Λ N Nc b. Here are a few derivations: S and abd S and ancd abcd S and ancd anccd abccd S + abc k d for any k in N. For this grammar m = 4 can be used in the pumping lemma because any derivation of a string z with z 4 must use the nonterminal N. For example, if z = 8 and z = abcccccd, then the pumping lemma factors z = abcccccd = uvwxy where 1 vx vwx 4 and uv k wx k y L for all k N. In this case let u = a, v = Λ, w = b, x = c, and y = ccccd. Example. The language L = {a n b n c n+k k, n N} is not context-free. Proof: Assume, BWOC, that L is context-free. L is infinite, so pumping lemma applies. Choose z = a m b m c m where m is the positive integer from the lemma. Then z = a m b m c m = uvwxy where 1 vx vwx m and uv k wx k y L for all k N. Observe neither v nor x can contain distinct letters. For example, if v = a b, then v 2 = a b a b, which can t appear as a substring of any string in L. So v and x must be strings of repeated occurrences of a single letter. Now since vwx m, there are two possible places in a m b m c m where v and x must occur: (1) v and x occur in a m b m. (2) v and x occur in b m c m. But we obtain the following contradictions because v and x are not both Λ. (1) Let k = 2 to obtain uv 2 wx 2 y = a m+i b m+j c m, where i > 0 or j > 0. So uv 2 wx 2 y L (2) Let k = 0 to obtain uwy = a m b m-i c m j, where i > 0 or j > 0. So we have uwy L. 8 These contradictions imply that L is not context-free. QED.
Example/Quiz. Prove that the language L = {ss s {a, b}*} is not context-free. Proof: Assume, BWOC, that L is context-free. L is infinite, so pumping lemma applies. Choose z = a m b m a m b m where m is the positive integer from the lemma. Then z = a m b m a m b m = uvwxy where 1 vx vwx m and uv k wx k y L for all k N. Now since vwx m, there are three possible places in a m b m a m b m where v and x must occur: (1) v and x occur in a m b m (on the left of z). (2) v and x occur in b m a m (in the center of z). (3) v and x occur in a m b m (on the right of z). Notice that v and x can consist only of repetitions a single letter. For example, in case (1) suppose v = a i b j for some i > 0 and j > 0 and x = b n for some n 0. Then, letting k = 0, we would obtain uwy = a m i b m j n a m b m, which cannot be in L. The argument is similar for the other cases. So v and x must consist only of repetitions of a single letter. We need to find a contradiction in each of the three cases. We ll do it by using k = 0. This tells us that uwy L. But we obtain the following contradictions because v and x are not both Λ. (1) uwy = a m i b m j a m b m where either i > 0 or j > 0 So uwy L, (2) uwy = a m b m i a m j b m where either i > 0 or j > 0. So uwy L. (3) uwy = a m b m a m i b m j where either i > 0 or j > 0. So uwy L. Therefore L is not context-free. QED. Remark: Be careful that the choice of z is not in a context-free sublanguage of L. For example, if we chose z = (ab) m (ab) m in the preceding example, we would not get any contradictions. 9