LCFRS Exercises and solutions Laura Kallmeyer SS 2010 Question 1 1. Give a CFG for the following language: {a n b m c m d n n > 0, m 0} 2. Show that the following language is not context-free: {a 2n n 0} Hint: Show that this language does not satisfy the CFL pumping lemma. 1. Nonterminals N = {S, T }, terminals T = {a, b, c, d}, start symbol S and productions {S a S d, S a T d, T b T c, T ε}. 2. To show: L = {a 2n n 0} is not context-free. We assume that L is context-free. Then it satisfies the pumping lemma with a certain constant c 1. The word a 2c is in the language. The next longer word is a 2c+1 with a 2c+1 a 2c = 2 c+1 2 c = 2 c > c. Contradiction to the pumping lemma according to which there must be a word with a length a 2c + c. Question 2 Consider the language L 2 = {a n b n n 0}. 1. Give a CFG for L 2 with nested dependencies, i.e., such that for each word a 1...a n b 1... b n (the subscripts mark the occurrences of the as and bs respectively) a i and b n+1 i are added by the same production for all 1 i n. 2. Show that for L 2 there is no CFG displaying cross-serial dependencies, i.e., no CFG such that for each word a 1...a n b 1... b n, a i and b i are added by the same production for all 1 i n and, furthermore, different a s are added by different productions. Hint: You can argue that if such a CFG exists, then there exists also a CFG for the copy language which is a contradiction to the fact that the copy language is not context-free. 1. G = N, T, P, S with N = {S}, T = {a, b}, start symbol S and productions S asb, S ε. 2. Assume that such a CFG exists. Its productions are then all of the form X αaβbγ with X N, α, β, γ N such that if such a production is applied when generating a string a 1... a n b 1... b n, then the a and b of the production necessarily end up at positions i and n + i for some i, 1 i n. Then replacing each of these productions X αaβbγ with X αaβaγ and X αbβbγ leads to a CFG generating the copy language. Contradiction.
Question 3 Similar to Shieber s (1985) argument for Swiss German, one can apply first a homomorphism f, then intersect with some regular language, and then apply another homomorphism g in order to reduce the language of Swiss German to the copy language {ww w {a, b} }. Find the corresponding homomorphisms and the regular language. A first homomorphism can be as the f from Shieber (1985). Then intersect with the regular language w{a, b} x{c, d} y which leads to {wv 1 xv 2 y v 1 {a, b}, v 2 {c, d} such that v 1 = v 2 and for all i, 1 i v 1 : if the ith symbol in v 1 is an a (a b), the ith symbol in v 2 is a c (a d)}. Finally we apply a second homomorphism g with g(w) := g(x) := g(y) := ε, g(a) := g(c) := a, g(b) := g(d) := b. This leads to the copy language. Question 4 Consider the MCFG given by the following clauses (in simple RCG notation): S(XY Z) A(Y )B(X, Z) A(aX) A(X) B(bX, by b) B(X, Y ) A(a) ε B(ε, ε) ε 1. Give the sets yield(a) and yield(b). 2. What is the string language generated by the grammar? 1. yield(a) = { a n n 1} yield(b) = { b n, (bb) n n 0}. 2. {b m a n (bb) m n 1, m 0}. Question 5 Give the language generated by the following simple RCG and give the derivation tree for a string of length 9. S-REL(XY Z) VP-REL(X, Z)N-SUBJ(Y ) VP-REL(X, Y Z) (X, Z)V(Y ) (X,a copy of Y ) (X, Y ) (X,a picture of Y ) (X, Y ) N-SUBJ(Peter) ε V(painted) ε (whom, ε) ε The string language is the regular language whom Peter painted ((a copy of) + (a picture of)) For the string whom Peter painted a copy of a picture of (of length 9), we obtain the following derivation tree: 2
S-REL N-SUBJ VP-REL V whom Peter painted a copy of a picture of ε Question 6 Consider the simple RCG with the following clauses: S(XY ZU) A(X, Z)B(U, Y ) S(XY Z) A(X, Z)C(Y ) A(aX, az) A(X, Z) A(ε, c) ε B(Xb, Y b) B(X, Y ) B(ε, c) ε C(aXY ) D(X)C(Y ) D(d) ε 1. Perform the following transformations on this simple RCG while obtaining always weakly equivalent simple RCGs: (a) Transform the grammar into an ordered simple RCG. (b) Remove useless rules. (c) Remove ε-rules. 2. What is the string language generated by this grammar? 1. Simplifying the grammar: (a) Transform the grammar into an ordered simple RCG. (If the superscript is the identity, we omit it.) The only problematic rule is S(XY ZU) A(X, Z)B(U, Y ). It transforms into S(XY ZU) A(X, Z)B 2,1 (Y, U). Add B 2,1 (Y b, Xb) B(X, Y ) and B 2,1 (c, ε) ε. Then, B 2,1 (Y b, Xb) B(X, Y ) transforms into B 2,1 (Y b, Xb) B 2,1 (Y, X). In the following, for reasons of readability, we replace B 2,1 with a new symbol E. Result: S(XY ZU) A(X, Z)E(Y, U) S(XY Z) A(X, Z)C(Y ) A(aX, az) A(X, Z) A(ε, c) ε B(Xb, Y b) B(X, Y ) B(ε, c) ε E(Y b, Xb) E(Y, X) E(c, ε) ε C(aXY ) D(X)C(Y ) D(d) ε (b) Remove useless rules. N T = {A, B, E, D, S}. Consequently, remove S(XY Z) A(X, Z)C(Y ) and C(aXY ) D(X)C(Y ). In the result, N S = {S, A, E}. Consequently, remove also D(d) ε, B(Xb, Y b) B(X, Y ) and B(ε, c) ε. Result: S(XY ZU) A(X, Z)E(Y, U) A(aX, az) A(X, Z) E(Y b, Xb) E(Y, X) A(ε, c) ε E(c, ε) ε 3
(c) Remove ε-rules. N ε = {A 01, A 11, E 10, E 11, S 1 }. Resulting productions: S 1 (XY ZU) A 11 (X, Z)E 11 (Y, U) S 1 (Y ZU) A 01 (Z)E 11 (Y, U) S 1 (XY Z) A 11 (X, Z)E 10 (Y ) S 1 (Y Z) A 01 (Z)E 10 (Y ) A 11 (ax, az) A 11 (X, Z) A 11 (a, az) A 01 (Z) A 01 (c) ε E 11 (Y b, Xb) E 11 (Y, X) E 11 (Y b, b) E 10 (Y ) E 10 (c) ε 2. The string language generated by this grammar is {a n cb m a n cb m n, m 0}. Question 7 Show that the language {w 5 w {a, b} } is not a 2-MCFL. Hint: Intersect first with the regular language a + b + a + b + a + b + a + b + a + b + and then show that the result does not satisfy the pumping lemma. We assume that L = {w 5 w {a, b} } is a 2-MCFL. Then the language L = {a n b m a n b m a n b m a n b m a n b m n, m > 0} which we obtain from intersecting L with the regular language denoted by a + b + a + b + a + b + a + b + a + b + must also be a 2-MCFL. Consequently, with the pumping lemma, there must be at least one word in the language of the form w 1 v 1 w 2 v 2 w 3 v 3 w 4 v 4 w 5 where v 1 v 2 v 3 v 4 v 5 ε such that the v i (1 i 4) can be iterated. Each of the v 1,..., v 4 must necessarily contain either only as or only bs, otherwise the next iteration step would lead to a word outside the language. However, this means that by these iterations only some and not all of the exponents n and m get increased (since maximally four substrings are iterated but we have five exponents n and five exponents m). I.e., after the next iteration we necessarily obtain a word with either two a-sequences of different length or two b-sequences of different length. This means that the word we obtain by iteration is not in L. Therefore, L does not satisfy the pumping lemma for 2-MCFL which contradicts our assumption that L (and L ) are 2-MCFLs. Question 8 1. Show that the copy language {ww w T } for some alphabet T is semilinear using the Parikh Theorem. 2. Show that {a 2n n 0} is not semilinear. Hint: if the language was semilinear it would satisfy the constant growth property. Show that this is not the case. 1. The copy language L := {ww w T } is letter equivalent to L := {ww R w T and w R is w in reverse order}, which is a CFL: It is generated by the CFG with productions S ε and S xsx for all x T. Consequently (with Parikh s theorem) L and also L are semilinear. 2. Assume that {a 2n n 0} satisfies the constant growth property with c 0 and C. Then take a w = a 2m with w = 2 m > max({c 0 } C). Then, according to the definition of constant growth, for w = a 2m+1 there must be a w = a 2k with w = w + c for some c C. I.e., 2 m+1 = 2 k + c. Consequently (since k m) c 2 m. Contradiction. Question 9 Consider the following TA: M = N, T, S, ret, κ, K, δ, U, Θ with N = {S, S, S A, S B, A, B,ret}, T = {a, b}, K = N and κ the identity, δ(s) = δ(a) = δ(s A ) = δ(b) = δ(s B ) = 1, δ(ret) = and the following transitions: 4
S [S]S, S a A2, S a SA, S A [S A ]S, S b B2, S b SB, S B [S B ]S, a A 2 ret, [SA ]ret A 2, B 2 b ret, [SB ]ret B 2 1. What is the string language accepted by this TA? 2. Choose a word of length 4 in this language and give the thread sets (only successful items) that are generated for this word. 1. The language is {ww R w {a, b} + }. 2. Successful configurations for w = abba: thread set remaining input operation ε : S abba ε : S, 1 : S abba S [S]S ε : S, 1 : S A bba S a S A ε : S, 1 : S A, 11 : S bba S A [S A ]S ε : S, 1 : S A, 11 : B 2 ba S b B 2 ε : S, 1 : S A, 11 : ret a B 2 b ret ε : S, 1 : A 2 a [S A ]ret A 2 ε : S, 1 : ret ε a A 2 ret Question 10 Consider the following set-local MCTAG: α A ε S B ε β A A NA a A d b A NA c β B B NA e B h f B NA g 1. What is the string language generated by this set-local MCTAG? 2. Give an equivalent 4-MCFG. 1. {a n b n c n d n e n f n g n h n n 0}. 2. Start symbol S, N = {α, β, S}. Rules: S(X) α(x) α(xy ZU) β(x, Y, Z, U) β(axb, cy d, ezf, guh) β(x, Y, Z, U) α(ε) ε β(ab, cd, ef, gh) ε 5