Properties of context-free Languages We simplify CFL s. Greibach Normal Form Chomsky Normal Form We prove pumping lemma for CFL s. We study closure properties and decision properties. Some of them remain, some not. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 1 / 26
Greibach Normal Form It would be nice to know which rule to select expecially dificult is the left recursion A Aα Definition Greibach Normal Form or a CFG We say a grammar G is in Greibach Normal Form iff all rules are in the form A aβ, where a T, β V (a string of variables). the terminal in the body of the rule helps to select appropriate rule expecially if we have a unique rule with this terminal. Theorem (Greibach Normal Form) For each CFL language L there exists a CFG grammar G in Greibach Normal Form such that L(G) = L {λ}. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 2 / 26
Join the rules Lemma (Joining the rules) Let us have a rule A αbβ in grammar G and B ω 1,..., B ω k are all rules for B. If we replace the rule A αbβ by rules A αω 1 β,..., A αω k β we get an equivalent grammar. Proof: A αbβ α Bβ α ω i β A αω i β α ω i β in the original grammar in the new grammar Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 3 / 26
Left Recursion Lemma (Left Recursion) Let A Aω 1,..., A Aω k be all productions with left recursion in the grammar G for A and A α 1,..., A α m be all other rules for A, Z is a new variable. Then, by replaceing these rules by the rules: 1. A α i, A α i Z, Z ω j, Z ω j Z, nebo 2. A α i Z, Z ω j Z, Z λ we get an equivalent grammar. Proof: A Aω in... Aω i1... ω in α j ω i1... ω in (G) A α j Z α j ω i1 Z... α j ω i1... ω in 1 Z α j ω i1... ω in (1) A α j Z α j ω i1 Z... α j ω i1... ω in Z α j ω i1... ω in (2) Theorem (Greibach Normal Form) For each CFL language L there exists a CFG grammar G in Greibach Normal Form such that L(G) = L {λ}. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 4 / 26
Proof: Greibach Normal Form join rules and remove left recursion we enumerate all variables {A 1,..., A n } we allow recursion only in the form A i A j ω, where i < j we iterate i from 1 to n A i A j ω for j < i removed by joining the rules for j = i remove left recursion we get rules in the form A i A j ω (i < j), A i aω (a T ), Z i ω rules with A i (original variables) only in the form A i aω iteratively join the rules for i from n to 1 (for n already holds) rules with Z i (new variables) only in the form Z i aω In none rule for Z i the body begins with Z j either it is in required form or we join it with the rule A j aω we remove terminals inside rules. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 5 / 26
Reduction to GreibachNF Example Original Grammar E E + T T T T F F F (E) a (almost) Greibach Normal Form E (E) a (E)T at (E)E ae (E)T E at E E +T + TE T (E) a (E)T at T F FT F (E) a Left Recursion Removed E T TE E +T + TE T F FT T F FT F (E) a Greibach Normal Form E (EP a (EPT at (EPE ae (EPT E at E E +T + TE T (EP a (EPT at T F FT F (EP a P ) Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 6 / 26
Normal Forms for Context-Free Grammars Chomsky Normal Form: all production are of the form A BC or A a, A, B, C where are variables, a is a terminal. Every CFL (without λ) is generated by a CFG in Chomsky Normal Form. To get there, we perform simplifications Eliminate useless symbols eliminate λ-productions A λ for some variable A eliminate unit productions A B for variables A, B. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 7 / 26
Eliminating Useless symbols Definition (useful symbol) A symbol X is useful for a grammar G = (V, T, P, S) if there is some derivation of the form S αxβ w where w T, X (V T ). If X is not useful, we say it is useless. X is generating if X w for some terminal string w. Note always w w by zero steps. X is reachable if the is S αxβ for some α, β. We aim eliminate non generating and not reachable symbols. Example Consider the grammar: S AB a A b Eliminate B (nongenerating): S a A b. Eliminate A (not reachable): S a. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 8 / 26
Theorem (Eliminating useless symbols) Let G = (V, T, P, S) be a CFG, and assume that L(G). Let G 1 = (V 1, T 1, P 1, S) is obtained: Eliminate nongenerating symbols and all productions involving them. Eliminate all symbols that are not reachable after previous step. Then G 1 has no useless symbols, and L(G 1 ) = L(G). Generating symbols: BASIS: Every a T is generating. INDUCTION: For any production A α and every symbol of α is generating. Then A is generating. (This includes A ). Reachable symbols: BASIS: S is surely reachable. INDUCTION: If A is reachable, for all production with A in the head, all symbols of the bodies are also reachable. Theorem The algorithms above find all and only the generating / reachable symbols. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 9 / 26
Eliminating λ-productions Without λ-production, λ / L. We aim to prove: L has a CFG, then L {λ} has a CFG without λ-productions. Definition (nullable variable) A variable A is nullable if A λ. For nullable variables in the body B CAD, we create two versions of the production - with and without this variable. An algorithm to find all nullable symbols of G: If A λ is a production of G, then A is nullable. If B C 1... C k where each C i is nullable, then B is nullable (note terminal C i T is not nullable). Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 10 / 26
Construction of a grammar without λ-productions from G = (V, T, P, S). Determine nullable symbols. For each production A X 1... X k P, k 1 suppose m of X i s are nullable. The new grammar G 1 will have 2 m versions of this production with/without each nullable symbol except λ in case m = k. Example Consider the grammar: S AB A aaa λ B bbb λ S AB A B A aaa aa aa a B bbb bb bb b Final grammar: S AB A B A aaa aa a B bbb bb b. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 11 / 26
Eliminating Unit Productions Definition (unit production) A unit production is A B P where both A, B are variables. Example I a b Ia Ib I0 I1 F I (E) T F T F E T E + T Expanding T in E T E F T F Expanding E F E I (E) Expanding E I E a b Ia Ib I0 I1 Together: E a b Ia Ib I0 I1 (E) T F E + T. We have to avoid possible cycles. Definition (unit pair) A pair A, B V such that A B using only unit productions is called a unit pair. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 12 / 26
Unit pairs identification BASIS: (A, A) for A V is a unit pair. INDUCTION: If (A, B) is a unit pair and B C P, then (A, C) is a unit pair. Example (Unit pairs from previous grammar) (E, E), (T, T ), (F, F ), (I, I), (E, T ), (E, F ), (E, I), (T, F ), (T, I), (F, I). To eliminate unit productions from G = (V, T, P, S): Example Find all unit pairs of G. For each unit pair (A, B) new grammar all productions A α there B α P. I a b Ia Ib I0 I1 F I (E) T F T F E T E + T I a b Ia Ib I0 I1 F (E) a b Ia Ib I0 I1 T T F (E) a B Ia Ib I0 I1 E E + T T F (E) a b Ia Ib I0 I1 Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 13 / 26
Normal Form of CFGs Theorem (Normal Form of CFGs) If G is a CFG, L(G) \ {λ}, then there is a CFG G 1 such that L(G 1 ) = L(G) \ {λ} and G 1 has no λ-productions, unit productions, or useless symbols. Proof. Proof outline: Start by eliminating λ-productions. Eliminate unit productions. This does not introduce λ-productions. Eliminate useless symbols. This does not introduce any new production. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 14 / 26
Chomsky Normal Form Definition (Chomsky Normal Form) A CFG grammar G = (V, T, P, S) that has no useless symbols and all productions are in on of two forms: A BC, A, B, C V, A a, A V, a T, is said to be in Chomsky Normal Form (CNF). To put a grammar to CNF, we need two additional steps: Bodies of length 2 or more consist only of variables. Break bodies of length 3 or more to bodies of two variables. For every terminal a create a new variable, say A, add one production rule A a, Use A in place of a everywhere a appears in a body of length 2 or more. For a production A B 1... B k introduce k 2 variables C i And productions A B 1 C 1,C 1 B 2 C 2,...,C k 2 B k 1 B k. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 15 / 26
CNF Example I a b IA IB IZ IU F LER a b Ia IB IZ IU T TMF LER a b IA IB IZ IU E EPT TMF LER a b IA IB IZ IU A a B b Z 0 U 1 P + M L ( R ) F LC 3 a b Ia IB IZ IU T TC 2 LC 3 a b IA IB IZ IU E EC 1 TC 2 LC 3 a b IA IB IZ IU C 1 PT C 2 MF C 3 ER I, A, B, Z, U, P, M, L, R as before Theorem (CNF) If G is a CFG, L(G) \ {λ}, then there is a grammar G 1 in Chomsky Normal Form such that L(G 1 ) = L(G) \ {λ}. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 16 / 26
The Size of the Parse Tree Theorem (The Size of Parse Trees) Suppose we have a parse tree according the a CNF grammar G = (V, T, P, S) that yield a terminal string w. If the length of the longeeset path is n, then w 2 n 1. Proof. By induction on n, BASIS: a = 1 = 2 0, INDUCTION: 2 n 2 + 2 n 2 = 2 n 1. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 17 / 26
Pumping Lemma for Context Free Languages Theorem (Pumping Lemma for Context Free Languages) Let L be a CFL. Then there exists constants p, q N such that any z L, z p can be written z = uvwxy subject to: vwx q. vx λ. i 0, uv i wx i y L. A 1 T 1 A 2 T 2 u v w x y v w x Proof Idea: take the parse tree for z find the longest path there must be two equal variables these variables define two subtrees the subtrees define partition of z = uvwxy we can move the tree T 1 (i > 1) or replace T 1 by T 2 (i = 0) Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 18 / 26
Proof: z > p : z = uvwxy, vwx q, vx λ, i 0uv i wx i y L we take the grammar in Chomsky NF (for L = {λ} and proof separatelly). Let V = k. We set p = 2 k 1, q = 2 k. For z L, z n, the parse tree has a path z of length > k we denote the terminal of the longest path t At least two of the last k variables on the path to t are equal we take the couple A 1, A 2 closest to t (it defines subtrees T 1, T 2 ) the path from A 1 to t is the longest in T 1 and the length is maximally k + 1 the yield of T 1 is no longer than 2 k (so vwx q) there are two paths from A 1 (ChNF), one to T 2 other to the rest of vx ChNF not nullable, so vx λ derivation of the word (A 1 va 2 x, A 2 w) S ua 1 y uva 2 xy uvwxy if we move A 2 to A 1 (i = 0) S ua 2 y uwy if we move A 1 to A 2 (i = 2, 3,...) S ua 1 y uva 1 xy uvva 2 xxy uvvwxxy Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 19 / 26
Applications of the Pomping Lemma for CFL s "Adversary game" as for regular languages: Pick a language L that is not CFL. Our adversary gets to pik n, which we do not know. We get to pick z, and we may use n as a parameter. Our adversary gets to berak z into uvwxy, subject vwx n and vx λ. We win the game, if by picking i and showing uv i wx i y is not in L. Lemma (Not CFL) Following languages are not CFL: {0 n 1 n 2 n n 1} {0 i 1 j 2 i 3 j i 1&j 1} {ww w is in {0, 1} } Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 20 / 26
Pumping Lemma Ussage Example (non CFL) Following language is not CFL {0 n 1 n 2 n n 1} assume it were CFL we get p, q from the Pumping Lemma select k = max(p, q), then 0 k 1 k 2 k > p the word is not longer then q we pump at most two different symbols the equality of symbols is violated CONTRADICTION. Example (not a CFL) Following language is not CFL {0 i 1 j 2 k 0 i j k} assume it were CFL we get p, q from the Pumping Lemma select n = max(p, q), then 0 n 1 n 2 n > p the word is not longer then q we pump at most two different symbols in the case of a (or b), pump up CONTRADICTION i j (or j k) if c (or b), pump down CONTRADICTION j k (or i j) Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 21 / 26
Generalisations of the Pumping Lemma Pumping lemma is only an implication. Example (non CFL, that can be pumped) L = {a i b j c k d l i = 0 j = k = l} is not a CFL but it can be pumped. i = 0 : b j c k d l i > 0 : a i b n c n d n can be pumped in any letter can be pumped in a A solution? generalisations (Ogden lemma and others) pumping marked symbols closure properties. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 22 / 26
Closure Properties of CFLs Theorem (CFLs are closed on substitution) If L is a CFL over Σ, and s is a substitution on Σ such that s(a) is a CFL for each a Σ, then s(l) is CFL. Proof. Idea: Replace each a by the start symbol of a CFG for language s(a). First, rename variables to be unique in all G = (V, T, P, S), G a = (V a, T a, P a, S a ), a Σ. We construct a new grammar G = (V, T, P, S) for s(l): V = V a Σ Va T = a Σ Ta P = Pa {p P with all a replaced by Sa}. a Σ G generates the language s(l). Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 23 / 26
CFL closed under From Substitution theorem we can prove: Theorem (CFL s closed under union, concatenation, closure, homomorphism.) The CFL s are slosed under union, concatenation, closure ( ), positive closure (+), homomorphism. Theorem (CFL s closed under reversal) If L is a CFL, then so is L R. Proof. Reverse all productions. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 24 / 26
CFL s and intersection Example (CFLs are not closed on intersection) The langulage L = {0 n 1 n 2 n n 1} = {0 n 1 n 2 i n 1, i 1} {0 i 1 n 2 n n 1, i 1} is not CFL. Theorem (CFLs are closed on intersection with a regular language) If L is a CFL and F is a regular language, then L R is a CFL. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 25 / 26
Theorem (CFLs are not closed on complement) Let L, L 1, L 2 be CFLs, R a regular language. L R is a CFL. L R = L R, R is regular. L is not necessarily a CFL. L 1 L 2 = L 1 L 2. L 1 L 2 is not necessarily a CFL. Σ L not always CFL. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 26 / 26
CFLs are closed on inverse homomorphism Theorem (CFLs are closed on inverse homomorphism) Let L be a CFL and h a homomorphism, then h 1 (L) is a CFL. Proof. After a is read, h(a) is placed in a buffer. The symbols of h(a) are one at a time fed to the PDA being simulated. Only when the buffer is empty does the constructed PDA read another of its input symbols. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 27 / 26
Closure Properties Language Regular Context-free Dyck union YES YES NO intersection YES NO NO with RL YES YES YES complement YES NO YES homomorphism YES YES NO inverse hom. YES YES YES Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 28 / 26
Complexity of Converting among CFG s and PDA s Conversions linear in the size of the input: Theorem CFG to a PDA. PDA by final state to a PDA by empty stack. PDA by empty stack to PDA by final state. There is an O(n 3 ) algorithm that takes a PDA P whose repr3esentation nas length n and produces a DVG of length at most O(n 3 ). Theorem Given a grammar G of length n, we can find an equivalent CNF grammar for G in time O(n 2 ); the resulting grammar has length O(n 2 ). Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 29 / 26
Testing Emptiness of CFL s Lemma Test whether the start symbol S of G is generating can be done in O(n) time. Create an indexed list of all variables (left) Add links: variable a chain of all the positions in which that variable appears (full line). production, a count of the number of positions holding variables whose ability to generate a terminal string has not yet been taken into account (dashed and dotted lines) Establishing B is generating follow full links and decrease the count by 1 for each occurrence. If a count reaches 0, then we know the head variable is generating. Keep all generating variables on a stack and procceed them. This algorithm is O(n). Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 30 / 26
Cocke-Younger-Kasami algorithm for testing a membership in a CFL Inefficient: exponential in w (check all derivation trees of appropriage depth of CNF grammar for L. Definition (CYG Algorithm, time O(n 3 )) The input is CNF grammar G = (V, T, P, S) for language L and a string w = q 1 a 2... a n T. Produce triangular table (right), horizontal axis is w X ij is the set of variables A such that A a ia i+1... a j. Fill the table upwards. X ij = {A BC; B X ik, C X k+1,j } X 15 X 14 X 25 X 13 X 24 X 35 X 12 X 23 X 34 X 45 X 11 X 22 X 33 X 44 X 55 a 1 a 2 a 3 a 4 a 5 BASIS: X ii = {A; A a i P} Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 31 / 26
{S, A, C} - {S, A, C} - {B} {B} {S, A} {B} {S, C} {S, A} {B} {A, C} {A, C} {B} {A, C} b a a b a Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 32 / 26
Undecidable CFL Problems (preview) Is a given CFG ambiguous? Is a given CFL inherently ambiguous? Is the intersection of two CFL s empty? Is a given CFL equal to Σ, where Σ is the alphabet of this language? Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 33 / 26
Summary of Chapter 7 Eliminating Useless Symbols Eliminating λ and Unit productions Chomsky Normal Form: no useless symbols, evry production body consists of either two variable or one terminal. G CFG; L(G) NFG : L(NFG) = L(G) \ {λ}. The Pumping Lemma n; z L, z n u, v, w, x, y; z = uvwxy such that: vwx n; vx λ; i 0, uv i wx i y L. Testing Empriness of a CFL possible also in O( G ) time. Testing Membership in a CFL: Cocke-Younger-Kasami algorithm O( w 3 ); w Σ for a fixed CFL. Automata and Grammars Normal Forms, Pumping Lemma 8 April 6, 2017 34 / 26