CS375: Logic and Theory of Computing Fuhua (Frank) Cheng Department of Computer Science University of Kentucky 1
Table of Contents: Week 1: Preliminaries (set algebra, relations, functions) (read Chapters 1-4) Weeks 3-6: Regular Languages, Finite Automata (Chapter 11) Weeks 7-9: Context-Free Languages, Pushdown Automata (Chapters 12) Weeks 10-12: Turing Machines (Chapter 13) 2
Table of Contents (conti): Weeks 13-14: Propositional Logic (Chapter 6), Predicate Logic (Chapter 7), Computational Logic (Chapter 9), Algebraic Structures (Chapter 10) 3
Context-Free Language Topics Algorithm. Remove Λ-productions from grammars for languages without Λ. 1. Find nonterminals that derive Λ. 2. For each production A w construct all productions A w where w is obtained from w by removing one or more occurrences of the nonterminals from Step 1. 3. Combine the original productions with those of step 2 and eliminate any Λ-productions. 4
Context-Free Language Topics Algorithm. Remove Λ-productions from grammars for languages without Λ. Two questions: To avoid violating the length increasing property 1. Why do we want to remove Ʌ-productions from a given grammar? 2. why do we need step 2 in the above algorithm? To avoid the creation of a path that leads to a Ʌ-production 5
Remove Λ-productions Language generated by this grammar does not contain Ʌ Example. Remove Λ-productions from the grammar: Solution: S ABc A aa Λ B bb Λ. 1. The nonterminals A and B derive Λ. 2. From S ABc we construct S Bc Ac c From A aa we construct A a From B bb we construct B b 3. S ABc Bc Ac c A aa a B bb b. 6
Remove Λ-productions Quiz. Remove Λ-productions from the grammar: S ABc Ab c A ABa Λ B Bbc Λ. Solution: 1. The nonterminals A and B derive Λ. 2. From S ABc we construct S Bc Ac c From S Ab we construct A b From A ABa we construct A Ba Aa a From B Bbc we construct B bc 3. S ABc Ab c Bc Ac b A ABa Ba Aa a B Bbc bc. 7
Chomsky Normal Form (CNF) Definition. Productions have one of the following forms: A a (terminal) A BC (non-terminals) S Λ (forbid S on RHS) No need if Ʌ is not in the language Example: S AS a A SA b Language of this grammar = { a, ba, bba, aba, baba, bbaba, ababa, bababa, bbababa, abababa, } 8 = { a(ba) n, (ba) m, b(ba) m n ϵ N, m ϵ N + } True?
Chomsky Normal Form (CNF) Definition. Productions have one of the following forms: Advantages: A a (terminal) A BC (non-terminals) S Λ (forbid S on RHS) 1. Parse trees are binary, easy to represent No need if Ʌ is not in the language 2. Any string of length n > 0 can be derived in exactly 2n 1 steps. (Thus: one can determine if a string is in the language by exhaustive search of all derivations.) 9
Chomsky Normal Form (CNF) Definition. Productions have one of the following forms: A a (terminal) A BC (non-terminals) S Λ (forbid S on RHS) No need if Ʌ is not in the language Example: S AS a A SA b Consider: a, ba, aba S a S AS bs ba S AS SAS aas abs aba 10
Any string of length n > 0 can be derived in exactly 2n 1 steps. Input string: x x x x x x x x n terminals (n 2) Derivation: S A B C D B E F D B S T W X Y Z D B n-1 steps n non-terminals Then 4/18/2017 replace each of the non-terminals University of Kentucky with a terminal (n steps) 11 So, totally, n-1 + n = 2n-1 steps.
Transform to Chomsky Normal Form (CFL contains Ʌ) Algorithm: (no start symbol occurs on right side of production) 1. If start symbol S occurs on some right side, create new start symbol S and new production S S. 2. Remove A Λ (if A S) by previous algorithm. (If S Λ is removed, add it back.) 3. Remove unit productions (i.e., A B): If A B or A + B, then construct productions A w where B w is not a unit production. Now remove all unit productions. 12 e.g., IF A B but B ac then conctruct A ac
Transform to Chomsky Normal Form (CFL contains Ʌ) Algorithm (conti): 4. For each production whose right side has 2 or more symbols, replace all occurrrences of each terminal a with a new nonterminal A and also add the new production A a. e.g., IF A ab then change it to A CB and create C a 5. Replace each production B C1 Cn where n > 2 with B C1D where D C2 Cn. Repeat this step until all right sides have length 2. 13
Transform to Chomsky Normal Form (CFL contains Ʌ) Example. Construct a Chomsky normal form for: S asb D D Dc Λ. Solution: 1. Add the production S S. S S S asb D D Dc Ʌ 2. S S Λ S asb ab D D Dc c. Why? 14
Transform to Chomsky Normal Form (CFL contains Ʌ) S S S asb D D Dc Ʌ Why? 2-1. The non-terminals D and S derive Ʌ 2-2. For D Dc we construct D c For S asb we construct S ab Construct S Ʌ 2. S S Λ S asb ab D D Dc c. 15
Transform to Chomsky Normal Form (CFL contains Ʌ) Example. Construct a Chomsky normal form for: Solution: S asb D D Dc Λ. 2. S S Λ S asb ab D D Dc c. Remove unit productions How? 3. S asb ab Dc c Λ S asb ab Dc c D Dc c. 16
Transform to Chomsky Normal Form (CFL contains Ʌ) How? S S S D 2. S S Λ S asb ab D D Dc c. asb ab Dc c Dc c S asb S ab S Dc S c S Dc S c 3. S asb ab Dc c Λ S asb ab Dc c D Dc c. 17
Transform to Chomsky Normal Form (CFL contains Ʌ) How? 3. S asb ab Dc c Λ S asb ab Dc c D Dc c. S ab S AB A a B b S Dc S DC C c 4. S ASB AB DC c Λ S ASB AB DC c D DC c A a B b 18 C c.
Transform to Chomsky Normal Form (CFL contains Ʌ) Example. Construct a Chomsky normal form for: S asb D D Dc Λ. Solution(conti): 4. S ASB AB DC c Λ S ASB AB DC c D DC c A a B b C c. 5. Replace S ASB and S ASB with S AE, S AE, and E SB. 19
Greibach Normal Form (GNF) Definition. Productions have one of the following forms: A b (terminal) A bd1 Dk S Λ. If Ʌ is in the language Example: S aas a A bsa b 20
Greibach Normal Form (GNF) Definition. Productions have one of the following forms: A b (terminal) A bd1 Dk S Λ. If Ʌ is in the language Advantage: Any string of length n > 0 can be derived in n steps. 21
Greibach Normal Form (GNF) Definition. Productions have one of the following forms: A b (terminal) A bd1 Dk S Λ. Example: Consider: a, aba, ababa S aas a A bsa b If Ʌ is in the language S a S aas abs aba S aas absas abaas ababs 22 ababa
Transform CFG to Greibach Normal Form Algorithm: 1. Perform steps 1, 2, 3 of the Chomsky algorithm. 2. Remove all left-recursion, including indirect, without adding Λ. 3. Make substitutions to transform the grammar into the proper form. 23
Transform CFG to Greibach Normal Form Example. Construct a Greibach normal form for: S AB Ac d A aa a B Ab c. Solution: Steps 1 (Chomsky steps 1, 2, 3) and 2 are not needed. Why? Chomsky: 1. create S S 2. remove A Ʌ 3. remove unit productions Greibach: 2. remove all left-recursion No need None None 24 None
Transform CFG to Greibach Normal Form Example. Construct a Greibach normal form for: S AB Ac d A aa a B Ab c. Solution (conti.) : 3. Replace A in S AB Ac d with aa a to obtain S aab ab aac ac d. Replace A in B Ab c with the right side of A aa a to obtain B aab ab c. S aab ab aac ac d B aab ab c A aa a 25
Example. Construct a Greibach normal form for: S AB Ac d A aa a B Ab c. Solution (conti.) : S aab ab aac ac d B aab ab c A aa a Now add the new productions C c and D b and make appropriate replacements to obtain the proper form: S aab ab aac ac d A aa a B aad ad c C c 26 D b
Properties of Context-Free Languages When we know some properties of context-free languages they can help us argue, BWOC, that certain languages are not context-free. 27
The Pumping Lemma intuition (1) Recall the pumping lemma for regular languages. It told us that if there was a string long enough to cause a cycle in the DFA for the language, then we could pump the cycle and discover an infinite sequence of strings that had to be in the language. xy k z L for all k 28
recursive Terminating mechanism The Pumping Lemma intuition (2) If L is an infinite context-free language, then any grammar for L must be recursive, so there must be derivations of the following form where u, v, w, x, and y are terminal strings. S + uny N + vnx (where v and x are not both Λ) N + w. These derivations lead to derivations like S + uny + uvnxy + uv 2 Nx 2 y + uv k Nx k y + uv k wx k y L 29 for all k N. Basis for the Pumping Lemma
The Pumping Lemma For every infinite context-free language L There is an integer m > 0 such that if z L and z m, then z has the form such that 1. vx 1 2. vwx m z = uvwxy 3. uv k wx k y L for all k N. m depends on the grammar 30
The Pumping Lemma 1. vx 1 v and x cannot both be Ʌ 2. vx vwx w could be an empty string 3. m =? 4. Why is vwx m? Start with a CNF grammar for L {Ʌ} Let the grammar have n nonterminals Pick m = 2 n 31
Proof of The Pumping Lemma Start with a CNF grammar for L {Ʌ} Let the grammar have n non-terminals Pick m = 2 n Let z > m We claim ( Lemma 1 ) that a parse tree with yield z must have a path of length n+2 or more 32
Proof of Lemma 1 If all paths in the parse tree of a CNF grammar are of length n+1, then the longest yield has length 2 n 1, as in: n nonterminals one terminal n+1 n terminals 2 n 1 terminals 33
Back to the Proof of the Pumping Lemma Now we know that the parse tree for z has a path with at least n+1 non-terminals (and one terminal) Consider some longest path There are only n different non-terminals, so among the lowest n+1 we can find two nodes with the same label, say A. The parse tree thus looks like: 34
Parse Tree in The Pumping Lemma Proof Can t both be Ʌ S 2 n = m because height of the first A can be assumed to be at most n+2. Why? 35
Back to the Proof of the Pumping Lemma S Longest path : non- terminal : terminal n+1 Consider the lowest n+1 non-terminals in the longest path 36
Back to the Proof of the Pumping Lemma S Longest path : non- terminal : terminal n+1 A A By Pigeon Hole Principle, two of the variables must have the same label, say A. 37
Back to the Proof of the Pumping Lemma S Longest path : non- terminal : terminal n+1 A A As a node in the lowest n+1 non-terminals, the height of the first A is at most n+2 38
Pump Zero Times 39
Pump Twice 40
Pump Thrice, Etc., Etc. 41
Using The Pumping Lemma Example. The language L = { a n b n c n+k k, n N} is not context-free. Proof: Assume, BWOC, that L is context-free. L is infinite, so pumping lemma applies. Choose z = a m b m c m from the lemma. Then z = a m b m c m = uvwxy where m is the positive integer where 1 vx vwx m and uv k wx k y L for all k N. 42
Example. The language L = { a n b n c n+k k, n N} is not context-free. Proof (conti.) : Observe neither v nor x can contain distinct letters. For example, if v = a b, then v 2 = a b a b, which cannot appear as a substring of any string in L. So v and x must be strings of repeated occurrences 43 of a single letter.
Example. The language L = { a n b n c n+k k, n N} is not context-free. Proof (conti.) : Note that since vwx m, vwx cannot extends from a m into c m. For, otherwise, as the following figure shows, the length of vwx would be m+2, a contradiction. 44
Example. The language L = { a n b n c n+k k, n N} is not context-free. Proof (conti.) : So we have only 5 situations to consider: (1) v and x both occur in a m (2) v and x occur in a m and b m, respectively. (3) v and x both occur in b m (4) v and x occur in b m and c m, respectively. (5) v and x both occur in c m 45
(1) v and x both occur in a m Let u = a p p 0 v = a i i 0 w = a q q 0 x = a j j 0; i+j > 0 y = a m p i q j b m c m But then for k=0, we have uwy = uv 0 wx 0 y = a m i j b m c m L a contradiction! 46
(2) v and x occur in a m and b m, respectively. Let v = a i i 0 x = b j j 0; i+j > 0 y = b q c m q 0 w = a p b m j q p 0 u = a m i p Then uv 2 wx 2 y = a m i p a 2i a p b m j q b 2j b q c m = a m+i b m+j c m L 47 a contradiction!
Example. The language L = { a n b n c n+k k, n N} is not context-free. Proof (conti.) : The other three cases can be proved similarly. (3) v and x both occur in b m (4) v and x occur in b m and c m, respectively. (5) v and x both occur in c m In either case, we will get a contradiction again. Hence, L is not a context-free language. 48
End of Context-Free Language and Pushdown Automata V * Information on several Stanford InfoLab s slides is used here 49