Algorithms & Models of Computation CS/ECE 374, Fall 2017 Context Free Languages and Grammars Lecture 7 Tuesday, September 19, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 36
What stack got to do with it? What s a stack but a second hand memory? 1 DFA/NFA/Regular expressions. constant memory computation. 2 Turing machines DFA/NFA + unbounded memory. a standard computer/program. Sariel Har-Peled (UIUC) CS374 2 Fall 2017 2 / 36
What stack got to do with it? What s a stack but a second hand memory? 1 DFA/NFA/Regular expressions. constant memory computation. 2 NFA + stack context free grammars (CFG). 3 Turing machines DFA/NFA + unbounded memory. a standard computer/program. Sariel Har-Peled (UIUC) CS374 2 Fall 2017 2 / 36
What stack got to do with it? What s a stack but a second hand memory? 1 DFA/NFA/Regular expressions. constant memory computation. 2 NFA + stack context free grammars (CFG). 3 Turing machines DFA/NFA + unbounded memory. a standard computer/program. NFA with two stacks. Sariel Har-Peled (UIUC) CS374 2 Fall 2017 2 / 36
Context Free Languages and Grammars Programming Language Specification Parsing Natural language understanding Generative model giving structure... Sariel Har-Peled (UIUC) CS374 3 Fall 2017 3 / 36
Programming Languages Sariel Har-Peled (UIUC) CS374 4 Fall 2017 4 / 36
Natural Language Processing Sariel Har-Peled (UIUC) CS374 5 Fall 2017 5 / 36
Models of Growth L-systems http://www.kevs3d.co.uk/dev/lsystems/ Sariel Har-Peled (UIUC) CS374 6 Fall 2017 6 / 36
Kolam drawing generated by grammar Sariel Har-Peled (UIUC) CS374 7 Fall 2017 7 / 36
Context Free Grammar (CFG) Definition Definition A CFG is a quadruple G = (V, T, P, S) V is a finite set of non-terminal symbols T is a finite set of terminal symbols (alphabet) P is a finite set of productions, each of the form A α where A V and α is a string in (V T ). Formally, P V (V T ). S V is a start symbol ( G = Variables, Terminals, Productions, Start var ) Sariel Har-Peled (UIUC) CS374 8 Fall 2017 8 / 36
Context Free Grammar (CFG) Definition Definition A CFG is a quadruple G = (V, T, P, S) V is a finite set of non-terminal symbols T is a finite set of terminal symbols (alphabet) P is a finite set of productions, each of the form A α where A V and α is a string in (V T ). Formally, P V (V T ). S V is a start symbol ( G = Variables, Terminals, Productions, Start var ) Sariel Har-Peled (UIUC) CS374 8 Fall 2017 8 / 36
Context Free Grammar (CFG) Definition Definition A CFG is a quadruple G = (V, T, P, S) V is a finite set of non-terminal symbols T is a finite set of terminal symbols (alphabet) P is a finite set of productions, each of the form A α where A V and α is a string in (V T ). Formally, P V (V T ). S V is a start symbol ( G = Variables, Terminals, Productions, Start var ) Sariel Har-Peled (UIUC) CS374 8 Fall 2017 8 / 36
Context Free Grammar (CFG) Definition Definition A CFG is a quadruple G = (V, T, P, S) V is a finite set of non-terminal symbols T is a finite set of terminal symbols (alphabet) P is a finite set of productions, each of the form A α where A V and α is a string in (V T ). Formally, P V (V T ). S V is a start symbol ( G = Variables, Terminals, Productions, Start var ) Sariel Har-Peled (UIUC) CS374 8 Fall 2017 8 / 36
Example V = {S} T = {a, b} P = {S ɛ a b asa bsb} (abbrev. for S ɛ, S a, S b, S asa, S bsb) S asa absba abbsbba abb b bba What strings can S generate like this? Sariel Har-Peled (UIUC) CS374 9 Fall 2017 9 / 36
Example V = {S} T = {a, b} P = {S ɛ a b asa bsb} (abbrev. for S ɛ, S a, S b, S asa, S bsb) S asa absba abbsbba abb b bba What strings can S generate like this? Sariel Har-Peled (UIUC) CS374 9 Fall 2017 9 / 36
Example V = {S} T = {a, b} P = {S ɛ a b asa bsb} (abbrev. for S ɛ, S a, S b, S asa, S bsb) S asa absba abbsbba abb b bba What strings can S generate like this? Sariel Har-Peled (UIUC) CS374 9 Fall 2017 9 / 36
Example formally... V = {S} T = {a, b} P = {S ɛ a b asa bsb} (abbrev. for S ɛ, S a, S b, S asa, S bsb) G = {S}, {a, b}, S ɛ, S a, S b S asa S bsb S Sariel Har-Peled (UIUC) CS374 10 Fall 2017 10 / 36
Palindromes Madam in Eden I m Adam Dog doo? Good God! Dogma: I am God. A man, a plan, a canal, Panama Are we not drawn onward, we few, drawn onward to new era? Doc, note: I dissent. A fast never prevents a fatness. I diet on cod. http://www.palindromelist.net Sariel Har-Peled (UIUC) CS374 11 Fall 2017 11 / 36
Examples L = {0 n 1 n n 0} S ɛ 0S1 Sariel Har-Peled (UIUC) CS374 12 Fall 2017 12 / 36
Examples L = {0 n 1 n n 0} S ɛ 0S1 Sariel Har-Peled (UIUC) CS374 12 Fall 2017 12 / 36
Notation and Convention Let G = (V, T, P, S) then a, b, c, d,..., in T (terminals) A, B, C, D,..., in V (non-terminals) u, v, w, x, y,... in T for strings of terminals α, β, γ,... in (V T ) X, Y, X in V T Sariel Har-Peled (UIUC) CS374 13 Fall 2017 13 / 36
Derives relation Formalism for how strings are derived/generated Definition Let G = (V, T, P, S) be a CFG. For strings α 1, α 2 (V T ) we say α 1 derives α 2 denoted by α 1 G α 2 if there exist strings β, γ, δ in (V T ) such that α 1 = βaδ α 2 = βγδ A γ is in P. Examples: S ɛ, S 0S1, 0S1 00S11, 0S1 01. Sariel Har-Peled (UIUC) CS374 14 Fall 2017 14 / 36
Derives relation continued Definition For integer k 0, α 1 k α 2 inductive defined: α 1 0 α 2 if α 1 = α 2 α 1 k α 2 if α 1 β 1 and β 1 k 1 α 2. Alternative definition: α 1 k α 2 if α 1 k 1 β 1 and β 1 α 2 is the reflexive and transitive closure of. α 1 α 2 if α 1 k α 2 for some k. Examples: S ɛ, 0S1 0000011111. Sariel Har-Peled (UIUC) CS374 15 Fall 2017 15 / 36
Derives relation continued Definition For integer k 0, α 1 k α 2 inductive defined: α 1 0 α 2 if α 1 = α 2 α 1 k α 2 if α 1 β 1 and β 1 k 1 α 2. Alternative definition: α 1 k α 2 if α 1 k 1 β 1 and β 1 α 2 is the reflexive and transitive closure of. α 1 α 2 if α 1 k α 2 for some k. Examples: S ɛ, 0S1 0000011111. Sariel Har-Peled (UIUC) CS374 15 Fall 2017 15 / 36
Derives relation continued Definition For integer k 0, α 1 k α 2 inductive defined: α 1 0 α 2 if α 1 = α 2 α 1 k α 2 if α 1 β 1 and β 1 k 1 α 2. Alternative definition: α 1 k α 2 if α 1 k 1 β 1 and β 1 α 2 is the reflexive and transitive closure of. α 1 α 2 if α 1 k α 2 for some k. Examples: S ɛ, 0S1 0000011111. Sariel Har-Peled (UIUC) CS374 15 Fall 2017 15 / 36
Context Free Languages Definition The language generated by CFG G = (V, T, P, S) is denoted by L(G) where L(G) = {w T S w}. Definition A language L is context free (CFL) if it is generated by a context free grammar. That is, there is a CFG G such that L = L(G). Sariel Har-Peled (UIUC) CS374 16 Fall 2017 16 / 36
Context Free Languages Definition The language generated by CFG G = (V, T, P, S) is denoted by L(G) where L(G) = {w T S w}. Definition A language L is context free (CFL) if it is generated by a context free grammar. That is, there is a CFG G such that L = L(G). Sariel Har-Peled (UIUC) CS374 16 Fall 2017 16 / 36
Example L = {0 n 1 n n 0} S ɛ 0S1 L = {0 n 1 m m > n} L = {w { (, ) } } w is properly nested string of parenthesis Sariel Har-Peled (UIUC) CS374 17 Fall 2017 17 / 36
Closure Properties of CFLs G 1 = (V 1, T, P 1, S 1 ) and G 2 = (V 2, T, P 2, S 2 ) Assumption: V 1 V 2 =, that is, non-terminals are not shared Theorem CFLs are closed under union. L 1, L 2 CFLs implies L 1 L 2 is a CFL. Theorem CFLs are closed under concatenation. L 1, L 2 CFLs implies L 1 L 2 is a CFL. Theorem CFLs are closed under Kleene star. If L is a CFL = L is a CFL. Sariel Har-Peled (UIUC) CS374 18 Fall 2017 18 / 36
Closure Properties of CFLs G 1 = (V 1, T, P 1, S 1 ) and G 2 = (V 2, T, P 2, S 2 ) Assumption: V 1 V 2 =, that is, non-terminals are not shared Theorem CFLs are closed under union. L 1, L 2 CFLs implies L 1 L 2 is a CFL. Theorem CFLs are closed under concatenation. L 1, L 2 CFLs implies L 1 L 2 is a CFL. Theorem CFLs are closed under Kleene star. If L is a CFL = L is a CFL. Sariel Har-Peled (UIUC) CS374 18 Fall 2017 18 / 36
Closure Properties of CFLs Union G 1 = (V 1, T, P 1, S 1 ) and G 2 = (V 2, T, P 2, S 2 ) Assumption: V 1 V 2 =, that is, non-terminals are not shared. Theorem CFLs are closed under union. L 1, L 2 CFLs implies L 1 L 2 is a CFL. Sariel Har-Peled (UIUC) CS374 19 Fall 2017 19 / 36
Closure Properties of CFLs Concatenation Theorem CFLs are closed under concatenation. L 1, L 2 CFLs implies L 1 L 2 is a CFL. Sariel Har-Peled (UIUC) CS374 20 Fall 2017 20 / 36
Closure Properties of CFLs Stardom (i.e, Kleene star) Theorem CFLs are closed under Kleene star. If L is a CFL = L is a CFL. Sariel Har-Peled (UIUC) CS374 21 Fall 2017 21 / 36
Exercise Prove that every regular language is context-free using previous closure properties. Prove the set of regular expressions over an alphabet Σ forms a non-regular language which is context-free. Sariel Har-Peled (UIUC) CS374 22 Fall 2017 22 / 36
Closure Properties of CFLs continued Theorem CFLs are not closed under complement or intersection. Theorem If L 1 is a CFL and L 2 is regular then L 1 L 2 is a CFL. Sariel Har-Peled (UIUC) CS374 23 Fall 2017 23 / 36
Canonical non-cfl Theorem L = {a n b n c n n 0} is not context-free. Proof based on pumping lemma for CFLs. Technical and outside the scope of this class. Sariel Har-Peled (UIUC) CS374 24 Fall 2017 24 / 36
Parse Trees or Derivation Trees A tree to represent the derivation S w. Rooted tree with root labeled S Non-terminals at each internal node of tree Terminals at leaves Children of internal node indicate how non-terminal was expanded using a production rule A picture is worth a thousand words Sariel Har-Peled (UIUC) CS374 25 Fall 2017 25 / 36
Parse Trees or Derivation Trees A tree to represent the derivation S w. Rooted tree with root labeled S Non-terminals at each internal node of tree Terminals at leaves Children of internal node indicate how non-terminal was expanded using a production rule A picture is worth a thousand words Sariel Har-Peled (UIUC) CS374 25 Fall 2017 25 / 36
Example S A derivation tree for abbaab (also called parse tree ) a S b b S a S à asb bsa SS ab ba ε S S b a ε A corresponding derivation of abbaab S è asb è absab è abssab è abbasab è abbaab Sariel Har-Peled (UIUC) CS374 26 Fall 2017 26 / 36
Ambiguity in CFLs Definition A CFG G is ambiguous if there is a string w L(G) with two different parse trees. If there is no such string then G is unambiguous. Example: S S S 1 2 3 S S S S S S 3 S S S S 1 2 1 3 2 3 (2 1) (3 2) 1 Sariel Har-Peled (UIUC) CS374 27 Fall 2017 27 / 36
Ambiguity in CFLs Original grammar: S S S 1 2 3 Unambiguous grammar: S S C 1 2 3 C 1 2 3 S S C S C 1 3 2 (3 2) 1 The grammar forces a parse corresponding to left-to-right evaluation. Sariel Har-Peled (UIUC) CS374 28 Fall 2017 28 / 36
Inherently ambiguous languages Definition A CFL L is inherently ambiguous if there is no unambiguous CFG G such that L = L(G). There exist inherently ambiguous CFLs. Example: L = {a n b m c k n = m or m = k} Given a grammar G it is undecidable to check whether L(G) is inherently ambiguous. No algorithm! Sariel Har-Peled (UIUC) CS374 29 Fall 2017 29 / 36
Inherently ambiguous languages Definition A CFL L is inherently ambiguous if there is no unambiguous CFG G such that L = L(G). There exist inherently ambiguous CFLs. Example: L = {a n b m c k n = m or m = k} Given a grammar G it is undecidable to check whether L(G) is inherently ambiguous. No algorithm! Sariel Har-Peled (UIUC) CS374 29 Fall 2017 29 / 36
Inherently ambiguous languages Definition A CFL L is inherently ambiguous if there is no unambiguous CFG G such that L = L(G). There exist inherently ambiguous CFLs. Example: L = {a n b m c k n = m or m = k} Given a grammar G it is undecidable to check whether L(G) is inherently ambiguous. No algorithm! Sariel Har-Peled (UIUC) CS374 29 Fall 2017 29 / 36
Inductive proofs for CFGs Question: How do we formally prove that a CFG L(G) = L? Example: S ɛ a b asa bsb Theorem L(G) = {palindromes} = {w w = w R } Two directions: L(G) L, that is, S w then w = w R L L(G), that is, w = w R then S w Sariel Har-Peled (UIUC) CS374 30 Fall 2017 30 / 36
Inductive proofs for CFGs Question: How do we formally prove that a CFG L(G) = L? Example: S ɛ a b asa bsb Theorem L(G) = {palindromes} = {w w = w R } Two directions: L(G) L, that is, S w then w = w R L L(G), that is, w = w R then S w Sariel Har-Peled (UIUC) CS374 30 Fall 2017 30 / 36
L(G) L Show that if S w then w = w R By induction on length of derivation, meaning For all k 1, S k w implies w = w R. If S 1 w then w = ɛ or w = a or w = b. Each case w = w R. Assume that for all k < n, that if S k w then w = w R Let S n w (with n > 1). Wlog w begin with a. Then S asa k 1 aua where w = aua. And S n 1 u and hence IH, u = u R. Therefore w r = (aua) R = (ua) R a = au R a = aua = w. Sariel Har-Peled (UIUC) CS374 31 Fall 2017 31 / 36
L(G) L Show that if S w then w = w R By induction on length of derivation, meaning For all k 1, S k w implies w = w R. If S 1 w then w = ɛ or w = a or w = b. Each case w = w R. Assume that for all k < n, that if S k w then w = w R Let S n w (with n > 1). Wlog w begin with a. Then S asa k 1 aua where w = aua. And S n 1 u and hence IH, u = u R. Therefore w r = (aua) R = (ua) R a = au R a = aua = w. Sariel Har-Peled (UIUC) CS374 31 Fall 2017 31 / 36
L L(G) Show that if w = w R then S w. By induction on w That is, for all k 0, w = k and w = w R implies S w. Exercise: Fill in proof. Sariel Har-Peled (UIUC) CS374 32 Fall 2017 32 / 36
Mutual Induction Situation is more complicated with grammars that have multiple non-terminals. See Section 5.3.2 of the notes for an example proof. Sariel Har-Peled (UIUC) CS374 33 Fall 2017 33 / 36
Normal Forms Normal forms are a way to restrict form of production rules Advantage: Simpler/more convenient algorithms and proofs Two standard normal forms for CFGs Chomsky normal form Greibach normal form Sariel Har-Peled (UIUC) CS374 34 Fall 2017 34 / 36
Normal Forms Normal forms are a way to restrict form of production rules Advantage: Simpler/more convenient algorithms and proofs Two standard normal forms for CFGs Chomsky normal form Greibach normal form Sariel Har-Peled (UIUC) CS374 34 Fall 2017 34 / 36
Normal Forms Chomsky Normal Form: Productions are all of the form A BC or A a. If ɛ L then S ɛ is also allowed. Every CFG G can be converted into CNF form via an efficient algorithm Advantage: parse tree of constant degree. Greibach Normal Form: Only productions of the form A aβ are allowed. All CFLs without ɛ have a grammar in GNF. Efficient algorithm. Advantage: Every derivation adds exactly one terminal. Sariel Har-Peled (UIUC) CS374 35 Fall 2017 35 / 36
Normal Forms Chomsky Normal Form: Productions are all of the form A BC or A a. If ɛ L then S ɛ is also allowed. Every CFG G can be converted into CNF form via an efficient algorithm Advantage: parse tree of constant degree. Greibach Normal Form: Only productions of the form A aβ are allowed. All CFLs without ɛ have a grammar in GNF. Efficient algorithm. Advantage: Every derivation adds exactly one terminal. Sariel Har-Peled (UIUC) CS374 35 Fall 2017 35 / 36
Things to know: Pushdown Automata PDA: a NFA coupled with a stack PDAs and CFGs are equivalent: both generate exactly CFLs. PDA is a machine-centric view of CFLs. Sariel Har-Peled (UIUC) CS374 36 Fall 2017 36 / 36