Context-Free Grammars 2IT70 Finite Automata and Process Theory Technische Universiteit Eindhoven May 18, 2016
Generating strings language L 1 = {a n b n n > 0} ab L 1 if w L 1 then awb L 1 production rules ab and ab 2 IT70 (2016) Context-Free Grammars 2/ 41
Generating strings language L 1 = {a n b n n > 0} language L 2 = (01) ab L 1 if w L 1 then awb L 1 ε L 2 if w L 2 then 01w L 2 production rules ab and ab production rules ε and 01 2 IT70 (2016) Context-Free Grammars 2/ 41
Variables, terminals, production rules, start symbol palindromes over {a, b} ε a b aa bb binary integer expressions E I E N I a I I0 E E +E E E E E (E) I I1 N 1 N N0 N N1 2 IT70 (2016) Context-Free Grammars 3/ 41
Variables, terminals, production rules, start symbol palindromes over {a, b} ε a b aa bb alternative notation ε a b aa bb binary integer expressions E I E N I a I I0 E E +E E E E E (E) I I1 N 1 N N0 N N1 alternative notation E I N E +E E E (E) I a I0 I1 N 1 N0 N1 2 IT70 (2016) Context-Free Grammars 3/ 41
Clicker questions L81 Consider again the grammar given by E I N E +E E E (E) I a I0 I1 N 1 N0 N1 How many of the strings aa1, a01, 011, 11a, a01+a01, (a11 a10), +101, (110) cannot be generated by the grammar, you expect? A. Two strings B. Three strings C. Four strings D. ix strings E. Can t tell 2 IT70 (2016) Context-Free Grammars 4/ 41
Language of a CFG context-free grammar G = (V, T, R, ) V variables and T terminals R V (V T) production rules A α V start symbol productions G (V T) (V T) γ G γ if γ = β 1 Aβ 2, A α rule of G, γ = β 1 αβ 2 production sequences γ 0 G γ 1 G G γ n 2 IT70 (2016) Context-Free Grammars 5/ 41
Language of a CFG context-free grammar G = (V, T, R, ) V variables and T terminals R V (V T) production rules A α V start symbol productions G (V T) (V T) γ G γ if γ = β 1 Aβ 2, A α rule of G, γ = β 1 αβ 2 production sequences γ 0 G γ 1 G G γ n language of a variable L G (A) = {w T A G w } language of the grammar L(G) = L G () = {w T G w } 2 IT70 (2016) Context-Free Grammars 5/ 41
More examples expression ::= term expression + term term ::= factor term factor factor ::= identifier ( expression ) identifier ::= a b c... 2 IT70 (2016) Context-Free Grammars 6/ 41
More examples expression ::= term expression + term term ::= factor term factor factor ::= identifier ( expression ) identifier ::= a b c... char ::= a... z A... Z... text ::= ε char text doc ::= ε element doc element ::= text <EM> doc </EM> <P> doc <OL> list </OL> listitem ::= <LI> doc list ::= ε listitem doc 2 IT70 (2016) Context-Free Grammars 6/ 41
Combining and splitting productions lemma CFG G = (V, T, R, ) if X 1 n 1 G γ 1,...,X k n k G γ k then X 1 X k n G γ 1 γ k where n = n 1 + +n k if X 1 X k n G γ then X 1 n 1 G γ 1,...,X k n k G γ k where n = n 1 + +n k and γ = γ 1...γ k X 1,...,X k (V T), γ 1,...,γ k (V T) 2 IT70 (2016) Context-Free Grammars 7/ 41
The parentheses language L () CFG ε () several production sequences for string ()(()) G G () G (()) G (()) G ()(()) G ()(()) G G () G () G ()() G ()(()) G ()(()) G G () G ()() G ()() G ()(()) G ()(()) leftmost, rightmost, mixed production sequence 2 IT70 (2016) Context-Free Grammars 8/ 41
Clicker question L82 Given the CFG () (). How many production sequences are there for the string (())((()))? A. (())((())) has 5 possible production sequences B. (())((())) has 6 possible production sequences C. (())((())) has 10 possible production sequences D. (())((())) has 12 possible production sequences E. Can t tell 2 IT70 (2016) Context-Free Grammars 9/ 41
Proving a grammar correct CFG G with production rules ab and ab for L = {a n b n n 1} it holds that L(G) = L proof induction on n: if n G w then w L, thus L(G) L induction on n: if w = a n b n then w L(G), thus L L(G) 2 IT70 (2016) Context-Free Grammars 10/ 41
Avoiding the inductive proofs lemma CFGs G 1 = (V 1, T 1, R 1, 1 ) and G 2 = (V 2, T 2, R 2, 2 ) moreover V 1 and V 2 disjoint define CFG G = ({} V 1 V 2, T 1 T 2, R, ) if R = { 1 2 } R 1 R 2 then L(G) = L(G 1 ) L(G 2 ) if R = { 1 2 } R 1 R 2 then L(G) = L(G 1 ) L(G 2 ) if R = { ε 1 } R 1 then L(G) = L(G 1 ) 2 IT70 (2016) Context-Free Grammars 11/ 41
Avoiding the inductive proofs (cont.) CFG G with production rules 1 2 1 ab B ε bb 2 ba A ε aa then L(G) = {ab n, ba m n,m 0} proof use the lemma L G (A) = {a m m 0} and L G (B) = {b n n 0} L G ( 1 ) = {a} {b n n 0} and L G ( 2 ) = {b} {a m m 0} L(G) = {ab n n 0} {ba m m 0} 2 IT70 (2016) Context-Free Grammars 12/ 41
Context-free languages language L is context-free if L = L(G) for CFG G {a n b n n 0} and {ww R w {0,1} } are context-free 2 IT70 (2016) Context-Free Grammars 13/ 41
Context-free languages language L is context-free if L = L(G) for CFG G {a n b n n 0} and {ww R w {0,1} } are context-free theorem if L is regular then L is context-free proof for DFA D = (Q, Σ, δ, q 0, F ) put G = (Q, Σ, R, q 0 ) where R = {q aq δ(q,a) = q } {q ε q F } then L = L(G) 2 IT70 (2016) Context-Free Grammars 13/ 41
Chomsky Normal Form 2IT70 Finite Automata and Process Theory Technische Universiteit Eindhoven May 18, 2016
Useless symbols CFG G = (V,T,R,) symbol X V T generating if X G w T symbol X V T reachable if α,β: G αxβ symbol X V T is useful if both generating and reachable 2IT70 (2016) Chomsky Normal Form 15/41
Clicker question L83 Consider the grammar AB c A a C c 2IT70 (2016) Chomsky Normal Form 16/41
Clicker question L83 Consider the grammar AB c A a C c Which of the following statements about variables holds true? A. 2 variables generating, 2 reachable, 1 useful B. 2 variables generating, 2 reachable, 2 useful C. 3 variables generating, 3 reachable, 2 useful D. 3 variables generating, 3 reachable, 3 useful E. Can t tell 2IT70 (2016) Chomsky Normal Form 16/41
Finding of generating variables CFG G = (V,T,R,) with L(G) symbol X V T is generating if X G w T and Gen(G) = {X X generating} lemma put Gen 0 = T Gen i+1 = Gen i {A A G α, α Gen i } Gen = i=0 Gen i then Gen(G) = Gen theorem CFG G = (V,T,R,) with V = V Gen(G) and R = {A α R A Gen, α Gen } then L(G) = L(G ) and all symbols of G generating 2IT70 (2016) Chomsky Normal Form 17/41
Finding of reachable variables CFG G = (V,T,R,) with L(G) symbol X V T is reachable if G Reach(G) = {X X reachable} αxβ and lemma put Reach 0 = {} Reach i+1 = Reach i {X A G γ, A Reach i, γαxβ } Reach = i=0 Reach i then Reach(G) = Reach theorem CFG G = (V,T,R,) with V = V Reach(G) and R = {A α R A Reach } then L(G) = L(G ) and all variables of G reachable 2IT70 (2016) Chomsky Normal Form 18/41
Parse Trees 2IT70 Finite Automata and Process Theory Technische Universiteit Eindhoven May 18, 2016
Identifying production sequences parentheses grammar ε () several production sequences for string ()(()) G G () G () G ()() G ()(()) G ()(()) G G () G ()() G ()() G ()(()) G ()(()) 2IT70 (2016) Parse Trees 20/41
Identifying production sequences parentheses grammar ε () several production sequences for string ()(()) G G () G () G ()() G ()(()) G ()(()) G G () G ()() G ()() G ()(()) G ()(()) swapping independent productions 2IT70 (2016) Parse Trees 20/41
Identifying production sequences (cont.) 2IT70 (2016) Parse Trees 21/41
Identifying production sequences (cont.) 2IT70 (2016) Parse Trees 21/41
Identifying production sequences (cont.) ( ) 2IT70 (2016) Parse Trees 21/41
Identifying production sequences (cont.) ( ) ε 2IT70 (2016) Parse Trees 21/41
Identifying production sequences (cont.) ( ) ( ) ε 2IT70 (2016) Parse Trees 21/41
Identifying production sequences (cont.) ( ) ( ) ε ( ) 2IT70 (2016) Parse Trees 21/41
Identifying production sequences (cont.) ( ) ( ) ε ( ) ε 2IT70 (2016) Parse Trees 21/41
Identifying production sequences (once more) 2IT70 (2016) Parse Trees 22/41
Identifying production sequences (once more) 2IT70 (2016) Parse Trees 22/41
Identifying production sequences (once more) ( ) 2IT70 (2016) Parse Trees 22/41
Identifying production sequences (once more) ( ) ( ) 2IT70 (2016) Parse Trees 22/41
Identifying production sequences (once more) ( ) ( ) ε 2IT70 (2016) Parse Trees 22/41
Identifying production sequences (once more) ( ) ( ) ε ( ) 2IT70 (2016) Parse Trees 22/41
Identifying production sequences (once more) ( ) ( ) ε ( ) ε 2IT70 (2016) Parse Trees 22/41
Yield of a parse tree CFG G = (V, T, R, ) set PT G of all parse trees of G [X] single node tree, X V T [A ε] two node tree, root A, leaf ε for rule A ε R [A PT 1,PT 2,...,PT k ] rule A X 1 X k R parse trees PT i with root X i 2IT70 (2016) Parse Trees 23/41
Yield of a parse tree CFG G = (V, T, R, ) set PT G of all parse trees of G [X] single node tree, X V T [A ε] two node tree, root A, leaf ε for rule A ε R [A PT 1,PT 2,...,PT k ] rule A X 1 X k R parse trees PT i with root X i yield function yield : PT G (V T) yield([x]) = X yield([a ε]) = ε yield([a PT 1,...,PT k ]) = yield(pt 1 )... yield(pt k ) parse tree PT is complete if yield(pt) T 2IT70 (2016) Parse Trees 23/41
A parse tree with yield ()(()) ( ) ( ) ε ( ) ε 2IT70 (2016) Parse Trees 24/41
A parse tree with yield ()(()) ( ) ( ) ε ( ) ε 2IT70 (2016) Parse Trees 24/41
Another parse tree CFG AB A ε aaa B ε Bb A B a a A B b ε ε parse tree with yield aab 2IT70 (2016) Parse Trees 25/41
Another parse tree CFG AB A ε aaa B ε Bb A B a a A B b ε ε parse tree with yield aab 2IT70 (2016) Parse Trees 25/41
Parsing CFG G with rules ε ab ba aabb L(G)? w {a,b} 2IT70 (2016) Parse Trees 26/41
Parsing CFG G with rules ε ab ba aabb L(G)? 1 2 3 4 ε a b b a ε awb bwa w 1 w 2 2IT70 (2016) Parse Trees 26/41
Parsing CFG G with rules ε ab ba aabb L(G)? 1 2 3 4 ε a b b a ε awb bwa w 1 w 2 2IT70 (2016) Parse Trees 26/41
Parsing CFG G with rules ε ab ba aabb L(G)? 2.1 2.2 2.3 2.4 a b a b a b a b ε a b b a 2IT70 (2016) Parse Trees 27/41
Parsing CFG G with rules ε ab ba aabb L(G)? 2.1 2.2 2.3 2.4 a b a b a b a b ε a b b a ab aawbb abwab aw 1 w 2 b 2IT70 (2016) Parse Trees 27/41
Parsing CFG G with rules ε ab ba aabb L(G)? 2.2.1 2.2.2 2.2.3 2.2.4 a b a b a b a b ε a b a b a b a b ε a b b a 2IT70 (2016) Parse Trees 28/41
Parsing CFG G with rules ε ab ba aabb L(G)? 2.2.1 2.2.2 2.2.3 2.2.4 a b a b a b a b ε a b a b a b a b ε a b b a aabb aaawbbb aabwabb aaw 1 w 2 bb 2IT70 (2016) Parse Trees 28/41
Parsing CFG G with rules ε ab ba aabb L(G)? 2.2.1 2.2.2 2.2.3 2.2.4 a b a b a b a b ε a b a b a b a b ε a b b a aabb aaawbbb aabwabb aaw 1 w 2 bb Thus aabb L(G) 2IT70 (2016) Parse Trees 28/41
Clicker question L91 parsing takes at most 2 w 1 rounds if no summand ε summands have at least one terminal or at least two variables 2IT70 (2016) Parse Trees 29/41
Clicker question L91 parsing takes at most 2 w 1 rounds if no summand ε summands have at least one terminal or at least two variables With the parsing procedure and restrictions above, we have that A. Parsing is linear in the length of the string B. Parsing is quadratic in the length of the string C. Parsing is exponential in the length of the string D. Can t tell 2IT70 (2016) Parse Trees 29/41
generated strings of terminals vs. yields of parse trees theorem CFG G = (V, T, R, ) A G w implies w = yield(pt) for parse tree PT with root A proof by induction on n: A n G w implies PT PT G(A): w = yield(pt) for all A V and w T 2IT70 (2016) Parse Trees 30/41
generated strings of terminals vs. yields of parse trees theorem CFG G = (V, T, R, ) A G w implies w = yield(pt) for parse tree PT with root A proof by induction on n: A n G w implies PT PT G(A): w = yield(pt) for all A V and w T thus L(G)={w T G w } {yield(pt) PT complete parse tree of G, root } 2IT70 (2016) Parse Trees 30/41
Clicker question L92 uppose X l G β for a CFG G. 2IT70 (2016) Parse Trees 31/41
Clicker question L92 uppose X l G β for a CFG G. Then it holds that A. αxγ l G αβγ for all α T and γ T B. αxγ l G αβγ for all α T and γ (V T) C. αxγ l G αβγ for all α (V T) and γ T D. αxγ l G αβγ for all α (V T) and γ (V T) E. Can t tell 2IT70 (2016) Parse Trees 31/41
From parse tree to leftmost production sequence theorem CFG G for parse tree PT, root A and yield w: A l G w proof induction on the height of the parse tree PT thus {yield(pt) PT complete parse tree of G, root } {w T G w }=L(G) 2IT70 (2016) Parse Trees 32/41
Different parse trees (harmless) a b a ε b ε aabb ε a b a ε b ε aabb ambiguous grammar ε ab 2IT70 (2016) Parse Trees 33/41
Different parse trees (harmful) E I E + E E * E ( E ) I a b c E E E E E + E I E + E E E E a I I I I c b c a b a*b+c a*b+c ambigious grammar 2IT70 (2016) Parse Trees 34/41
Different parse trees (harmful, cont.) E I E + E E * E ( E ) I a b c 14 2 7 2 3 + 4 a 3 4 10 6 + 4 2 3 4 2 3 c b c a b 2*3+4? wrong 2*3+4? right ambigious grammar 2IT70 (2016) Parse Trees 35/41
Disambiguation E T E + T T F T * F F I ( E ) I a b c syntactic categories: expression, term, factor, identifier 2IT70 (2016) Parse Trees 36/41
Disambiguation E E + T E T E + T T F T * F F I ( E ) I a b c T F T F I F I c I b a a*b+c syntactic categories: expression, term, factor, identifier 2IT70 (2016) Parse Trees 36/41