Context-Free Grammars (and Languages) Lecture 7 1
Today Beyond regular expressions: Context-Free Grammars (CFGs) What is a CFG? What is the language associated with a CFG? Creating CFGs. Reasoning about CFGs. 2
Compiler Frontend Rules encoded as regular expressions Rules cannot be encoded as regular expressions for (i=0; i<n; i++) { a++; } Lexical Analyzer Parser stmt for ( id = number ; id < id ; id ++! ) { id ++ ; } for-stmt for ( expr ; expr ; expr ) stmt lval = rval id < id id ++ { stmt } id number expr id ++ ; 3
Biological Models 4 en.wikipedia.org/wiki/l-system
Biological Models Rule: 5
Biological Models Rule: or 6 Grammar: Rewriting rules for generating a set of strings (i.e., a language) from a seed
Context-Free Grammar Example: a (simplistic) syntax for arithmetic expressions 7 expr expr + expr expr expr expr expr a b c e.g. expr * a + b c expr expr + expr derives b c (This grammar is ambiguous since there is another parse tree for the same string) a expr expr
Context-Free Grammar Example: a (simplistic) syntax for arithmetic expressions expr expr + expr expr expr expr expr a b c e.g. expr * a + b c derives Σ = {a,b,c,+, } V = {expr, } G = (Σ, V, P, S)# (terminals)# (non-terminals)# P = {(A,α) A α} (prod. rules) S = expr expr expr + expr expr expr a b c short-hand (start symbol) 8
Context-Free Grammar : Arrows Production Rule: A π, A V, π (Σ V)*! Immediately Derives: α1 α2 if α1, α2 (Σ V)* s.t., α1 = βaγ, α2 = βπ γ and A π % More clearly, if grammar is G, we write α G* α# Derives: α * α# if α1,, αt+1 (Σ V)* s.t. expr expr + expr expr expr a b c expr expr + expr expr + expr expr + expr expr 9 α1=α, αt+1=α#, and for all i [1, t], αi αi+1 t-step derivation α t α# expr * expr + expr expr * + * a + b c! expr * a + b c
Context-Free Languages The language generated by a grammar G with start symbol S and alphabet Σ, L(G) = { w Σ* S G* w }# Languages generated by a context free grammars are called Context Free Languages (CFL) 10
Examples Over Σ = { 0,1 }, give a grammar for the following languages: L = { 0 n 1 n n 0 }! S ε 0S1 L = { w w = w R }# S ε 0 1 0S0 1S1 L = { 0 m 1 n m n }! S A B Z ε 0Z1 // 0 n 1 n A 0Z 0A // 0 m 1 n with m > n B Z1 B1 // 0 m 1 n with m < n L = { 0 m 1 n m < n }# Z ε 0Z1 // 0 n 1 n S Z1 S1 // 0 m 1 n with m < n 11
Parse Tree Parse Tree captures the structure of derivations for a given string (but not the exact order) expr * a + b c expr 12 The exact order of derivations is not important But structure is important! Ambiguous grammar: If some string has two different parse trees expr + expr a expr expr expr * expr + expr expr * + * a + b c! expr * a + expr * a + expr c * a + b c b c
Ambiguity expr expr + expr expr expr a b c expr * a + b c expr expr expr expr expr + expr expr + expr expr expr c a a b b c 13
An Unambiguous Grammar expr term + expr term! term term a b c In practice, unambiguous grammars are important (e.g., in compilers) expr * a + b c expr term + expr Operator precedence enforced by requiring all carried out (to get a term ) before any + a term term 14 There are CFLs which do not have any unambiguous grammar: inherently ambiguous languages b c
Examples L = L(0*)# S ε 0 SS : Ambiguous! S ε 0S : Unambiguous L = set of all strings with balanced parentheses S ε (S) SS : Ambiguous! T ( ) (S) S ε TS : Unambiguous 15
Examples L = set of all valid regular expressions over {0, 1} An ambiguous grammar (start symbol S, Σ = {Ø,e,0,1,+,*,(,)} ): S Ø e 0 1 (S) S* SS S+S An unambiguous grammar for a subset of regular expressions: S Ø e 0 1 (S) (S*) (SS) (S+S) Exercise: An unambiguous grammar for all valid regular expressions 16
Proving Correctness of Grammars Claim: Let L = { w # 0 (w) = # 1 (w) }. Then, L(G) = L where the productions of G are: S 0S1 1S0 SS ε! Challenge: Give an unambiguous grammar Proof: Need to prove both L(G) L and L(G) L. Prove L(G) L by induction on the length of derivations (or height of parse trees)! Prove L(G) L by induction on the length of strings.! 17
Proving Correctness of Grammars Claim: Let L = { w # 0 (w) = # 1 (w) }. Then, L(G) = L where the productions of G are: S 0S1 1S0 SS ε! Proof: Proving L(G) L by induction on the length of derivations.! Let w L(G). S t w for some t 1. Induction on t to show that w L. Base case: t=1. Only string derived is ε. 18 Induction step: Consider t > 1. Suppose all u s.t. S k u, k < t, in L. Let w be such that S t w. i.e., S α 1 t-1 w. Case α 1 =0S1: w = 0u1 and S t-1 u. By IH, # 0 (u)=# 1 (u). Hence # 0 (w) = # 0 (u)+1 = # 1 (v)+1 = # 1 (w). (Case α 1 =1S0 is symmetric.) Case α 1 =SS: w = uv and S m u, S n v, 1 m,n < t (m+n = t-1). By IH, # 0 (u)=# 1 (u) & # 0 (v)=# 1 (v). Hence # 0 (w) = # 0 (u)+# 0 (v) = # 1 (u)+# 1 (v) = # 1 (w)
Proving Correctness of Grammars Claim: Let L = { w # 0 (w) = # 1 (w) }. Then, L(G) = L where the productions of G are: S 0S1 1S0 SS ε! Proof: Proving L(G) L by induction on the length of strings.! Suppose w L. To show by induction on w that w L(G). Base cases: w =0. ε L(G). No string with w =1 in L(G). 19 Induction step: Let n 2. Suppose u L(G) for all u L with u < n. Let w L be such that w =n; i.e., # 0 (w)=# 1 (w). Case w=0u1: Then u L and u < n. By IH, u L(G). i.e., S *u. Hence, S 0S1 * 0u1 = w. (Case w=1u0 is symmetric.) Case w=0u0: Let d i = # 0 (i-long prefix of w) # 1 (i-long prefix of w). Then d 1 = 1, d n = 0, d n-1 = -1. So 1 < m n-1 s.t., d m = 0. i.e., w=xy, where x, y < w, and x,y L. By IH, x,y L(G). Hence S SS * xy = w. (Case w=1u1 is symmetric.)
Proving Correctness of Grammars Often will need to strengthen the claim to include strings generated by every iable in the grammar! Claim: Let L = { w # 0 (w) = # 1 (w) }. Then, L(G) = L where productions of G are: S AB BA ε A 0 AS SA B 1 BS SB! Stronger Claim: A derives all strings w s.t. # 0 (w) = # 1 (w)+1. B derives all strings w s.t. # 1 (w) = # 0 (w)+1. S derives all strings w s.t. # 0 (w) = # 1 (w). 20
Closure Properties for CFL Union: If L1 and L2 are CFLs, so is L1 L2. Let G1 = (Σ, V1, P1, S1), G2 = (Σ, V2, P2, S2) with V1 V2 = Ø. Let G = (Σ, V, P, S) with V = V1 V2 {S}, and P = P1 P2 { S S1 S2 }. Then L(G) = L(G1) L(G2).# Concatenation: If L1 and L2 are CFLs, so is L1 L2. Let G1 = (Σ, V1, P1, S1), G2 = (Σ, V2, P2, S2) with V1 V2 = Ø. Let G = (Σ, V, P, S) with V = V1 V2 {S}, and P = P1 P2 { S S1 S2 }. Then L(G) = L(G1) L(G2).# 21 Kleene Star: If L1 is a CFL, so is L1*. Let G1 = (Σ, V1, P1, S1). Let G = (Σ, V, P, S) with V = V1 {S}, and P = P1 { S ε S S1 }. Then L(G) = L(G1)*.
Closure Properties for CFL CFLs are not closed under intersection or complement Intersection: L1 = { 0 i 1 j 0 k i=j } & L1 = { 0 i 1 j 0 k j=k } are CFLs. But it turns out that L1 L2 = { 0 i 1 j 0 k i=j=k } is not a CFL! Complement: If CFLs were to be closed under complementation, since they are already closed under union, they would have been closed under intersection! 22
Grammars Rewriting rules for generating strings from a seed In an unrestricted grammar, the rules are of the form α β where α, β (Σ V)* Context-Free Grammar: Rewriting rules apply to individual iables (with no context ) All languages Languages with algorithms/ unrestricted grammars Context Free Languages Regular Languages 23