Chapter 5: Context-Free Languages Peter Cappello Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 cappello@cs.ucsb.edu Please read the corresponding chapter before attending this lecture. These notes are supplemented with figures, and material that arises during the lecture in response to questions. Please report any errors in these notes to cappello@cs.ucsb.edu. I ll fix them immediately. Based on An Introduction to Formal Languages and Automata, 3rd Ed., Peter Linz, Jones and Bartlett Publishers, Inc. 1
5.1 Context-Free Grammars Def. 1.1: A grammar G is a 4-tuple G = (V, T, S, P ), where V is a finite set of objects called variables, T is a finite set of objects called terminal symbols, S V is called the start variable, P is a finite set of productions. Example: Let G = ({S}, {a, b}, S, P ), where P has the following 2 productions: S asb, S λ. 2
We denote a derivation, for example, of the word aabb as follows: This is spoken S derives aasbb. S asb aasbb aabb. The strings S, asb, aasbb, and aabb are called sentential forms of the derivation. We may summarize the entire derivation using the notation S aabb. Def. 1.2 Let G = (V, T, S, P ) be a grammar. Then, the set is the language generated by G. L(G) = {w T : S w} 3
Def. 5.1: A grammar G = (V, T, S, P ) is called a context-free grammar (CFG) if all productions in P are of the form A x, where A V and x (V T ). A language L is called a context-free language (CFL) when there is a CFG G such that L = L(G). Every regular grammar is context-free. A context-free grammar is so called because when a variable A appears in a derivation, it always can be replaced with the right-hand-side (RHS) of its production (A xy z, e.g.): Its substitutability does not depend on the surrounding symbols or context. 4
Examples of Context-Free Languages Example 5.1 G = ({S}, {a, b}, S, P ) with productions S asa, S bsb, S λ. For example, S bsb bbsbb bbasabb bbaabb. In general, L(G) is what language (describe in English)? We know that L(G) is irregular. Thus, the family of CFLs properly contains the family of regular languages. 5
Example 5.4 G = ({S}, {(, )}, S, P ) with productions S (S), S SS, S λ. Is the string (()) in L(G)? Describe L(G) in English. What is the point, if any, of the 2 nd production? Do you think L(G) is regular? What is L(G) ( )? 6
Example G E = ({E, T, F }, {a, +,, [, ]}, E, P ) with productions E E + T T, T T F F, F [ E ] a. Is [a + a] * a in L(G E )? Do you think L(G E ) is regular? Consider the homomorphism h: h(a) = λ h(+) = λ h( ) = λ h([) = [ h(]) = ]. What is h(l(g E )) [ ]? 7
Leftmost and Rightmost Derivations A nonlinear grammar has a production whose RHS has more than 1 variable. In a production with n variables on the RHS, there are n! orders of replacing those variables. 8
For example, Let G = ({A, B, S}, {0, 1}, S, P ) with productions 1. S AB 2. A aaa 3. A λ 4. B Bb 5. B λ L(G) = {a 2n b m : n, m 0}. In particular, aab L(G), as the following 2 distinct derivations show: S AB aaab aab aabb aab S AB ABb aaabb aaab aab 9
To remove such differences, we can require that variables are replaced in a specific order. Def. 5.2: A derivation is leftmostwhen in each step the leftmost variable in the sentential form is replaced. If in each step the rightmost variable in the sentential form is replaced, we call the derivation rightmost. The following derivation is leftmost: S AB aaab aab aabb aab. 10
Derivation Trees (Illustrate a derivation tree for the word aab, using the previous grammar.) Def. 5.3: Let G = (V, T, S, P ) be a CFG. An ordered tree is a derivation tree for G when it has the following properties: 1. The root is labelled S. 2. Ever leaf has a label from T {λ}. 3. Every interior vertex has a label from V. 4. If a vertex has label A V, and its children are labelled, from left to right, a 1, a 2,..., a n, then P has a production A a 1 a 2 a n. 5. A leaf labelled λ has no siblings. A tree with properties 3, 4, and 5, and in which property 2 is replaced by 2a. Every leaf has a label from V T {λ} is called a partial derivation tree. 11
Example: (Using G E, illustrate a partial derivation tree and a derivation tree.) 12
Relation between Sentential Forms and Derivation Trees Thm. 5.1: Let G = (V, T, S, P ) be a CFG. w L(G) there is a derivation tree of G whose yield is w. If t G is a partial derivation tree for G whose root is labelled S, then the yield of t G is a sentential form of G. Proof: We prove a part of what is claimed, and leave the remainder as an exercise. We claim, using induction on the length of the derivation, that if s is a sentential form of L(G), then there is a partial derivation tree whose yield is s. Basis n = 1: If the length of the derivation is 1, the derivation must be of the form S a 1 a n. By the definition of derivation trees (rules 1 and 4), there is a partial 13
derivation tree whose root is S and whose children, from left to right, are a 1,..., a n. Induction hypothesis: Assume that for every sentential form derivable in n steps, there is a partial derivation tree. Induction step: Any sentential form s derivable in n + 1 steps must be such that S xay, where x, y (V T ), A V, in n steps (illustrate), and in the n + 1 st step xay xa 1 a n y = s (V T ). By the I.H., there is a partial derivation tree with yield xay By the definition of derivation trees, rule 4, the grammar must have a production of the form A a 1 a n. 14
We see that by putting children under the leaf labelled A, we get a partial derivation tree whose yield is xa 1 a n y = s (Illustrate.) By induction, we thus claim that it is true for all finitely derived sentential forms. As a consequence of the foregoing, since a derivation tree is a partial derivation tree whose leaves are terminals, every w L(G) is the yield of some derivation tree. 15