Creating a Recursive Descent Parse Table Recursive descent parsing is sometimes called LL parsing (Left to right examination of input, Left derivation) Consider the following grammar E TE' E' +TE' T FT' T' *FT' F ( E ) a Compute the First/Follow sets. Rules for forming first: 1. If X is a terminal, then First(X) = {X} 2. First(ε) = {ε} 3. If X ε, then add ε to First(X); 4. If X is a nonterminal and X Y 1 Y 2 Y 2 Y n then place a in First(X) if a is in the first Y i and all earlier Y's have ε in their first sets. If all Y i contain ε in their first sets, then add ε to First(X). Rules for forming follow: 1. place $ in the follow(s) where S is the start symbol and $ is the righthand marker. 2. If there is a production A αbβ then everything in First (β) except for ε is placed in Follow(B). 3. If there is a production A αb or a production A αbβ where FIRST(β) contains ε then everything in Follow(A) is in Follow(B). (Note, α refers to anythng that precedes B on the right hand side of the production and β refers to anything that follows B on the right hand side of the production.) With these rules, cover the answers (below) and compute the First/Follow sets for the above grammar. First E = { a ( } First E' = {+ } First T = { a ( } First T' = {* } First F = { a ( } Follow E = { ) $} Follow E' = { ) $} Follow T = {+ ) $ } Follow T' = {+ ) $ } Follow F = {+ * ) $ }
What do you notice about the grammar? E TE' E' +TE' T FT' T' *FT' F ( E ) a It does the same thing as our favorite expression grammar, E E + T T T T * F F F (E) a but has been massaged so it works better with a recursive descent parser. Notice that the prime versions of the non-terminals (E and T ) have been named to show that they were derived from the original non-terminals. Javacc (which we will experiment with) creates a recursive descent parser, and thus its grammar will need to be massaged. Study the rules for creating the parsing table to see why the original grammar had to be massaged. Constructing a predictive parsing table. Create a table M with rows labeled with nonterminals and columns labeled as terminals plus $ (for no more input). The rows represent the current non terminal you are trying to determine a production for. The columns represent the current input symbol. 1. For each production A α of the grammar (where α is anything on the right hand side of the production), do steps 2 and 3. Note that if α= Y 0 Y 1 Y 2 Y n then First(α) contains the terminals in First(Y 0 ). If First(Y 0 ) contains, then First(α) contains the terminals of First(Y 1 ), and so on. First(α) contains only if all Y i contain in their first sets. 2. For each terminal t in First(α), add A α to M[A,t] 3. If ε is in First(α), add A α to M[A,b] for each terminal b in Follow(A). 4. If ε is in First(α) and $ is in Follow(A), add add A α to M[A,$]. 5. Consider each undefined entry of M to be an ERROR. Using these rules, create a recursive descent table. Cover the table below and then compare your answers. Using the table, show how you could parse (a*a*a) +a$ with only one character look ahead.
Bottom Up Parsing The correct RHS in a given right sentential form to rewrite to get the previous right sentential form, is called a handle. Consider the following grammar: E E + T T T T * F F F (E) a Show the rightmost derivation for a+a*a E E + T E + T*F E + T*a E + F*a E + a*a T + a*a F + a*a a + a*a Now, start at the last step of the rightmost derivation and go backwards. This is what we are trying to do with a bottom up parser. We need to first determine that the a becomes an F, then that the F becomes a T, etc. In order to do bottom up parsing, we will rewrite the grammar only slightly so we can distinguish between the two productions shown on the same line. And we will give the productions numbers so we can refer to them 1. E E + T 2. E T 3. T T * F 4. T F 5. F (E) 6. F a Your text contains the parsing table we will use. The parsing table is generated by a tool like YACC, and the generation is beyond the scope of this course. However, we need to be able to use the parsing table. Take a look at figure 4.3 in which you see a parse stack, the input, and the parsing table. This models what we do. We want to read the input one symbol at a time and make a decision as to what to do, given a history of what we have done in the past. We have a parse stack to help us remember these important details. As we look at an input, there are only two choices of actions we take: 1. We read the input and shift it over to our stack along with a state. Thus, the action S4 means to shift the current symbol to the stack and then push the state 4 to the stack.
2. We look at the input and decide not to delete it from the input. Instead, we look at the top symbols of our stack and remove some and place others on. This step is called a reduce step. The number of the reduce does NOT refer to a state. It refers to the production which you use. For R5 (for example), the steps are as follows: a. Look at production 5 F (E). In our case, there are three symbols on the right hand side. This means for us to remove three sets of symbols from our stack. b. You should see an (, an E, and an ) (along with their states) on the top of the stack. Remove these. c. They want to become an F (going backwards on the production). d. Before you place the F on the stack, look at the exposed state on the stack. Use that exposed state and F to determine the new state (using the Goto part of the table). e. That new state is then placed after F on the stack.