Syntactic Analysis Chapter 4: Bottom-up Analysis 106 / 338
Bottom-up Analysis Attention: Many grammars are not LL(k)! A reason for that is: Definition Grammar G is called left-recursive, if A + A β for an A N, β (T N) 107 / 338
Bottom-up Analysis Attention: Many grammars are not LL(k)! A reason for that is: Definition Grammar G is called left-recursive, if A + A β for an A N, β (T N) Example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 name 1 2... is left-recursive 107 / 338
Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S Assumption: G is LL(k) 108 / 338
Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S Assumption: G is LL(k) First k (α β n γ) First k (α β n+1 γ) = 108 / 338
Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S A S Assumption: G is LL(k) First k (α β n γ) First k (α β n+1 γ) = n A A β n β γ First k ( ) 108 / 338
Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S A S Assumption: G is LL(k) First k (α β n γ) First k (α β n+1 γ) = n A A β n β γ α First k ( ) 108 / 338
Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S A S Assumption: G is LL(k) First k (α β n γ) First k (α β n+1 γ) = A n A β n β γ A β α First k ( ) 108 / 338
Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S Assumption: G is LL(k) First k (α β n γ) First k (α β n+1 γ) = Case 1: β ɛ Contradiction!!! Case 2: β w ɛ == First k (α β k γ) First k (α β k+1 γ) 108 / 338
Shift-Reduce Parser Idea: We delay the decision whether to reduce until we know, whether the input matches the right-hand-side of a rule! Donald Knuth Construction: Shift-Reduce parser M R G The input is shifted successively to the pushdown. Is there a complete right-hand side (a handle) atop the pushdown, it is replaced (reduced) by the corresponding left-hand side 109 / 338
Shift-Reduce Parser Example: The pushdown automaton: S A B A a B b States: q 0, f, a, b, A, B, S; Start state: q 0 End state: f q 0 a q 0 a a ɛ A A b A b b ɛ B A B ɛ S q 0 S ɛ f 110 / 338
Shift-Reduce Parser Construction: In general, we create an automaton M R G = (Q, T, δ, q 0, F) with: Q = T N {q 0, f } (q 0, f fresh); F = {f }; Transitions: δ = {(q, x, q x) q Q, x T} // Shift-transitions {(q α, ɛ, q A) q Q, A α P} // Reduce-transitions {(q 0 S, ɛ, f )} // finish 111 / 338
Shift-Reduce Parser Construction: In general, we create an automaton M R G = (Q, T, δ, q 0, F) with: Q = T N {q 0, f } (q 0, f fresh); F = {f }; Transitions: δ = {(q, x, q x) q Q, x T} // Shift-transitions {(q α, ɛ, q A) q Q, A α P} // Reduce-transitions {(q 0 S, ɛ, f )} // finish Example-computation: (q 0, a b) (q 0 a, b) (q 0 A, b) (q 0 A b, ɛ) (q 0 A B, ɛ) (q 0 S, ɛ) (f, ɛ) 111 / 338
Shift-Reduce Parser Observation: The sequence of reductions corresponds to a reverse rightmost-derivation for the input To prove correctnes, we have to prove: (ɛ, w) (A, ɛ) iff A w The shift-reduce pushdown automaton M R G is in general also non-deterministic For a deterministic parsing-algorithm, we have to identify computation-states for reduction == LR-Parsing 112 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: counter 2 + 40 E 1 E 0 + T 1 Pushdown: ( q 0 ) T 0 F 2 T 1 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: 2 + 40 E 1 E 0 + T 1 Pushdown: ( q 0 name ) T 0 F 2 T 1 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: 2 + 40 E 1 E 0 + T 1 Pushdown: ( q 0 F ) T 0 T 1 F 2 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: 2 + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T ) T 0 F 2 T 1 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: 2 + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T ) T 0 F 2 T 1 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T ) T 0 F 2 T 1 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T F ) T 0 F 2 T 1 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T ) T 0 T 1 F 2 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: + 40 E 1 E 0 + T 1 Pushdown: ( q 0 E ) T 0 T 1 F 2 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: 40 E 1 E 0 + T 1 Pushdown: ( q 0 E + ) T 0 T 1 F 2 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( q 0 E + ) T 0 T 1 F 2 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( q 0 E + F ) T 0 T 1 F 2 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( q 0 E + T ) T 0 T 1 F 2 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( q 0 E ) T 0 T 1 F 2 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( f ) T 0 T 1 F 2 F 2 F 1 name 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T F ) T 0 F 2 T 1 F 2 F 1 Generic Observation: In a sequence of configurations of M R G name (q 0 α γ, v) (q 0 α B, v) (q 0 S, ɛ) we call α γ a viable prefix for the complete item [B γ ]. 113 / 338
Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( q 0 E + F ) T 0 T 1 F 2 F 2 F 1 Generic Observation: In a sequence of configurations of M R G name (q 0 α γ, v) (q 0 α B, v) (q 0 S, ɛ) we call α γ a viable prefix for the complete item [B γ ]. 113 / 338
Bottom-up Analysis: Viable Prefix α γ is viable for [B γ ] iff S R α B v A 0 i 0 α 1 A 1 i 1 α 2 A 2 i 2 α m B i γ... with α = α 1... α m 114 / 338
Bottom-up Analysis: Viable Prefix α γ is viable for [B γ ] iff S R α B v A 0 i 0 α 1 A 1 i 1 α 2 A 2 i 2 α m B i γ... with α = α 1... α m Conversely, for an arbitrary valid word α we can determine the set of all later on possibly matching rules... 114 / 338
Bottom-up Analysis: Admissible Items The item [B γ β] is called admissible for α iff S R α B v with α = α γ : A 0 i 0 α 1 A 1 i 1 α 2 A 2 i 2 α m B i γ β... with α = α 1... α m 115 / 338
Characteristic Automaton Observation: The set of viable prefixes from (N T) for (admissible) items can be computed from the content of the shift-reduce parser s pushdown with the help of a finite automaton: States: Items Start state: [S S] Final states: {[B γ ] B γ P} Transitions: (1) ([A α X β],x,[a α X β]), X (N T), A α X β P; (2) ([A α B β],ɛ, [B γ]), A α B β, B γ P; The automaton c(g) is called characteristic automaton for G. 116 / 338
Characteristic Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 S E E S E E E+ T E + T E E + T E E+ T E E+ T E T T E T T T F T F T T F T T F T T F T F F T F ( E ) F ( E ) F ( E ) F ( E ) F ( E ) F F 117 / 338
Characteristic Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 S E E S E E E+ T E + T E E + T E E+ T E E+ T E T T E T T T F T F T T F T T F T T F T F F T F ( E ) F ( E ) F ( E ) F ( E ) F ( E ) F F 117 / 338
Canonical LR(0)-Automaton The canonical LR(0)-automaton LR(G) is created from c(g) by: 1 performing arbitrarily many ɛ-transitions after every consuming transition 2 performing the powerset construction... for example: 0 T E F ( ( 1 4 3 F 5 T 2 + F ( ( 6 E + 8 * ) T 11 9 7 * F 10 118 / 338
Canonical LR(0)-Automaton Example: E E + T 0 T 1 Therefore we determine: T T F 0 F 1 F ( E ) 0 2 119 / 338
Canonical LR(0)-Automaton Example: E E + T 0 T 1 Therefore we determine: T T F 0 F 1 F ( E ) 0 2 q 0 = {[S E], q 1 = δ(q 0, E) = {[S E ], {[E E + T], {[E E + T]} {[E T], {[T T F]} q 2 = δ(q 0, T) = {[E T ], {[T F], {[T T F]} {[F ( E ) ], {[F ]} q 3 = δ(q 0, F) = {[T F ]} q 4 = δ(q 0, ) = {[F ]} 119 / 338
Canonical LR(0)-Automaton q 5 = δ(q 0, ( ) = {[F ( E ) ], q 7 = δ(q 2, ) = {[T T F], {[E E + T], {[F ( E ) ], {[E T], {[F ]} {[T T F], {[T F], q 8 = δ(q 5, E) = {[F ( E ) ]} {[F ( E ) ], {[E E + T]} {[F ]} q 9 = δ(q 6, T) = {[E E + T ], q 6 = δ(q 1, +) = {[E E + T], {[T T F]} {[T T F], {[T F], q 10 = δ(q 7, F) = {[T T F ]} {[F ( E ) ], {[F ]} q 11 = δ(q 8, ) ) = {[F ( E ) ]} 120 / 338
Canonical LR(0)-Automaton Observation: The canonical LR(0)-automaton can be created directly from the grammar. Therefore we need a helper function δɛ (ɛ-closure) We define: δ ɛ (q) = q {[B γ] [A α B β ] q, β (N T) : B B β} States: Sets of items; Start state: δ ɛ {[S S]} Final states: {q A α P : [A α ] q} Transitions: δ(q, X) = δ ɛ {[A α X β] [A α X β] q} 121 / 338
LR(0)-Parser Idea for a parser: The parser manages a viable prefix α = X 1... X m on the pushdown and uses LR(G), to identify reduction spots. It can reduce with A γ, if [A γ ] is admissible for α Optimization: We push the states instead of the X i in order not to process the pushdown s content with the automaton anew all the time. Reduction with A γ leads to popping the uppermost γ states and continue with the state on top of the stack and input A. Attention: This parser is only deterministic, if each final state of the canonical LR(0)-automaton is conflict free. 122 / 338
LR(0)-Parser... for example: q 1 = {[S E ], {[E E + T]} q 2 = {[E T ], q 9 = {[E E + T ], {[T T F]} {[T T F]} q 3 = {[T F ]} q 10 = {[T T F ]} q 4 = {[F ]} q 11 = {[F ( E ) ]} The final states q 1, q 2, q 9 contain more then one admissible item non deterministic! 123 / 338
LR(0)-Parser The construction of the LR(0)-parser: States: Q {f } (f fresh) Start state: q 0 Final state: f Transitions: Shift: (p, a, p q) if q = δ(p, a) Reduce: (p q 1... q m, ɛ, p q) if [A X 1... X m ] q m, q = δ(p, A) Finish: (q 0 p, ɛ, f ) if [S S ] p with LR(G) = (Q, T, δ, q 0, F). 124 / 338
LR(0)-Parser Correctness: we show: The accepting computations of an LR(0)-parser are one-to-one related to those of a shift-reduce parser M R G. we conclude: The accepted language is exactly L(G) The sequence of reductions of an accepting computation for a word w T yields a reverse rightmost derivation of G for w 125 / 338
LR(0)-Parser Attention: Unfortunately, the LR(0)-parser is in general non-deterministic. We identify two reasons: Reduce-Reduce-Conflict: [A γ ], [A γ ] q with A A γ γ Shift-Reduce-Conflict: [A γ ], [A α a β] q with a T for a state q Q. Those states are called LR(0)-unsuited. 126 / 338
Revisiting the Conflicts of the LR(0)-Automaton What differenciates the particular Reductions and Shifts? Input: E 0 2 + 40 E 1 + T 1 Pushdown: ( q 0 T ) E 1?? T 0 F 2 T 1 F 2 F 1 name E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 127 / 338
Revisiting the Conflicts of the LR(0)-Automaton What differenciates the particular Reductions and Shifts? Input: + 40 T 0? E 1 E 0 + T 1 Pushdown: ( q 0 T )? T 0 F 2 T 1 F 2 F 1 name E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 127 / 338
Revisiting the Conflicts of the LR(0)-Automaton Idea: Matching lookahead with right context matters! Input: E 0 2 + 40 E 1 + T 1 Pushdown: ( q 0 T ) E 1?? T 0 F 2 T 1 F 2 F 1 name E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 127 / 338
Revisiting the Conflicts of the LR(0)-Automaton Idea: Input: Pushdown: ( q 0 T ) Matching lookahead with right context matters! + 40 T 0? E 1? T 0 E 0 + T 1 F 2 T 1 F 2 F 1 name E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 127 / 338
LR(k)-Grammars Idea: Consider k-lookahead in conflict situations. Definition: The reduced contextfree grammar G is called LR(k)-grammar, if for First k (w) = First k (x) with: } S R α A w α β w S R α A w follows: α = α A = A w = x α β x 128 / 338
LR(k)-Grammars Idea: Consider k-lookahead in conflict situations. Definition: The reduced contextfree grammar G is called LR(k)-grammar, if for First k (w) = First k (x) with: } S R α A w α β w S R α A w follows: α = α A = A w = x α β x Strategy for testing Grammars for LR(k)-property 1 Focus iteratively on all rightmost derivations S R α X w α β w 2 Identify handle α β in sentence forms α β w 3 Determine minimal k, such that First k (w) associates β with a unique X β for non-prefixfree α βs 128 / 338
LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1 129 / 338
LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k : Let S R α X w α β w. Then α β is of one of these forms: 129 / 338
LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k : Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) 129 / 338
LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k but LR(0): Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) 129 / 338
LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k but LR(0): Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) (2) S a A c A A b b b 129 / 338
LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k but LR(0): Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) (2) S a A c A A b b b... is also not LL(k) for any k : Let S R α X w α β w. Then α β is of one of these forms: 129 / 338
LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k but LR(0): Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) (2) S a A c A A b b b... is also not LL(k) for any k : Let S R α X w α β w. Then α β is of one of these forms: a b, a A b b, a A c 129 / 338
LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k but LR(0): Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) (2) S a A c A A b b b... is also not LL(k) for any k but again LR(0): Let S R α X w α β w. Then α β is of one of these forms: a b, a A b b, a A c 129 / 338
LR(k)-Grammars for example: (3) S a A c A b b A b 130 / 338
LR(k)-Grammars for example: (3) S a A c A b b A b Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: 130 / 338
LR(k)-Grammars for example: (3) S a A c A b b A b Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: a b 2n b c, a b 2n b b A c, a A c 130 / 338
LR(k)-Grammars for example: (3) S a A c A b b A b... is not LR(0), but LR(1): Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: a b 2n b c, a b 2n b b A c, a A c 130 / 338
LR(k)-Grammars for example: (3) S a A c A b b A b... is not LR(0), but LR(1): Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: a b 2n b c, a b 2n b b A c, a A c (4) S a A c A b A b b 130 / 338
LR(k)-Grammars for example: (3) S a A c A b b A b... is not LR(0), but LR(1): Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: a b 2n b c, a b 2n b b A c, a A c (4) S a A c A b A b b Consider the rightmost derivations: S R a b n A b n c a b n b b n c 130 / 338
LR(k)-Grammars for example: (3) S a A c A b b A b... is not LR(0), but LR(1): Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: a b 2n b c, a b 2n b b A c, a A c (4) S a A c A b A b b... is not LR(k) for any k 0: Consider the rightmost derivations: S R a b n A b n c a b n b b n c 130 / 338
LR(1)-Parsing Idea: Let s equip items with 1-lookahead Definition LR(1)-Item An LR(1)-item is a pair [B α β, x] with x Follow 1 (B) = {First 1 (ν) S µ B ν} 131 / 338
Admissible LR(1)-Items The item [B α β, x] is admissable for γ α if: S R γ B w with {x} = First 1 (w) S i 0 γ 0 A 1 i 1 γ 1 A m i m γ m B i α β x w... with γ 0... γ m = γ 132 / 338
The Characteristic LR(1)-Automaton The set of admissible LR(1)-items for viable prefixes is again computed with the help of the finite automaton c(g, 1). The automaton c(g, 1): States: LR(1)-items Start state: [S S, ɛ] Final states: {[B γ, x] B γ P, x Follow 1 (B)} Transitions: (1) ([A α X β, x],x,[a α X β, x]), X (N T) (2) ([A α B β, x],ɛ, [B γ, x ]), A α B β, B γ P, x First 1 (β) {x}; 133 / 338
The Characteristic LR(1)-Automaton The set of admissible LR(1)-items for viable prefixes is again computed with the help of the finite automaton c(g, 1). The automaton c(g, 1): States: LR(1)-items Start state: [S S, ɛ] Final states: {[B γ, x] B γ P, x Follow 1 (B)} Transitions: (1) ([A α X β, x],x,[a α X β, x]), X (N T) (2) ([A α B β, x],ɛ, [B γ, x ]), A α B β, B γ P, x First 1 (β) {x}; This automaton works like c(g) but additionally manages a 1-prefix from Follow 1 of the left-hand sides. 133 / 338
The Canonical LR(1)-Automaton The canonical LR(1)-automaton LR(G, 1) is created from c(g, 1), by performing arbitrarily many ɛ-transitions and then making the resulting automaton deterministic... 134 / 338
The Canonical LR(1)-Automaton The canonical LR(1)-automaton LR(G, 1) is created from c(g, 1), by performing arbitrarily many ɛ-transitions and then making the resulting automaton deterministic... But again, it can be constructed directly from the grammar; analoguously to LR(0), we need the ɛ-closure δ ɛ as a helper function: δɛ (q) = q {[C γ, x] [A α B β, x ] q, β (N T) : B C β x First 1 (β β ) {x }} 134 / 338
The Canonical LR(1)-Automaton The canonical LR(1)-automaton LR(G, 1) is created from c(g, 1), by performing arbitrarily many ɛ-transitions and then making the resulting automaton deterministic... But again, it can be constructed directly from the grammar; analoguously to LR(0), we need the ɛ-closure δ ɛ as a helper function: δɛ (q) = q {[C γ, x] [A α B β, x ] q, β (N T) : B C β x First 1 (β β ) {x }} Then, we define: States: Sets of LR(1)-items; Start state: δ ɛ {[S S, ɛ]} Final states: {q A α P : [A α, x] q} Transitions: δ(q, X) = δ ɛ {[A α X β, x] [A α X β, x] q} 134 / 338
The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 0 = {[S E ], q 3 = δ(q 0, F) = {[T F ]} {[E E + T ], {[E T ], q 4 = δ(q 0, ) {[F ]} {[T T F ], {[T F ], q 5 = δ(q 0, ( ) = {[F ( E ) ], {[F ( E ) ], {[E E + T ], {[F ]} {[E T ], {[T T F ], q 1 = δ(q 0, E) = {[S E ], {[T F ], {[E E + T ]} {[F ( E ) ], {[F ]} q 2 = δ(q 0, T) = {[E T ], {[T T F ]} 135 / 338
The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 0 = {[S E, {ɛ}], q 3 = δ(q 0, F) = {[T F ]} {[E E + T, {ɛ, +}], {[E T, {ɛ, +}], q 4 = δ(q 0, ) {[F ]} {[T T F, {ɛ, +, }], {[T F, {ɛ, +, }], q 5 = δ(q 0, ( ) = {[F ( E ) ], {[F ( E ), {ɛ, +, }], {[E E + T ], {[F, {ɛ, +, }]} {[E T ], {[T T F ], q 1 = δ(q 0, E) = {[S E ], {[T F ], {[E E + T ]} {[F ( E ) ], {[F ]} q 2 = δ(q 0, T) = {[E T ], {[T T F ]} 135 / 338
The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 0 = {[S E, {ɛ}], q 3 = δ(q 0, F) = {[T F, {ɛ, +, }]} {[E E + T, {ɛ, +}], {[E T, {ɛ, +}], q 4 = δ(q 0, ) {[F, {ɛ, +, }]} {[T T F, {ɛ, +, }], {[T F, {ɛ, +, }], q 5 = δ(q 0, ( ) = {[F ( E ) ], {[F ( E ), {ɛ, +, }], {[E E + T ], {[F, {ɛ, +, }]} {[E T ], {[T T F ], q 1 = δ(q 0, E) = {[S E, {ɛ}], {[T F ], {[E E + T, {ɛ, +}]} {[F ( E ) ], {[F ]} q 2 = δ(q 0, T) = {[E T, {ɛ, +}], {[T T F, {ɛ, +, }]} 135 / 338
The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 0 = {[S E, {ɛ}], q 3 = δ(q 0, F) = {[T F, {ɛ, +, }]} {[E E + T, {ɛ, +}], {[E T, {ɛ, +}], q 4 = δ(q 0, ) {[F, {ɛ, +, }]} {[T T F, {ɛ, +, }], {[T F, {ɛ, +, }], q 5 = δ(q 0, ( ) = {[F ( E ), {ɛ, +, }], {[F ( E ), {ɛ, +, }], {[E E + T, { ), +}], {[F, {ɛ, +, }]} {[E T, { ), +}], {[T T F, { ), +, }], q 1 = δ(q 0, E) = {[S E, {ɛ}], {[T F, { ), +, }], {[E E + T, {ɛ, +}]} q 2 = δ(q 0, T) = {[E T, {ɛ, +}], {[T T F, {ɛ, +, }]} {[F ( E ), { ), +, }], {[F, { ), +, }]} 135 / 338
The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 5 = δ(q 5, ( ) = {[F ( E ) ], q 7 = δ(q 2, ) = {[T T F ], {[E E + T ], {[F ( E ) ], {[E T ], {[F ]} {[T T F ], {[T F ], q 8 = δ(q 5, E) = {[F ( E ) ]} {[F ( E ) ], {[E E + T ]} {[F ]} q 9 = δ(q 6, T) = {[E E + T ], q 6 = δ(q 1, +) = {[E E + T ], {[T T F ]} {[T T F ], {[T F ], q 10 = δ(q 7, F) = {[T T F ]} {[F ( E ) ], {[F ]} q 11 = δ(q 8, ) ) = {[F ( E ) ]} 136 / 338
The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 5 = δ(q 5, ( ) = {[F ( E ), { ), +, }], q 7 = δ(q 2, ) = {[T T F ], {[E E + T, { ), +}], {[F ( E ) ], {[E T, { ), +}], {[F ]} {[T T F, { ), +, }], {[T F, { ), +, }], q 8 = δ(q 5, E) = {[F ( E ) ]} {[F ( E ), { ), +, }], {[E E + T ]} {[F, { ), +, }]} q 9 = δ(q 6, T) = {[E E + T ], q 6 = δ(q 1, +) = {[E E + T ], {[T T F ]} {[T T F ], {[T F ], q 10 = δ(q 7, F) = {[T T F ]} {[F ( E ) ], {[F ]} q 11 = δ(q 8, ) ) = {[F ( E ) ]} 136 / 338
The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 5 = δ(q 5, ( ) = {[F ( E ), { ), +, }], q 7 = δ(q 2, ) = {[T T F ], {[E E + T, { ), +}], {[F ( E ) ], {[E T, { ), +}], {[F ]} {[T T F, { ), +, }], {[T F, { ), +, }], q 8 = δ(q 5, E) = {[F ( E ) ]} {[F ( E ), { ), +, }], {[E E + T ]} {[F, { ), +, }]} q 9 = δ(q 6, T) = {[E E + T ], q 6 = δ(q 1, +) = {[E E + T, {ɛ, +}], {[T T F ]} {[T T F, {ɛ, +, }], {[T F, {ɛ, +, }], q 10 = δ(q 7, F) = {[T T F ]} {[F ( E ), {ɛ, +, }], {[F, {ɛ, +, }]} q 11 = δ(q 8, ) ) = {[F ( E ) ]} 136 / 338
The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 5 = δ(q 5, ( ) = {[F ( E ), { ), +, }], q 7 = δ(q 2, ) = {[T T F, {ɛ, +, }], {[E E + T, { ), +}], {[F ( E ), {ɛ, +, }], {[E T, { ), +}], {[F, {ɛ, +, }]} {[T T F, { ), +, }], {[T F, { ), +, }], q 8 = δ(q 5, E) = {[F ( E ), {ɛ, +, }]} {[F ( E ), { ), +, }], {[E E + T, { ), +}]} {[F, { ), +, }]} q 9 = δ(q 6, T) = {[E E + T, {ɛ, +}], q 6 = δ(q 1, +) = {[E E + T, {ɛ, +}], {[T T F, {ɛ, +, }]} {[T T F, {ɛ, +, }], {[T F, {ɛ, +, }], q 10 = δ(q 7, F) = {[T T F, {ɛ, +, }]} {[F ( E ), {ɛ, +, }], {[F, {ɛ, +, }]} q 11 = δ(q 8, ) ) = {[F ( E ), {ɛ, +, }]} 136 / 338
The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 2 = δ(q 5, T) = {[E T, { ), +}], q 7 = δ(q 9, ) = {[T T F, { ), +, }], {[T T F, { ), +, }]} {[F ( E ), { ), +, }], {[F, { ), +, }]} q 3 = δ(q 5, F) = {[F F, { ), +, }]} q 8 = δ(q 5, E) = {[F ( E ), { ), +, }]} q 4 = δ(q 5, ) = {[F, { ), +, }]} {[E E + T, { ), +}]} q 6 = δ(q 8, +) = {[E E + T, { ), +}], q 9 = δ(q 6, T) = {[E E + T, { ), +}], {[T T F, { ), +, }], {[T T F, { ), +, }]} {[T F, { ), +, }], {[F ( E ), { ), +, }], q 10 = δ(q 7, F) = {[T T F, { ), +, }]} {[F, { ), +, }]} q 11 = δ(q 8, ) ) = {[F ( E ), { ), +, }]} 137 / 338
The Canonical LR(1)-Automaton 0 T E F ( 1 4 3 F 5 T 2 + F ( ( 6 E + 8 * ) T 11 9 7 * F 10 138 / 338
The Canonical LR(1)-Automaton ( F ) * * T ( F T 2 5 3 4 6 8 11 9 7 10 E ( F ( T F F F ( ( * * ( ) + + F E T E T 3 4 1 2 5 0 10 8 11 9 6 7 + 138 / 338
The Canonical LR(1)-Automaton Discussion: In the example, the number of states was almost doubled... and it can become even worse The conflicts in states q 1, q 2, q 9 are now resolved! e.g. we have for: with: q 9 = {[E E + T, {ɛ, +}], {[T T F, {ɛ, +, }]} {ɛ, +} (First 1 ( F) {ɛ, +, }) = {ɛ, +} { } = 139 / 338
The LR(1)-Parser: action Output goto The goto-table encodes the transitions: goto[q, X] = δ(q, X) Q The action-table describes for every state q and possible lookahead w the necessary action. 140 / 338
The LR(1)-Parser The construction of the LR(1)-parser: States: Q {f } (f fresh) Start state: q 0 Final state: f Transitions: Shift: (p, a, p q) if q = goto[q, a], s = action[p, w] Reduce: (p q 1... q β, ɛ, p q) if [A β ] q β, q = goto(p, A), [A β ] = action[q β, w] Finish: (q 0 p, ɛ, f ) if [S S ] p with LR(G, 1) = (Q, T, δ, q 0, F). 141 / 338
The LR(1)-Parser: Possible actions are: shift // Shift-operation reduce (A γ) // Reduction with callback/output error // Error... for example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 1 action ɛ ( ) + q 1 S, 0 s q 2 E, 1 s q 2 E, 1 s q 3 T, 1 T, 1 T, 1 q 3 T, 1 T, 1 T, 1 q 4 F, 1 F, 1 F, 1 q 4 F, 1 F, 1 F, 1 q 9 E, 0 E, 0 s q 9 E, 0 E, 0 s q 10 T, 0 T, 0 T, 0 q 10 T, 0 T, 0 T, 0 q 11 F, 0 F, 0 F, 0 q 11 F, 0 F, 0 F, 0 142 / 338
The Canonical LR(1)-Automaton In general: We identify two conflicts: Reduce-Reduce-Conflict: [A γ, x], [A γ, x] q with A A γ γ Shift-Reduce-Conflict: [A γ, x], [A α a β, y] q with a T und x {a}. for a state q Q. Such states are now called LR(1)-unsuited 143 / 338
The Canonical LR(1)-Automaton In general: We identify two conflicts: Reduce-Reduce-Conflict: [A γ, x], [A γ, x] q with A A γ γ Shift-Reduce-Conflict: [A γ, x], [A α a β, y] q with a T und x {a} k First k (β) k {y}. for a state q Q. Such states are now called LR(k)-unsuited 143 / 338
Special LR(k)-Subclasses Theorem: A reduced contextfree grammar G is called LR(k) iff the canonical LR(k)-automaton LR(G, k) has no LR(k)-unsuited states. 144 / 338
Special LR(k)-Subclasses Theorem: A reduced contextfree grammar G is called LR(k) iff the canonical LR(k)-automaton LR(G, k) has no LR(k)-unsuited states. Discussion: Our example apparently is LR(1) In general, the canonical LR(k)-automaton has much more states then LR(G) = LR(G, 0) Therefore in practice, subclasses of LR(k)-grammars are often considered, which only use LR(G)... 144 / 338
Special LR(k)-Subclasses Theorem: A reduced contextfree grammar G is called LR(k) iff the canonical LR(k)-automaton LR(G, k) has no LR(k)-unsuited states. Discussion: Our example apparently is LR(1) In general, the canonical LR(k)-automaton has much more states then LR(G) = LR(G, 0) Therefore in practice, subclasses of LR(k)-grammars are often considered, which only use LR(G)... For resolving conflicts, the items are assigned special lookahead-sets: 1 independently on the state itself == Simple LR(k) 2 dependent on the state itself == LALR(k) 144 / 338
Syntactic Analysis Chapter 5: Summary 145 / 338
Parsing Methods deterministic languages = LR(1) =... = LR(k) LALR(k) SLR(k) LR(0) regular languages LL(1) LL(k) 146 / 338
Parsing Methods deterministic languages = LR(1) =... = LR(k) LALR(k) SLR(k) LR(0) regular languages Discussion: LL(1) LL(k) All contextfree languages, that can be parsed with a deterministic pushdown automaton, can be characterized with an LR(1)-grammar. LR(0)-grammars describe all prefixfree deterministic contextfree languages The language-classes of LL(k)-grammars form a hierarchy within the deterministic contextfree languages. 147 / 338
Lexical and Syntactical Analysis: Concept of specification and implementation: 0 0 [1-9][0-9]* Generator [1 9] [0 9] E E{op}E Generator 148 / 338
Lexical and Syntactical Analysis: From Regular Expressions to Finite Automata 0 1 2 0 1 0 1 0 1 2 t * f 0 1 0 1 2 f. 3 4 2 2 3 4 f 3 4 f 0 0 1 1 2 a 3 3 4 4 0 1 2 f f 0 1 2 f f 0 1 3 4 2 f. 3 4 3 4 a b a b a b a 0 b a 1 a b a a 2 a b 3 4 From Finite Automata to Scanners a 0 2 b 0 2 3 w r i t e l n ( " H a l l o " ) ; 1 a 1 4 A B 4 149 / 338
Lexical and Syntactical Analysis: Computation of lookahead sets: F ɛ (S ) F ɛ (E) F ɛ (E) F ɛ (E) F ɛ (E) F ɛ (T) F ɛ (T) F ɛ (T) F ɛ (T) F ɛ (F) F ɛ (F) { (, name, } a a b c 3 S E T F (,, name 0 1 2 From Item-Pushdown Automata to LL(1)-Parsers: S AB i 0 S i 1 A 1 β 0 A a B b i n A n β 1 a b i B β γ w First 1( ) δ M Output 150 / 338
Lexical and Syntactical Analysis: From characteristic to canonical Automata: S E E T T E E+ T T T F F E E T T F S E E T T F ( E ) F ( E ) F ( E ) F ( E ) F ( E ) F F + T E E + T E E+ T E E+ T F T T F T T F T T F 0 T E F ( ( F 1 4 3 5 2 T + F ( ( 6 E + 8 * T 11 ) 9 7 * F 10 From Shift-Reduce-Parsers to LR(1)-Parsers: S i 0 + T 1 6 9 E + T γ A 1 i 6 0 1 9 4 F + F 4 γ 1 A m i m 0 3 11 * ( F ( F 3 E ) 11 * 5 γ m B i ( 8 T ( F ( T E ) 5 8 F 2 ( *( 7 10 α β T F 2 x w * 7 10 action goto Output 151 / 338