Chapter 4: Bottom-up Analysis 106 / 338

Similar documents
Bottom-up Analysis. Theorem: Proof: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k.

Parsing -3. A View During TD Parsing

CS 406: Bottom-Up Parsing

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

THEORY OF COMPILATION

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

CA Compiler Construction

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

SLR(1) and LALR(1) Parsing for Unrestricted Grammars. Lawrence A. Harris

Compiler Design Spring 2017

Computer Science 160 Translation of Programming Languages

CS20a: summary (Oct 24, 2002)

Compiler Construction

Bottom-Up Syntax Analysis

Compiler Construction Lent Term 2015 Lectures (of 16)

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Compiler Construction Lent Term 2015 Lectures (of 16)

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

Compiling Techniques

CS415 Compilers Syntax Analysis Bottom-up Parsing

Bottom-Up Syntax Analysis

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Everything You Always Wanted to Know About Parsing

Syntax Analysis Part I

Pushdown Automata: Introduction (2)

Syntax Analysis Part I

Compiling Techniques

Parsing VI LR(1) Parsers

I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X }

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Compiler Construction Lectures 13 16

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Bottom-Up Syntax Analysis

CS Rewriting System - grammars, fa, and PDA

1. For the following sub-problems, consider the following context-free grammar: S AB$ (1) A xax (2) A B (3) B yby (5) B A (6)

Bottom-Up Syntax Analysis

Foundations of Informatics: a Bridging Course

8. LL(k) Parsing. Canonical LL(k) parser dual of the canonical LR(k) parser generalization of strong LL(k) parser

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Type 3 languages. Type 2 languages. Regular grammars Finite automata. Regular expressions. Type 2 grammars. Deterministic Nondeterministic.

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Compiling Techniques

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 7a Syntax Analysis Elias Athanasopoulos

Syntax Analysis (Part 2)

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

Einführung in die Computerlinguistik

INF5110 Compiler Construction

Why augment the grammar?

Final exam study sheet for CS3719 Turing machines and decidability.

Fundamentele Informatica II

On LR(k)-parsers of polynomial size

1. For the following sub-problems, consider the following context-free grammar: S AA$ (1) A xa (2) A B (3) B yb (4)

CS Pushdown Automata

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

Pushdown Automata. Chapter 12

Computational Models - Lecture 4

PARSING AND TRANSLATION November 2010 prof. Ing. Bořivoj Melichar, DrSc. doc. Ing. Jan Janoušek, Ph.D. Ing. Ladislav Vagner, Ph.D.

Accept or reject. Stack

3.13. PUSHDOWN AUTOMATA Pushdown Automata

LL conflict resolution using the embedded left LR parser

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412

Briefly on Bottom-up

Translator Design Lecture 16 Constructing SLR Parsing Tables

Introduction to Bottom-Up Parsing

Computability and Complexity

Curs 8. LR(k) parsing. S.Motogna - FL&CD

Context free languages

Introduction to Bottom-Up Parsing

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

Top-Down Parsing and Intro to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

An Alternative Construction in Symbolic Reachability Analysis of Second Order Pushdown Systems

CDM Parsing and Decidability

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

MA/CSSE 474 Theory of Computation

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

What we have done so far

The Idea of a Pushdown Automaton

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. source code. IR IR target.

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

CS153: Compilers Lecture 5: LL Parsing

Blackhole Pushdown Automata

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

MA/CSSE 474 Theory of Computation

Computational Models: Class 5

Generalized Bottom Up Parsers With Reduced Stack Activity

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 9 : Myhill-Nerode Theorem and applications

Push-down Automata = FA + Stack

5 Context-Free Languages

Pushdown Automata (Pre Lecture)

Transcription:

Syntactic Analysis Chapter 4: Bottom-up Analysis 106 / 338

Bottom-up Analysis Attention: Many grammars are not LL(k)! A reason for that is: Definition Grammar G is called left-recursive, if A + A β for an A N, β (T N) 107 / 338

Bottom-up Analysis Attention: Many grammars are not LL(k)! A reason for that is: Definition Grammar G is called left-recursive, if A + A β for an A N, β (T N) Example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 name 1 2... is left-recursive 107 / 338

Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S Assumption: G is LL(k) 108 / 338

Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S Assumption: G is LL(k) First k (α β n γ) First k (α β n+1 γ) = 108 / 338

Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S A S Assumption: G is LL(k) First k (α β n γ) First k (α β n+1 γ) = n A A β n β γ First k ( ) 108 / 338

Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S A S Assumption: G is LL(k) First k (α β n γ) First k (α β n+1 γ) = n A A β n β γ α First k ( ) 108 / 338

Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S A S Assumption: G is LL(k) First k (α β n γ) First k (α β n+1 γ) = A n A β n β γ A β α First k ( ) 108 / 338

Bottom-up Analysis Theorem: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k. Proof: Let A A β α P and A be reachable from S Assumption: G is LL(k) First k (α β n γ) First k (α β n+1 γ) = Case 1: β ɛ Contradiction!!! Case 2: β w ɛ == First k (α β k γ) First k (α β k+1 γ) 108 / 338

Shift-Reduce Parser Idea: We delay the decision whether to reduce until we know, whether the input matches the right-hand-side of a rule! Donald Knuth Construction: Shift-Reduce parser M R G The input is shifted successively to the pushdown. Is there a complete right-hand side (a handle) atop the pushdown, it is replaced (reduced) by the corresponding left-hand side 109 / 338

Shift-Reduce Parser Example: The pushdown automaton: S A B A a B b States: q 0, f, a, b, A, B, S; Start state: q 0 End state: f q 0 a q 0 a a ɛ A A b A b b ɛ B A B ɛ S q 0 S ɛ f 110 / 338

Shift-Reduce Parser Construction: In general, we create an automaton M R G = (Q, T, δ, q 0, F) with: Q = T N {q 0, f } (q 0, f fresh); F = {f }; Transitions: δ = {(q, x, q x) q Q, x T} // Shift-transitions {(q α, ɛ, q A) q Q, A α P} // Reduce-transitions {(q 0 S, ɛ, f )} // finish 111 / 338

Shift-Reduce Parser Construction: In general, we create an automaton M R G = (Q, T, δ, q 0, F) with: Q = T N {q 0, f } (q 0, f fresh); F = {f }; Transitions: δ = {(q, x, q x) q Q, x T} // Shift-transitions {(q α, ɛ, q A) q Q, A α P} // Reduce-transitions {(q 0 S, ɛ, f )} // finish Example-computation: (q 0, a b) (q 0 a, b) (q 0 A, b) (q 0 A b, ɛ) (q 0 A B, ɛ) (q 0 S, ɛ) (f, ɛ) 111 / 338

Shift-Reduce Parser Observation: The sequence of reductions corresponds to a reverse rightmost-derivation for the input To prove correctnes, we have to prove: (ɛ, w) (A, ɛ) iff A w The shift-reduce pushdown automaton M R G is in general also non-deterministic For a deterministic parsing-algorithm, we have to identify computation-states for reduction == LR-Parsing 112 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: counter 2 + 40 E 1 E 0 + T 1 Pushdown: ( q 0 ) T 0 F 2 T 1 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: 2 + 40 E 1 E 0 + T 1 Pushdown: ( q 0 name ) T 0 F 2 T 1 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: 2 + 40 E 1 E 0 + T 1 Pushdown: ( q 0 F ) T 0 T 1 F 2 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: 2 + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T ) T 0 F 2 T 1 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: 2 + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T ) T 0 F 2 T 1 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T ) T 0 F 2 T 1 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T F ) T 0 F 2 T 1 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T ) T 0 T 1 F 2 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: + 40 E 1 E 0 + T 1 Pushdown: ( q 0 E ) T 0 T 1 F 2 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: 40 E 1 E 0 + T 1 Pushdown: ( q 0 E + ) T 0 T 1 F 2 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( q 0 E + ) T 0 T 1 F 2 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( q 0 E + F ) T 0 T 1 F 2 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( q 0 E + T ) T 0 T 1 F 2 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( q 0 E ) T 0 T 1 F 2 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( f ) T 0 T 1 F 2 F 2 F 1 name 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: + 40 E 1 E 0 + T 1 Pushdown: ( q 0 T F ) T 0 F 2 T 1 F 2 F 1 Generic Observation: In a sequence of configurations of M R G name (q 0 α γ, v) (q 0 α B, v) (q 0 S, ɛ) we call α γ a viable prefix for the complete item [B γ ]. 113 / 338

Reverse Rightmost Derivations in Shift-Reduce-Parsers Idea: Observe reverse rightmost-derivations of M R G! Input: E 1 E 0 + T 1 Pushdown: ( q 0 E + F ) T 0 T 1 F 2 F 2 F 1 Generic Observation: In a sequence of configurations of M R G name (q 0 α γ, v) (q 0 α B, v) (q 0 S, ɛ) we call α γ a viable prefix for the complete item [B γ ]. 113 / 338

Bottom-up Analysis: Viable Prefix α γ is viable for [B γ ] iff S R α B v A 0 i 0 α 1 A 1 i 1 α 2 A 2 i 2 α m B i γ... with α = α 1... α m 114 / 338

Bottom-up Analysis: Viable Prefix α γ is viable for [B γ ] iff S R α B v A 0 i 0 α 1 A 1 i 1 α 2 A 2 i 2 α m B i γ... with α = α 1... α m Conversely, for an arbitrary valid word α we can determine the set of all later on possibly matching rules... 114 / 338

Bottom-up Analysis: Admissible Items The item [B γ β] is called admissible for α iff S R α B v with α = α γ : A 0 i 0 α 1 A 1 i 1 α 2 A 2 i 2 α m B i γ β... with α = α 1... α m 115 / 338

Characteristic Automaton Observation: The set of viable prefixes from (N T) for (admissible) items can be computed from the content of the shift-reduce parser s pushdown with the help of a finite automaton: States: Items Start state: [S S] Final states: {[B γ ] B γ P} Transitions: (1) ([A α X β],x,[a α X β]), X (N T), A α X β P; (2) ([A α B β],ɛ, [B γ]), A α B β, B γ P; The automaton c(g) is called characteristic automaton for G. 116 / 338

Characteristic Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 S E E S E E E+ T E + T E E + T E E+ T E E+ T E T T E T T T F T F T T F T T F T T F T F F T F ( E ) F ( E ) F ( E ) F ( E ) F ( E ) F F 117 / 338

Characteristic Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 S E E S E E E+ T E + T E E + T E E+ T E E+ T E T T E T T T F T F T T F T T F T T F T F F T F ( E ) F ( E ) F ( E ) F ( E ) F ( E ) F F 117 / 338

Canonical LR(0)-Automaton The canonical LR(0)-automaton LR(G) is created from c(g) by: 1 performing arbitrarily many ɛ-transitions after every consuming transition 2 performing the powerset construction... for example: 0 T E F ( ( 1 4 3 F 5 T 2 + F ( ( 6 E + 8 * ) T 11 9 7 * F 10 118 / 338

Canonical LR(0)-Automaton Example: E E + T 0 T 1 Therefore we determine: T T F 0 F 1 F ( E ) 0 2 119 / 338

Canonical LR(0)-Automaton Example: E E + T 0 T 1 Therefore we determine: T T F 0 F 1 F ( E ) 0 2 q 0 = {[S E], q 1 = δ(q 0, E) = {[S E ], {[E E + T], {[E E + T]} {[E T], {[T T F]} q 2 = δ(q 0, T) = {[E T ], {[T F], {[T T F]} {[F ( E ) ], {[F ]} q 3 = δ(q 0, F) = {[T F ]} q 4 = δ(q 0, ) = {[F ]} 119 / 338

Canonical LR(0)-Automaton q 5 = δ(q 0, ( ) = {[F ( E ) ], q 7 = δ(q 2, ) = {[T T F], {[E E + T], {[F ( E ) ], {[E T], {[F ]} {[T T F], {[T F], q 8 = δ(q 5, E) = {[F ( E ) ]} {[F ( E ) ], {[E E + T]} {[F ]} q 9 = δ(q 6, T) = {[E E + T ], q 6 = δ(q 1, +) = {[E E + T], {[T T F]} {[T T F], {[T F], q 10 = δ(q 7, F) = {[T T F ]} {[F ( E ) ], {[F ]} q 11 = δ(q 8, ) ) = {[F ( E ) ]} 120 / 338

Canonical LR(0)-Automaton Observation: The canonical LR(0)-automaton can be created directly from the grammar. Therefore we need a helper function δɛ (ɛ-closure) We define: δ ɛ (q) = q {[B γ] [A α B β ] q, β (N T) : B B β} States: Sets of items; Start state: δ ɛ {[S S]} Final states: {q A α P : [A α ] q} Transitions: δ(q, X) = δ ɛ {[A α X β] [A α X β] q} 121 / 338

LR(0)-Parser Idea for a parser: The parser manages a viable prefix α = X 1... X m on the pushdown and uses LR(G), to identify reduction spots. It can reduce with A γ, if [A γ ] is admissible for α Optimization: We push the states instead of the X i in order not to process the pushdown s content with the automaton anew all the time. Reduction with A γ leads to popping the uppermost γ states and continue with the state on top of the stack and input A. Attention: This parser is only deterministic, if each final state of the canonical LR(0)-automaton is conflict free. 122 / 338

LR(0)-Parser... for example: q 1 = {[S E ], {[E E + T]} q 2 = {[E T ], q 9 = {[E E + T ], {[T T F]} {[T T F]} q 3 = {[T F ]} q 10 = {[T T F ]} q 4 = {[F ]} q 11 = {[F ( E ) ]} The final states q 1, q 2, q 9 contain more then one admissible item non deterministic! 123 / 338

LR(0)-Parser The construction of the LR(0)-parser: States: Q {f } (f fresh) Start state: q 0 Final state: f Transitions: Shift: (p, a, p q) if q = δ(p, a) Reduce: (p q 1... q m, ɛ, p q) if [A X 1... X m ] q m, q = δ(p, A) Finish: (q 0 p, ɛ, f ) if [S S ] p with LR(G) = (Q, T, δ, q 0, F). 124 / 338

LR(0)-Parser Correctness: we show: The accepting computations of an LR(0)-parser are one-to-one related to those of a shift-reduce parser M R G. we conclude: The accepted language is exactly L(G) The sequence of reductions of an accepting computation for a word w T yields a reverse rightmost derivation of G for w 125 / 338

LR(0)-Parser Attention: Unfortunately, the LR(0)-parser is in general non-deterministic. We identify two reasons: Reduce-Reduce-Conflict: [A γ ], [A γ ] q with A A γ γ Shift-Reduce-Conflict: [A γ ], [A α a β] q with a T for a state q Q. Those states are called LR(0)-unsuited. 126 / 338

Revisiting the Conflicts of the LR(0)-Automaton What differenciates the particular Reductions and Shifts? Input: E 0 2 + 40 E 1 + T 1 Pushdown: ( q 0 T ) E 1?? T 0 F 2 T 1 F 2 F 1 name E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 127 / 338

Revisiting the Conflicts of the LR(0)-Automaton What differenciates the particular Reductions and Shifts? Input: + 40 T 0? E 1 E 0 + T 1 Pushdown: ( q 0 T )? T 0 F 2 T 1 F 2 F 1 name E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 127 / 338

Revisiting the Conflicts of the LR(0)-Automaton Idea: Matching lookahead with right context matters! Input: E 0 2 + 40 E 1 + T 1 Pushdown: ( q 0 T ) E 1?? T 0 F 2 T 1 F 2 F 1 name E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 127 / 338

Revisiting the Conflicts of the LR(0)-Automaton Idea: Input: Pushdown: ( q 0 T ) Matching lookahead with right context matters! + 40 T 0? E 1? T 0 E 0 + T 1 F 2 T 1 F 2 F 1 name E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 127 / 338

LR(k)-Grammars Idea: Consider k-lookahead in conflict situations. Definition: The reduced contextfree grammar G is called LR(k)-grammar, if for First k (w) = First k (x) with: } S R α A w α β w S R α A w follows: α = α A = A w = x α β x 128 / 338

LR(k)-Grammars Idea: Consider k-lookahead in conflict situations. Definition: The reduced contextfree grammar G is called LR(k)-grammar, if for First k (w) = First k (x) with: } S R α A w α β w S R α A w follows: α = α A = A w = x α β x Strategy for testing Grammars for LR(k)-property 1 Focus iteratively on all rightmost derivations S R α X w α β w 2 Identify handle α β in sentence forms α β w 3 Determine minimal k, such that First k (w) associates β with a unique X β for non-prefixfree α βs 128 / 338

LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1 129 / 338

LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k : Let S R α X w α β w. Then α β is of one of these forms: 129 / 338

LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k : Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) 129 / 338

LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k but LR(0): Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) 129 / 338

LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k but LR(0): Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) (2) S a A c A A b b b 129 / 338

LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k but LR(0): Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) (2) S a A c A A b b b... is also not LL(k) for any k : Let S R α X w α β w. Then α β is of one of these forms: 129 / 338

LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k but LR(0): Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) (2) S a A c A A b b b... is also not LL(k) for any k : Let S R α X w α β w. Then α β is of one of these forms: a b, a A b b, a A c 129 / 338

LR(k)-Grammars for example: (1) S A B A a A b 0 B a B b b 1... is not LL(k) for any k but LR(0): Let S R α X w α β w. Then α β is of one of these forms: A, B, a n a A b, a n a B b b, a n 0, a n 1 (n 0) (2) S a A c A A b b b... is also not LL(k) for any k but again LR(0): Let S R α X w α β w. Then α β is of one of these forms: a b, a A b b, a A c 129 / 338

LR(k)-Grammars for example: (3) S a A c A b b A b 130 / 338

LR(k)-Grammars for example: (3) S a A c A b b A b Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: 130 / 338

LR(k)-Grammars for example: (3) S a A c A b b A b Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: a b 2n b c, a b 2n b b A c, a A c 130 / 338

LR(k)-Grammars for example: (3) S a A c A b b A b... is not LR(0), but LR(1): Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: a b 2n b c, a b 2n b b A c, a A c 130 / 338

LR(k)-Grammars for example: (3) S a A c A b b A b... is not LR(0), but LR(1): Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: a b 2n b c, a b 2n b b A c, a A c (4) S a A c A b A b b 130 / 338

LR(k)-Grammars for example: (3) S a A c A b b A b... is not LR(0), but LR(1): Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: a b 2n b c, a b 2n b b A c, a A c (4) S a A c A b A b b Consider the rightmost derivations: S R a b n A b n c a b n b b n c 130 / 338

LR(k)-Grammars for example: (3) S a A c A b b A b... is not LR(0), but LR(1): Let S R α X w α β w with {y} = First k (w) then α β y is of one of these forms: a b 2n b c, a b 2n b b A c, a A c (4) S a A c A b A b b... is not LR(k) for any k 0: Consider the rightmost derivations: S R a b n A b n c a b n b b n c 130 / 338

LR(1)-Parsing Idea: Let s equip items with 1-lookahead Definition LR(1)-Item An LR(1)-item is a pair [B α β, x] with x Follow 1 (B) = {First 1 (ν) S µ B ν} 131 / 338

Admissible LR(1)-Items The item [B α β, x] is admissable for γ α if: S R γ B w with {x} = First 1 (w) S i 0 γ 0 A 1 i 1 γ 1 A m i m γ m B i α β x w... with γ 0... γ m = γ 132 / 338

The Characteristic LR(1)-Automaton The set of admissible LR(1)-items for viable prefixes is again computed with the help of the finite automaton c(g, 1). The automaton c(g, 1): States: LR(1)-items Start state: [S S, ɛ] Final states: {[B γ, x] B γ P, x Follow 1 (B)} Transitions: (1) ([A α X β, x],x,[a α X β, x]), X (N T) (2) ([A α B β, x],ɛ, [B γ, x ]), A α B β, B γ P, x First 1 (β) {x}; 133 / 338

The Characteristic LR(1)-Automaton The set of admissible LR(1)-items for viable prefixes is again computed with the help of the finite automaton c(g, 1). The automaton c(g, 1): States: LR(1)-items Start state: [S S, ɛ] Final states: {[B γ, x] B γ P, x Follow 1 (B)} Transitions: (1) ([A α X β, x],x,[a α X β, x]), X (N T) (2) ([A α B β, x],ɛ, [B γ, x ]), A α B β, B γ P, x First 1 (β) {x}; This automaton works like c(g) but additionally manages a 1-prefix from Follow 1 of the left-hand sides. 133 / 338

The Canonical LR(1)-Automaton The canonical LR(1)-automaton LR(G, 1) is created from c(g, 1), by performing arbitrarily many ɛ-transitions and then making the resulting automaton deterministic... 134 / 338

The Canonical LR(1)-Automaton The canonical LR(1)-automaton LR(G, 1) is created from c(g, 1), by performing arbitrarily many ɛ-transitions and then making the resulting automaton deterministic... But again, it can be constructed directly from the grammar; analoguously to LR(0), we need the ɛ-closure δ ɛ as a helper function: δɛ (q) = q {[C γ, x] [A α B β, x ] q, β (N T) : B C β x First 1 (β β ) {x }} 134 / 338

The Canonical LR(1)-Automaton The canonical LR(1)-automaton LR(G, 1) is created from c(g, 1), by performing arbitrarily many ɛ-transitions and then making the resulting automaton deterministic... But again, it can be constructed directly from the grammar; analoguously to LR(0), we need the ɛ-closure δ ɛ as a helper function: δɛ (q) = q {[C γ, x] [A α B β, x ] q, β (N T) : B C β x First 1 (β β ) {x }} Then, we define: States: Sets of LR(1)-items; Start state: δ ɛ {[S S, ɛ]} Final states: {q A α P : [A α, x] q} Transitions: δ(q, X) = δ ɛ {[A α X β, x] [A α X β, x] q} 134 / 338

The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 0 = {[S E ], q 3 = δ(q 0, F) = {[T F ]} {[E E + T ], {[E T ], q 4 = δ(q 0, ) {[F ]} {[T T F ], {[T F ], q 5 = δ(q 0, ( ) = {[F ( E ) ], {[F ( E ) ], {[E E + T ], {[F ]} {[E T ], {[T T F ], q 1 = δ(q 0, E) = {[S E ], {[T F ], {[E E + T ]} {[F ( E ) ], {[F ]} q 2 = δ(q 0, T) = {[E T ], {[T T F ]} 135 / 338

The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 0 = {[S E, {ɛ}], q 3 = δ(q 0, F) = {[T F ]} {[E E + T, {ɛ, +}], {[E T, {ɛ, +}], q 4 = δ(q 0, ) {[F ]} {[T T F, {ɛ, +, }], {[T F, {ɛ, +, }], q 5 = δ(q 0, ( ) = {[F ( E ) ], {[F ( E ), {ɛ, +, }], {[E E + T ], {[F, {ɛ, +, }]} {[E T ], {[T T F ], q 1 = δ(q 0, E) = {[S E ], {[T F ], {[E E + T ]} {[F ( E ) ], {[F ]} q 2 = δ(q 0, T) = {[E T ], {[T T F ]} 135 / 338

The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 0 = {[S E, {ɛ}], q 3 = δ(q 0, F) = {[T F, {ɛ, +, }]} {[E E + T, {ɛ, +}], {[E T, {ɛ, +}], q 4 = δ(q 0, ) {[F, {ɛ, +, }]} {[T T F, {ɛ, +, }], {[T F, {ɛ, +, }], q 5 = δ(q 0, ( ) = {[F ( E ) ], {[F ( E ), {ɛ, +, }], {[E E + T ], {[F, {ɛ, +, }]} {[E T ], {[T T F ], q 1 = δ(q 0, E) = {[S E, {ɛ}], {[T F ], {[E E + T, {ɛ, +}]} {[F ( E ) ], {[F ]} q 2 = δ(q 0, T) = {[E T, {ɛ, +}], {[T T F, {ɛ, +, }]} 135 / 338

The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 0 = {[S E, {ɛ}], q 3 = δ(q 0, F) = {[T F, {ɛ, +, }]} {[E E + T, {ɛ, +}], {[E T, {ɛ, +}], q 4 = δ(q 0, ) {[F, {ɛ, +, }]} {[T T F, {ɛ, +, }], {[T F, {ɛ, +, }], q 5 = δ(q 0, ( ) = {[F ( E ), {ɛ, +, }], {[F ( E ), {ɛ, +, }], {[E E + T, { ), +}], {[F, {ɛ, +, }]} {[E T, { ), +}], {[T T F, { ), +, }], q 1 = δ(q 0, E) = {[S E, {ɛ}], {[T F, { ), +, }], {[E E + T, {ɛ, +}]} q 2 = δ(q 0, T) = {[E T, {ɛ, +}], {[T T F, {ɛ, +, }]} {[F ( E ), { ), +, }], {[F, { ), +, }]} 135 / 338

The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 5 = δ(q 5, ( ) = {[F ( E ) ], q 7 = δ(q 2, ) = {[T T F ], {[E E + T ], {[F ( E ) ], {[E T ], {[F ]} {[T T F ], {[T F ], q 8 = δ(q 5, E) = {[F ( E ) ]} {[F ( E ) ], {[E E + T ]} {[F ]} q 9 = δ(q 6, T) = {[E E + T ], q 6 = δ(q 1, +) = {[E E + T ], {[T T F ]} {[T T F ], {[T F ], q 10 = δ(q 7, F) = {[T T F ]} {[F ( E ) ], {[F ]} q 11 = δ(q 8, ) ) = {[F ( E ) ]} 136 / 338

The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 5 = δ(q 5, ( ) = {[F ( E ), { ), +, }], q 7 = δ(q 2, ) = {[T T F ], {[E E + T, { ), +}], {[F ( E ) ], {[E T, { ), +}], {[F ]} {[T T F, { ), +, }], {[T F, { ), +, }], q 8 = δ(q 5, E) = {[F ( E ) ]} {[F ( E ), { ), +, }], {[E E + T ]} {[F, { ), +, }]} q 9 = δ(q 6, T) = {[E E + T ], q 6 = δ(q 1, +) = {[E E + T ], {[T T F ]} {[T T F ], {[T F ], q 10 = δ(q 7, F) = {[T T F ]} {[F ( E ) ], {[F ]} q 11 = δ(q 8, ) ) = {[F ( E ) ]} 136 / 338

The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 5 = δ(q 5, ( ) = {[F ( E ), { ), +, }], q 7 = δ(q 2, ) = {[T T F ], {[E E + T, { ), +}], {[F ( E ) ], {[E T, { ), +}], {[F ]} {[T T F, { ), +, }], {[T F, { ), +, }], q 8 = δ(q 5, E) = {[F ( E ) ]} {[F ( E ), { ), +, }], {[E E + T ]} {[F, { ), +, }]} q 9 = δ(q 6, T) = {[E E + T ], q 6 = δ(q 1, +) = {[E E + T, {ɛ, +}], {[T T F ]} {[T T F, {ɛ, +, }], {[T F, {ɛ, +, }], q 10 = δ(q 7, F) = {[T T F ]} {[F ( E ), {ɛ, +, }], {[F, {ɛ, +, }]} q 11 = δ(q 8, ) ) = {[F ( E ) ]} 136 / 338

The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 5 = δ(q 5, ( ) = {[F ( E ), { ), +, }], q 7 = δ(q 2, ) = {[T T F, {ɛ, +, }], {[E E + T, { ), +}], {[F ( E ), {ɛ, +, }], {[E T, { ), +}], {[F, {ɛ, +, }]} {[T T F, { ), +, }], {[T F, { ), +, }], q 8 = δ(q 5, E) = {[F ( E ), {ɛ, +, }]} {[F ( E ), { ), +, }], {[E E + T, { ), +}]} {[F, { ), +, }]} q 9 = δ(q 6, T) = {[E E + T, {ɛ, +}], q 6 = δ(q 1, +) = {[E E + T, {ɛ, +}], {[T T F, {ɛ, +, }]} {[T T F, {ɛ, +, }], {[T F, {ɛ, +, }], q 10 = δ(q 7, F) = {[T T F, {ɛ, +, }]} {[F ( E ), {ɛ, +, }], {[F, {ɛ, +, }]} q 11 = δ(q 8, ) ) = {[F ( E ), {ɛ, +, }]} 136 / 338

The Canonical LR(1)-Automaton For example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 2 First 1 (S ) = First 1 (E) = First 1 (T) = First 1 (F) = name,, ( q 2 = δ(q 5, T) = {[E T, { ), +}], q 7 = δ(q 9, ) = {[T T F, { ), +, }], {[T T F, { ), +, }]} {[F ( E ), { ), +, }], {[F, { ), +, }]} q 3 = δ(q 5, F) = {[F F, { ), +, }]} q 8 = δ(q 5, E) = {[F ( E ), { ), +, }]} q 4 = δ(q 5, ) = {[F, { ), +, }]} {[E E + T, { ), +}]} q 6 = δ(q 8, +) = {[E E + T, { ), +}], q 9 = δ(q 6, T) = {[E E + T, { ), +}], {[T T F, { ), +, }], {[T T F, { ), +, }]} {[T F, { ), +, }], {[F ( E ), { ), +, }], q 10 = δ(q 7, F) = {[T T F, { ), +, }]} {[F, { ), +, }]} q 11 = δ(q 8, ) ) = {[F ( E ), { ), +, }]} 137 / 338

The Canonical LR(1)-Automaton 0 T E F ( 1 4 3 F 5 T 2 + F ( ( 6 E + 8 * ) T 11 9 7 * F 10 138 / 338

The Canonical LR(1)-Automaton ( F ) * * T ( F T 2 5 3 4 6 8 11 9 7 10 E ( F ( T F F F ( ( * * ( ) + + F E T E T 3 4 1 2 5 0 10 8 11 9 6 7 + 138 / 338

The Canonical LR(1)-Automaton Discussion: In the example, the number of states was almost doubled... and it can become even worse The conflicts in states q 1, q 2, q 9 are now resolved! e.g. we have for: with: q 9 = {[E E + T, {ɛ, +}], {[T T F, {ɛ, +, }]} {ɛ, +} (First 1 ( F) {ɛ, +, }) = {ɛ, +} { } = 139 / 338

The LR(1)-Parser: action Output goto The goto-table encodes the transitions: goto[q, X] = δ(q, X) Q The action-table describes for every state q and possible lookahead w the necessary action. 140 / 338

The LR(1)-Parser The construction of the LR(1)-parser: States: Q {f } (f fresh) Start state: q 0 Final state: f Transitions: Shift: (p, a, p q) if q = goto[q, a], s = action[p, w] Reduce: (p q 1... q β, ɛ, p q) if [A β ] q β, q = goto(p, A), [A β ] = action[q β, w] Finish: (q 0 p, ɛ, f ) if [S S ] p with LR(G, 1) = (Q, T, δ, q 0, F). 141 / 338

The LR(1)-Parser: Possible actions are: shift // Shift-operation reduce (A γ) // Reduction with callback/output error // Error... for example: E E + T 0 T 1 T T F 0 F 1 F ( E ) 0 1 action ɛ ( ) + q 1 S, 0 s q 2 E, 1 s q 2 E, 1 s q 3 T, 1 T, 1 T, 1 q 3 T, 1 T, 1 T, 1 q 4 F, 1 F, 1 F, 1 q 4 F, 1 F, 1 F, 1 q 9 E, 0 E, 0 s q 9 E, 0 E, 0 s q 10 T, 0 T, 0 T, 0 q 10 T, 0 T, 0 T, 0 q 11 F, 0 F, 0 F, 0 q 11 F, 0 F, 0 F, 0 142 / 338

The Canonical LR(1)-Automaton In general: We identify two conflicts: Reduce-Reduce-Conflict: [A γ, x], [A γ, x] q with A A γ γ Shift-Reduce-Conflict: [A γ, x], [A α a β, y] q with a T und x {a}. for a state q Q. Such states are now called LR(1)-unsuited 143 / 338

The Canonical LR(1)-Automaton In general: We identify two conflicts: Reduce-Reduce-Conflict: [A γ, x], [A γ, x] q with A A γ γ Shift-Reduce-Conflict: [A γ, x], [A α a β, y] q with a T und x {a} k First k (β) k {y}. for a state q Q. Such states are now called LR(k)-unsuited 143 / 338

Special LR(k)-Subclasses Theorem: A reduced contextfree grammar G is called LR(k) iff the canonical LR(k)-automaton LR(G, k) has no LR(k)-unsuited states. 144 / 338

Special LR(k)-Subclasses Theorem: A reduced contextfree grammar G is called LR(k) iff the canonical LR(k)-automaton LR(G, k) has no LR(k)-unsuited states. Discussion: Our example apparently is LR(1) In general, the canonical LR(k)-automaton has much more states then LR(G) = LR(G, 0) Therefore in practice, subclasses of LR(k)-grammars are often considered, which only use LR(G)... 144 / 338

Special LR(k)-Subclasses Theorem: A reduced contextfree grammar G is called LR(k) iff the canonical LR(k)-automaton LR(G, k) has no LR(k)-unsuited states. Discussion: Our example apparently is LR(1) In general, the canonical LR(k)-automaton has much more states then LR(G) = LR(G, 0) Therefore in practice, subclasses of LR(k)-grammars are often considered, which only use LR(G)... For resolving conflicts, the items are assigned special lookahead-sets: 1 independently on the state itself == Simple LR(k) 2 dependent on the state itself == LALR(k) 144 / 338

Syntactic Analysis Chapter 5: Summary 145 / 338

Parsing Methods deterministic languages = LR(1) =... = LR(k) LALR(k) SLR(k) LR(0) regular languages LL(1) LL(k) 146 / 338

Parsing Methods deterministic languages = LR(1) =... = LR(k) LALR(k) SLR(k) LR(0) regular languages Discussion: LL(1) LL(k) All contextfree languages, that can be parsed with a deterministic pushdown automaton, can be characterized with an LR(1)-grammar. LR(0)-grammars describe all prefixfree deterministic contextfree languages The language-classes of LL(k)-grammars form a hierarchy within the deterministic contextfree languages. 147 / 338

Lexical and Syntactical Analysis: Concept of specification and implementation: 0 0 [1-9][0-9]* Generator [1 9] [0 9] E E{op}E Generator 148 / 338

Lexical and Syntactical Analysis: From Regular Expressions to Finite Automata 0 1 2 0 1 0 1 0 1 2 t * f 0 1 0 1 2 f. 3 4 2 2 3 4 f 3 4 f 0 0 1 1 2 a 3 3 4 4 0 1 2 f f 0 1 2 f f 0 1 3 4 2 f. 3 4 3 4 a b a b a b a 0 b a 1 a b a a 2 a b 3 4 From Finite Automata to Scanners a 0 2 b 0 2 3 w r i t e l n ( " H a l l o " ) ; 1 a 1 4 A B 4 149 / 338

Lexical and Syntactical Analysis: Computation of lookahead sets: F ɛ (S ) F ɛ (E) F ɛ (E) F ɛ (E) F ɛ (E) F ɛ (T) F ɛ (T) F ɛ (T) F ɛ (T) F ɛ (F) F ɛ (F) { (, name, } a a b c 3 S E T F (,, name 0 1 2 From Item-Pushdown Automata to LL(1)-Parsers: S AB i 0 S i 1 A 1 β 0 A a B b i n A n β 1 a b i B β γ w First 1( ) δ M Output 150 / 338

Lexical and Syntactical Analysis: From characteristic to canonical Automata: S E E T T E E+ T T T F F E E T T F S E E T T F ( E ) F ( E ) F ( E ) F ( E ) F ( E ) F F + T E E + T E E+ T E E+ T F T T F T T F T T F 0 T E F ( ( F 1 4 3 5 2 T + F ( ( 6 E + 8 * T 11 ) 9 7 * F 10 From Shift-Reduce-Parsers to LR(1)-Parsers: S i 0 + T 1 6 9 E + T γ A 1 i 6 0 1 9 4 F + F 4 γ 1 A m i m 0 3 11 * ( F ( F 3 E ) 11 * 5 γ m B i ( 8 T ( F ( T E ) 5 8 F 2 ( *( 7 10 α β T F 2 x w * 7 10 action goto Output 151 / 338