General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Similar documents
Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

1.4 Nonregular Languages

Convert the NFA into DFA

CS 275 Automata and Formal Language Theory

Closure Properties of Regular Languages

CS 314 Principles of Programming Languages

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Overview HC9. Parsing: Top-Down & LL(1) Context-Free Grammars (1) Introduction. CFGs (3) Context-Free Grammars (2) Vertalerbouw HC 9: Ch.

1.3 Regular Expressions

For convenience, we rewrite m2 s m2 = m m m ; where m is repeted m times. Since xyz = m m m nd jxyj»m, we hve tht the string y is substring of the fir

The transformation to right derivation is called the canonical reduction sequence. Bottom-up analysis

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

5.1 Definitions and Examples 5.2 Deterministic Pushdown Automata

CS 275 Automata and Formal Language Theory

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

FABER Formal Languages, Automata and Models of Computation

Formal languages, automata, and theory of computation

Harvard University Computer Science 121 Midterm October 23, 2012

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

Review for the Midterm

Minimal DFA. minimal DFA for L starting from any other

This lecture covers Chapter 8 of HMU: Properties of CFLs

Parse trees, ambiguity, and Chomsky normal form

Formal Languages Simplifications of CFGs

Handout: Natural deduction for first order logic

Lecture 09: Myhill-Nerode Theorem

CSC 473 Automata, Grammars & Languages 11/9/10

Parsing and Pattern Recognition

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

First Midterm Examination

More on automata. Michael George. March 24 April 7, 2014

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Chapter 2 Finite Automata

1 Structural induction

Lecture 08: Feb. 08, 2019

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Automata and Languages

Chapter 0. What is the Lebesgue integral about?

CMSC 330: Organization of Programming Languages

Lexical Analysis Finite Automate

CS 311 Homework 3 due 16:30, Thursday, 14 th October 2010

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

1 Nondeterministic Finite Automata

a b b a pop push read unread

Let's start with an example:

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

First Midterm Examination

Nondeterminism. Nondeterministic Finite Automata. Example: Moves on a Chessboard. Nondeterminism (2) Example: Chessboard (2) Formal NFA

11.1 Finite Automata. CS125 Lecture 11 Fall Motivation: TMs without a tape: maybe we can at least fully understand such a simple model?

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Normal Forms for Context-free Grammars

7.2 The Definite Integral

Riemann Sums and Riemann Integrals

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Riemann Sums and Riemann Integrals

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

Bottom-Up Parsing. Canonical Collection of LR(0) items. Part II

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

1 Structural induction, finite automata, regular expressions

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

Regular Language. Nonregular Languages The Pumping Lemma. The pumping lemma. Regular Language. The pumping lemma. Infinitely long words 3/17/15

Infinite Geometric Series

19 Optimal behavior: Game theory

Improper Integrals, and Differential Equations

20 MATHEMATICS POLYNOMIALS

Exercises Chapter 1. Exercise 1.1. Let Σ be an alphabet. Prove wv = w + v for all strings w and v.

State Minimization for DFAs

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

CS 330 Formal Methods and Models

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Nondeterminism and Nodeterministic Automata

Reinforcement learning II

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

Homework 3 Solutions

Context-Free Grammars and Languages

Designing finite automata II

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Where did dynamic programming come from?

Theory of Computation Regular Languages

The Regulated and Riemann Integrals

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

Coalgebra, Lecture 15: Equations for Deterministic Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

DIRECT CURRENT CIRCUITS

Math Lecture 23

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Turing Machines Part One

CS 275 Automata and Formal Language Theory

PART 2. REGULAR LANGUAGES, GRAMMARS AND AUTOMATA

Turing Machines Part One

THEOTY OF COMPUTATION

Math 520 Final Exam Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

5.7 Improper Integrals

Transcription:

Bottom up prsing Generl ide LR(0) SLR LR(1) LLR To best exploit JvCUP, should understnd the theoreticl bsis (LR prsing); 1 Top-down vs Bottom-up Bottom-up more powerful thn top-down; Cn process more powerful grmmr thn LL, will explin lter. Bottom-up prsers re too hrd to write by hnd but JvCUP (nd ycc) genertes prser from spec; Bottom up prser uses right most derivtion Top down uses left most derivtion; Less grmmr trnsltion is required, hence the grmmr looks more nturl; Intuition: bottom-up prse postpones decisions bout which production rule to pply until it hs more dt thn ws vilble to top-down. Will explin lter 2 1

Bottom up prsing Strt with string of terminls; Exmple: Build up from leves of prse Grmmr: tree; pply productions bckwrds; When rech strt symbol & exhusted input, done; Shift-reduce is common bottomup technique. Notice the blue d should not be reduced into B in step 2. S rm Be rm de rm bcde rm bbcde How to get the right reduction steps? Reduce bbcde to S: b bcde bcde bc de d e d e B e Be S S Be bc b B d 3 Sententil form Sententil Form ny string tht cn be derived from non-terminls. Cn consist of terminls nd non terminls. Exmple: E E+T E + id T+id id + id Sententil forms: E+id, T+id,... Right sententil form: obtined by right most derivtion Sentence Sententil form with no non-terminls; id+id is sentence. 4 2

Hndles S rm Be rm de rm bcde rm bbcde S Be bc b B d Informlly, hndle of sententil form is substring tht cn be reduced. bc is hndle of the right sententil form bcde, becuse bc, nd fter bc is replced by, the resulting string de is still right sententil form. Is d hndle of bcde? No. this is becuse bcbe is not right sententil form. Formlly, hndle of right sententil form γ is production β nd position in γ where the string β my be found nd replced by. If S * rm αw rm αβw, then β in the position fter α is hndle of αβw. When the production β nd the position re cler, we simply sy the substring β is hndle. 5 Hndles in expression exmple E T + E T T int * T int (E) Consider the string: int * int + int The rightmost derivtion E rm T+E rm T+T rm T+int rm int*t +int rm int*int +int For unmbiguous grmmr, there is exctly one hndle for ech right-sententil form. The question is, how to find the hndle? Observtion: The substring to the right of hndle contins only terminl symbols. 6 3

Shift-reduce prsing Brek the input string into two prts: un-digested prt nd semidigested prt Left prt of input prtly processed; Right prt completely unprocessed. int foo (double n) { return (int) n+1 ; } Shifted, prtly reduced So fr unprocessed Use stck to keep trck of tokens seen so fr nd the rules lredy pplied bckwrds (reductions) Shift next input token onto stck When stck top contins good right-hnd-side of production, reduce by rule; Importnt fct: Hndle is lwys t the top of the stck. 7 Shift-reduce min loop Shift: If cn t perform reduction nd there re tokens remining in the unprocessed input, then trnsfer token from the input onto the stck. Reduce: If we cn find rule α, nd the contents of the stck re βα for some β (β my be empty), then reduce the stck to β. The α is clled hndle. Recognizing hndles is key! ccept: S is t the top of the stck nd input now empty, done Error: other cses. 8 4

Exmple 1 Prse Stck Remining input Prser ction Grmmr: S > E E > T E + T T > id (E) Input string: (id) (id)$ Shift prenthesis onto stck ( id)$ Shift id onto stck (id )$ Reduce: T id (pop RHS of production, push LHS, input unchnged) (T )$ Reduce: E T (E )$ Shift right prenthesis (E) $ Reduce: T (E) T $ Reduce: E T E $ Reduce: S E S $ Done: ccept S rm E rm T rm (E) rm (T) rm (id) 9 Shift-Reduce Exmple 2 S E E T E + T T id (E) Prse Stck Remining Input ction ( (id (T (E (E+ (E+id (E+T (E (E) T E S (id + id) $ id + id) $ + id) $ + id) $ + id) $ id) $ ) $ ) $ ) $ $ $ $ $ Shift ( Shift id Reduce T id Reduce E T Shift + Shift id Reduce T id Reduce E E+T; (Ignore: E T) Shift ) Reduce T (E) Reduce E T Reduce S E ccept Input: (id + id) (id +id) (T +id) (E +id ) (E+T ) (E) T E S Note tht it is the reverse of the following rightmost derivtion: S rm E rm T rm (E) rm (E+T ) rm (E +id ) rm (T +id) rm (id +id) 10 5

Conflicts during shift reduce prsing Reduce/reduce conflict stck input... (E+T ) Which rule we should use, E E+T or E T? Shift/reduce conflict ifstt if (E) S if (E) S else S stck Input... if (E) S else... Both reduce nd shift re pplicble. Wht we should do next, reduce or shift? 11 LR(K) prsing Left-to-right, Rightmost derivtion with k-token lookhed. L - Left-to-right scnning of the input R - Constructing rightmost derivtion in reverse k - number of input symbols to select prser ction Most generl prsing technique for deterministic grmmrs. Efficient, Tble-bsed prsing Prses by shift-reduce Cn mssge grmmrs less thn LL(1) Cn hndle lmost ll progrmming lnguge structures LL LR CFG In generl, not prcticl: tbles too lrge (10^6 sttes for C++, d). Common subsets: SLR, LLR (1). 12 6

LR Prsing continued Dt structures: Stck of sttes {s} ction tble ction[s,]; T Goto tble Goto[s,X]; X N In LR prsing, push whole sttes on stck Stck of sttes keeps trck of wht we ve seen so fr (left context): wht we ve shifted & reduced & by wht rules. Use ction tbles to decide shift vs reduce Use Goto tble to move to new stte 13 Min loop of LR prser Initil stte S0 strts on top of stck; Given stte St stte on top of stck nd the next input token : If (ction[st, ] == shift Si) Push new stte Si onto stck Cll yylex to get next token If (ction[st, ] == reduce by Y X1 Xn) Pop off n sttes to find Su on top of stck Push new stte Sv = Goto[Su,Y] onto stck If (ction[st, ] == ccept), done! If (ction[st, ] == error), cn t continue to successful prse. 14 7

Exmple LR prse tble Stte on TOS ction Goto id + ( ) $ E T 0 S4 S3 S1 S2 1 S5 ccept 2 R2 R2 R2 R2 R2 3 S4 S3 S6 S2 4 R4 R4 R4 R4 R4 5 S4 S3 S8 6 S5 S7 (1) E E + T (2) E T (3) T (E) (4) T id 7 R3 R3 R3 R3 R3 8 R1 R1 R1 R1 R1 If (ction[st, ] == shift), Push new stte ction[st, ] onto stck, Cll yylex to get next token If (ction[st, ] == reduce by Y X1 Xn), Pop off n sttes to find Su on top of stck, Push new stte Sv = Goto[Su,Y] onto stck We explin how to construct this tble lter. 15 Stte on ction Goto TOS id + ( ) $ E T 0 S4 S3 S1 S2 1 S5 ccept 2 R2 R2 R2 R2 R2 3 S4 S3 S6 S2 4 R4 R4 R4 R4 R4 5 S4 S3 S8 6 S5 S7 7 R3 R3 R3 R3 R3 8 R1 R1 R1 R1 R1 (1) E E + T (2) E T (3) T (E) (4) T id Stte stck Remining Input Prser ction S0 id + (id)$ Shift S4 onto stte stck, move hed in input S0S4 + (id)$ Reduce 4) T id, pop stte stck, goto S2, input unchnged S0S2 + (id)$ Reduce 2) E T, goto S1 S0S1 + (id)$ Shift S5 S0S1S5 (id)$ Shift S3 S0S1S5S3 id)$ Shift S4 (sw nother id) S0S1S5S3S4 )$ Reduce 4) T id, goto S2 S0S1S5S3S2 )$ Reduce 2) E T, goto S6 S0S1S5S3S6 )$ Shift S7 S0S1S5S3S6S7 $ Reduce 3) T (E), goto S8 S0S1S5S8 $ Reduce 1) E E + T, goto S1 * S0S1 $ ccept 16 8

Types of LR prsers LR (k) SLR (k) -- Simple LR LLR (k) Lookhed LR k = # symbols lookhed 0 or 1 in this clss Drgon book hs generl cses Strt with simplest: LR(0) prser 17 LR (0) prser dvntges: Simplest to understnd, Smllest tbles Disdvntges No lookhed, so too simple-minded for rel prsers Good cse to see how to build tbles, though. We ll use LR(0) constructions in other LR(k) prsers Key to LR prsing is recognizing hndles Hndle: sequence of symbols encoded in top stck sttes representing right-hnd-side of rule we wnt to reduce by. 18 9

LR Tbles Given grmmr G, identify possible sttes for prser. Sttes encpsulte wht we ve seen nd shifted nd wht re reduced so fr Steps to construct LR tble: Construct sttes using LR(0) configurtions (or items); Figure out trnsitions between sttes 19 Configurtion configurtion (or item) is rule of G with dot in the right-hnd side. If rule XYZ in grmmr, then the configs re XYZ XY Z X YZ XYZ Dot represents wht prser hs gotten in stck in recognizing the production. XYZ mens XYZ on stck. Reduce! X YZ mens X hs been shifted. To continue prse, we must see token tht could begin string derivble from Y. Nottionl convention: X, Y, Z: symbol, either terminl or non-terminl, b, c : terminl α, β, γ: sequence of terminls or non-terminls 20 10

Set of configurtions X YZ mens X hs been shifted. To continue prse, we must see token tht could begin string derivble from Y. Tht is, we need to see token in First(Y) (or in Follow(Y) if Y ε) Formlly, need to see token t such tht Y * t β for some β Suppose Y α β lso in G. Then these configs correspond to the sme prse stte: X YZ Y α Y β Since the bove configurtions represent the sme stte, we cn: Put them into set together. dd ll other equivlent configurtions to chieve closure. (lgorithm lter) This set represents one prser stte: the stte the prser cn be in while prsing string. 21 Trnsitions between sttes Prser goes from one stte to nother bsed on symbols processed X YZ Y α Y β Y XY Z Model prse s finite utomton! When stte (configurtion set) hs dot t end of n item, tht is F ccept stte Build LR(0) prser bsed on this F 22 11

Constructing item sets & closure Strting Configurtion: ugment Grmmr with symbol S dd production S S to grmmr Initil item set I0 gets S S Perform Closure on S S (Tht completes prser strt stte.) Compute Successor function to mke next stte (next item set) 23 Computing closure Closure(I) 1. Initilly every item in I is dded to closure(i) 2. If α B β is in closure(i) for ll productions B γ, dd B γ 3. Repet step 2 until set gets no more dditions. Exmple Given the configurtion set: { E E+T} Wht is the closure of { E E+T}: E E + T by rule 1 E T by rule 2 T (E) by rule 2 nd 3 T id by rule 2 nd 3 (1) E E + T (2) E T (3) T (E) (4) T id 24 12

Building stte trnsitions LR Tbles need to know wht stte to goto fter shift or reduce. Given Set C & symbol X, we define the set C = Successor (C,X) s: For ech config in C of the form Y α X β, 1. dd Y α X β to C 2. Do closure on C Informlly, move by symbol X from one item set to nother; move to the right of X in ll items where dot is before X; remove ll other items; compute closure. C X C 25 Successor exmple Given I= {E E + T, E T, T (E), T id } Wht is successor(i, ( )? move the fter ( : T ( E ) compute the closure: T ( E) E E + T E T T (E) T id (1) E E + T (2) E T (3) T (E) (4) T id 26 13

Construct the LR(0) tble Construct F={I0, I1, I2,..., In} Stte i is determined by Ii. The prsing ctions for stte i re: if α is in Ii, then set ction[i, ] to reduce α for ll inputs (if is not S ) If S S is in Ii, then set ction[i, $] to ccept. if α β is in Ii nd successor(ii, )=Ij, then set ction[i,j] to shift j. ( is terminl) The goto trnsitions for stte i re constructed for ll non-terminls using the rule: if successor(ii,)=ij, then goto[i, ]=j. ll entries not defined by bove rules re errors. The initil stte I0 is the one constructed from S S. 27 Steps of constructing LR(0) tble 1. ugment the grmmr; 2. Drw the trnsition digrm; 1. Compute the configurtion set (item set/stte); 2. Compute the successor; 3. Fill in the ction tble nd Goto tble. (0) E E (1) E E + T (2) E T (3) T (E) (4) T id 28 14

Configurtion set Successor I0: E' E I1 E E+T I1 E T I2 T (E) I3 T id I4 I1: E' E ccept (dot t end of E rule) E E +T I5 I2: E T Reduce 2 (dot t end) I3: T ( E) I6 E E+T I6 E T I2 T (E) I3 T id I4 I4: T id Reduce 4 (dot t end) I5: E E+ T I8 T (E) I3 T id I4 I6: T (E ) I7 E E +T I5 I7: T (E) Reduce 3 (dot t end) I8: E E+T Reduce 1 (dot t end) Item sets exmple 29 Trnsition digrm I 0 E E' E E E + T E T T (E) T id I 1 E' E E E + T id I 3 I 4 T id id + id ( I 5 E E + T T (E) T id + T E E + T I 7 I 8 T (E) I 2 T E T T ( T ( E) E E + T E T T (E) T id E I 6 T (E ) E E + T ) ( 30 15

The prsing tble Stte on TOS ction Goto id + ( ) $ E T 0 S4 S3 1 2 1 S5 ccept 2 R2 R2 R2 R2 R2 3 S4 S3 6 2 4 R4 R4 R4 R4 R4 5 S4 S3 8 6 S5 S7 7 R3 R3 R3 R3 R3 8 R1 R1 R1 R1 R1 31 Prsing n erroneous input Stte stck Input Prser ction S0 id + +$ Shift S4 (0) E E (1) E E + T (2) E T (3) T (E) (4) T id S0 S4 + +$ Reduce 4) T id, pop S4, Goto S2 S0 S2 + +$ Reduce 2) E T, pop S2, Goto S1 S0 S1 + +$ Push S5 S0 S1 S5 +$ No ction [S5, +] Error! Stte on ction Goto TOS id + ( ) $ E T 0 S4 S3 S1 S2 1 S5 ccept 2 R2 R2 R2 R2 R2 3 S4 S3 S6 S2 4 R4 R4 R4 R4 R4 5 S4 S3 S8 6 S5 S7 7 R3 R3 R3 R3 R3 8 R1 R1 R1 R1 R1 32 16

Subset construction nd closure I 1 S S S S S I 0 S S' S S S S S S' S I 2 S S I 4 S S I 3 S I 0 S' S S S S S I 2 S S S S S S I 4 S I 3 33 LR(0) grmmr grmmr is LR(0) if the following two conditions hold: 1. For ny configurtion set contining the item α β, there is no complete item B γ in tht set. No shift/reduce conflict in ny stte in tble, for ech stte, either shift or reduce 2. There is t most one complete item α in ech configurtion set. No reduce/reduce conflict in tble, for ech stte, use sme reduction rule for every input symbol. Very few grmmrs meet the requirements to be LR(0). 34 17

I 2 S E E E+T T T id (E) id[e] I 1 E I 0 E' E E E + T E' E E E + T E T T (E) T id T id[e] T E T id ( I 3 T id T id [E] T id [ I 4 T ( E) E E + T E T T (E) T id T id[e] + id I 9 T id[ E] E E +T E T T (E) T id T id[e] E ( E I 5 I 10 E E + T T (E) T id T id[e] + T id[e ] E E + T ] T id[e] I 11 ( + T (E ) E E + T T E E + T I 7 Incomplete digrm T (E) I 6 ) I 8 35 SLR Prse tble (incomplete) Stte on TOS ction Goto id + ( ) $ [ ] E T 0 S4 S3 1 2 1 S5 ccept 2 R2 R2 R2 R2 (0) S E (1) E E+T (2) E T (3) T id (4) T (E) (5) T id[e] 3 S4 S3 6 2 4 R3 R3 R3 S9 R3 5 S4 S3 8 6 S5 S7 7 R5 R5 R5 R5 8 R1 R1 R1 R1 9 S4 S3 10 2 10 S5 S11 11 R5 R5 R5 R5 36 18

Stte stck Input Prser ction S0 id [ id ] + id $ Shift S4 S0 S4 id [ id ] + id $ S9 S0 S4id S9[ id ] + id $ shift S4 S0 S4id S9[ S4id ] +id $ Reduce T id S0 S4id S9[ S2T ] + id $ Reduce E T S0 S4id S9[ S10E ] + id $ S11 S0 S4id S9[ S10E S11] + id $ Reduce T id[e] S0 S2T + id $ Reduce E T S0 S1E + id $ S5 S0 S1E S5+ id $ S4 S0 S1E S5+ S4id $ Reduce T id S0 S1E S5+ S8T $ Reduce E E+T S0 S1E $ ccept 37 I 0 S E E E+T T V=E T id (E) id[e] V id E I 1 E' E E E + T E' E E E + T E T E V=E T (E) T id T id[e] V id I 2 T E > T id I 4 T id T id [E] V id Shift/reduce conflict: T id T id [E] Reduce/reduce conflict: T id V id 38 19

SLR Prse tble (incomplete) Stte on TOS ction Goto id + ( ) $ [ ] E T 0 S4 S3 1 2 1 S5 ccept 2 R2 R2 R2 R2 3 S4 S3 6 2 4 R4 R4 R4 S9 R4 (0) S E (1) E E+T (2) E T (3) E V=E (4) T id (5) T (E) (6) T id[e] (7) V id 5 S4 S3 8 6 S5 S7 7 R5 R5 R5 R5 8 R1 R1 R1 R1 9 10 11 39 LR(0) key points Construct LR tble Strt with ugmented grmmr (S S) Generte items from productions. Insert the Dot into ll positions Generte item sets (or configurtion sets) from items; they re our prser sttes. Generte stte trnsitions from function successor (stte, symbol). Build ction nd Goto tbles from sttes nd trnsitions. Tbles implement shift-reduce prser. View [sttes nd trnsitions] s finite utomton. n Item represents how fr prser is in recognizing prt of one rule s RHS. n Item set combines vrious pths the prser might hve tken so fr, to diverge s more input is prsed. LR(0) grmmrs re esiest LR to understnd, but too simple to use in rel life prsing. 40 20

Simple LR(1) prsing: SLR LR(0) One LR(0) stte mustn t hve both shift nd reduce items, or two reduce items. So ny complete item (dot t end) must be in its own stte; prser will lwys reduce when in this stte. SLR Peek hed t input to see if reduction is pproprite. Before reducing by rule XYZ, see if the next token is in Follow (). Reduce only in tht cse. Otherwise, shift. 41 Construction for SLR tbles 1. Construct F = {I0, I1,... In }, the LR(0) item sets. 2. Stte i is Ii. The prsing ctions for the stte re: ) If α is in Ii then set ction[i,] to reduce > α for ll in Follow() ( is not S'). b) If S' S is in Ii then set ction[i,$] to ccept. c) If α β is in Ii nd successor(ii, ) = Ij, then set ction[i,] to shift j ( must be terminl). 3. The goto trnsitions for stte i re constructed for ll non-terminls using the rule: If successor(ii, ) = Ij, then Goto [i, ] = j. 4. ll entries not defined by rules 2 nd 3 re errors. 5. The initil stte is closure of set with item S S. 42 21

Properties of SLR Pickier rule bout setting ction tble is the only difference from LR(0) tbles; If G is SLR it is unmbiguous, but not vice vers; Stte cn hve both shift nd reduce items, if Follow sets re disjoint. 43 SLR Exmple Item sets I0 nd successor (I0, id): E' E E E + T E T T (E) T id T id[e] id T id T id [E] E' E E E + T T T (E) id id[e] LR(0) prser sees both shift nd reduce, but SLR prser consults Follow set: Follow(T) = { +, ), ], $ } so T id mens reduce on + or ) or ] or $ T id [E] mens shift otherwise (e.g. on [ ) 44 22

SLR Exmple 2 E' E E E + T E T E V = E T (E) T id V id id T id V id E' E E E + T T V = E T (E) id V id Two complete LR(0) items, so reduce-reduce conflict in LR(0) grmmr, but: Follow(T) = { +, ), $ } Follow(V) = { = } Disjoint, so no conflict. Seprte ction entries in tble. 45 SLR grmmr grmmr is SLR if the following two conditions hold: If items α β nd B γ re in stte, then terminl Follow(B). no shift-reduce conflict on ny stte. This mens the successor function for x from tht set either shifts to new stte or reduces, but not both. For ny two complete items α nd B β in stte, the Follow sets must be disjoint. (Follow() Follow(B) is empty.) no reduce-reduce conflict on ny stte. If more thn one non-terminl could be reduced from this set, it must be possible to uniquely determine which using only one token of lookhed. Compre with LR(0) grmmr: 1. For ny configurtion set contining the item α β, there is no complete item B γ in tht set. 2. There is t most one complete item α in ech configurtion set. Note tht LR(0) SLR 46 23

SLR 1. S S 2. S dc 3. S db 4. c In S3 there is reduce/shift conflict: It cn be R4 or shift. By looking t the Follow set of, the conflict is removed. ction Goto b c d $ S S0 S2 1 S1 S2 S3 4 S3 S5 R4 S4 S6 S5 R2 S6 R3 S0: S' S S dc S db S S1: S S d S2: S d c S d b c c S3: S dc c S4: S d b b S5: S dc S6: S db 47 Prse trce Stte stck Input Prser ction S0 dc$ Shift S2 S0 S2d c$ Shift S3 S0 S2d S3c $ shift S5 S0 S2d S3c S5 $ Reduce 2 S0 S1S $ ccept 48 24

Non-SLR exmple 1. S S 2. S dc 3. S db 4. S 5. c S0: S' S S dc S db S c c S S9: c d S1: S S S2: S d c S d b c S7: S c S3: S dc c S4: S d b S8: S S3 hs shift/reduce conflict. By looking t Follow(), both nd b re in the follow set. So under column we still don t know whether to reduce or shift. b S5: S dc S6: S db 49 The conflict SLR prsing tble ction Goto b c d $ S S0 S9 S2 1 7 S1 S2 S3 4 S3 S5/R5 R5 S4 S6 S5 R2 S6 R3 S7 S8 S8 R4 S9 R5 R5 Follow() = {, b} 50 25

LR(1) prsing Mke items crry more informtion. LR(1) item: X1...Xi Xi+1...Xj, tok Terminl tok is the lookhed. Mening: hve sttes for X1...Xi on stck lredy expect to put sttes for Xi+1...Xj onto stck nd then reduce, but only if token following Xj is tok tok cn be $ Split Follow() into seprte cses Cn cluster items nottionlly: [ α, /b/c] mens the three items: [ α, ] [ α, b] [ α, c] Reduce α to if next token is or b or c {, b, c } Follow() 51 LR(1) item sets More items nd more item sets thn SLR Closure: For ech item [ α Bβ, ] in I, for ech production B γ in G, nd for ech terminl b in First(β), dd [B γ, b] to I (dd only items with the correct lookhed) Once we hve closed item set, use LR(1) successor function to compute trnsitions nd next items. Exmple: Initil item: [S S, $] Wht is the closure? [S dc, $] [S db, $] [S, $] [ c, ] S S S dc db c 52 26

LR(1) successor function Given I n item set with [ α Xβ, ], dd [ α X β, ] to item set J. successor(i,x) is the closure of set J. Similr to successor function to LR(0), but we propgte the lookhed token for ech item. Exmple S0: S' S, $ S dc, $ S db, $ S, $ c, c S d S1: S S, $ S2: S d c, $ S d b, $ c, b S9: c, 53 LR(1) tbles ction tble entries: If [ α, ] Ii, then set ction[i,] to reduce by rule α ( is not S'). If [S S, $] Ii then set ction[i,$] to ccept. If [ α β, b] is in Ii nd succ(ii, ) = Ij, then set ction[i,] to shift j. Here is terminl. Goto entries: For ech stte I & ech non-terminl : If succ(ii, ) = Ij, then Goto [i, ] = j. 54 27

LR(1) digrm 1. S S 2. S dc 3. S db 4. S 5. c S0: S' S, $ S dc, $ S db, $ S, $ c, c S S9: c, d S1: S S, $ S2: S d c, $ S d b, $ c, b S7: S, $ c S3: S dc, $ c, b S4: S d b, $ S8: S, $ b S5: S dc, $ S6: S db, $ 55 Crete the LR(1) prse tble ction Goto b c d $ S S0 S9 S2 1 7 S1 S2 S3 4 S3 S5 R5 S4 S6 S5 R2 S6 R3 S7 S8 S8 R4 S9 R5 56 28

nother LR(1) exmple Crete the trnsition digrm 0) S' S 1) S 2) 3) b S0: S' S, $ S, $, /b b, /b b S S9: b, /b S1: S S, $ S2: S, $, $ b, $ S7:, /b, /b b, /b b b S3: S, $ S4:, $, $ b, $ b S5: b, $ S8:, /b S6:, $ 57 Prse tble stte ction Goto b $ S S0 S7 S9 1 2 S1 ccept S2 S4 S5 3 S3 R1 S4 S4 S5 6 S5 R3 S6 R2 S7 S7 S9 8 S8 R2 R2 S9 R3 R3 58 29

Prse trce stck remining input prse ction S0 bb$ S9 S0S9 b$ R3 b S0S2 b S4 S0S2S4 b S4 S0S2S4S4 b S5 S0S2S4S4S5 $ R3 b S0S2S4S4S6 $ R2 S0S2S4S6 $ R2 S0S2S3 $ R1 S S0S1 $ ccept 59 LR(1) grmmr grmmr is LR(1) if the following 2 conditions re stisfied for ech configurtion set: For ech item [ α β, b] in the set, there is no item in the set of the form [B γ, ] In the ction tble, this trnsltes to no shift/reduce conflict. If there re two complete items [ α, ] nd [B β, b] in the set, then nd b should be different. In the ction tble, this trnsltes to no reduce/reduce conflict Compre with the SLR grmmr For ny item α β in the set, with terminl, there is no complete item B γ in tht set with in Follow(B). For ny two complete items α nd B β in the set, the Follow sets must be disjoint. Note tht SLR(1) LR(1) LR(0) SLR(1) LR(1) 60 30

LR(1) tbles continued LR(1) tbles cn get big exponentil in size of rules Cn we keep the dditionl power we got from going SLR LR without tble explosion? LLR! We split SLR(1) sttes to get LR(1) sttes, mybe too ggressively. Try to merge item sets tht re lmost identicl. Tricky bit: Don t introduce shift-reduce or reduce-reduce conflicts. 61 LLR pproch Just sy LLR (it s lwys 1 in prctice) Given the numerous LR(1) sttes for grmmr G, consider merging similr sttes, to get fewer sttes. Cndidtes for merging: sme core (LR(0) item) only differences in lookheds Exmple: S1: X α β, /b/c S2: X α β, c/d S12: X α β, /b/c/d 62 31

Sttes with sme core items S0: S' S, $ S, $, /b b, /b b S S9: b, /b S1: S S, $ S2: S, $, $ b, $ S7:, /b, /b b, /b b b S3: S, $ S4:, $, $ b, $ b S5: b, $ S8:, /b 0) S' S 1) S 2) 3) b S6:, $ 63 Merge the sttes S0: S' S, $ S, $, /b b, /b b b S S59: b, /b/$ S1: S S, $ S2: S, $, $ b, $ S47:, /b/$, /b/$ b, /b/$ b S3: S, $ S4:, $, $ b, $ b S5: b, $ S68:, /b/$ 0) S' S 1) S 2) 3) b S6:, $ 64 32

Merge the sttes S0: S' S, $ S, $, /b b, /b b b S S59: b, /b/$ S1: S S, $ S2: S, $, $ b, $ S47:, /b/$, /b/$ b, /b/$ b S3: S, $ S68:, /b/$ 0) S' S 1) S 2) 3) b Follow()={ b $ } 65 fter the merge Wht hppened when we merged? Three fewer sttes Lookhed on items merged. In this cse, lookhed in merged sets constitutes entire Follow set. So, we mde SLR(1) grmmr by merging. Result of merge usully not SLR(1). 66 33

conflict fter merging 1) S Bc Cd bbd bcc 2) B e 3) C e S0: S' S, $ S Bc, $ S Cd, $ S bbd, $ S bcc,$ S S1: S S, $ S2: S Bc, $ S Cd, $ B e, c C e,d e S3: B e, c C e, d b S4: S b Bd, $ S b Cc, $ B e, d C e, c e S5: B e, d C e, c 67 Prcticl considertion mbiguity in LR grmmrs G: G produces multiple rightmost derivtions. (i.e. cn build two different prse trees for one input string.) Remember: E E + E E * E (E) id We dded terms nd fctors to force unmbiguous prse with correct precedence nd ssocitivity Wht if we threw the grmmr into n LR-grmmr tble-construction mchine nywy? Conflicts = multiple ction entries for one cell We choose which entry to keep, toss others 68 34

Precedence nd ssocitivity in JvCUP E S0: E' E E E+E E E*E E (E) E id S1: E E E E +E E E * E ( S2: E ( E) E E+E E E*E E (E) + * ( S4: E E+ E E E+E E E*E E (E) E id S5: E E* E E E+E E E*E E (E) E id * E + + E S8: E E*E E E +E E E *E E E + E E * E (E) id * S7: E E+E E E +E E E *E id S3: E id E S6: E (E ) E E +E E E *E ) S9: E (E) 69 JvCup grmmr terminl PLUS, TIMES; precedence left PLUS; precedence left TIME; E::=E PLUS E E TIMES E ID Wht if the input is x+y+z? When shifting + conflicts with reducing production contining +, choose reduce Wht if the input is x+y*z? Wht if the input is x*y+z? 70 35

Trnsition digrm for ssignment expr S1: S S, $ S id V=E V id E V n S S0: S' S, $ S id, $ S V=E, $ V id, = V S3: S V =E, $ S2: S id, $ V id, = id = S4: S V= E, $ E V, $ E n, $ V id, $ E V n id S5: S V=E, $ S6: E V, $ S7: E n, $ S8: V id, $ 71 Why re there conflicts in some rules in ssignments? P S2: S P, $ Non LR(1) grmmr P m Pm S0: S' P, $ P m, $ P Pm, $ P, $ P, m P m, m P Pm, m m S1: P m, $/m It is n mbiguous grmmr. There re two rightmost/leftmost derivtions for sentence m: P Pm m P m *** Shift/Reduce conflict found in stte #0 between P ::= (*) nd P ::= (*) m under symbol m 72 36

slightly chnged grmmr, still not LR P S2: S P, $ Non LR(1) grmmr P m m P S0: S' P, $ P m, $ P mp, $ P, $ m S1: P m, $ P m P, $ P m, $ P mp, $ P, $ P S3: P mp, $ It is n mbiguous grmmr. There re two prse trees for sentence m: P mp m P m Reduce/Reduce conflict found in stte #1 between P ::= m (*) nd P ::= (*) under symbols: {EOF} Produced from jvcup 73 Modified LR(1) grmmr S0: S' P, $ P, $ P mp, $ P m S1: S P, $ S2: P m P, $ P mp, $ P, $ P LR(1) grmmr P m P S3: P mp, $ Note tht there re no conflicts The derivtion: P mp mmp mm 74 37

nother wy of chnging to LR(1) grmmr LR(1) grmmr P Q Q m m Q P S1: S P, $ S0: S' P, $ P Q, $ P, $ Q m, $ Q mq, $ m S2: Q m, $ Q m Q, $ Q m, $ Q mq, $ Q S3: Q mq, $ 75 LR grmmrs: comprison LR(0) SLR(1) LLR LR(1) CFG dvntges Disdvntges LR(0) Smllest tbles, esiest to build Indequte for mny PL structures SLR(1) LLR(1) LR(1) More inclusive, more inform?on thn LR(0) Sme size tbles s SLR, more lngs, efficient to build Most precise use of lookhed, most PL structures we wnt Mny useful grmmrs re not SLR (1) empiricl, not mthem?cl Tbles order of mgnitude > SLR(1) 76 38

The spce of grmmrs CFG Unmbiguous CFG LL(1) LR(1) LLR(1) SLR(1) LR(0) 77 The spce of grmmrs Wht re used in prctice CFG Unmbiguous CFG LL(1) LR(1) LLR(1) SLR(1) LR(0) 78 39

Verifying the lnguge generted by grmmr To verify grmmr: every string generted by G is in L every string in L cn be generted by G Exmple: S (S)S ε the lnguge is ll the strings of blnced prenthesis, such s (), (()), () (()()) Proof prt 1: every sentence derived from S is blnced. bsis: empty string is blnced. induction: suppose tht ll derivtions fewer thn n steps produce blnced sentences, nd consider leftmost derivtion of n steps. such derivtion must be of the form: S (S)S *(x)s *(x)y Proof prt 2: every blnced string cn be derived from S Bsis: the empty string cn be derived from S. Induction: suppose tht every blnced string of length less thn 2n cn be derived from S. Consider blnced string w of length 2n. w must strt with (. w cn be written s (x)y, where x, y re blnced. 79 Hierrchy of grmmrs Lnguge 1: ny string of nd b CFG is more powerful thn RE Type n grmmr is more powerful thn type n+1 grmmr Exmple: Σ={, b} The lnguge of ny string consists of nd b CFG b ε Cn be describe by RE The lnguge of plindromes consist of nd b bb b ε Cn be described by CFG, but not RE Lnguge 2: plindromes Lnguge 3 Lnguge 2 Lnguge 5 Lnguge 4 RG Lnguge 1 When grmmr is more powerful, it is not tht it cn describe lrger lnguge. Insted, the power mens the bility to restrict the set. More powerful grmmr cn define more complicted boundry between correct nd incorrect sentences. Therefore, more different lnguges 80 40

Metphoric comprison of grmmrs RE drw the rose use stright lines (ruler nd T-squre suffice) CFG pproximte the outline by stright lines nd circle segments (ruler nd compsses) 81 41