Lecture 4: Lexical and Syntax Analysis

Similar documents
FSA. CmSc 365 Theory of Computation. Finite State Automata and Regular Expressions (Chapter 2, Section 2.3) ALPHABET operations: U, concatenation, *

The Course covers: Lexical Analysis Syntax Analysis Semantic Analysis Runtime environments Code Generation Code Optimization. CS 540 Spring 2013 GMU 2

CSE303 - Introduction to the Theory of Computing Sample Solutions for Exercises on Finite Automata

Last time: introduced our first computational model the DFA.

DFA (Deterministic Finite Automata) q a

Winter 2016 COMP-250: Introduction to Computer Science. Lecture 23, April 5, 2016

CS 6353 Compiler Construction, Homework #1. 1. Write regular expressions for the following informally described languages:

a b c cat CAT A B C Aa Bb Cc cat cat Lesson 1 (Part 1) Verbal lesson: Capital Letters Make The Same Sound Lesson 1 (Part 1) continued...

Finite Automata. d: Q S Q. Finite automaton is M=(Q, S, d, q 0, F) Ex: an FA that accepts all odd-length strings of zeros: q 0 q 1. q i. q k.

Notes on Finite Automata Department of Computer Science Professor Goldberg Textbooks: Introduction to the Theory of Computation by Michael Sipser

Lecture 11 Waves in Periodic Potentials Today: Questions you should be able to address after today s lecture:

INTEGRALS. Chapter 7. d dx. 7.1 Overview Let d dx F (x) = f (x). Then, we write f ( x)

Section 3: Antiderivatives of Formulas

TOPIC 5: INTEGRATION

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Math 61 : Discrete Structures Final Exam Instructor: Ciprian Manolescu. You have 180 minutes.

Chapter 16. 1) is a particular point on the graph of the function. 1. y, where x y 1

Minimal DFA. minimal DFA for L starting from any other

Minimum Spanning Trees

Integration Continued. Integration by Parts Solving Definite Integrals: Area Under a Curve Improper Integrals

Designing finite automata II

Multi-Section Coupled Line Couplers

Enforcement of Opacity Security Properties Using Insertion Functions

Formal languages, automata, and theory of computation

COMP108 Algorithmic Foundations

CSC Design and Analysis of Algorithms. Example: Change-Making Problem

Ch 1.2: Solutions of Some Differential Equations

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

CS 314 Principles of Programming Languages

CISC 4090 Theory of Computation

1 Introduction to Modulo 7 Arithmetic

Formal Concept Analysis

More on automata. Michael George. March 24 April 7, 2014

Overview HC9. Parsing: Top-Down & LL(1) Context-Free Grammars (1) Introduction. CFGs (3) Context-Free Grammars (2) Vertalerbouw HC 9: Ch.

Garnir Polynomial and their Properties

Scanner. Specifying patterns. Specifying patterns. Operations on languages. A scanner must recognize the units of syntax Some parts are easy:

CIVL 8/ D Boundary Value Problems - Rectangular Elements 1/7

S i m p l i f y i n g A l g e b r a SIMPLIFYING ALGEBRA.

Module graph.py. 1 Introduction. 2 Graph basics. 3 Module graph.py. 3.1 Objects. CS 231 Naomi Nishimura

Convert the NFA into DFA

A Simple Code Generator. Code generation Algorithm. Register and Address Descriptors. Example 3/31/2008. Code Generation

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Finite Automata-cont d

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

CSE 373: AVL trees. Warmup: Warmup. Interlude: Exploring the balance invariant. AVL Trees: Invariants. AVL tree invariants review

Talen en Automaten Test 1, Mon 7 th Dec, h45 17h30

Let's start with an example:

# 1 ' 10 ' 100. Decimal point = 4 hundred. = 6 tens (or sixty) = 5 ones (or five) = 2 tenths. = 7 hundredths.

1.4 Nonregular Languages

Context-Free Grammars and Languages

Homework 3 Solutions

, between the vertical lines x a and x b. Given a demand curve, having price as a function of quantity, p f (x) at height k is the curve f ( x,

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

Syntax Analysis. Context-free grammar Top-down and bottom-up parsing. cs5363 1

Worked out examples Finite Automata

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

First Midterm Examination

UNCORRECTED SAMPLE PAGES 4-1. Naming fractions KEY IDEAS. 1 Each shape represents ONE whole. a i ii. b i ii

QUESTIONS BEGIN HERE!

First Midterm Examination

Harvard University Computer Science 121 Midterm October 23, 2012

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

Java II Finite Automata I

CS 361 Meeting 12 10/3/18

CS 330 Formal Methods and Models

Instructions for Section 1

Lexical Analysis Part III

XML and Databases. Outline. Recall: Top-Down Evaluation of Simple Paths. Recall: Top-Down Evaluation of Simple Paths. Sebastian Maneth NICTA and UNSW

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

CSE 373: More on graphs; DFS and BFS. Michael Lee Wednesday, Feb 14, 2018

On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery

Present state Next state Q + M N

Thoery of Automata CS402

Non-deterministic Finite Automata

Lecture 08: Feb. 08, 2019

CMSC 330: Organization of Programming Languages

Linear Algebra Existence of the determinant. Expansion according to a row.

2. Lexical Analysis. Oscar Nierstrasz

CS 461, Lecture 17. Today s Outline. Example Run

This Week. Computer Graphics. Introduction. Introduction. Graphics Maths by Example. Graphics Maths by Example

FABER Formal Languages, Automata and Models of Computation

1 Nondeterministic Finite Automata

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

QUESTIONS BEGIN HERE!

V={A,B,C,D,E} E={ (A,D),(A,E),(B,D), (B,E),(C,D),(C,E)}

V={A,B,C,D,E} E={ (A,D),(A,E),(B,D), (B,E),(C,D),(C,E)}

Seven-Segment Display Driver

(a) v 1. v a. v i. v s. (b)

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

UNTYPED LAMBDA CALCULUS (II)

Using the Printable Sticker Function. Using the Edit Screen. Computer. Tablet. ScanNCutCanvas

b. How many ternary words of length 23 with eight 0 s, nine 1 s and six 2 s?

Section: Other Models of Turing Machines. Definition: Two automata are equivalent if they accept the same language.

CSI35 Chapter 11 Review

The transformation to right derivation is called the canonical reduction sequence. Bottom-up analysis

Abstract Interpretation: concrete and abstract semantics

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

Deterministic Finite Automata

a b b a pop push read unread

Transcription:

CS 515 Progrmming Lngug nd Compilrs I Lctur 4: Lxicl nd Syntx Anlysis Zhng (Eddy) Zhng Rutgrs Univrsity Fll 2017, 9/26/2017 1

2 Lctur 4A: Lxicl nd Syntx Anlysis II (Th lcturs r sd on th slids copyrightd y Kith Coopr nd Lind Torczon from Ric Univrsity.)

3 Rviw: Thr Pss Compilr Front End IR Middl End IR Bck End Mchin cod Errors Prsr IR Scnnr nd Prsr collort to chck th syntx of th input progrm. 1. Scnnr mps strm of chrctrs into words (words r th fundmntl unit of syntx) strm of chrctrs Scnnr Smntic Elortion Scnnr producs strm of tokns for th prsr Tokn is <prt of spch, word> Prt of spch is unit in th grmmr 2. Prsr mps strm of words into sntnc in grmmticl modl of th input lngug

4 Rviw: Compilr Front End Front End IR Middl End IR Bck End Mchin cod Errors Prsr IR Scnnr nd Prsr collort to chck th syntx of th input progrm. 1. Scnnr mps strm of chrctrs into words (words r th fundmntl unit of syntx) strm of chrctrs Scnnr Smntic Elortion Scnnr producs strm of tokns for th prsr Tokn is <prt of spch, word> Prt of spch is unit in th grmmr 2. Prsr mps strm of words into sntnc in grmmticl modl of th input lngug

5 Rviw: Th Compilr s Front End Scnnr looks t vry chrctr - Convrts strm of chrctrs to strm of tokns, or clssifid words: <prt of spch, word> Prsr IR - Efficincy & sclility mttr Scnnr is only prt of compilr tht looks t vry chrctr Prsr looks t vry tokn -Dtrmins if th strm of tokns forms strm of chrctrs Scnnr Smntic Elortion Front End sntnc in th sourc lngug -Fits tokns to som syntctic modl, or grmmr, for th sourc lngug

6 Rviw: Th Compilr s Front End How cn w utomt th construction of scnnr & prsrs? Scnnr Prsr IR - Spcify syntx with rgulr xprssions (REs) - Construct finit utomton & scnnr from th rgulr xprssion strm of chrctrs Scnnr Smntic Elortion Prsr - Spcify syntx with contxt-fr grmmrs Front End (CFGs) - Construct push-down utomton & prsr from th CFG

7 Rviw: Automt W cn rprsnt this cod for rcognizing not s n utomton c nxt chrctr If c = n thn { c nxt chrctr if c = o thn { c nxt chrctr if c = t thn rturn <NOT, not > ls rport rror } ls rport rror } ls rport rror s 0 s n Any chrctr s 1 o O(1) pr input chrctr, if rnchs r O(1) Trnsition Digrm for not Stts drwn with doul lins indict succss s 2 t s 3

8 Rviw: Automt To rcogniz lrgr st of kywords is (rltivly) sy For xmpl, th st of words { if, int, not, thn, to, writ } s 2 if s 1 f n s 3 t s 4 int on-to-on mp from finl stts to words (my prts of spch) s 0 i n s 5 o s 6 t s 7 not t s 8 h s 9 s 10 s 11 n thn Cost is O(1) pr chrctr w o s 12 to s 13 r s 14 i s 15 t s 16 s 17 writ

9 Rviw: Spcifying An Automton W list th words Ech word s splling is uniqu nd concis W cn sprt thm with th symol, rd s or Th spcifiction: Thr is som sutlty hr. W introducd nottion for conctntion nd choic (or ltrntion). Conctntion: is followd y Altrntion: is ithr or if int not thn to writ is simpl rgulr xprssion for th lngug ccptd y our utomton

10 Rviw: Spcifying An Automton W introducd nottion for W list th words conctntion nd choic (or Ech word s splling is uniqu nd concis ltrntion). W cn sprt thm with th symol, rd s or Conctntion: Th spcifiction: is followd y Altrntion: if int not thn to writ is ithr or is simpl rgulr xprssion for th lngug ccptd y our utomton

11 Rviw: Spcifying An Automton Evry Rgulr Exprssion corrsponds to n utomton -W cn construct, utomticlly, n utomton from th RE -W cn construct, utomticlly, n RE from ny utomton -W cn convrt n utomton sily nd dirctly into cod -Th utomton nd th cod oth hv O(1) cost pr input chrctr An RE spcifiction lds dirctly to n fficint scnnr

12 Rviw: Unsignd Intgrs Th RE corrsponds to n utomton nd n implmnttion 0 s 0 s 1 1 9 0 9 s 2 unsignd intgr unsignd intgr c nxt chrctr n 0 if c = 0 thn rturn <CONSTANT,n> ls if ( 1 c 9 ) thn { n toi(c) c nxt chrctr whil ( 0 c 9 ) { t toi(c) n n * 10 + t } rturn <CONSTANT,n> } ls rport rror Th utomton nd th cod cn gnrtd utomticlly

13 Rviw: Rgulr Exprssion A forml dfintion Rgulr Exprssions ovr n Alpht Σ If x Σ, thn x is n RE dnoting th st { x } or th lngug L = { x } If x nd y r REs thn 1. xy is n RE dnoting L(x)L(y) = { pq p L(x) nd q L(y) } 2. x y is n RE dnoting L(x) L(y) 3. x * is n RE dnoting L(x) * = 0 k < L(x) k (Kln Closur) St of ll strings tht r zro or mor conctntions of x 4. x + is n RE dnoting L(x) + = 1 k < L(x) k (Positiv Closur) St of ll strings tht r on or mor conctntions of x ε is n RE dnoting th mpty st

14 Rviw: Rgulr Exprssion How do ths oprtors hlp? ε is n RE dnoting th mpty st In prctic, th mpty string is oftn usful Rgulr Exprssions ovr n Alpht Σ If x is in Σ, thn x is n RE dnoting th st { x } or th lngug L = { x } Th splling of ny lttr in th lpht is n RE If x nd y r REs thn 1. xy is n RE dnoting L(x)L(y) = { pq p L(x) nd q L(y) } If w conctnt lttrs, th rsult is n RE (splling of words) 2. x y is n RE dnoting L(x) L(y) Any finit list of words cn writtn s n RE, ( w0 w1 w2 wn ) 3. x * is n RE dnoting L(x) * = 0 k < L(x) k 4. x + is n RE dnoting L(x) + = 1 k < L(x) k W cn us closur to writ finit dscriptions of infinit, ut countl, sts

15 Rviw: Rgulr Exprssion Lt th nottion [0-9] shorthnd for (0 1 2 3 4 5 6 7 8 9) RE Exmpls: Non-ngtiv intgr [0-9] [0-9] * or [0-9] + No lding zros 0 [1-9] [0-9] * Algol-styl Idntifir ([-z] [A-Z]) ([-z] [A-Z] [0-9]) * Dciml numr 0 [1-9] [0-9] *. [0-9] * Rl numr (( 0 [1-9] [0-9] * ) (0 [1-9] [0-9] *. [0-9] * ) E [0-9] [0-9] * Ech of ths REs corrsponds to n utomton nd n implmnttion.

16 Rviw: From RE to Scnnr Construct scnnrs dirctly from REs using utomt thory Thr r svrl wys to prform this construction Clssic pproch is two-stp mthod. - 1. Build utomt for ch pic of th RE using simpl tmplt-drivn mthod Build spcific vrition on n utomton tht hs trnsitions on ε nd non-dtrministic choic (multipl trnsitions from stt on th sm symol) This construction is clld Thompson s construction - 2. Convrt th nwly uilt utomton into dtrministic utomton Dtrministic utomton hs no ε-trnsitions nd ll choics r singl-vlud This construction is clld th sust construction - Givn th dtrministic utomton, minimiz it to rduc th numr of stts - Minimiztion is spc optimiztion. Both th originl utomton nd th miniml on tk O(1) tim pr chrctr

17 Rviw: Thompson s Construction Th Ky Id - For ch RE symols nd oprtor, w hv smll tmplt - Build thm, in prcdnc ordr, nd join thm with ε-trnsitions S 0 S 1 S 0 S 1 S 3 S 4 NFA for NFA for S 0 S 1 S 2 S 3 S 4 S 5 S 0 S 1 S 3 S 4 NFA for NFA for * Prcdnc in REs: Prnthss, thn closur, thn conctntion, thn ltrntion

18 Rviw: Thompson s Construction Th Ky Id - For ch RE symols nd oprtor, w hv smll tmplt - Build thm, in prcdnc ordr, nd join thm with ε-trnsitions S 0 S 1 S 0 S 1 S 3 S 4 NFA for NFA for S 0 S 1 S 2 S 3 S 4 S 5 S 0 S 1 S 3 S 4 NFA for NFA for * Prcdnc in REs: Prnthss, thn closur, thn conctntion, thn ltrntion

19 Rviw: Thompson s Construction Th Ky Id - For ch RE symols nd oprtor, w hv smll tmplt - Build thm, in prcdnc ordr, nd join thm with ε-trnsitions S 0 S 1 S 0 S 1 S 3 S 4 NFA for NFA for S 0 S 1 S 2 S 3 S 4 S 5 S 0 S 1 S 3 S 4 NFA for NFA for * Prcdnc in REs: Prnthss, thn closur, thn conctntion, thn ltrntion

20 Rviw: Thompson s Construction Th Ky Id - For ch RE symols nd oprtor, w hv smll tmplt - Build thm, in prcdnc ordr, nd join thm with ε-trnsitions S 0 S 1 S 0 S 1 S 3 S 4 NFA for NFA for S 0 S 1 S 2 S 3 S 4 S 5 S 0 S 1 S 3 S 4 NFA for NFA for * Prcdnc in REs: Prnthss, thn closur, thn conctntion, thn ltrntion

21 Rviw: Thompson s Construction Th Ky Id - For ch RE symols nd oprtor, w hv smll tmplt - Build thm, in prcdnc ordr, nd join thm with ε-trnsitions S 0 S 1 S 0 S 1 S 3 S 4 NFA for NFA for S 0 S 1 S 2 S 3 S 4 S 5 S 0 S 1 S 3 S 4 NFA for NFA for * Prcdnc in REs: Prnthss, thn closur, thn conctntion, thn ltrntion

22 Rviw: Thompson s Construction Th Ky Id - For ch RE symols nd oprtor, w hv smll tmplt - Build thm, in prcdnc ordr, nd join thm with ε-trnsitions S 0 S 1 S 0 S 1 S 3 S 4 NFA for NFA for S 0 S 1 S 2 S 3 S 4 S 5 S 0 S 1 S 3 S 4 NFA for NFA for * Prcdnc in REs: Prnthss, thn closur, thn conctntion, thn ltrntion

23 Rviw: Thompson s Construction Lt s uild n NFA for ( c )* 1.,, & c S 0 S 1 S 0 S 1 2. c S 1 S 2 c S 0 S 1 S 0 S 5 c S 3 S 4 3. ( c ) * S 2 S 3 S 0 S 1 S 6 S 7 c S 4 S 5 4. ( c ) * S 0 S 1 S 4 S 5 S 2 S 3 S 8 S 9 c S 6 S 7 Prcdnc in REs: Prnthss, thn closur, thn conctntion, thn ltrntion

24 Rviw: Thompson s Construction Lt s uild n NFA for ( c )* 1.,, & c S 0 S 1 S 0 S 1 2. c S 1 S 2 c S 0 S 1 S 0 S 5 c S 3 S 4 3. ( c ) * S 2 S 3 S 0 S 1 S 6 S 7 c S 4 S 5 4. ( c ) * S 0 S 1 S 4 S 5 S 2 S 3 S 8 S 9 c S 6 S 7 Prcdnc in REs: Prnthss, thn closur, thn conctntion, thn ltrntion

25 Rviw: Thompson s Construction Lt s uild n NFA for ( c )* 1.,, & c S 0 S 1 S 0 S 1 2. c S 1 S 2 c S 0 S 1 S 0 S 5 c S 3 S 4 3. ( c ) * S 2 S 3 S 0 S 1 S 6 S 7 c S 4 S 5 4. ( c ) * S 0 S 1 S 4 S 5 S 2 S 3 S 8 S 9 c S 6 S 7 Prcdnc in REs: Prnthss, thn closur, thn conctntion, thn ltrntion

26 Rviw: Thompson s Construction Lt s uild n NFA for ( c )* 1.,, & c S 0 S 1 S 0 S 1 2. c S 1 S 2 c S 0 S 1 S 0 S 5 c S 3 S 4 3. ( c ) * S 2 S 3 S 0 S 1 S 6 S 7 c S 4 S 5 4. ( c ) * S 0 S 1 S 4 S 5 S 2 S 3 S 8 S 9 c S 6 S 7 Prcdnc in REs: Prnthss, thn closur, thn conctntion, thn ltrntion

27 Sust Construction Th Concpt - Build simplr utomton (no ε-trnsitions, no multi-vlud trnsitions) tht simults th hvior of th mor complx utomton - Ech stt in th nw utomton rprsnts st of stts in th originl NFA n 0 n 1 ( c )* n 4 n 5 n 2 n 3 n 8 n 9 c n 6 n 7 DFA DFA NFA d 0 d 1 c d 2 c d0 d1 n 0 n 1 n 2 n 3 n 4 n 6 n 9 d 3 d2 n 5 n 8 n 9 n 3 n 4 n 6 c d3 n 7 n 8 n 9 n 3 n 4 n 6

28 Sust Construction Th Concpt - Build simplr utomton (no ε-trnsitions, no multi-vlud trnsitions) tht simults th hvior of th mor complx utomton - Ech stt in th nw utomton rprsnts st of stts in th originl NFA n 0 n 1 ( c )* n 4 n 5 n 2 n 3 n 8 n 9 c n 6 n 7 DFA DFA NFA d 0 d 1 c d 2 c d0 d1 n 0 n 1 n 2 n 3 n 4 n 6 n 9 d 3 d2 n 5 n 8 n 9 n 3 n 4 n 6 c d3 n 7 n 8 n 9 n 3 n 4 n 6

29 Sust Construction Th Concpt - Build simplr utomton (no ε-trnsitions, no multi-vlud trnsitions) tht simults th hvior of th mor complx utomton - Ech stt in th nw utomton rprsnts st of stts in th originl NFA n 0 n 1 ( c )* n 4 n 5 n 2 n 3 n 8 n 9 c n 6 n 7 DFA DFA NFA d 0 d 1 c d 2 c d0 d1 n 0 n 1 n 2 n 3 n 4 n 6 n 9 d 3 d2 n 5 n 8 n 9 n 3 n 4 n 6 c d3 n 7 n 8 n 9 n 3 n 4 n 6

30 Sust Construction Th Concpt - Build simplr utomton (no ε-trnsitions, no multi-vlud trnsitions) tht simults th hvior of th mor complx utomton - Ech stt in th nw utomton rprsnts st of stts in th originl NFA n 0 n 1 ( c )* n 4 n 5 n 2 n 3 n 8 n 9 c n 6 n 7 DFA DFA NFA d 0 d 1 c d 2 c d0 d1 n 0 n 1 n 2 n 3 n 4 n 6 n 9 d 3 d2 n 5 n 8 n 9 n 3 n 4 n 6 c d3 n 7 n 8 n 9 n 3 n 4 n 6

31 Sust Construction Th Concpt - Build simplr utomton (no ε-trnsitions, no multi-vlud trnsitions) tht simults th hvior of th mor complx utomton - Ech stt in th nw utomton rprsnts st of stts in th originl NFA n 0 n 1 ( c )* n 4 n 5 n 2 n 3 n 8 n 9 c n 6 n 7 DFA DFA NFA d 0 d 1 c d 2 c d0 d1 n 0 n 1 n 2 n 3 n 4 n 6 n 9 d 3 d2 n 5 n 8 n 9 n 3 n 4 n 6 c d3 n 7 n 8 n 9 n 3 n 4 n 6

32 Sust Construction Th Concpt - Build simplr utomton (no ε-trnsitions, no multi-vlud trnsitions) tht simults th hvior of th mor complx utomton - Ech stt in th nw utomton rprsnts st of stts in th originl NFA n 0 n 1 ( c )* n 4 n 5 n 2 n 3 n 8 n 9 c n 6 n 7 DFA DFA NFA d 0 d 1 c d 2 c d0 d1 n 0 n 1 n 2 n 3 n 4 n 6 n 9 d 3 d2 n 5 n 8 n 9 n 3 n 4 n 6 c d3 n 7 n 8 n 9 n 3 n 4 n 6

33 Sust Construction Th Concpt - Build simplr utomton (no ε-trnsitions, no multi-vlud trnsitions) tht simults th hvior of th mor complx utomton - Ech stt in th nw utomton rprsnts st of stts in th originl NFA n 0 n 1 ( c )* n 4 n 5 n 2 n 3 n 8 n 9 c n 6 n 7 DFA DFA NFA d 0 d 1 c d 2 c d0 d1 n 0 n 1 n 2 n 3 n 4 n 6 n 9 d 3 d2 n 5 n 8 n 9 n 3 n 4 n 6 c d3 n 7 n 8 n 9 n 3 n 4 n 6

34 Sust Construction Th Concpt - Build simplr utomton (no ε-trnsitions, no multi-vlud trnsitions) tht simults th hvior of th mor complx utomton - Ech stt in th nw utomton rprsnts st of stts in th originl NFA n 0 n 1 ( c )* n 4 n 5 n 2 n 3 n 8 n 9 c n 6 n 7 DFA DFA NFA d 0 d 1 c d 2 c d0 d1 n 0 n 1 n 2 n 3 n 4 n 6 n 9 d 3 d2 n 5 n 8 n 9 n 3 n 4 n 6 c d3 n 7 n 8 n 9 n 3 n 4 n 6

35 Sust Construction Th Concpt - Build simplr utomton (no ε-trnsitions, no multi-vlud trnsitions) tht simults th hvior of th mor complx utomton - Ech stt in th nw utomton rprsnts st of stts in th originl NFA n 0 n 1 ( c )* n 4 n 5 n 2 n 3 n 8 n 9 c n 6 n 7 DFA DFA NFA d 0 d 1 c d 2 c d0 d1 n 0 n 1 n 2 n 3 n 4 n 6 n 9 d 3 d2 n 5 n 8 n 9 n 3 n 4 n 6 c d3 n 7 n 8 n 9 n 3 n 4 n 6

36 Minimiztion DFA minimiztion lgorithms work y discovring stts tht r quivlnt in thir contxts nd rplcing multipl stts with singl on -Minimiztion rducs th numr of stts, ut dos not chng th costs d 0 d 1 c d 2 c s 0 s 1 c Miniml DFA Stt Originl DFA Stt s0 d0 s1 d1, d2, d3 d 3 c

37 Minimiztion DFA minimiztion lgorithms work y discovring stts tht r quivlnt in thir contxts nd rplcing multipl stts with singl on -Minimiztion rducs th numr of stts, ut dos not chng th costs d 0 d 1 c d 2 c s 0 s 1 c Miniml DFA Stt Originl DFA Stt s0 d0 s1 d1, d2, d3 d 3 c

38 Minimiztion DFA minimiztion lgorithms work y discovring stts tht r quivlnt in thir contxts nd rplcing multipl stts with singl on -Minimiztion rducs th numr of stts, ut dos not chng th costs d 0 d 1 c d 2 c s 0 s 1 c Miniml DFA Stt Originl DFA Stt s0 d0 s1 d1, d2, d3 d 3 c

39 Implmnting n Automton A common strtgy is to simult th DFA s xcution - Sklton prsr + tl tht ncors th utomton - Th scnnr gnrtor constructs th tl - Th sklton prsr dos not chng stt s 0 chr NxtChr( ) whil (chr ¹ EOF) { stt d[stt,chr] chr NxtChr( ) } if (stt is finl stt) thn rport succss ls rport n rror Trnsition tl for our miniml DFA δ c s0 s1 s s s1 s s1 s1 Simpl Sklton Scnnr

40 Automtic Scnnr Construction Scnnr Gnrtor - Tsks in spcifiction writtn s collction of rgulr xprssions - Comins thm into on RE using ltrntion ( ) - Builds th miniml utomton ( 2.4, 2.6.2 EAC) - Emits th tls to driv sklton scnnr ( 2.5 EAC) Sourc Cod Sklton Scnnr Tls <prt of spch, word> pirs Spcifictions (s RE) Scnnr Gnrtor

41 Automtic Scnnr Construction Scnnr Gnrtor - As ltrntiv, th gnrtor cn produc cod rthr thn tls - Dirct-codd scnnrs r ugly, ut oftn fstr thn tl-drivn scnnrs - Othr thn spd, th two r quivlnt Sourc Cod Sklton Scnnr Tls <prt of spch, word> pirs Spcifictions (s RE) Scnnr Gnrtor

42 Automton vrsus Scnnr Automton ccpts or rjcts word - Runs until it xhusts th input nd ccpts or rjcts th strm Scnnr looks t th whol progrm nd rturns ll of th tokns - Must rk th input strm into sprt words - Must cptur nd clssify th lxm - Must dcid whn it hs lookd yond th nd of word Scnnr gnrtors uild th utomton for st of REs nd thn convrt th utomton Considr th RE: r [0-9] + And its miniml DFA: s 0 r [0-9] s 1 s 2 [0-9] into n fficint scnnr

43 Tl-Drivn Scnnr Th scnnr dos mor thn rcogniz word - Ctgoriz th tokn nd cptur its splling - Build DFA crfully such tht finl stt mps to ctgory - Rmmr th chrctrs t ch trnsition to cptur th splling

44 Tl-Drivn Scnnr Th scnnr dos mor thn rcogniz word - Ctgoriz th tokn nd cptur its splling - Build DFA crfully such tht finl stt mps to ctgory - Rmmr th chrctrs t ch trnsition to w cptur th splling chr nxt chrctr stt s 0 lxm null string whil (chr EOF ) { lxm lxm chr stt δ[stt,chr] chr nxt chrctr } If (stt S A ) thn rsult <PoS(stt), lxm> ls rsult <invlid, > rturn rsult δ r Trnsition Tl (δ) 0,1,2,3,4, Any 5,6,7,8,9 Othr s 0 s 1 s s s 1 s s 2 s s 2 s s 2 s s s s s

45 Tl-Drivn Scnnr Th scnnr dos mor thn rcogniz word - Ctgoriz th tokn nd cptur its splling - Build DFA crfully such tht finl stt mps to ctgory - Rmmr th chrctrs t ch trnsition to w cptur th splling Th scnnr nds to rcogniz whn on tokn nds nd nw on gins - Nd to rturn ll th words, in ordr, ut on t tim - Do not wnt forc lnks or dlimitrs vrywhr - Should x + y * z nd x+y*z scn diffrntly?

46 Rcognizing Tokn Boundris Two potntil solutions - Rquir dlimitrs twn vry tokn might ugly nd pinful - Run th utomton to n rror or EOF, nd thn ck up to finl stt // rcogniz words chr nxt chrctr stt s 0 lxm mpty string clr stck push (d) whil (stt ¹ s ) do chr nxt chrctr lxm lxm chr if stt S A thn clr stck push (stt) stt d(stt,chr) nd // cln up finl stt whil (stt S A nd stt d) do stt pop() trunct lxm roll ck th input on chr nd // rport th rsults if (stt S A ) thn rsult = <PoS(stt), lxm> ls rsult = <invlid, > rturn rsult

47 Tl-Drivn Scnnr Th scnnr dos mor thn rcogniz word - Ctgoriz th tokn nd cptur its splling - Build DFA crfully such tht finl stt mps to ctgory - Rmmr th chrctrs t ch trnsition to w cptur th splling Th scnnr nds to rcogniz whn on tokn nds nd nw on gins - Nd to rturn ll th words, in ordr, ut on t tim - Do not wnt forc lnks or dlimitrs vrywhr - Should x + y * z nd x+y*z scn diffrntly? Th scnnr nds to dl with th cs whn multipl mtchs r possil - Two or mor ctgoris for th sm xct string - Two or mor mtchs for givn string

48 Amiguous Rgulr Exprssions Amiguity is not lwys d thing - Somtims, it is sir to spcify n miguous RE + ++ += * ** *= < <= = > >= - Th scnnr nds to hndl miguous REs in n pproprit wy Nd n pproch to spcify th xpctd hvior - Scnnr gnrtors ssign priority or prcdnc y th ordr in which th pttrns ppr in th spcifiction - First mtching pttrn tks prcdnc

49 Exmpl: Rcognizing Kywords for nd whil For n RE such s for whil ( [-z] [-z] [0-9])* ) w wnt n utomt such s f o r s 2 s 3 s 4 [^o] [^r] [-z] [0-9] [^x] mns ny chrctr in Σ xcpt x [-] [g-v] [x-z] s 0 s 10 [-z] [0-9] [-z] [0-9] w s 5 [^h] [^i] [^l] [^] h i l s 6 s 7 s 8 s 9 s4 for s9 whil s10 gnrl idntifir

50 Amiguous Rgulr Exprssions Most lngugs hv oth kywords nd idntifirs - for might th RE for th kyword for - ([-z] ) ([-z] [0-9] ) might th RE for n idntifir - Th input string for mtchs oth REs Kywords r somtims hndld s spcil cs - Us th simplr RE nd utomton shown hr - Build hsh tl of idntifirs & prlod kywords - Rcogniz idntifirs during th hsh lookup 1 Th Trdoff [-z] s 0 s 10 [-z] [0-9] Simplr utomton - Mor stts in th scnnr vrsus lookup cost on kywords - Encoding kywords in RE hs no significnt runtim cost - Eithr strtgy works & diffrnc in runtim cost is tiny

51 Automtic Gnrtion of Scnnrs nd Prsrs Scnnr Gnrtion Procss 1. Writ down th REs for th input lngug nd connct thm with 2. Build ig, simpl NFA 3. Build th DFA tht simults th NFA 4. Minimiz th numr of stts in th DFA 5. Gnrt n implmnttion from th miniml DFA Scnnr Gnrtors 1. lx, flx, jflx, nd such ll work long ths lins 2. Algorithms r wll-known nd wll-undrstood 3. Ky issus: finding th longst mtch rthr thn first mtch, nd nginring th intrfc to th prsr

52 Automtic Gnrtion of Scnnrs nd Prsrs Dsign Tim spcifictions s CFGs Prsr Gnrtor Compilr-uild Tim Spcifiction s micro Convrt spcifiction to syntx nd syntx spcifictions s REs Scnnr Gnrtor Compilr-uild tim ctul compilr cod Compil tim Prsr IR nnottions strm of chrctrs Scnnr Smntic Elortion Front End Compil Tim Trnslt n ppliction in spcifid lngug into n xcutl form

53 Automtic Scnnr Gnrtion flx is scnnr gnrtor - flx follows th input convntions of lx - Tks fil tht includs REs for th words - Producs tl-drivn scnnr to rcogniz words nd rturn tokns - flx works with ison: th GNU LR(1) prsr gnrtor ison producs fil of dfinitions to - connct scnnr prts of spch to trminl symols in th grmmr (*.t.h) sourc cod Scnnr <prt of spch, word> pirs spcifictions writtn s rgulr xprssions Scnnr Gnrtor

54 Us flx to Gnrt Scnnrs A flx input fil hs thr prts Dfinitions sction - Holds #includ s, dclrtions, nd dfinitions (.g., DIGIT [0-9]) - Mchnism to insrt vrtim cod t th top of gnrtd scnnr cod Ruls sction - Holds sris of ruls, which consist of pttrn nd n ction - pttrn is just n RE - ction is frgmnt of C cod; ithr singl sttmnt or lock in rckts - Ruls r prioritizd in ordr of pprnc - Mchnism to insrt ritrry cod into scnning routin Usr cod sction - Holds cod tht is copid, vrtim, to th scnnr output fil - This sction is optionl good plc to put th min function Smpl Cod: /* Compnion sourc cod for "flx & ison", pulishd y O'Rilly */ /* f1-1 just lik unix wc */ %{ int chrs = 0; int words = 0; int lins = 0; %} %% [-za-z]+ { words++; chrs += strln(yytxt); } \n { chrs++; lins++; }. { chrs++; } %% min() { yylx(); printf("%8d%8d%8d\n", lins, words, chrs); } Sctions r sprtd y th dlimitr %% Cod: /il/usrs/zz124/cs515_2017/compilr_smpls/flxandbisonbsics

55 Spcifying RE Pttrns in flx flx hs rich spcifiction lngug mtchs th chrctr. mtchs ny chrctr xcpt \n [c] is clss; it mtchs c [f-jz] mtchs f g h i j Z f-j is numricl rng in collting squnc [^c] is ngtd clss; it mtchs ny chrctr xcpt,, or c [-z]{-}[iou] mtchs lowrcs consonnt lttrs (st sutrction) x * mtchs zro or mor instncs 1 of x x + mtchs on or mor instncs of x x? mtchs zro or on instnc of x x{3} mtchs xctly 3 x s, s would xxx x{3,} mtchs 3 or mor instncs of x x{3,5} mtchs 3, 4, or 5 instncs of x {nm} mtchs nm dfind in th dfinition sction of th fil \n mtchs th nwlin chrctr \, \, \f, \r, \t, \v mtch thir C scps \x mtchs x; usful for mtching oprtors in th RE nottion such s ^ xy mtchs RE x followd y RE y x y mtchs ithr RE x or RE y ^x mtchs x t th strt of lin x$ mtchs RE x t th nd of lin x/y mtchs x only if it is followd y RE y, ut dos not mtch y. (pushs y ck into th input strm) And, thr r mor options

Lctur 4B: Lxicl nd Syntx Anlysis III 56

57 Rviw: Compilr Front End Front End IR Middl End IR Bck End Mchin cod Errors Prsr IR Scnnr nd Prsr collort to chck th syntx of th input progrm. 1. Scnnr mps strm of chrctrs into words (words r th fundmntl unit of syntx) strm of chrctrs Scnnr Smntic Elortion Scnnr producs strm of tokns for th prsr Tokn is <prt of spch, word> Prt of spch is unit in th grmmr 2. Prsr mps strm of words into sntnc in grmmticl modl of th input lngug

58 Rviw: Th Compilr s Front End Scnnr looks t vry chrctr Convrts strm of chrctrs to strm of tokns, or clssifid words: <prt of spch, word> Efficincy & sclility mttr Scnnr is only prt of compilr tht looks t vry chrctr Prsr looks t vry tokn -Dtrmins if th strm of tokns forms strm of chrctrs Scnnr Prsr Smntic Elortion Front End IR sntnc in th sourc lngug -Fits tokns to som syntctic modl, or grmmr, for th sourc lngug

59 Rviw: Th Compilr s Front End How cn w utomt th construction of scnnr & prsrs? Scnnr Prsr IR - Spcify syntx with rgulr xprssions (REs) - Construct finit utomton & scnnr from th rgulr xprssion strm of chrctrs Scnnr Smntic Elortion Prsr - Spcify syntx with contxt-fr grmmrs Front End (CFGs) - Construct push-down utomton & prsr from th CFG

60 Th Study of Prsing Prsing is th procss of discovring drivtion for som sntnc -Nd mthmticl modl of syntx contxt-fr grmmr G -Nd n lgorithm to tst mmrship in L(G)

61 Spcifying Syntx: Contxt-Fr Grmmrs A CFG is four tupl, G = (S, N, T, P ) -S is th strt symol of th grmmr L(G) is th st of sntncs tht cn drivd from S -N is st of nontrminl symols or syntctic vrils { Gol, List, Pir } -T is st of trminl symols or words { (, ) } -P is st of productions or rwrit ruls 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Prnthss Grmmr This grmmr is writtn in vrint of Bckus Nur Form (BNF)

62 Spcifying Syntx: Contxt-Fr Grmmrs A CFG is four tupl, G = (S, N, T, P ) -S is th strt symol of th grmmr L(G) is th st of sntncs tht cn drivd from S -N is st of nontrminl symols or syntctic vrils { Gol, List, Pir } -T is st of trminl symols or words { (, ) } -P is st of productions or rwrit ruls 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Prnthss Grmmr L(G) includs sntncs such s ( ), (( )) ( ), & ( ) ( ) ( ) This grmmr is writtn in vrint of Bckus Nur Form (BNF)

63 Spcifying Syntx: Contxt-Fr Grmmrs Why not us rgulr xprssions to spcify PL syntx? Rgulr lngugs, rgulr xprssions, nd DFAs r limitd - Cn DFA count? As in L = { p k q k } or L = { wcw R w Σ* } - Finit stts mns no counting of unoundd vnt - Mtching itms in this fshion rquirs stck - Push p s onto th stck; pop thm off to mtch q s - Empty stck t th nd indicts n qul numr of p s nd q s Nithr of ths is rgulr lngug Dos this limittion mttr? - Sinc L = { p k q k } is th ssnc of mtching prnthss, rckts, nd th lik, this limittion is n importnt on - Evry progrmming lngug I know hs som mtching construct Bottom lin: REs for splling, CFGs for syntx

64 Spcifying Syntx: Contxt-Fr Grmmrs Why not us rgulr xprssions to spcify PL syntx? Rgulr lngugs, rgulr xprssions, nd DFAs r limitd - A DFA cnnot count, s in L = { p k q k } or L = { wcw R w Σ* } - Finit stts mns no counting of unoundd vnt - Mtching itms in this fshion rquirs stck - Push p s onto th stck; pop thm off to mtch q s - Empty stck t th nd indicts n qul numr of p s nd q s Nithr of ths is rgulr lngug Dos this limittion mttr? - Sinc L = { p k q k } is th ssnc of mtching prnthss, rckts, nd th lik, this limittion is n importnt on - Evry progrmming lngug I know hs som mtching construct Bottom lin: REs for splling, CFGs for syntx

65 Spcifying Syntx: Contxt-Fr Grmmrs Why not us rgulr xprssions to spcify PL syntx? Rgulr lngugs, rgulr xprssions, nd DFAs r limitd - A DFA cnnot count, s in L = { p k q k } or L = { wcw R w Σ* } - Finit stts mns no counting of unoundd vnt - Mtching itms in this fshion rquirs stck - Push p s onto th stck; pop thm off to mtch q s - Empty stck t th nd indicts n qul numr of p s nd q s Nithr of ths is rgulr lngug Dos this limittion mttr? - Sinc L = { p k q k } is th ssnc of mtching prnthss, rckts, nd th lik, this limittion is n importnt on - Evry progrmming lngug I know hs som mtching construct Bottom lin: REs for splling, CFGs for syntx

66 Spcifying Syntx: Contxt-Fr Grmmrs Why not us rgulr xprssions to spcify PL syntx? Rgulr lngugs, rgulr xprssions, nd DFAs r limitd - A DFA cnnot count, s in L = { p k q k } or L = { wcw R w Σ* } - Finit stts mns no counting of unoundd vnt - Mtching itms in this fshion rquirs stck - Push p s onto th stck; pop thm off to mtch q s - Empty stck t th nd indicts n qul numr of p s nd q s Nithr of ths is rgulr lngug Dos this limittion mttr? - Sinc L = { p k q k } is th ssnc of mtching prnthss, rckts, nd th lik, this limittion is n importnt on - Evry progrmming lngug I know hs som mtching construct Bottom lin: REs for splling, CFGs for syntx

67 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc A drivtion consists of sris of rwrit stps S γ0 γ1 γ2 γn 1 γn sntnc S is th strt symol of th grmmr Ech γi is sntntil form - If γ contins only trminl symols, γ is sntnc in L(G) - If γ contins 1 or mor non-trminls, γ is sntntil form To gt γi from γi 1, xpnd som NT A γi 1 y using A β - Rplc th occurrnc of A γi 1 with β to gt γi - Rplcing th lftmost NT t ch stp, crts lftmost drivtion - Rplcing th rightmost NT t ch stp, crts rightmost drivtion NT mns nontrminl symol

68 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Rul Sntntil Form 1 Gol List Drivtion Gol 1 List 3 Pir 5 ( ) Prs 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Drivtion of ( ) Prnthss Grmmr

69 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Rul Sntntil Form 1 Gol List Drivtion Gol 1 List 3 Pir 5 ( ) Prs 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Drivtion of ( ) Prnthss Grmmr

70 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Rul Sntntil Form 1 Gol List Drivtion Gol 1 List 3 Pir 5 ( ) Prs 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Drivtion of ( ) Prnthss Grmmr

71 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Rul Sntntil Form 1 Gol List Drivtion Gol 1 List 3 Pir 5 ( ) Prs 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Drivtion of ( ) Prnthss Grmmr

72 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Rul Sntntil Form 1 Gol List Drivtion Gol 1 List 3 Pir 5 ( ) Prs 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Drivtion of ( ) Prnthss Grmmr

73 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Rul Sntntil Form 1 Gol List Drivtion Gol 1 List 3 Pir 5 ( ) Prs 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Drivtion of ( ) Prnthss Grmmr

74 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Drivtion Rul Sntntil Form Gol 1 List 2 List Pir 5 List ( ) 3 Pir ( ) 4 ( List ) ( ) Prs 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Prnthss Grmmr 3 ( Pir ) ( ) 5 ( ( ) ) ( ) W dnot this prticulr drivtion: Gol * ( ( ) ) ( ) Drivtion of ( ( ) ) ( )

75 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Drivtion Rul Sntntil Form Gol 1 List 2 List Pir 5 List ( ) 3 Pir ( ) 4 ( List ) ( ) Prs 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Prnthss Grmmr 3 ( Pir ) ( ) 5 ( ( ) ) ( ) W dnot this prticulr drivtion: Gol * ( ( ) ) ( ) Drivtion of ( ( ) ) ( )

76 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Drivtion Rul Sntntil Form Gol 1 List 2 List Pir 5 List ( ) 3 Pir ( ) 4 ( List ) ( ) Prs 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Prnthss Grmmr 3 ( Pir ) ( ) 5 ( ( ) ) ( ) W dnot this prticulr drivtion: Gol * ( ( ) ) ( ) Drivtion of ( ( ) ) ( )

77 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Drivtion Rul Sntntil Form Gol 1 List 2 List Pir 5 List ( ) 3 Pir ( ) 4 ( List ) ( ) Prs 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Prnthss Grmmr 3 ( Pir ) ( ) 5 ( ( ) ) ( ) W dnot this prticulr drivtion: Gol * ( ( ) ) ( ) Drivtion of ( ( ) ) ( )

78 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Drivtion Rul Sntntil Form Gol 1 List 2 List Pir 5 List ( ) 3 Pir ( ) 4 ( List ) ( ) Prs 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Prnthss Grmmr 3 ( Pir ) ( ) 5 ( ( ) ) ( ) W dnot this prticulr drivtion: Gol * ( ( ) ) ( ) Drivtion of ( ( ) ) ( )

79 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Drivtion Rul Sntntil Form Gol 1 List 2 List Pir 5 List ( ) 3 Pir ( ) 4 ( List ) ( ) Prs 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Prnthss Grmmr 3 ( Pir ) ( ) 5 ( ( ) ) ( ) W dnot this prticulr drivtion: Gol * ( ( ) ) ( ) Drivtion of ( ( ) ) ( )

80 Prsing Th point of prsing is to discovr grmmticl drivtion for sntnc Drivtion Rul Sntntil Form Gol 1 List 2 List Pir 5 List ( ) 3 Pir ( ) 4 ( List ) ( ) Prs 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Prnthss Grmmr 3 ( Pir ) ( ) 5 ( ( ) ) ( ) W dnot this prticulr drivtion: Gol * ( ( ) ) ( ) Drivtion of ( ( ) ) ( )

81 Prsing A drivtion corrsponds to prs tr or syntx tr Rul Sntntil Form Gol Gol 1 List List Drivtion 2 List Pir 5 List ( ) 3 Pir ( ) 4 ( List ) ( ) Prs List Pir Pir ( ) 3 ( Pir ) ( ) 5 ( ( ) ) ( ) ( ) List Drivtion of ( ( ) ) ( ) Pir Gol * ( ( ) ) ( ) ( )

82 Prsing A drivtion corrsponds to prs tr or syntx tr Rul Sntntil Form Gol Gol 1 List List Drivtion 2 List Pir 5 List ( ) 3 Pir ( ) 4 ( List ) ( ) Prs List Pir Pir ( ) 3 ( Pir ) ( ) 5 ( ( ) ) ( ) ( ) List Drivtion of ( ( ) ) ( ) Th drivtion givs us th grmmticl structur of th input sntnc, which ws missing in th DFA / RE sd rcognizrs. Pir ( ) Gol * ( ( ) ) ( )

83 Two Ctgoris of Systmtic Prsrs Top-down prsrs (LL(1), rcursiv dscnt s EAC 3.3) - Strt with th root of th prs tr nd grow towrd th lvs - At ch stp, pick production & try to mtch th input - Bd pick my nd to cktrck - Som grmmrs r cktrck fr List Gol List Pir Bottom-up prsrs (LR(1), oprtor prcdnc s EAC 3.4) - Strt t th lvs nd grow towrd th root - As input is consumd, ncod possiilitis in n intrnl stt - Strt in stt vlid for lgl first tokns - W cn mk th procss dtrministic Pir ( ) ( ) List Pir Gol * ( ( ) ) ( ) ( )

84 Two Ctgoris of Systmtic Prsrs Top-down prsrs (LL(1), rcursiv dscnt s EAC 3.3) - Strt with th root of th prs tr nd grow towrd th lvs - At ch stp, pick production & try to mtch th input - Bd pick my nd to cktrck - Som grmmrs r cktrck fr List Gol List Pir Bottom-up prsrs (LR(1), oprtor prcdnc s EAC 3.4) - Strt t th lvs nd grow towrd th root - As input is consumd, ncod possiilitis in n intrnl stt - Strt in stt vlid for lgl first tokns - W cn mk th procss dtrministic Pir ( ) ( ) List Pir Gol * ( ( ) ) ( ) ( )

85 Bottom-up Prsrs How dos ottom-up prsr uild (discovr) drivtion? Th prsr uss stck to hold grmmr symols, oth trminl symols & nontrminl symols (In ssnc), it pushs tokns onto stck until th stck holds th right-hnd sid (rhs) of som production (rwrit rul) Whn it finds n rhs t th top of th stck, it rwrits th rhs with th rul s lhs (clld rduction or rduc ction ) How dos it rcogniz n rhs, or (mor prcisly) th nxt rhs in th drivtion of th input sntnc? For th momnt, ssum tht w hv n orcl to nswr tht qustion Applying th orcl to th stck yilds on of thr rsults: A production <lhs:rhs> whos rhs is t th top of th stck An indiction tht th stck is consistnt with som futur rs An indiction tht th stck cnnot ld to n rhs

86 A concptul ottom-up, shift-rduc prsr push $ onto th stck word NxtTokn() rpt until (top of stck = S nd word = EOF) ction orcl(stck,word) if (ction is futur & word EOF) thn push word word NxtTokn() ls if (ction is <lhs:rhs>) thn pop rhs symols off th stck push lhs onto th stck ls rport n rror rk out of loop rport succss Hr: $ is usd s n invlid symol S is th gol symol of th grmmr G push() nd pop() implmnt simpl stck

87 A concptul ottom-up, shift-rduc prsr push $ onto th stck word NxtTokn() rpt until (top of stck = S nd word = EOF) ction orcl(stck,word) if (ction is futur & word EOF) thn push word word NxtTokn() ls if (ction is <lhs:rhs>) thn pop rhs symols off th stck push lhs onto th stck ls rport n rror rk out of loop A shift-rduc prsr hs four kinds of ctions: Shift: nxt word is movd from input to th stck Rduc: TOS is rhs of rduction pop rhs off th stck push lhs onto th stck Error: rport th prolm to usr Accpt: (norml xit from loop) rport succss nd stop prsing Shift, Accpt, & Error r O(1) Rduc is O( rhs ) rport succss

88 A concptul ottom-up, shift-rduc prsr push $ onto th stck word NxtTokn() rpt until (top of stck = S nd word = EOF) ction orcl(stck,word) if (ction is futur & word EOF) thn push word word NxtTokn() ls if (ction is <lhs:rhs>) thn pop rhs symols off th stck push lhs onto th stck ls rport n rror rk out of loop A shift-rduc prsr hs four kinds of ctions: Shift: nxt word is movd from input to th stck Rduc: TOS is rhs of rduction pop rhs off th stck push lhs onto th stck Error: rport th prolm to usr Accpt: (norml xit from loop) rport succss nd stop prsing Shift, Accpt, & Error r O(1) Rduc is O( rhs ) rport succss

89 A concptul ottom-up, shift-rduc prsr push $ onto th stck word NxtTokn() rpt until (top of stck = S nd word = EOF) ction orcl(stck,word) if (ction is futur & word EOF) thn push word word NxtTokn() ls if (ction is <lhs:rhs>) thn pop rhs symols off th stck push lhs onto th stck ls rport n rror rk out of loop A shift-rduc prsr hs four kinds of ctions: Shift: nxt word is movd from input to th stck Rduc: TOS is rhs of rduction pop rhs off th stck push lhs onto th stck Error: rport th prolm to usr Accpt: (norml xit from loop) rport succss nd stop prsing Shift, Accpt, & Error r O(1) Rduc is O( rhs ) rport succss

90 A concptul ottom-up, shift-rduc prsr push $ onto th stck word NxtTokn() rpt until (top of stck = S nd word = EOF) ction orcl(stck,word) if (ction is futur & word EOF) thn push word word NxtTokn() ls if (ction is <lhs:rhs>) thn pop rhs symols off th stck push lhs onto th stck ls rport n rror rk out of loop A shift-rduc prsr hs four kinds of ctions: Shift: nxt word is movd from input to th stck Rduc: TOS is rhs of rduction pop rhs off th stck push lhs onto th stck Error: rport th prolm to usr Accpt: (norml xit from loop) rport succss nd stop prsing Shift, Accpt, & Error r O(1) Rduc is O( rhs ) rport succss

91 A concptul ottom-up, shift-rduc prsr push $ onto th stck word NxtTokn() rpt until (top of stck = S nd word = EOF) ction orcl(stck,word) if (ction is futur & word EOF) thn push word word NxtTokn() ls if (ction is <lhs:rhs>) thn pop rhs symols off th stck push lhs onto th stck ls rport n rror rk out of loop A shift-rduc prsr hs four kinds of ctions: Shift: nxt word is movd from input to th stck Rduc: TOS is rhs of rduction pop rhs off th stck push lhs onto th stck Error: rport th prolm to usr Accpt: (norml xit from loop) rport succss nd stop prsing Shift, Accpt, & Error r O(1) Rduc is O( rhs ) rport succss

92 A concptul ottom-up, shift-rduc prsr Considr th input strm ( ) Stp Stck Word Orcl s nswr $ ( som futur rhs 1 $ ( ) som futur rhs 2 $ ( ) EOF < Pir : ( ) > 5 3 $ Pir EOF < List : Pir > 3 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) 4 $ List EOF < Gol : List> 1 5 $ Gol EOF xits whil loop & rports succss

93 A concptul ottom-up, shift-rduc prsr Considr th input strm ( ) Stp Stck Word Orcl s nswr $ ( som futur rhs 1 $ ( ) som futur rhs 2 $ ( ) EOF < Pir : ( ) > 5 3 $ Pir EOF < List : Pir > 3 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) 4 $ List EOF < Gol : List> 1 5 $ Gol EOF xits whil loop & rports succss

94 A concptul ottom-up, shift-rduc prsr Considr th input strm ( ) Stp Stck Word Orcl s nswr $ ( som futur rhs 1 $ ( ) som futur rhs 2 $ ( ) EOF < Pir : ( ) > 5 3 $ Pir EOF < List : Pir > 3 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) 4 $ List EOF < Gol : List> 1 5 $ Gol EOF xits whil loop & rports succss

95 A concptul ottom-up, shift-rduc prsr Considr th input strm ( ) Stp Stck Word Orcl s nswr $ ( som futur rhs 1 $ ( ) som futur rhs 2 $ ( ) EOF < Pir : ( ) > 5 3 $ Pir EOF < List : Pir > 3 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) 4 $ List EOF < Gol : List> 1 5 $ Gol EOF xits whil loop & rports succss

96 A concptul ottom-up, shift-rduc prsr Considr th input strm ( ) Stp Stck Word Orcl s nswr $ ( som futur rhs 1 $ ( ) som futur rhs 2 $ ( ) EOF < Pir : ( ) > 5 3 $ Pir EOF < List : Pir > 3 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) 4 $ List EOF < Gol : List> 1 5 $ Gol EOF xits whil loop & rports succss

97 A concptul ottom-up, shift-rduc prsr Considr th input strm ( ) Stp Stck Word Orcl s nswr $ ( som futur rhs 1 $ ( ) som futur rhs 2 $ ( ) EOF < Pir : ( ) > 5 3 $ Pir EOF < List : Pir > 3 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) 4 $ List EOF < Gol : List> 1 5 $ Gol EOF xits whil loop & rports succss

98 Mor Shifts nd Rduc Considr th input strm (( )) ( ) Mor complx xmpl Ky is rcognizing th diffrnc twn futur, n rror, nd nd rduction In n LR(1) prsr, th orcl is ncodd into two prs tls: nmd ACTION nd GOTO 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Stp Stck Word Orcl s nswr $ ( som futur rhs 1 $ ( ( som futur rhs 2 $ ( ( ) som futur rhs 3 $ ( ( ) ) < Pir, ( ) > 5 4 $ ( Pir ) < List, Pir > 3 5 $ ( List ) som futur rhs 6 $ ( List ) ( < Pir, ( List ) > 4 7 $ Pir ( < List, Pir > 3 8 $ List ( som futur rhs 9 $ List ( ) som futur rhs 10 $ List ( ) EOF < Pir, ( ) > 5 11 $ List Pir EOF < List, List Pir > 2 12 $ List EOF < Gol, List > 1 13 $ Gol EOF xits whil loop & rports succss

99 How dos this work? An LR(1) prsr ncods stt informtion on th stck Tht stt informtion thrds togthr th rduc ctions to nsur tht th prsr uilds drivtion. -At ch point, th prsr is in stt tht rprsnts ll of th possil outcoms -From th comintion of th stt, th stck, nd th nxt word, it cn dcid whthr to shift, rduc, ccpt, or throw n rror (W will s xmpls.) All of tht knowldg is ncodd into th ACTION nd GOTO tls Th LR(1) Tl construction simults th st of prsr stts whn it uilds th ACTION nd GOTO tls S 3.4 of EAC ook

100 An LR(1) Sklton Prsr stck.push( INVALID ); stck.push(s 0 ); // initil stt word NxtWord(); loop forvr { s stck.top(); if ( ACTION[s,word] == rduc A ) thn { stck.popnum( 2* ); // pop RHS off stck s stck.top(); stck.push( A ); // push LHS, A stck.push( GOTO[s,A] ); // push nxt stt } ls if ( ACTION[s,word] == shift s i ) thn { stck.push(word); stck.push( s i ); word NxtWord(); } ls if ( ACTION[s,word] == ccpt & word == EOF) thn rk; ls throw syntx rror; } rport succss; Th Sklton LR(1) Prsr Follows th sic shift-rduc schm from prvious slids Rlis on stck & scnnr Stcks <symol,stt> pirs ACTION tl ncods th shift, rduc, ccpt, rror dcision GOTO thrds rduc ctions togthr to form potntil sntncs in L(G) Shifts words tims Rducs drivtion tims Accpts t most onc Dtcts rrors t th rlist possil point

101 LR(1) Tls for th Prnthsis Grmmr ACTION GOTO Stt ( ) EOF s 0 s 3 s 1 s 3 cc s 2 r 3 r 3 s 3 s 7 s 8 s 4 r 2 r 2 s 5 s 7 s 10 s 6 r 3 r 3 s 7 s 7 s 12 s 8 r 5 r 5 s 9 r 2 r 2 s 10 r 4 r 4 Stt List Pir s 0 1 2 s 1 4 s 2 s 3 5 6 s 4 s 5 9 s 6 s 7 11 6 s 8 s 9 s 10 1 Gol List 2 List List Pir 3 Pir 4 Pir ( List ) 5 ( ) Action Tl Entris Entry Mning s 3 shift & go to stt 3 r 2 rduc y prod n 2 s 11 s 7 s 13 s 11 9 cc ccpt sntnc s 12 r 5 r 5 s 12 syntx rror s 13 r 4 r 4 s 13

102 Th Prnthsis Lngug: Prsing ( ) -Trc of th Sklton Prsr s Action on ( ) Stt Lookhd Stck Hndl Action ( $ 0 non 0 ( $ 0 non shift 3 3 ) $ 0 ( 3 non shift 8 8 EOF $ 0 ( 3 ) 8 Pir ( ) rduc 5 2 EOF $ 0 Pir 2 List Pir rduc 3 1 EOF $ 0 List 1 Gol List ccpt

103 Summry LR(1) Prsrs - Th prsr gnrtor uilds modl of ll possil stts of th prsr - If, in ch stt, th shift / rduc / ccpt dcision cn md with just th lft contxt nd 1 word of lookhd, th grmmr is n LR(1) grmmr. Thr r mny grmmrs tht r not LR(1) grmmrs Thr r LR(1) grmmrs for ll dtrministiclly prsl lngugs - W will look t th kinds of prolms tht mk grmmr non-lr(1) Proprtis w wnt in grmmr - Grmmr must unmiguous - Grmmr must nforc dsird structur or mning For xmpl, prcdnc in rithmtic xprssions 1 + 2 * 3 = 7, not 9 - Grmmr must hv tht odd 1 word lookhd proprty - Som proprtis r lngug dsign, som r grmmr dsign

104 Rding Enginring A Compilr (EAC) Chptr 2.4 (Gnrtion of utomton nd th cod) Chptr 2.4.2 (Thompson s Construction) Chptr 2.4.3 (Sust Construction) Chptr 2.4.4 nd Chptr 2.6.3 (DFA Minimiztion) Chptr 2.5 (Implmnt n Automton) Chptr 3.2.1 (Us of CFG for PL syntx) Chptr 3.3 (Top-down prsrs) Chptr 3.4 (Bottom-up prsrs)