Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Similar documents
Syntax Analysis Part I

Syntax Analysis Part I

Syntax Analysis Part III

Top-Down Parsing and Intro to Bottom-Up Parsing

Syntactic Analysis. Top-Down Parsing

CSE302: Compiler Design

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

THEORY OF COMPILATION

Introduction to Bottom-Up Parsing

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Syntax Analysis - Part 1. Syntax Analysis

Parsing -3. A View During TD Parsing

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

Compiler Design Spring 2017

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Bottom-Up Syntax Analysis

Top-Down Parsing, Part II

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Compiling Techniques

Syntax Analysis (Part 2)

CS415 Compilers Syntax Analysis Top-down Parsing

EXAM. Please read all instructions, including these, carefully NAME : Problem Max points Points 1 10 TOTAL 100

LR2: LR(0) Parsing. LR Parsing. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

CONTEXT FREE GRAMMAR AND

Computer Science 160 Translation of Programming Languages

Context free languages

CS 406: Bottom-Up Parsing

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Exercises. Exercise: Grammar Rewriting

Context-Free Grammars (and Languages) Lecture 7

CA Compiler Construction

Creating a Recursive Descent Parse Table

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lent Term 2015 Lectures (of 16)

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Computing if a token can follow

CS 314 Principles of Programming Languages

Computer Science 160 Translation of Programming Languages

Bottom up parsing. General idea LR(0) SLR LR(1) LALR To best exploit JavaCUP, should understand the theoretical basis (LR parsing);

Follow sets. LL(1) Parsing Table

Compiling Techniques

Compiling Techniques

Compiler Principles, PS4

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Shift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Compiling Techniques

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 7a Syntax Analysis Elias Athanasopoulos

LL(1) Grammar and parser

I 1 : {S S } I 2 : {S X ay, Y X } I 3 : {S Y } I 4 : {X b Y, Y X, X by, X c} I 5 : {X c } I 6 : {S Xa Y, Y X, X by, X c} I 7 : {X by } I 8 : {Y X }

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

Context Free Grammars

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Lecture 11 Context-Free Languages

INF5110 Compiler Construction

Announcements. H6 posted 2 days ago (due on Tue) Midterm went well: Very nice curve. Average 65% High score 74/75

Chapter 4: Bottom-up Analysis 106 / 338

Parsing VI LR(1) Parsers

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

MA/CSSE 474 Theory of Computation

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412

LR(1) Parsers Part II. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Why augment the grammar?

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Compiler Design. Spring Syntactic Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Syntax Directed Transla1on

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Intro to Theory of Computation

Compiler Construction Lectures 13 16

CYK Algorithm for Parsing General Context-Free Grammars

Parsing with Context-Free Grammars

CPS 220 Theory of Computation

CS415 Compilers Syntax Analysis Bottom-up Parsing

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

Einführung in die Computerlinguistik

INF5110 Compiler Construction

Pushdown Automata: Introduction (2)

CISC4090: Theory of Computation

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Foundations of Informatics: a Bridging Course

Context-free Grammars and Languages

Announcement: Midterm Prep

The Parser. CISC 5920: Compiler Construction Chapter 3 Syntactic Analysis (I) Grammars (cont d) Grammars

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

Compiler Principles, PS7

CS20a: summary (Oct 24, 2002)

Computational Models - Lecture 4

This lecture covers Chapter 5 of HMU: Context-free Grammars

Pushdown Automata (Pre Lecture)

Transcription:

1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van ngelen, Flora State University, 2007 Position of a Parser in the Compiler Model 2 Source Program Lexical Analyzer Lexical error oken, tokenval Get next token Parser and rest of front-end Syntax error Semantic error Intermediate representation Symbol able 3 he Parser A parser implements a C-F grammar he role of the parser is twofold: 1. o check syntax (= string recognizer) And to report syntax errors accurately 2. o invoke semantic actions For static semantics checking, e.g. type checking of expressions, functions, etc. For syntax-directed translation of the source code to an intermediate representation 1

4 Syntax-Directed ranslation One of the major roles of the parser is to produce an intermediate representation (IR) of the source program using syntax-directed translation methods Possible IR output: Abstract syntax trees (ASs) Control-flow graphs (CFGs) with triples, three-address code, or register transfer list notation WHIRL (SGI Pro64 compiler) has 5 IR levels! 5 rror Handling A good compiler should assist in entifying and locating errors Lexical errors: important, compiler can easily recover and continue Syntax errors: most important for compiler, can almost always recover Static semantic errors: important, can sometimes recover Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required Logical errors: hard or impossible to detect 6 Prefix Viable-Prefix Property he viable-prefix property of LL/LR parsers allows early detection of syntax errors Goal: detection of an error as soon as possible without further consuming unnecessary input How: detect an error as soon as the prefix of the input does not match a prefix of any string in the language rror is detected here for (;) Prefix rror is detected here DO 10 I = 1;0 2

7 rror Recovery Strategies Panic mode Discard input until a token in a set of designated ronizing tokens is found Phrase-level recovery Perform local correction on the input to repair the error rror productions Augment grammar with productions for erroneous constructs Global correction Choose a minimal sequence of changes to obtain a global least-cost correction 8 Grammars (Recap) Context-free grammar is a 4-tuple G = (N,, P, S) where is a finite set of tokens (terminal symbols) N is a finite set of nonterminals P is a finite set of productions of the form α β where α (N )* N (N )* and β (N )* S N is a designated start symbol 9 Notational Conventions Used erminals a,b,c, specific terminals: 0, 1,, Nonterminals A,B,C, N specific nonterminals: expr, term, stmt Grammar symbols X,Y,Z (N ) Strings of terminals u,v,w,x,y,z * Strings of grammar symbols α,β,γ (N )* 3

10 Derivations (Recap) he one-step derivation is defined by α A β α γ β where A γ is a production in the grammar In addition, we define is leftmost lm if α does not contain a nonterminal is rightmost rm if β does not contain a nonterminal ransitive closure * (zero or more steps) Positive closure (one or more steps) he language generated by G is defined by L(G) = {w * S w} 11 Derivation (xample) Grammar G = ({}, {,*,(,),-,}, P, ) with productions P = * ( ) - xample derivations: - - rm rm rm * * * Chomsky Hierarchy: Language Classification A grammar G is sa to be Regular if it is right linear where each production is of the form A w B or A w or left linear where each production is of the form A B w or A w Context free if each production is of the form A α where A N and α (N )* Context sensitive if each production is of the form α A β α γ β where A N, α,γ,β (N )*, γ > 0 Unrestricted 12 4

13 Chomsky Hierarchy L(regular) L(context free) L(context sensitive) L(unrestricted) Where L() = { L(G) G is of type } hat is: the set of all languages generated by grammars G of type xamples: very finite language is regular! (construct a FSA for strings in L(G)) L 1 = { a n b n n 1 } is context free L 2 = { a n b n c n n 1 } is context sensitive 14 Parsing Universal (any C-F grammar) Cocke-Younger-Kasimi arley op-down (C-F grammar with restrictions) Recursive descent (predictive parsing) LL (Left-to-right, Leftmost derivation) methods Bottom-up (C-F grammar with restrictions) Operator precedence parsing LR (Left-to-right, Rightmost derivation) methods SLR, canonical LR, LALR 15 op-down Parsing LL methods (Left-to-right, Leftmost derivation) and recursive-descent parsing Grammar: ( ) - Leftmost derivation: lm lm lm 5

16 Left Recursion (Recap) Productions of the form A A α β γ are left recursive When one of the productions in a grammar is left recursive then a predictive parser loops forever on certain inputs General Left Recursion limination Method 17 Arrange the nonterminals in some order A 1, A 2,, A n for i = 1,, n do for j = 1,, i-1 do replace each A i A j γ with A i δ 1 γ δ 2 γ δ k γ where A j δ 1 δ 2 δ k enddo eliminate the immediate left recursion in A i enddo Immediate Left-Recursion limination Method 18 Rewrite every left-recursive production A A α β γ A δ into a right-recursive production: A β A R γ A R A R α A R δ A R ε 6

xample Left Recursion lim. A B C a B C A A b C A B C C a Choose arrangement: A, B, C 19 i = 1: i = 2, j = 1: i = 3, j = 1: i = 3, j = 2: nothing to do B C A A b B C A B C b a b (imm) B C A B R a b B R B R C b B R ε C A B C C a C B C B a B C C a C B C B a B C C a C C A B R C B a b B R C B a B C C a (imm) C a b B R C B C R a B C R a C R C R A B R C B C R C C R ε 20 Left Factoring When a nonterminal has two or more productions whose right-hand ses start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing Replace productions A α β 1 α β 2 α β n γ with A α A R γ A R β 1 β 2 β n 21 Predictive Parsing liminate left recursion from grammar Left factor the grammar Compute FIRS and FOLLOW wo variants: Recursive (recursive calls) Non-recursive (table-driven) 7

22 FIRS (Revisited) FIRS(α) = { the set of terminals that begin all strings derived from α } FIRS(a) = {a} if a FIRS(ε) = {ε} FIRS(A) = A α FIRS(α) for A α P FIRS(X 1 X 2 X k ) = if for all j = 1,, i-1 : ε FIRS(X j ) then add non-ε in FIRS(X i ) to FIRS(X 1 X 2 X k ) if for all j = 1,, k : ε FIRS(X j ) then add ε to FIRS(X 1 X 2 X k ) FOLLOW FOLLOW(A) = { the set of terminals that can immediately follow nonterminal A } FOLLOW(A) = for all (B α A β) P do add FIRS(β)\{ε} to FOLLOW(A) for all (B α A β) P and ε FIRS(β) do add FOLLOW(B) to FOLLOW(A) for all (B α A) P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add to FOLLOW(A) 23 LL(1) Grammar A grammar G is LL(1) if it is not left recursive and for each collection of productions A α 1 α 2 α n for nonterminal A the following holds: 1. FIRS(α i ) FIRS(α j ) = for all i j 2. if α i * ε then 2.a. α j * ε for all i j 2.b. FIRS(α j ) FOLLOW(A) = for all i j 24 8

25 Non-LL(1) xamples Grammar S S a a S a S a S a R ε R S ε S a R a R S ε Not LL(1) because: Left recursive FIRS(a S) FIRS(a) For R: S * ε and ε * ε For R: FIRS(S) FOLLOW(R) Recursive Descent Parsing (Recap) 26 Grammar must be LL(1) very nonterminal has one (recursive) procedure responsible for parsing the nonterminal s syntactic category of input tokens When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information Using FIRS and FOLLOW to Write a Recursive Descent Parser 27 expr term rest rest term rest - term rest ε term procedure rest(); begin if lookahead in FIRS( term rest) then match( ); term(); rest() else if lookahead in FIRS(- term rest) then match( - ); term(); rest() else if lookahead in FOLLOW(rest) then return else error() end; FIRS( term rest) = { } FIRS(- term rest) = { - } FOLLOW(rest) = { } 9

Non-Recursive Predictive Parsing: able-driven Parsing Given an LL(1) grammar G = (N,, P, S) construct a table M[A,a] for A N, a and use a driver program with a stack 28 input a b stack X Y Z Predictive parsing program (driver) Parsing table M output Constructing an LL(1) Predictive Parsing able 29 for each production A α do for each a FIRS(α) do add A α to M[A,a] enddo if ε FIRS(α) then for each b FOLLOW(A) do add A α to M[A,b] enddo endif enddo Mark each undefined entry in M error xample able R R ε R * F R ε F ( ) A α F ( ) FIRS(α) ε ( FOLLOW(A) ( ) R R ) ε ) ( ) R * F R * ) ) * ) * ) 30 * ( ) R R R R R * F R F F ( ) 10

Ambiguous grammar S i t S S R a S R e S ε b rror: duplicate table entry S S R a S a LL(1) Grammars are Unambiguous b b e S R ε S R e S A α S i t S S R S a S R e S S R ε b i S i t S S R FIRS(α) i a e ε b 31 FOLLOW(A) e e e e t t S R ε Predictive Parsing Program (Driver) push() push(s) a := lookahead repeat X := pop() if X is a terminal or X = then match(x) // moves to next token and a := lookahead else if M[X,a] = X Y 1 Y 2 Y k then push(y k, Y k-1,, Y 2, Y 1 ) // such that Y 1 is on top else endif until X = invoke actions and/or produce IR output error() 32 xample able-driven Parsing 33 Stack R R R F R R R R R R R R R F R R R R R R F* R R F R R R R R Input * * * * * * * * * * * * Production applied R R R * F R 11

34 Panic Mode Recovery Add ronizing actions to undefined entries based on FOLLOW Pro: Can be automated Cons: rror messages are needed FOLLOW() = { ) } FOLLOW( R ) = { ) } FOLLOW() = { ) } FOLLOW( R ) = { ) } FOLLOW(F) = { * ) } R R F R R * R * F R ( F ( ) ) : the driver pops current nonterminal A and skips input till token or skips input until one of FIRS(A) is found Phrase-Level Recovery 35 Change input stream by inserting missing tokens For example: is changed into * Pro: Can be automated Cons: Recovery not always intuitive Can then continue here R R F insert * R R * R * F R ( F ( ) ) insert *: driver inserts missing * and retries the production 36 rror Productions R R ε R * F R ε F ( ) Add error production : R F R to ignore missing *, e.g.: Pro: Powerful recovery method Cons: Cannot be automated R R F R F R R R * R * F R ( F ( ) ) 12