Compiling Techniques

Similar documents
Compiling Techniques

Compiling Techniques

Syntactic Analysis. Top-Down Parsing

CS 314 Principles of Programming Languages

CS415 Compilers Syntax Analysis Top-down Parsing

Compiling Techniques

Top-Down Parsing and Intro to Bottom-Up Parsing

Compiling Techniques

CSE302: Compiler Design

Syntax Analysis Part I

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Syntax Analysis Part III

INF5110 Compiler Construction

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lent Term 2015 Lectures (of 16)

Computer Science 160 Translation of Programming Languages

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Syntax Analysis Part I

Recursive descent for grammars with contexts

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Compiler construction

The Parser. CISC 5920: Compiler Construction Chapter 3 Syntactic Analysis (I) Grammars (cont d) Grammars

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Course Script INF 5110: Compiler construction

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Introduction to Bottom-Up Parsing

Compiler Principles, PS4

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Introduction to Bottom-Up Parsing

LL(1) Grammar and parser

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

Context-Free Grammars (and Languages) Lecture 7

THEORY OF COMPILATION

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CA Compiler Construction

Exercises. Exercise: Grammar Rewriting

CS153: Compilers Lecture 5: LL Parsing

Context free languages

INF5110 Compiler Construction

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Chapter 4: Bottom-up Analysis 106 / 338

Syntax Analysis - Part 1. Syntax Analysis

Context Free Grammars

CONTEXT FREE GRAMMAR AND

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Announcement: Midterm Prep

Compiler Construction Lectures 13 16

From EBNF to PEG. Roman R. Redziejowski. Giraf s Research

Creating a Recursive Descent Parse Table

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

1. For the following sub-problems, consider the following context-free grammar: S A$ (1) A xbc (2) A CB (3) B yb (4) C x (6)

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Lecture Notes on Inductive Definitions

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

Lecture Notes on Inductive Definitions

Compiler Principles, PS7

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Introduction to ANTLR (ANother Tool for Language Recognition)

CS 406: Bottom-Up Parsing

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Intro to Theory of Computation

Top-Down Parsing, Part II

Parsing VI LR(1) Parsers

CS20a: summary (Oct 24, 2002)

Simplification and Normalization of Context-Free Grammars

Parsing -3. A View During TD Parsing

1. For the following sub-problems, consider the following context-free grammar: S AB$ (1) A xax (2) A B (3) B yby (5) B A (6)

Compila(on* **0368/3133*(Semester*A,*2013/14)*

From EBNF to PEG. Roman R. Redziejowski

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Computational Models - Lecture 3

Computer Science 160 Translation of Programming Languages

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

From EBNF to PEG. Roman R. Redziejowski. Concurrency, Specification and Programming Berlin 2012

Context-Free Grammars and Languages

Bottom-Up Syntax Analysis

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Parsing with Context-Free Grammars

Plan for Today and Beginning Next week (Lexical Analysis)

Context-Free Languages (Pre Lecture)

Einführung in die Computerlinguistik

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

CS415 Compilers Syntax Analysis Bottom-up Parsing

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

Lecture 11 Context-Free Languages

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412

FLAC Context-Free Grammars

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Pushdown Automata: Introduction (2)

Computational Models - Lecture 4 1

Compiler Design. Spring Syntactic Analysis. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Transcription:

Lecture 5: Top-Down Parsing 26 September 2017

The Parser Context-Free Grammar (CFG) Lexer Source code Scanner char Tokeniser token Parser AST Semantic Analyser AST IR Generator IR Errors Checks the stream of words/tokens produced by the lexer for grammatical correctness Determine if the input is syntactically well formed Guides checking at deeper levels than syntax Used to build an IR representation of the code

Table of contents Context-Free Grammar (CFG) 1 Context-Free Grammar (CFG) Definition RE to CFG 2 Main idea Writing a Parser Left Recursion 3 Need for lookahead LL(1) property LL(K)

Definition RE to CFG Specifying syntax with a grammar Use Context-Free Grammar (CFG) to specify syntax Contex-Free Grammar definition A Context-Free Grammar G is a quadruple (S, N, T, P) where: S is a start symbol N is a set of non-terminal symbols T is a set of terminal symbols or words P is a set of production or rewrite rules where only a single non-terminal is allowed on the left-hand side P : N (N T )

Definition RE to CFG From Regular Expression to Context-Free Grammar Kleene closure A : replace A to A rep in all production rules and add A rep = A A rep ɛ as a new production rule Positive closure A + : replace A + to A rep in all production rules and add A rep = A A rep A as a new production rule Option [A]: replace [A] to A opt in all production rules and add A opt = A ɛ as a new production rule

Definition RE to CFG Example: function call f u n c a l l ::= IDENT ( [ IDENT (, IDENT) ] ) after removing the option: f u n c a l l ::= IDENT ( a r g l i s t ) a r g l i s t ::= IDENT (, IDENT) ɛ after removing the closure: f u n c a l l ::= IDENT ( a r g l i s t ) a r g l i s t ::= IDENT a r g r e p ɛ a r g r e p ::=, IDENT a r g r e p ɛ

Main idea Writing a Parser Left Recursion Steps to derive a syntactic analyser for a context free grammar expressed in an EBNF style: convert all the regular expressions as seen; Implement a function for each non-terminal symbol A. This function recognises sentences derived from A; Recursion in the grammar corresponds to recursive calls of the created functions. This technique is called recursive-descent parsing or predictive parsing.

Main idea Writing a Parser Left Recursion Parser class (pseudo-code) Token c u r r e n t T o k e n ; void e r r o r ( TokenClass... e x p e c t e d ) {/... / boolean a c c e p t ( TokenClass... e x p e c t e d ) { r e t u r n ( c u r r e n t T o k e n e x p e c t e d ) ; Token e x p e c t ( TokenClass... e x p e c t e d ) { Token token = c u r r e n t T o k e n ; i f ( a c c e p t ( e x p e c t e d ) ) { nexttoken ( ) ; // m o d i f i e s c u r r e n t T o k e n r e t u r n token ; e l s e e r r o r ( e x p e c t e d ) ;

Main idea Writing a Parser Left Recursion CFG for function call f u n c a l l ::= IDENT ( a r g l i s t ) a r g l i s t ::= IDENT a r g r e p ɛ a r g r e p ::=, IDENT a r g r e p ɛ Recursive-Descent Parser void p a r s e F u n C a l l ( ) { e x p e c t (IDENT ) ; e x p e c t (LPAR ) ; p a r s e A r g L i s t ( ) ; e x p e c t (RPAR ) ; void p a r s e A r g L i s t ( ) { i f ( a c c e p t (IDENT ) ) { nexttoken ( ) ; parseargrep ( ) ; void parseargrep ( ) { i f ( a c c e p t (COMMA) ) { nexttoken ( ) ; e x p e c t (IDENT ) ; parseargrep ( ) ;

Be aware of infinite recursion! Main idea Writing a Parser Left Recursion Left Recursion E ::= E + T T The parser would recurse indefinitely! Luckily, we can transform this grammar to: E ::= T ( + T)

Removing Left Recursion Main idea Writing a Parser Left Recursion You can use the following rule to remove left recursion: A Aα 1 Aα 2... Aα m β 1 β 2... β n where first(β i ) first(a) = and ε / first(α i ) can be rewritten into: A β 1 A β 2 A... β n A A α 1 A α 2 A... α m A ε Hint: Use this to deal with arrayaccess and fieldaccess for the coursework

Need for lookahead LL(1) property LL(K) Consider the following bit of grammar stmt ::= a s s i g n ; f u n c a l l ; f u n c a l l ::= IDENT ( a r g l i s t ) a s s i g n ::= IDENT = l e x p void p a r s e A s s i g n ( ) { e x p e c t (IDENT ) ; e x p e c t (EQ ) ; p a r s e L e x p ( ) ; void parsestmt ( ) {??? void p a r s e F u n C a l l ( ) { e x p e c t (IDENT ) ; e x p e c t (LPAR ) ; p a r s e A r g L i s t ( ) ; e x p e c t (RPAR ) ; If the parser picks the wrong production, it may have to backtrack. Alternative is to look ahead to pick the correct production.

Need for lookahead LL(1) property LL(K) How much lookahead is needed? In general, an arbitrarily large amount Fortunately: Large subclasses of CFGs can be parsed with limited lookahead Most programming language constructs fall in those subclasses Among the interesting subclasses are LL(1) grammars. LL(1) Left-to-Right parsing; Leftmost derivation; (i.e. apply production for leftmost non-terminal first) 1 symbol lookahead.

Need for lookahead LL(1) property LL(K) Basic idea: given A α β, the parser should be able to choose between α and β. First sets For some symbol α N T, define First(α) as the set of symbols that appear first in some string that derives from α: x First(α) iif α xγ, for some γ The LL(1) property: if A α and A β both appear in the grammar, we would like: First(α) First(β) = This would allow the parser to make the correct choice with a lookahead of exactly one symbol! (almost, see next slide!)

Need for lookahead LL(1) property LL(K) What about ɛ-productions (the ones that consume no symbols)? If A α and A β and ɛ First(α), then we need to ensure that First(β) is disjoint from Follow(α). Follow(α) is the set of all terminal symbols in the grammar that can legally appear immediately after α. (See EaC 3.3 for details on how to build the First and Follow sets.) Let s define First + (α) as: First(α) Follow(α), if ɛ First(α) First(α) otherwise LL(1) grammar A grammar is LL(1) iff A α and B β implies: First + (α) First + (β) =

Need for lookahead LL(1) property LL(K) Given a grammar that has the LL(1) property: each non-terminal symbols appearing on the left hand side is recognised by a simple routine; the code is both simple and fast. Predictive Parsing Grammar with the LL(1) property are called predictive grammars because the parser can predict the correct expansion at each point. Parsers that capitalise on the LL(1) property are called predictive parsers. One kind of predictive parser is the recursive descent parser.

Need for lookahead LL(1) property LL(K) Sometimes, we might need to lookahead one or more tokens. LL(2) Grammar Example stmt ::= a s s i g n ; f u n c a l l ; f u n c a l l ::= IDENT ( a r g l i s t ) a s s i g n ::= IDENT = l e x p void parsestmt ( ) { i f ( a c c e p t (IDENT ) ) { i f ( lookahead ( 1 ) == LPAR) p a r s e F u n C a l l ( ) ; e l s e i f ( lookahead ( 1 ) == EQ) p a r s e A s s i g n ( ) ; e l s e e r r o r ( ) ; e l s e e r r o r ( ) ;

Next lecture Context-Free Grammar (CFG) Need for lookahead LL(1) property LL(K) More about LL(1) & LL(k) languages and grammars Dealing with ambiguity Left-factoring Bottom-up parsing