Compiling Techniques

Similar documents
Compiling Techniques

Compiling Techniques

Compiling Techniques

Syntactic Analysis. Top-Down Parsing

Compiling Techniques

Top-Down Parsing and Intro to Bottom-Up Parsing

CS 314 Principles of Programming Languages

CS415 Compilers Syntax Analysis Top-down Parsing

CSE302: Compiler Design

Syntax Analysis Part I

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

Syntax Analysis Part III

INF5110 Compiler Construction

Computer Science 160 Translation of Programming Languages

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lent Term 2015 Lectures (of 16)

The Parser. CISC 5920: Compiler Construction Chapter 3 Syntactic Analysis (I) Grammars (cont d) Grammars

CA Compiler Construction

Compiler construction

Syntax Analysis Part I

Recursive descent for grammars with contexts

Course Script INF 5110: Compiler construction

THEORY OF COMPILATION

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Introduction to Bottom-Up Parsing

Chapter 4: Bottom-up Analysis 106 / 338

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Introduction to Bottom-Up Parsing

Context Free Grammars

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Announcement: Midterm Prep

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing

CS153: Compilers Lecture 5: LL Parsing

LL(1) Grammar and parser

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Context free languages

Syntax Analysis: Context-free Grammars, Pushdown Automata and Parsing Part - 3. Y.N. Srikant

Compiler Principles, PS4

INF5110 Compiler Construction

Creating a Recursive Descent Parse Table

Parsing VI LR(1) Parsers

CONTEXT FREE GRAMMAR AND

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Computer Science 160 Translation of Programming Languages

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412

LR(1) Parsers Part III Last Parsing Lecture. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Syntax Analysis - Part 1. Syntax Analysis

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam

Pushdown Automata: Introduction (2)

Context-Free Grammars (and Languages) Lecture 7

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

An Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia

Context-Free Grammars and Languages. We have seen that many languages cannot be regular. Thus we need to consider larger classes of langs.

Exercises. Exercise: Grammar Rewriting

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Introduction to ANTLR (ANother Tool for Language Recognition)

Compiler Construction Lectures 13 16

Syntax Analysis, VI Examples from LR Parsing. Comp 412

Talen en Compilers. Johan Jeuring , period 2. October 29, Department of Information and Computing Sciences Utrecht University

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

From EBNF to PEG. Roman R. Redziejowski. Giraf s Research

Bottom-Up Syntax Analysis

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

CS20a: summary (Oct 24, 2002)

Compiler Principles, PS7

CS415 Compilers Syntax Analysis Bottom-up Parsing

Lecture Notes on Inductive Definitions

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Lecture Notes on Inductive Definitions

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

CS 406: Bottom-Up Parsing

From EBNF to PEG. Roman R. Redziejowski. Concurrency, Specification and Programming Berlin 2012

Context-Free Grammars and Languages

Parsing with Context-Free Grammars

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing -3. A View During TD Parsing

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Intro to Theory of Computation

Computational Models - Lecture 3

LR(1) Parsers Part II. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved.

Top-Down Parsing, Part II

Lecture 11 Context-Free Languages

Introduction to Computational Linguistics

Follow sets. LL(1) Parsing Table

Context-Free Languages (Pre Lecture)

Simplification and Normalization of Context-Free Grammars

Syntax Analysis, VII The Canonical LR(1) Table Construction. Comp 412 COMP 412 FALL Chapter 3 in EaC2e. source code. IR IR target.

Computability Theory

CS453 Intro and PA1 1

Bottom-up Analysis. Theorem: Proof: Let a grammar G be reduced and left-recursive, then G is not LL(k) for any k.

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Plan for Today and Beginning Next week (Lexical Analysis)

1. For the following sub-problems, consider the following context-free grammar: S AB$ (1) A xax (2) A B (3) B yby (5) B A (6)

Transcription:

Lecture 5: Top-Down Parsing 6 October 2015

The Parser Context-Free Grammar (CFG) Lexer Source code Scanner char Tokeniser token Parser AST Semantic Analyser AST IR Generator IR Errors Checks the stream of words/tokens produced by the lexer for grammatical correctness Determine if the input is syntactically well formed Guides checking at deeper levels than syntax Used to build an IR representation of the code

Table of contents Context-Free Grammar (CFG) 1 Context-Free Grammar (CFG) Definition RE to CFG 2 Main idea Parser interface Example 3 LL(1) property LL(K)

Definition RE to CFG Specifying syntax with a grammar Use Context-Free Grammar (CFG) to specify syntax Contex-Free Grammar definition A Context-Free Grammar G is a quadruple (S,N,T,P) where: S is a start symbol N is a set of non-terminal symbols T is a set of terminal symbols or words P is a set f production or rewrite rules where only a single non-terminal is allowed on the left-hand side (lhs) P : N (N T )

Definition RE to CFG From Regular Expression to Context-Free Grammar Kleene closure A : replace A to A rep in all production rules and add A rep = A A rep ɛ as a new production rule Positive closure A + : replace A + to A rep in all production rules and add A rep = A A rep A as a new production rule Option [A]: replace [A] to A opt in all production rules and add A opt = A ɛ as a new production rule

Definition RE to CFG Example: function call f u n c a l l ::= IDENT ( [ IDENT (, IDENT) ] ) after removing the option: f u n c a l l ::= IDENT ( a r g l i s t ) a r g l i s t ::= IDENT (, IDENT) ɛ after removing the closure: f u n c a l l ::= IDENT ( a r g l i s t ) a r g l i s t ::= IDENT a r g r e p ɛ a r g r e p ::=, IDENT a r g r e p ɛ

Main idea Parser interface Example To derive a syntactic analyser for a context free grammar express in an EBNF style: convert all the regular expressions as seen Implement a function for each non-terminal A. This function recognises sentences derived from A Recursion in the grammar corresponds to recursive calls of the created functions This technique is called recursive-descent parsing or predictive parsing.

Main idea Parser interface Example Parser class (pseudo-code) Token token ; // c u r r e n t token void e r r o r ( TokenClass... e x p e c t e d ) {... boolean a c c e p t ( TokenClass... e x p e c t e d ) { r e t u r n ( token e x p e c t e d ) ; Token e x p e c t ( TokenClass... e x p e c t e d ) { Token r e s u l t = token ; i f ( a c c e p t ( e x p e c t e d ) ) { nexttoken ( ) ; r e t u r n r e s u l t ; e l s e e r r o r ( e x p e c t e d ) ;

Main idea Parser interface Example CFG for function call f u n c a l l ::= IDENT ( a r g l i s t ) a r g l i s t ::= IDENT a r g r e p ɛ a r g r e p ::=, IDENT a r g r e p ɛ Recursive-Descent Parser void p a r s e F u n C a l l ( ) e x p e c t (IDENT ) ; e x p e c t (LPAR ) ; p a r s e A r g L i s t ( ) ; e x p e c t (RPAR ) ; void p a r s e A r g L i s t ( ) i f ( a c c e p t (IDENT ) ) { nexttoken ( ) ; parseargrep ( ) ; void parseargrep ( ) i f ( a c c e p t (COMMA) ) { nexttoken ( ) ; e x p e c t (IDENT ) ; parseargrep ( ) ;

Be aware of infinite recursion! Main idea Parser interface Example Left Recursion E ::= E + T T The parser would recurse indefinitely! Luckily, we can transform this grammar to: E ::= T ( + T)

Main idea Parser interface Example Consider the following bit of grammar stmt ::= a s s i g n ; f u n c a l l ; f u n c a l l ::= IDENT ( a r g l i s t ) a s s i g n ::= IDENT = l e x p void p a r s e A s s i g n ( ) { e x p e c t (IDENT ) ; e x p e c t (EQ ) ; p a r s e L e x p ( ) ; void parsestmt ( ) {??? void p a r s e F u n C a l l ( ) { e x p e c t (IDENT ) ; e x p e c t (LPAR ) ; p a r s e A r g L i s t ( ) ; e x p e c t (RPAR ) ; If it picks the wrong production, the parser may have to backtrack. Alternative is to look ahead in input to pick correctly.

LL(1) property LL(K) How much lookahead is needed? In general, an arbitrarily large amount Fortunately: Large subclasses of CFGs can be parsed with limited lookahead Most programming language constructs fall in those subclasses Among the interesting subclasses are LL(1) grammars. LL(1) Left-to-Right parsing; Leftmost derivation; 1 symbol lookahead.

LL(1) property LL(K) Basic idea: given A α β, the parser should be able to choose between α and β. First sets For some rhs α G, define First(α) as the set of tokens that appear as the first symbol in some sting that derives from α: x First(α) iif α xγ, for some γ The LL(1) property: if A α and A β both appear in the grammar, we would like: First(α) First(β) = This would allow the parser to make the correct choice with a lookahead of exactly one symbol! (almost, see next slide!)

LL(1) property LL(K) What about ɛ-productions (the ones that consume no token)? If A α and A β and ɛ First(α), then we need to ensure that First(β) is disjoint from Follow(α). Follow(α) is the set of all words in the grammar that can legally appear immediately after an α (see EaC 3.3 for details on how to build the First and Follow sets). Let s define First + (α) as: First(α) Follow(α), if ɛ First(α) First(α) otherwise LL(1) grammar A grammar is LL(1) iff A α and B β implies: First + (α) First + (β) =

LL(1) property LL(K) Given a grammar that has the LL(1) property: can write a simple routine to recognise each lhs code is both simple and fast Predictive Parsing Grammar with the LL(1) property are called predictive grammars because the parser can predict the correct expansion at each point. Parsers that capitalise on the LL(1) property are called predictive parsers. One kind of predictive parser is the recursive descent parser.

LL(1) property LL(K) Sometimes, we might need to lookahead one or more tokens. LL(2) Grammar Example stmt ::= a s s i g n ; f u n c a l l ; f u n c a l l ::= IDENT ( a r g l i s t ) a s s i g n ::= IDENT = l e x p void parsestmt ( ) { i f ( a c c e p t (IDENT ) ) { i f ( lookahead ( 1 ) == LPAR) p a r s e F u n C a l l ( ) ; e l s e i f ( lookahead ( 1 ) == EQ) p a r s e A s s i g n ( ) ; e l s e e r r o r ( ) ; e l s e e r r o r ( ) ;

Next lecture Context-Free Grammar (CFG) LL(1) property LL(K) More about LL(1) & LL(k) languages and grammars Dealing with ambiguity Left-factoring Bottom-up parsing