Compiling Techniques

Similar documents
Compiling Techniques

Compiling Techniques

Compiling Techniques

Compiling Techniques

INF5110 Compiler Construction

CS 314 Principles of Programming Languages

CFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.

Compiling Techniques

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

CSCI Compiler Construction

Syntax Directed Transla1on

Syntactic Analysis. Top-Down Parsing

CS415 Compilers Syntax Analysis Top-down Parsing

2.4 Parsing. Computer Science 332. Compiler Construction. Chapter 2: A Simple One-Pass Compiler : Parsing. Top-Down Parsing

Context-Free Grammars (and Languages) Lecture 7

INF5110 Compiler Construction

Introduction to ANTLR (ANother Tool for Language Recognition)

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lent Term 2015 Lectures (of 16)

Bottom-Up Syntax Analysis

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CA Compiler Construction

Top-Down Parsing and Intro to Bottom-Up Parsing

Parsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario

Properties of Context-Free Languages

Computer Science 160 Translation of Programming Languages

Recursive descent for grammars with contexts

Syntax Analysis Part I

Syntax Analysis Part I

THEORY OF COMPILATION

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Announcement: Midterm Prep

Algebraic Dynamic Programming

Predictive parsing as a specific subclass of recursive descent parsing complexity comparisons with general parsing

LL(1) Grammar and parser

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

Structure. Background. them to their higher order equivalents. functionals in Standard ML source code and converts

Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Specification of Chemical Formulæ in XL with Operator Overloading

Bottom-Up Parsing. Ÿ rm E + F *idÿ rm E +id*idÿ rm T +id*id. Ÿ rm F +id*id Ÿ rm id + id * id

CONTEXT FREE GRAMMAR AND

Introduction to Computational Linguistics

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

Syntax Analysis - Part 1. Syntax Analysis

Syntax Analysis Part III

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

COSE312: Compilers. Lecture 17 Intermediate Representation (2)

Compiler construction

Course Script INF 5110: Compiler construction

Follow sets. LL(1) Parsing Table

Context-Free Grammars: Normal Forms

Pushdown Automata: Introduction (2)

Lecture 11 Context-Free Languages

Intro to Theory of Computation

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

The Life Cycle of Grammarware. CWI Scientific Meeting Vadim Zaytsev, SWAT, CWI 2012

EDA045F: Program Analysis LECTURE 10: TYPES 1. Christoph Reichenbach

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

CYK Algorithm for Parsing General Context-Free Grammars

Introduction to Theory of Computing

[4203] Compiler Theory Sheet # 3-Answers

Introduction to Bottom-Up Parsing

Computing if a token can follow

Meta-programming & you

Even More on Dynamic Programming

Declarative Programming Techniques

Automata Theory CS F-08 Context-Free Grammars

Notes for Comp 497 (Comp 454) Week 10 4/5/05

Lambda Calculus! Gunnar Gotshalks! LC-1

Declarative Programming Techniques

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

Pushdown Automata (Pre Lecture)

1. For the following sub-problems, consider the following context-free grammar: S AA$ (1) A xa (2) A B (3) B yb (4)

Functional Big-step Semantics

From EBNF to PEG. Roman R. Redziejowski. Concurrency, Specification and Programming Berlin 2012

Finite Automata and Formal Languages TMV026/DIT321 LP Useful, Useless, Generating and Reachable Symbols

5/10/16. Grammar. Automata and Languages. Today s Topics. Grammars Definition A grammar G is defined as G = (V, T, P, S) where:

Introduction and Motivation. Introduction and Motivation. Introduction to Computability. Introduction and Motivation. Theory. Lecture5: Context Free

n Top-down parsing vs. bottom-up parsing n Top-down parsing n Introduction n A top-down depth-first parser (with backtracking)

COMP 3161/9161 Week 2

d(ν) = max{n N : ν dmn p n } N. p d(ν) (ν) = ρ.

Computer Science 160 Translation of Programming Languages

Chomsky Normal Form for Context-Free Gramars

First Class Traversal Idioms with Nested Attribute Grammars

Introduction to Bottom-Up Parsing

Mathematical Foundations of Programming. Nicolai Kraus. Draft of February 15, 2018

Context Free Grammars

Accumulators. A Trivial Example in Oz. A Trivial Example in Prolog. MergeSort Example. Accumulators. Declarative Programming Techniques

Introduction to Bottom-Up Parsing

Parsing. Unger s Parser. Introduction (1) Unger s parser [Grune and Jacobs, 2008] is a CFG parser that is

Context-free Grammars and Languages

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam

Lecture Notes on Inductive Definitions

1. For the following sub-problems, consider the following context-free grammar: S A$ (1) A xbc (2) A CB (3) B yb (4) C x (6)

From EBNF to PEG. Roman R. Redziejowski. Giraf s Research

Transcription:

Lecture 7: Abstract Syntax 13 October 2015

Table of contents Syntax Tree 1 Syntax Tree Semantic Actions Examples Abstract Grammar 2 Internal Representation AST Builder 3 Visitor Processing

Semantic Actions Examples Abstract Grammar A parser does more than simply recognising syntax. It can: evaluate code (interpreter) emit code (simple compiler) build an internal representation of the program (multi-pass compiler) In general, a parser performs semantic actions: recursive descent parser: integrate the actions with the parsing functions bottom-up parser (automatically generated): add actions to the grammar

Syntax Tree Semantic Actions Examples Abstract Grammar In a multi-pass compiler, the parser builds a syntax tree which is used by the subsequent passes A syntax tree can be: a concrete syntax tree (or parse tree) if it directly corresponds to the context-free grammar an abstract syntax tree if it corresponds to a simplified (or abstract) grammar The abstract syntax tree (AST) is usually used in compilers.

Semantic Actions Examples Abstract Grammar Example: Concrete Syntax Tree (Parse Tree) Example: CFG for arithmetic expressions (EBNF form) Expr ::= Term ( ( + ) Term ) Term ::= F a c t o r ( ( / ) F a c t o r ) F a c t o r ::= number ( Expr ) After removal of EBNF syntax Expr :: = Term Terms Terms ::= ( + ) Term Terms ɛ Term ::= F a c t o r F a c t o r s F a c t o r s ::= ( / ) F a c t o r F a c t o r s ɛ F a c t o r ::= number ( Expr ) After further simplification Expr ::= Term (( + ) Expr ɛ) Term ::= F a c t o r ( ( / ) Term ɛ) F a c t o r ::= number ( Expr )

Semantic Actions Examples Abstract Grammar Example: Concrete Syntax Tree (Parse Tree) CFG for arithmetic expressions Expr ::= Term (( + ) Expr ɛ) Term ::= F a c t o r ( ( / ) Term ɛ) F a c t o r ::= number ( Expr ) Concrete Syntax Tree for 5 3 Term Term Factor Factor number The concrete syntax tree contains a lot of unnecessary information. number 3 5

Semantic Actions Examples Abstract Grammar It is possible to simplify the concrete syntax tree to remove the redundant information. For instance parenthesis are not necessary. Exercise 1 Write the concrete syntax tree for 3 (4 + 5) 2 Simplify the tree.

Abstract Grammar Syntax Tree Semantic Actions Examples Abstract Grammar These simplifications leads to a new simpler context-free grammar caller Abstract Grammar. Example: abstract grammar for arithmetic expressions Expr ::= BinOp i n t L i t e r a l BinOp :: = Expr Op Expr Op ::= add sub mul d i v 5 3 BinOp intliteral(5) intliteral(3) mul This is called an

Semantic Actions Examples Abstract Grammar Example: abstract grammar for arithmetic expressions Expr ::= BinOp i n t L i t e r a l BinOp :: = Expr Op Expr Op ::= add sub mul d i v Note that for given concrete grammar, there exist numerous abstract grammar: Expr ::= AddOp SubOp MulOp DivOp i n t L i t e r a l AddOp :: = Expr add Expr SubOp :: = Expr sub Expr MulOp :: = Expr mul Expr DivOp ::= Expr d i v Expr We pick the most suitable grammar for the compiler.

Syntax Tree Internal Representation AST Builder The (AST) forms the main intermediate representation of the compiler s front-end. For each non-terminal or terminal in the abstract grammar, we define a class. If a non-terminal has any alternative on the rhs (right hand side), then the class is abstract (cannot instantiate it). The terminal or non-terminal appearing on the rhs are subclasses of the non-terminal on the lhs. The sub-trees are represented as instance variable in the class. Each non-abstract class has a unique constructor. If a terminal does not store any information, then we can use an Enum type in Java instead of a class.

Internal Representation AST Builder Example: abstract grammar for arithmetic expressions Expr ::= BinOp i n t L i t e r a l BinOp :: = Expr Op Expr Op ::= add sub mul d i v Corresponding Java Classes a b s t r a c t c l a s s Expr { c l a s s I n t L i t e r a l extends Expr { i n t i ; I n t L i t e r a l ( i n t i ) {... c l a s s BinOp extends Expr { Op op ; Expr l h s ; Expr r h s ; BinOp (Op op, Expr l h s, Expr r h s ) {... enum Op {ADD, SUB, MUL, DIV

Internal Representation AST Builder CFG for arithmetic expressions Expr ::= Term (( + ) Expr ɛ) Term ::= F a c t o r ( ( / ) Term ɛ) F a c t o r ::= number ( Expr ) Current Parser (class) Expr p a r s e E x p r ( ) { parseterm ( ) ; i f ( a c c e p t (PLUS MINUS ) ) nexttoken ( ) ; p a r s e E x p r ( ) ; Expr parseterm ( ) { p a r s e F a c t o r ( ) ; i f ( a c c e p t (TIMES DIV ) ) nexttoken ( ) ; parseterm ( ) ; Expr p a r s e F a c t o r ( ) { i f ( a c c e p t (LPAR) ) p a r s e E x p r ( ) ; e x p e c t (RPAR ) ; e l s e e x p e c t (NUMBER) ;

Internal Representation AST Builder AST building (modified Parser) Current Parser v o i d p a r s e E x p r ( ) { parseterm ( ) ; i f ( a c c e p t (PLUS MINUS ) ) nexttoken ( ) ; p a r s e E x p r ( ) ; Expr p a r s e E x p r ( ) { Expr l h s = parseterm ( ) ; i f ( a c c e p t (PLUS MINUS ) ) Op op ; i f ( token == PLUS) op = ADD; e l s e // token == MINUS op = SUB ; nexttoken ( ) ; Expr r h s = p a r s e E x p r ( ) ; r e t u r n new BinOp ( op, l h s, r h s ) ; r e t u r n l h s ;

Internal Representation AST Builder AST building (modified Parser) Current Parser v o i d parseterm ( ) { p a r s e F a c t o r ( ) ; i f ( a c c e p t (TIMES DIV ) ) nexttoken ( ) ; parseterm ( ) ; Expr parseterm ( ) { Expr l h s = p a r s e F a c t o r ( ) ; i f ( a c c e p t (TIMES DIV ) ) Op op ; i f ( token == TIMES) op = MUL; e l s e // token == DIV op = DIV ; nexttoken ( ) ; Expr r h s = parseterm ( ) ; r e t u r n new BinOp ( op, l h s, r h s ) ; r e t u r n l h s ;

Internal Representation AST Builder AST building (modified Parser) Current Parser v o i d p a r s e F a c t o r ( ) { i f ( a c c e p t (LPAR ) ) p a r s e E x p r ( ) ; e x p e c t (RPAR ) ; e l s e e x p e c t (NUMBER) ; Expr p a r s e F a c t o r ( ) { i f ( a c c e p t (LPAR ) ) Expr e = p a r s e E x p r ( ) ; e x p e c t (RPAR ) ; r e t u r n e ; e l s e I n t L i t e r a l i l = parsenumber ( ) ; r e t u r n i l ; I n t L i t e r a l parsenumber ( ) { Token n = e x p e c t (NUMBER) ; i n t i = I n t e g e r. p a r s e I n t ( n. data ) ; r e t u r n new I n t L i t e r a l ( i ) ;

Compiler Pass Syntax Tree Visitor Processing AST pass An AST pass is an action that process the AST in a single traversal. An pass can for instance: assign a type to each node of the AST perform an optimisation generate code It is important to ensure that the different passes can access the AST in a flexible way. An inefficient solution would be to use instanceof to find the type of syntax node Example i f ( t r e e i n s t a n c e o f I n t L i t e r a l ) ( ( I n t L i t e r a l ) t r e e ). i ;

Visitor Processing Using this technique, a compiler pass is represented by a function f () in each of the AST classes. The method is abstract if the class is abstract To process an instance of an AST class e, we simply call e. f (). The exact behaviour will depends on the concrete class implementations Example for the arithmetic expression A pass to print the AST: String tostr () A pass to evaluate the AST: int eval ()

Visitor Processing a b s t r a c t c l a s s Expr { a b s t r a c t S t r i n g t o S t r ( ) ; a b s t r a c t i n t e v a l ( ) ; c l a s s I n t L i t e r a l extends Expr { i n t i ; S t r i n g t o S t r ( ) { r e t u r n +i ; i n t e v a l ( ) { r e t u r n i ; c l a s s BinOp extends Expr { Op op ; Expr l h s ; Expr r h s ; S t r i n g t o S t r ( ) { r e t u r n l h s. t o S t r ( ) + op. name ( ) + r h s. t o S t r ( ) ; i n t e v a l ( ) { s w i t c h ( op ) { case ADD: l h s. e v a l ( ) + r h s. e v a l ( ) ; break ; case SUB : l h s. e v a l ( ) r h s. e v a l ( ) ; break ; case MUL: l h s. e v a l ( ) r h s. e v a l ( ) ; break ; case DIV : l h s. e v a l ( ) / r h s. e v a l ( ) ; break ;

Visitor Processing Main class c l a s s Main { v o i d main ( S t r i n g [ ] a r g s ) { Expr e x p r = E x p r P a r s e r. p a r s e ( s o m e i n p u t f i l e ) ; S t r i n g s t r = e x p r. t o S t r ( ) ; i n t r e s u l t = e x p r. e v a l ( ) ;

Visitor Processing Syntax Tree Visitor Processing With this technique, all the methods from a pass are grouped in a visitor. For this, need a language that implements single dispatch: the method is chosen based on the dynamic type of the object (the AST node) The visitor design pattern allows us to implement double dispatch, the method is chosen based on: the dynamic type of the object (the AST node) the dynamic type of the argument (the visitor) Note that if the language supports pattern matching, it is not needed to use a visitor since double-dispatch can be implemented more effectively.

Single vs. double dispatch Visitor Processing In Java: Single dispatch c l a s s A { void p r i n t ( ) { System. out. p r i n t ( A ) ; c l a s s B extends A { void p r i n t ( ) { System. out. p r i n t ( B ) ; A a = new A ( ) ; B b = new B ( ) ; a. p r i n t ( ) ; // o u t p u t s A b. p r i n t ( ) ; // o u t p u t s B

Single vs. double dispatch Visitor Processing In Java: Double dispatch (Java does not support double dispatch) c l a s s A { c l a s s B extends A { c l a s s P r i n t ( ) { void p r i n t (A a ) { System. out. p r i n t ( A ) ; void p r i n t (B b ) { System. out. p r i n t ( B ) ; A a = new A ( ) ; B b = new B ( ) ; A b2 = new B ( ) ; P r i n t p = new P r i n t ( ) ; p. p r i n t ( a ) ; // o u t p u t s A p. p r i n t ( b ) ; // o u t p u t s B p. p r i n t ( b2 ) ; // o u t p u t s A

Visitor Processing Visitor Interface i n t e r f a c e V i s i t o r <T> { T v i s i t I n t L i t e r a l ( I n t L i t e r a l i l ) ; T v i s i t B i n O p ( BinOp bo ) ; Modified AST classes a b s t r a c t c l a s s Expr { a b s t r a c t T a c c e p t ( V i s i t o r <T> v ) ; c l a s s I n t L i t e r a l extends Expr {... T a c c e p t ( V i s i t o r <T> v ) { r e t u r n v. v i s i t I n t L i t e r a l ( t h i s ) ; c l a s s BinOp extends Expr {... T a c c e p t ( V i s i t o r <T> v ) { r e t u r n v. v i s i t B i n O p ( t h i s ) ;

Visitor Processing ToStr Visitor ToStr implements V i s i t o r <S t r i n g > { S t r i n g v i s i t I n t L i t e r a l ( I n t L i t e r a l i l ) { r e t u r n + i l. i ; S t r i n g v i s i t B i n O p ( BinOp bo ) { r e t u r n bo. l h s. a c c e p t ( t h i s ) + bo. op. name ( ) + bo. r h s. a c c e p t ( t h i s Eval Visitor E v a l implements V i s i t o r <I n t e g e r > { I n t e g e r v i s i t I n t L i t e r a l ( I n t L i t e r a l i l ) { r e t u r n i l. i ; I n t e g e r v i s i t B i n O p ( BinOp bo ) { s w i t c h ( bo. op ) { case ADD: l h s. a c c e p t ( t h i s ) + r h s. a c c e p t ( t h i s ) ; break ; case SUB : l h s. a c c e p t ( t h i s ) r h s. a c c e p t ( t h i s ) ; break ; case MUL: l h s. a c c e p t ( t h i s ) r h s. a c c e p t ( t h i s ) ; break ; case DIV : l h s. a c c e p t ( t h i s ) / r h s. a c c e p t ( t h i s ) ; break ;

Visitor Processing Main class c l a s s Main { v o i d main ( S t r i n g [ ] a r g s ) { Expr e x p r = E x p r P a r s e r. p a r s e ( s o m e i n p u t f i l e ) ; S t r i n g s t r = e x p r. a c c e p t ( new ToStr ( ) ) ; i n t r e s u l t = e x p r. a c c e p t ( new E v a l ( ) ) ;

Extensibility Syntax Tree Visitor Processing With an AST, there can extensions in two dimensions: 1 Adding a new AST node For the object-oriented processing this means add a new sub-class In the case of the visitor, need to add a new method in every visitor 2 Adding a new pass For the object-oriented processing, this means adding a function in every single AST node classes For the visitor case, simply create a new visitor

Picking the right design Visitor Processing Facilitate extensibility: the object-oriented design makes it easy to add new type of AST node the visitor-based scheme makes it easy to write new passes Facilitate modularity: the object-oriented design allows for code and data to be stored in the AST node and be shared between phases (e.g., types) the visitor design allows for code and data to be shared among the methods of the same pass

Next lecture Syntax Tree Visitor Processing Context-sensitive Analysis