Compiling Techniques

Similar documents
Compiling Techniques

Instruction Selection

Register Allocation. Maryam Siahbani CMPT 379 4/5/2016 1

CSCI-564 Advanced Computer Architecture

Material Covered on the Final

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Automata Theory CS S-12 Turing Machine Modifications

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

Special Nodes for Interface

Binary addition example worked out

Measuring Goodness of an Algorithm. Asymptotic Analysis of Algorithms. Measuring Efficiency of an Algorithm. Algorithm and Data Structure

CA Compiler Construction

Quantum Computing Virtual Machine. Author: Alexandru Gheorghiu Scientific advisor: PhD. Lorina Negreanu

Register Alloca.on. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units

Computer Science Introductory Course MSc - Introduction to Java

- Why aren t there more quantum algorithms? - Quantum Programming Languages. By : Amanda Cieslak and Ahmana Tarin

Compilers. Lexical analysis. Yannis Smaragdakis, U. Athens (original slides by Sam

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Compiling Techniques

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Functional Big-step Semantics

Part I: Definitions and Properties

Parallelism and Machine Models

Computer Systems on the CERN LHC: How do we measure how well they re doing?

Multicore Semantics and Programming

EE 660: Computer Architecture Out-of-Order Processors

Compiler Design Spring 2017

CS 700: Quantitative Methods & Experimental Design in Computer Science

Lecture 3, Performance

Lecture 3, Performance

Unit 6: Branch Prediction

Lecture: Pipelining Basics

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

Compiling Techniques

Memory Elements I. CS31 Pascal Van Hentenryck. CS031 Lecture 6 Page 1

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k.

CSE 200 Lecture Notes Turing machine vs. RAM machine vs. circuits

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

EECS 579: Logic and Fault Simulation. Simulation

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Catie Baker Spring 2015

Stacks. Definitions Operations Implementation (Arrays and Linked Lists) Applications (system stack, expression evaluation) Data Structures 1 Stacks

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

COMPUTER-AIDED DESIGN FOR NEXT-GENERATION QUANTUM COMPUTING SYSTEMS

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman.

Formal Verification Methods 1: Propositional Logic

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

CSC 5170: Theory of Computational Complexity Lecture 4 The Chinese University of Hong Kong 1 February 2010

Digital Control of Electric Drives

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

CSE 373: Data Structures and Algorithms Pep Talk; Algorithm Analysis. Riley Porter Winter 2017

Divisible Load Scheduling

The L Machines are very high-level, in two senses:

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Engineering for Compatibility

1 ListElement l e = f i r s t ; / / s t a r t i n g p o i n t 2 while ( l e. next!= n u l l ) 3 { l e = l e. next ; / / next step 4 } Removal

COVER SHEET: Problem#: Points

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

CS470: Computer Architecture. AMD Quad Core

An Automotive Case Study ERTSS 2016

Cost/Performance Tradeoffs:

CMSC 313 Lecture 15 Good-bye Assembly Language Programming Overview of second half on Digital Logic DigSim Demo

Insert Sorted List Insert as the Last element (the First element?) Delete Chaining. 2 Slide courtesy of Dr. Sang-Eon Park

CMSC 330: Organization of Programming Languages. Pushdown Automata Parsing

On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work

Announcements. Project #1 grades were returned on Monday. Midterm #1. Project #2. Requests for re-grades due by Tuesday

CMSC 313 Lecture 16 Announcement: no office hours today. Good-bye Assembly Language Programming Overview of second half on Digital Logic DigSim Demo

DETERMINING THE VARIABLE QUANTUM TIME (VQT) IN ROUND ROBIN AND IT S IMPORTANCE OVER AVERAGE QUANTUM TIME METHOD

Automated design of floating-point logarithm functions on integer processors

CSE370: Introduction to Digital Design

Boolean circuits. Lecture Definitions

Computational Complexity

Numerical Methods Lecture 2 Simultaneous Equations

EECS150 - Digital Design Lecture 21 - Design Blocks

(a) Definition of TMs. First Problem of URMs

Clock-driven scheduling

HW8. Due: November 1, 2018

Informatique Fondamentale IMA S8

Equalities and Uninterpreted Functions. Chapter 3. Decision Procedures. An Algorithmic Point of View. Revision 1.0

Data Structures in Java

Lecture Overview SSA Maximal SSA Semipruned SSA o Placing -functions o Renaming Translation out of SSA Using SSA: Sparse Simple Constant Propagation

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

Lecture 11. Advanced Dividers

4. (3) What do we mean when we say something is an N-operand machine?

[2] Predicting the direction of a branch is not enough. What else is necessary?

CODE GENERATION REGISTER ALLOCATION. Goal. Interplay between. Translate intermediate code into target code

Lecture 13. Real-Time Scheduling. Daniel Kästner AbsInt GmbH 2013

Room Allocation Optimisation at the Technical University of Denmark

CMP N 301 Computer Architecture. Appendix C

UML. Design Principles.

Complexity: Some examples

Extensibility Patterns: Extension Access

Embedded Systems 23 BF - ES

ORI 390Q Models and Analysis of Manufacturing Systems First Exam, fall 1994

Transcription:

Lecture 11: Introduction to 13 November 2015

Table of contents 1 Introduction Overview The Backend The Big Picture 2 Code Shape

Overview Introduction Overview The Backend The Big Picture Source code FrontEnd Middle End BackEnd Machine Code Errors Front-end Lexer Parser AST builder Semantic Analyser Middle-end Optimizations (Compiler Optimisations course)

The Back end Introduction Overview The Backend The Big Picture Instruction Selection Register Allocation Instruction Scheduling Machine code Errors Translate into target machine code Choose instructions to implement each operation Decide which value to keep in registers Ensure conformance with system interfaces Automation has been less successful in the back end

Instruction Selection Introduction Overview The Backend The Big Picture Instruction Selection Register Allocation Instruction Scheduling Machine code Errors Mapping the into assembly code (in our case AST to Java ByteCode) Assumes a fixed storage mapping & code shape Combining operations, using address modes

Register Allocation Introduction Overview The Backend The Big Picture Instruction Selection Register Allocation Instruction Scheduling Machine code Errors Deciding which value reside in a register Minimise amount of spilling

Instruction Scheduling Introduction Overview The Backend The Big Picture Instruction Selection Register Allocation Instruction Scheduling Machine code Errors Avoid hardware stalls and interlocks Reordering operations to hide latencies Use all functional units productively Instruction scheduling is an optimisation Improves quality of the code. Not strictly required.

The Big Picture Introduction Overview The Backend The Big Picture How hard are these problems? Instruction selection Can make locally optimal choices, with automated tool Global optimality is NP-Complete Instruction scheduling Single basic block heuristic work quickly General problem, with control flow NP-Complete Register allocation Single basic block, no spilling, 1 register size linear time Whole procedure is NP-Complete (graph colouring algorithm) These three problems are tightly coupled! However, conventional wisdom says we lose little by solving these problems independently.

Overview The Backend The Big Picture How to solve these problems? Instruction selection Use some form of pattern matching Assume enough registers or target important values Instruction scheduling Within a block, list scheduling is close to optimal Across blocks, build framework to apply list scheduling Register allocation Start from virtual registers & map enough into k With targeting, focus on good priority heuristic Approximate solutions Will be important to define good metrics for close, good, enough,....

Generating Code for Register-Based Machine The key code quality issue is holding values in registers when can a value be safely allocated to a register? When only 1 name can reference its value Pointers, parameters, aggregate & arrays all cause trouble when should a value be allocated to a register? when it is both safe & profitable Encoding this knowledge into the assign a virtual register to anything that go into one load or store the others at each reference Register allocation is key All this relies on a strong register allocator.

Register-based machine Most real physical machine are register-based Instruction operates on registers. The number of architecture register available to the compiler can vary from processor to processors. The first phase of code generation usually assumes an unlimited numbers of registers (virtual registers). Later phases (register allocator) converts these virtual register to the finite set of available physical architectural registers (more on this in lecture on register allocation).

Generating Code for Register-Based Machine Memory x y Example: x+y l o a d I @x r1 // l o a d the a d d r e s s o f x i n t o r1 loada r1 r2 // now v a l u e o f x i s i n r2 l o a d I @y r3 // l o a d the a d d r e s s o f y i n t o r3 loada r3 r4 // now v a l u e o f y i s i n r4 add r2, r4 r5 // r5 c o n t a i n s x+y

Exercise Write down the list of assembly instructions for x+(y*3) Exercise Assuming you have an instruction muli (multiply immediate), rewrite the previous example. This illustrate the instruction selection problem (more on this in following lectures).

Visitor Implementation for binary operators Binary operators R e g i s t e r v i s i t B i n O p ( BinOp bo ) { R e g i s t e r l h s R e g = bo. l h s. a c c e p t ( t h i s ) ; R e g i s t e r l h s R e g = bo. r h s. a c c e p t ( t h i s ) ; R e g i s t e r r e s u l t = n e x t R e g i s t e r ( ) ; switch ( bo. op ) { case ADD: emit ( add l h s R e g. i d rhsreg. i d r e s u l t. i d ) ; break ; case MUL: emit ( mul l h s R e g. i d rhsreg. i d r e s u l t. i d ) ; break ;... } r e t u r n r e s u l t ; }

Visitor Implementation for variables x l o a d I @x r1 // l o a d the a d d r e s s o f x i n t o r1 loada r1 r2 // now v a l u e o f x i s i n r2 Var R e g i s t e r v i s i t V a r ( Var v ) { R e g i s t e r addrreg = n e x t R e g i s t e r ( ) ; R e g i s t e r r e s u l t = n e x t R e g i s t e r ( ) ; emit ( l o a d I v. a d d r e s s addrreg. i d ) ; emit ( loada addrreg. i d r e s u l t. i d ) ; r e t u r n r e s u l t ; }

Visitor Implementation for integer literals 3 l o a d I 3 r1 IntLiteral R e g i s t e r v i s i t I n t L i t e r a l ( I n t L i t e r a l i t ) { R e g i s t e r r e s u l t = n e x t R e g i s t e r ( ) ; emit ( l o a d I i t. v a l u e r e s u l t. i d ) ; r e t u r n r e s u l t ; }

Generating Code for Stack-Based Virtual Machine Stack-based machine don t use registers but instead perform computation using an operand stack (e.g., Java Bytecode, see previous lecture). Makes it easier to write a code generator, no need to handle register allocation. Producing the real register-based machine code is delayed to runtime (jit-compiler).

Local variables 0 1 x y Example: x+y using Java Bytecode i l o a d 0 // push x onto the s t a c k i l o a d 1 // push y onto the s t a c k i a d d // top o f s t a c k c o n t a i n s x+y

Binary operators Operand stack lhs value rhs value bo value... Void v i s i t B i n O p ( BinOp bo ) { bo. l h s. a c c e p t ( t h i s ) ; bo. r h s. a c c e p t ( t h i s ) ; switch ( bo. op ) { case ADD: emit ( i a d d ) ; break ; case MUL: emit ( i m ul ) ; break ;... } r e t u r n n u l l ; }

Identifiers and Integer Literals Void v i s i t V a r ( Var v ) { emit ( i l o a d v. l o c a l V a r I d ) ; } Void v i s i t I n t L i t e r a l ( I n t L i t e r a l i t ) { i n t v a l u e = i t. v a l u e ; i f ( i t. v a l u e < 32768) emit ( s i p u s h i t. v a l u e ) ; e l s e... } Exercise Unfortunately, Java ByteCode does not have an instruction for pushing full integer. Need to generate code for that. Complete the code above.

Next lecture: Code Shape Conditions Function calls Loops If statement