CSE P 501 Compilers. Value Numbering & Op;miza;ons Hal Perkins Winter UW CSE P 501 Winter 2016 S-1

Similar documents
Beyond Superlocal: Dominators, Data-Flow Analysis, and DVNT. COMP 506 Rice University Spring target code. source code OpJmizer

Lecture 10: Data Flow Analysis II

Lecture 11: Data Flow Analysis III

Lecture 4 Introduc-on to Data Flow Analysis

: Advanced Compiler Design Compu=ng DF(X) 3.4 Algorithm for inser=on of φ func=ons 3.5 Algorithm for variable renaming

Register Alloca.on. CMPT 379: Compilers Instructor: Anoop Sarkar. anoopsarkar.github.io/compilers-class

Scalar Optimisation Part 1

Construction of Static Single-Assignment Form

Generalizing Data-flow Analysis

Dataflow Analysis. A sample program int fib10(void) { int n = 10; int older = 0; int old = 1; Simple Constant Propagation

Predicate abstrac,on and interpola,on. Many pictures and examples are borrowed from The So'ware Model Checker BLAST presenta,on.

Lecture Overview SSA Maximal SSA Semipruned SSA o Placing -functions o Renaming Translation out of SSA Using SSA: Sparse Simple Constant Propagation

What is SSA? each assignment to a variable is given a unique name all of the uses reached by that assignment are renamed

Material Covered on the Final

Computing Static Single Assignment (SSA) Form

Unit 3: Ra.onal and Radical Expressions. 3.1 Product Rule M1 5.8, M , M , 6.5,8. Objec.ve. Vocabulary o Base. o Scien.fic Nota.

CSC D70: Compiler Optimization Static Single Assignment (SSA)

Lecture 5 Introduction to Data Flow Analysis

Polynomials and Gröbner Bases

ENEE350 Lecture Notes-Weeks 14 and 15

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

CSCI 1010 Models of Computa3on. Lecture 02 Func3ons and Circuits

Dataflow Analysis Lecture 2. Simple Constant Propagation. A sample program int fib10(void) {

CS 161: Design and Analysis of Algorithms

CMSC 631 Program Analysis and Understanding. Spring Data Flow Analysis

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on. Instructor: Wei-Min Shen

Probability and Structure in Natural Language Processing

MIT Loop Optimizations. Martin Rinard

Topic-I-C Dataflow Analysis

Bellman s Curse of Dimensionality

Tainted Flow Analysis on e-ssaform

n CS 160 or CS122 n Sets and Functions n Propositions and Predicates n Inference Rules n Proof Techniques n Program Verification n CS 161

ECE 5775 (Fall 17) High-Level Digital Design Automation. Control Flow Graph

Graphical Models. Lecture 10: Variable Elimina:on, con:nued. Andrew McCallum

Lesson 3-1: Solving Linear Systems by Graphing

Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, Deep Lunch

Parallelizing Gaussian Process Calcula1ons in R

Parameter Es*ma*on: Cracking Incomplete Data

CSE546: SVMs, Dual Formula5on, and Kernels Winter 2012

Speeding up characteristic 2: I. Linear maps II. The Å(Ò) game III. Batching IV. Normal bases. D. J. Bernstein University of Illinois at Chicago

Introduc)on to Ar)ficial Intelligence

Verifying the LLVM. Steve Zdancewic DeepSpec Summer School 2017

COSE312: Compilers. Lecture 17 Intermediate Representation (2)

Scalar Optimisation Part 2

Crystal Structures: Bulk and Slab Calcula3ons

Dataflow analysis. Theory and Applications. cs6463 1

Compiler Design. Data Flow Analysis. Hwansoo Han

Par$$oned Elias- Fano indexes. Giuseppe O)aviano Rossano Venturini

Introduction to data-flow analysis. Data-flow analysis. Control-flow graphs. Data-flow analysis. Example: liveness. Requirements

Dataflow Analysis. Chapter 9, Section 9.2, 9.3, 9.4

Data flow analysis. DataFlow analysis

Saturday, April 23, Dependence Analysis

Exact data mining from in- exact data Nick Freris

Probability and Structure in Natural Language Processing

CSC D70: Compiler Optimization Dataflow-2 and Loops

CSE 473: Ar+ficial Intelligence

Data Flow Analysis. Lecture 6 ECS 240. ECS 240 Data Flow Analysis 1

Unit 2. Projec.ons and Subspaces

Static Analysis of Programs: A Heap-Centric View

MA/CS 109 Lecture 7. Back To Exponen:al Growth Popula:on Models

Deriva'on of The Kalman Filter. Fred DePiero CalPoly State University EE 525 Stochas'c Processes

More belief propaga+on (sum- product)

Topics on Compilers

CSE 373: Data Structures and Algorithms Pep Talk; Algorithm Analysis. Riley Porter Winter 2017

Minimum Edit Distance. Defini'on of Minimum Edit Distance

Principles of AI Planning

DRAFT. Algebraic computation models. Chapter 14

Data-Flow Analysis. Compiler Design CSE 504. Preliminaries

Methodological Foundations of Biomedical Informatics (BMSC-GA 4449) Optimization

CS 243 Lecture 11 Binary Decision Diagrams (BDDs) in Pointer Analysis

DFA of non-distributive properties

Normaliza)on and Func)onal Dependencies

Bayesian Networks. Why Joint Distributions are Important. CompSci 570 Ron Parr Department of Computer Science Duke University

Adventures in Dataflow Analysis

The ultraviolet structure of N=8 supergravity at higher loops

IC3 and Beyond: Incremental, Inductive Verification

(b). Identify all basic blocks in your three address code. B1: 1; B2: 2; B3: 3,4,5,6; B4: 7,8,9; B5: 10; B6: 11; B7: 12,13,14; B8: 15;

Basics of Linear Temporal Proper2es

CSE 200 Lecture Notes Turing machine vs. RAM machine vs. circuits

Understanding IC3. Aaron R. Bradley. ECEE, CU Boulder & Summit Middle School. Understanding IC3 1/55

SAT-Based Verification with IC3: Foundations and Demands

: Compiler Design. 9.5 Live variables 9.6 Available expressions. Thomas R. Gross. Computer Science Department ETH Zurich, Switzerland

The Mysteries of Quantum Mechanics

Static Program Analysis

Compiling Techniques

State Space Representa,ons and Search Algorithms

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

Variable Elimination (VE) Barak Sternberg

Electricity & Magnetism Lecture 10: Kirchhoff s Rules

Lecture 2: Divide and conquer and Dynamic programming

Register Allocation. Maryam Siahbani CMPT 379 4/5/2016 1

Decidable and undecidable languages

Long-Short Term Memory and Other Gated RNNs

Least Square Es?ma?on, Filtering, and Predic?on: ECE 5/639 Sta?s?cal Signal Processing II: Linear Es?ma?on

Principles of AI Planning

CSE 331 Winter 2018 Reasoning About Code I

Precise Interprocedural Analysis using Random Interpretation

Spectral Algorithms for Graph Mining and Analysis. Yiannis Kou:s. University of Puerto Rico - Rio Piedras NSF CAREER CCF

CS 553 Compiler Construction Fall 2006 Homework #2 Dominators, Loops, SSA, and Value Numbering

Proofs Sec*ons 1.6, 1.7 and 1.8 of Rosen

Transcription:

CSE P 501 Compilers Value Numbering & Op;miza;ons Hal Perkins Winter 2016 UW CSE P 501 Winter 2016 S-1

Agenda Op;miza;on (Review) Goals Scope: local, superlocal, regional, global (intraprocedural), interprocedural Control flow graphs (reminder) Value numbering Dominators Ref.: Cooper/Torczon ch. 8 UW CSE P 501 Winter 2016 S-2

Code Improvement (1) Pick a bewer algorithm(!) Use machine resources efficiently Instruc;ons, registers More later UW CSE P 501 Winter 2016 S-3

Code Improvement (2) Local op;miza;ons basic blocks Algebraic simplifica;ons Constant folding Common subexpression elimina;on (i.e., redundancy elimina;on) Dead code elimina;on Specialize computa;on based on context etc., etc., UW CSE P 501 Winter 2016 S-4

Code Improvement (3) Global op;miza;ons Code mo;on Moving invariant computa;ons out of loops Strength reduc;on (replace mul;plica;ons by repeated addi;ons, for example) Global common subexpression elimina;on Global register alloca;on Many others UW CSE P 501 Winter 2016 S-5

Op;miza;on None of these improvements are truly op;mal Hard problems (in theory- of- computa;on sense) Proofs of op;mality assume ar;ficial restric;ons Best we can do is to improve things Most (much?) (some?) of the ;me Realis;cally: try to do bewer for common idioms both in the code and on the machine UW CSE P 501 Winter 2016 S-6

Op;miza;on Phase Goal Discover, at compile ;me, informa;on about the run;me behavior of the program, and use that informa;on to improve the generated code UW CSE P 501 Winter 2016 S-7

A First Running Example: Redundancy Elimina;on An expression x+y is redundant at a program point iff, along every path from the procedure s entry, it has been evaluated and its cons;tuent subexpressions (x and y) have not been redefined If the compiler can prove the expression is redundant: Can store the result of the earlier evalua;on Can replace the redundant computa;on with a reference to the earlier (stored) result UW CSE P 501 Winter 2016 S-8

Common PaWern for Code Improvement Typical for most compiler op;miza;ons First, discover opportuni;es through program analysis Then, modify the IR to take advantage of the opportuni;es Historically, goal usually was to decrease execu;on ;me Other possibili;es: reduce space, power, UW CSE P 501 Winter 2016 S-9

Issues (1) Safety transforma;on must not change program meaning Must generate correct results Can t generate spurious errors Op;miza;ons must be conserva;ve Large part of analysis goes towards proving safety Can pay off to speculate (be op;mis;c) but then need to recover if reality is different UW CSE P 501 Winter 2016 S-10

Issues (2) Profi;bility If a transforma;on is possible, is it profitable? Example: loop unrolling Can increase amount of work done on each itera;on, i.e., reduce loop overhead Can eliminate duplicate opera;ons done on separate itera;ons UW CSE P 501 Winter 2016 S-11

Issues (3) Downside risks Even if a transforma;on is generally worthwhile, need to think about poten;al problems For example: Transforma;on might need more temporaries, pukng addi;onal pressure on registers Increased code size could cause cache misses, or, in bad cases, increase page working set UW CSE P 501 Winter 2016 S-12

Example: Value Numbering Technique for elimina;ng redundant expressions: assign an iden;fying number VN(n) to each expression VN(x+y)=VN(j) if x+y and j have the same value Use hashing over value numbers for effeciency Old idea (Balke 1968, Ershov 1954) Invented for low- level, linear IRs Equivalent methods exist for tree IRs, e.g., build a DAG UW CSE P 501 Winter 2016 S-13

Uses of Value Numbers Improve the code Replace redundant expressions Simplify algebraic iden;;es Discover, fold, and propagate constant valued expressions UW CSE P 501 Winter 2016 S-14

Local Value Numbering Algorithm For each opera;on o = <op, o1,o2> in a block 1. Get value numbers for operands from hash lookup 2. Hash <op, VN(o1), VN(o2)> to get a value number for o (If op is commuta;ve, sort VN(o1), VN(o2) first) 3. If o already has a value number, replace o with a reference to the value 4. If o1 and o2 are constant, evaluate o at compile ;me and replace with an immediate load If hashing behaves well, this runs in linear ;me UW CSE P 501 Winter 2016 S-15

Example Code a = x + y b = x + y a = 17 c = x + y RewriWen UW CSE P 501 Winter 2016 S-16

Bug in Simple Example If we use the original names, we get in trouble when a name is reused Solu;ons Be clever about which copy of the value to use (e.g., use c=b in last statement) Create an extra temporary Rename around it (best!) UW CSE P 501 Winter 2016 S-17

Renaming Idea: give each value a unique name a i j means i th defini;on of a with VN = j Somewhat complex nota;on, but meaning is clear This is the idea behind SSA (Sta;c Single Assignment) Popular modern IR exposes many opportuni;es for op;miza;ons UW CSE P 501 Winter 2016 S-18

Example Revisited Code a = x + y b = x + y a = 17 c = x + y RewriWen UW CSE P 501 Winter 2016 S-19

Simple Extensions to Value Numbering Constant folding Add a bit that records when a value is constant Evaluate constant values at compile ;me Replace op with load immediate Algebraic iden;;es: x+0, x*1, x- x, Many special cases Switch on op to narrow down checks needed Replace result with input VN UW CSE P 501 Winter 2016 S-20

Larger Scopes This algorithm works on straight- line blocks of code (basic blocks) Best possible results for single basic blocks Loses all informa;on when control flows to another block To go further we need to represent mul;ple blocks of code and the control flow between them UW CSE P 501 Winter 2016 S-21

Control Flow Graph (CFG) reminder Nodes: basic blocks Key property: all statements executed sequen;ally if any are Edges: include a directed edge from n1 to n2 if there is any possible way for control to transfer from block n1 to n2 during execu;on UW CSE P 501 Winter 2016 S-22

Op;miza;on Categories (1) Local methods Usually confined to basic blocks Simplest to analyze and understand Most precise informa;on UW CSE P 501 Winter 2016 S-23

Op;miza;on Categories (2) Superlocal methods Operate over Extended Basic Blocks (EBBs) An EBB is a set of blocks b 1, b 2,, b n where b 1 has mul;ple predecessors and each of the remaining blocks b i (2 i n) have only b i- 1 as its unique predecessor The EBB is entered only at b 1, but may have mul;ple exits A single block b i can be the head of mul;ple EBBs (these EBBs form a tree rooted at b i ) Use informa;on discovered in earlier blocks to improve code in successors UW CSE P 501 Winter 2016 S-24

Op;miza;on Categories (3) Regional methods Operate over scopes larger than an EBB but smaller than an en;re procedure/ func;on/ method Typical example: loop body Difference from superlocal methods is that there may be merge points in the graph (i.e., a block with two or more predecessors) Facts true at merge point are facts known to be true on all possible paths to that point UW CSE P 501 Winter 2016 S-25

Op;miza;on Categories (4) Global methods Operate over en;re procedures Some;mes called intraprocedural methods Mo;va;on is that local op;miza;ons some;mes have bad consequences in larger context Procedure/method/func;on is a natural unit for analysis, separate compila;on, etc. Almost always need global data- flow analysis informa;on for these UW CSE P 501 Winter 2016 S-26

Op;miza;on Categories (5) Whole- program methods Operate over more than one procedure Some;mes called interprocedural methods Challenges: name scoping and parameter binding issues at procedure boundaries Classic examples: inline method subs;tu;on, interprocedural constant propaga;on Common in aggressive JIT compilers and op;mizing compilers for object- oriented languages UW CSE P 501 Winter 2016 S-27

Value Numbering Revisited Local Value Numbering 1 block at a ;me Strong local results No cross- block effects Missed opportuni;es B p = c + d r = c + d A D m = a + b n = a + b C e = b + 18 s = a + b u = e + f q = a + b r = c + d E e = a + 17 t = c + d u = e + f G y = a + b z = c + d F v = a + b w = c + d x = e + f UW CSE P 501 Winter 2016 S-28

Superlocal Value Numbering Idea: apply local method to EBBs {A,B}, {A,C,D}, {A,C,E} Final info from A is ini;al info for B, C; final info from C is ini;al for D, E Gets reuse from ancestors Avoid reanalyzing A, C Doesn t help with F, G B p = c + d r = c + d G A D y = a + b z = c + d m = a + b n = a + b C e = b + 18 s = a + b u = e + f F q = a + b r = c + d E v = a + b w = c + d x = e + f e = a + 17 t = c + d u = e + f UW CSE P 501 Winter 2016 S-29

SSA Name Space (from before) Code RewriWen a 3 0 = x 1 0 + y 2 0 a 3 0 = x 1 0 + y 2 0 b 3 0 = x 1 0 + y 2 0 b 3 0 = a 3 0 a 4 1 = 17 a 4 1 = 17 c 3 0 = x 1 0 + y 2 0 c 3 0 = a 3 0 Unique name for each defini;on Name ó VN a 0 3 is available to assign to c 0 3 UW CSE P 501 Winter 2016 S-30

SSA Name Space Two Principles Each name is defined by exactly one opera;on Each operand refers to exactly one defini;on Need to deal with merge points Add Φ func;ons at merge points to reconcile names Use subscripts on variable names for uniqueness UW CSE P 501 Winter 2016 S-31

Superlocal Value Numbering with All Bells & Whistles Finds more redundancies LiWle extra cost S;ll does nothing for F and G B A p 0 = c 0 + d 0 r 0 = c 0 + d 0 D m 0 = a 0 + b 0 n 0 = a 0 + b 0 C e 0 = b 0 + 18 s 0 = a 0 + b 0 u 0 = e 0 + f 0 q 0 = a 0 + b 0 r 1 = c 0 + d 0 E e 1 = a 0 + 17 t 0 = c 0 + d 0 u 1 = e 1 + f 0 G r 2 = Φ(r 0,r 1 ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 F e 2 = Φ(e 0,e 1 ) u 2 = Φ(u 0,u 1 ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 UW CSE P 501 Winter 2016 S-32

Larger Scopes S;ll have not helped F and G Problem: mul;ple predecessors Must decide what facts hold in F and in G For G, combine B & F? Merging states is expensive Fall back on what we know B A p 0 = c 0 + d 0 r 0 = c 0 + d 0 G D r 2 = Φ(r 0,r 1 ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 m 0 = a 0 + b 0 n 0 = a 0 + b 0 C e 0 = b 0 + 18 s 0 = a 0 + b 0 u 0 = e 0 + f 0 F q 0 = a 0 + b 0 r 1 = c 0 + d 0 E e 2 = Φ(e 0,e 1 ) u 2 = Φ(u 0,u 1 ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 e 1 = a 0 + 17 t 0 = c 0 + d 0 u 1 = e 1 + f 0 UW CSE P 501 Winter 2016 S-33

Dominators Defini;on x dominates y iff every path from the entry of the control- flow graph to y includes x By defini;on, x dominates x Associate a Dom set with each node Dom(x) 1 Many uses in analysis and transforma;on Finding loops, building SSA form, code mo;on UW CSE P 501 Winter 2016 S-34

Immediate Dominators For any node x, there is a y in Dom(x) closest to x This is the immediate dominator of x Nota;on: IDom(x) UW CSE P 501 Winter 2016 S-35

Dominator Sets Block Dom IDom A m 0 = a 0 + b 0 n 0 = a 0 + b 0 B p 0 = c 0 + d 0 r 0 = c 0 + d 0 C q 0 = a 0 + b 0 r 1 = c 0 + d 0 D e 0 = b 0 + 18 s 0 = a 0 + b 0 u 0 = e 0 + f 0 E e 1 = a 0 + 17 t 0 = c 0 + d 0 u 1 = e 1 + f 0 Note that the IDOM rela;on defines a tree! G r 2 = Φ(r 0,r 1 ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 F e 2 = Φ(e 0,e 1 ) u 2 = Φ(u 0,u 1 ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 UW CSE P 501 Winter 2016 S-36

Dominator Value Numbering S;ll looking for a way to handle F and G Idea: Use info from IDom(x) to start analysis of x Use C for F and A for G Dominator VN Technique (DVNT) B A p 0 = c 0 + d 0 r 0 = c 0 + d 0 G D r 2 = Φ(r 0,r 1 ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 m 0 = a 0 + b 0 n 0 = a 0 + b 0 C e 0 = b 0 + 18 s 0 = a 0 + b 0 u 0 = e 0 + f 0 F q 0 = a 0 + b 0 r 1 = c 0 + d 0 E e 2 = Φ(e 0,e 1 ) u 2 = Φ(u 0,u 1 ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 e 1 = a 0 + 17 t 0 = c 0 + d 0 u 1 = e 1 + f 0 UW CSE P 501 Winter 2016 S-37

DVNT algorithm Use superlocal algorithm on extended basic blocks Use scoped hash tables & SSA name space as before Start each node with table from its IDOM No values flow along back edges (i.e., loops) Constant folding, algebraic iden;;es as before UW CSE P 501 Winter 2016 S-38

Dominator Value Numbering Advantages Finds more redundancy LiWle extra cost Shortcomings Misses some opportuni;es (common calcula;ons in ancestors that are not IDOMs) Doesn t handle loops or other back edges B A p 0 = c 0 + d 0 r 0 = c 0 + d 0 G D r 2 = Φ(r 0,r 1 ) y 0 = a 0 + b 0 z 0 = c 0 + d 0 m 0 = a 0 + b 0 n 0 = a 0 + b 0 C e 0 = b 0 + 18 s 0 = a 0 + b 0 u 0 = e 0 + f 0 F q 0 = a 0 + b 0 r 1 = c 0 + d 0 E e 2 = Φ(e 0,e 1 ) u 2 = Φ(u 0,u 1 ) v 0 = a 0 + b 0 w 0 = c 0 + d 0 x 0 = e 2 + f 0 e 1 = a 0 + 17 t 0 = c 0 + d 0 u 1 = e 1 + f 0 UW CSE P 501 Winter 2016 S-39

The Story So Far Local algorithm Superlocal extension Some local methods extend cleanly to superlocal scopes Dominator VN Technique (DVNT) All of these propagate along forward edges None are global UW CSE P 501 Winter 2016 S-40

Coming AWrac;ons Data- flow analysis Provides global solu;on to redundant expression analysis Catches some things missed by DVNT, but misses some others Generalizes to many other analysis problems, both forward and backward Loops SSA for general transforma;ons UW CSE P 501 Winter 2016 S-41