Introduction to data-flow analysis. Data-flow analysis. Control-flow graphs. Data-flow analysis. Example: liveness. Requirements

Similar documents
Data flow analysis. DataFlow analysis

Generalizing Data-flow Analysis

Dataflow Analysis. Chapter 9, Section 9.2, 9.3, 9.4

MIT Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Dataflow analysis. Theory and Applications. cs6463 1

Dataflow Analysis. Dragon book, Chapter 9, Section 9.2, 9.3, 9.4

Data-Flow Analysis. Compiler Design CSE 504. Preliminaries

Scalar Optimisation Part 2

Iterative Dataflow Analysis

Definition: A binary relation R from a set A to a set B is a subset R A B. Example:

Mathematics 222a Quiz 2 CODE 111 November 21, 2002

DFA of non-distributive properties

CMSC 631 Program Analysis and Understanding. Spring Data Flow Analysis

Section Summary. Relations and Functions Properties of Relations. Combining Relations

Compiler Design. Data Flow Analysis. Hwansoo Han

Dataflow Analysis. A sample program int fib10(void) { int n = 10; int older = 0; int old = 1; Simple Constant Propagation

Data Flow Analysis. Lecture 6 ECS 240. ECS 240 Data Flow Analysis 1

Intra-procedural Data Flow Analysis Backwards analyses

Axioms of Kleene Algebra

Definition: A binary relation R from a set A to a set B is a subset R A B. Example:

Topic-I-C Dataflow Analysis

The Meet-Over-All-Paths Solution to Data-Flow Problems

Relations. We have seen several types of abstract, mathematical objects, including propositions, predicates, sets, and ordered pairs and tuples.

Register machines L2 18

We define a dataflow framework. A dataflow framework consists of:

LOWELL JOURNAL. MUST APOLOGIZE. such communication with the shore as Is m i Boimhle, noewwary and proper for the comfort

Reading: Chapter 9.3. Carnegie Mellon

Dataflow Analysis Lecture 2. Simple Constant Propagation. A sample program int fib10(void) {

Goal. Partially-ordered set. Game plan 2/2/2013. Solving fixpoint equations

Dataflow Analysis - 2. Monotone Dataflow Frameworks

CS422 - Programming Language Design

A DARK GREY P O N T, with a Switch Tail, and a small Star on the Forehead. Any

Chapter 1. Sets and Mappings

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

LESSON 7.1 FACTORING POLYNOMIALS I

Packet #5: Binary Relations. Applied Discrete Mathematics

Monotone Data Flow Analysis Framework

Data Flow Analysis (I)

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Multivariable Calculus and Matrix Algebra-Summer 2017

Chapter 2 Algorithms and Computation

Math 203A - Solution Set 1

Probabilistic Model Checking and [Program] Analysis (CO469)

CSC Discrete Math I, Spring Relations

Math 203A - Solution Set 1

Sets and Motivation for Boolean algebra

2.2 Some Consequences of the Completeness Axiom

CSC D70: Compiler Optimization Dataflow-2 and Loops

Lecture 5 Introduction to Data Flow Analysis

Scalar Optimisation Part 1

The least element is 0000, the greatest element is 1111.

3.1 Symmetry & Coordinate Graphs

Lecture 11: Data Flow Analysis III

{a, b, c} {a, b} {a, c} {b, c} {a}

Neatest and Promptest Manner. E d i t u r ami rul)lihher. FOIt THE CIIILDIIES'. Trifles.

Factor: x 2 11x + 30 = 0. Factoring Quadratic Equations. Fractional and Negative Exponents

MATH 403 MIDTERM ANSWERS WINTER 2007

CSCI 1010 Models of Computa3on. Lecture 02 Func3ons and Circuits

Static Program Analysis

Adding and Subtracting Polynomials

Contents. Chapter 2 Digital Circuits Page 1 of 30

Lecture Overview SSA Maximal SSA Semipruned SSA o Placing -functions o Renaming Translation out of SSA Using SSA: Sparse Simple Constant Propagation

CSC D70: Compiler Optimization Static Single Assignment (SSA)

Model Checking & Program Analysis

Dirty Quant Workshop

Zangwill s Global Convergence Theorem

Additional Practice Lessons 2.02 and 2.03

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M.

(4.2) Equivalence Relations. 151 Math Exercises. Malek Zein AL-Abidin. King Saud University College of Science Department of Mathematics

1 Introduction to information theory

Topology Exercise Sheet 2 Prof. Dr. Alessandro Sisto Due to March 7

Chapter 2. Vectors and Vector Spaces

Polynomials. In many problems, it is useful to write polynomials as products. For example, when solving equations: Example:

The Law of Averages. MARK FLANAGAN School of Electrical, Electronic and Communications Engineering University College Dublin

CP405 Theory of Computation

Procedure for Graphing Polynomial Functions

Finding k disjoint paths in a directed planar graph

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Equivalence relations

Groups and Symmetries

0 Sets and Induction. Sets

SUMMER MATH PACKET ADVANCED ALGEBRA A COURSE 215

CSE 167: Introduction to Computer Graphics Lecture #2: Linear Algebra Primer

EC-121 Digital Logic Design

IOAN ŞERDEAN, DANIEL SITARU

MIT Loop Optimizations. Martin Rinard

Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra

Vectors, dot product, and cross product

Lecture Notes: Selected Topics in Discrete Structures. Ulf Nilsson

What is SSA? each assignment to a variable is given a unique name all of the uses reached by that assignment are renamed

Notes for Comp 497 (454) Week 10

Answer: A. Answer: C. 3. If (G,.) is a group such that a2 = e, a G, then G is A. abelian group B. non-abelian group C. semi group D.

MAD 3105 PRACTICE TEST 2 SOLUTIONS

Static Problem Set 2 Solutions

X-MA2C01-1: Partial Worked Solutions

Chapter 7: Exponents

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

D-MATH Algebra I HS18 Prof. Rahul Pandharipande. Solution 1. Arithmetic, Zorn s Lemma.

Mathematical Preliminaries. Sipser pages 1-28

Lecture 8: Basic convex analysis

Transcription:

Data-flow analysis Michel Schinz based on material by Erik Stenman and Michael Schwartzbach Introduction to data-flow analysis Data-flow analysis Example: liveness Data-flow analysis is a global analysis framework that can be used to compute or, more precisely, approximate various properties of programs. The results of those analysis can be used to perform several optimisations, for example: common sub-expression elimination, dead-code elimination, constant propagation, register allocation, etc. A variable is said to be live at a given point if its value will be read later. While liveness is clearly undecidable, a conservative approximation can be computed using dataflow analysis. This approximation can then be used, for example, to allocate registers: a set of variables that are never live at the same time can share a single register. 3 4 Requirements Data-flow analysis requires the program to be represented as a control flow graph (CFG). To compute properties about the program, it assigns values to the nodes of the CFG. Those values must be related to each other by a special kind of partial order called a lattice. We therefore start by introducing control flow graphs and lattice theory. Control-flow graphs 5

Control-flow graph CFG example x!12 A control flow graph (CFG) is a graphical representation of a program. The nodes of the CFG are the statements of that program. The edges of the CFG represent the flow of control: there is an edge from n 1 to n 2 if and only if control can flow immediately from n 1 to n 2. That is, if the statements of n 1 and n 2 can be executed in direct succession. x!y y!5 if x<y x!2*y z!x/y 7 8 Predecessors and successors Basic block In the CFG, the set of the immediate predecessors of a node n is written pred(n). Similarly, the set of the immediate successors of a node n is written succ(n). A basic block is a maximal sequence of statements for which control flow is purely linear. That is, control always enters a basic block from the top its first instruction and leaves from the bottom its last instruction. Basic blocks are often used as the nodes of a CFG, in order to reduce its size. 9 10 CFG example (basic blocks) x!12 y!5 if x<y Lattice theory x!y x!2*y z!x/y 11

Partial order Partial order example A partial order is a mathematical structure (S,!)composed of a set S and a binary relation! on S, satisfying the following conditions: 1. reflexivity: x S, x! x 2. transitivity: x,y,z S, x! y $ y! z x! z 3. anti-symmetry: x,y S, x! y $ y! x x = y In Java, the set of types along with the subtyping relation form a partial order. According to that order, the type String is smaller (i.e. a subtype) of the type Object. The type String and Integer are not comparable: none of them is a subtype of the other. 13 14 Upper bound Lower bound Given a partial order (S,!) and a set X S, y S is an upper bound for X, written X! y, if x X, x! y. A least upper bound (lub) for X, written "X, is defined by: X! "X $ y S, X! y "X! y Notice that a least upper bound does not always exist. Given a partial order (S,!) and a set X S, y S is a lower bound for X, written y! X, if x X, y! x. A greatest lower bound for X, written #X, is defined by: #X! X $ y S, y! X y! #X Notice that a greatest lower bound does not always exist. 15 16 Lattice Finite partial orders A lattice is a partial order L = (S,!) for which "X and #X exist for all X S. A lattice has a unique greatest element, written! and pronounced top, defined as!= "S. It also has a unique smallest element, written and pronounced bottom, defined as = #S. The height of a lattice is the length of the longest path from to!. A partial order (S,!) is finite if the set S contains a finite number of elements. For such partial orders, the lattice requirements reduce to the following:! and exist, every pair of elements x,y in S has a least upper bound written x " y as well as a greatest lower bound written x # y. 17 18

Cover relation Hasse diagram In a partial order (S,!), we say that an element y covers another element x if: (x % y) ( z S, x! z % y x = z) where x % y x! y x! y. Intuitively, y covers x if y is the smallest element greater than x. A partial order can be represented graphically by a Hasse diagram. In such a diagram, the elements of the set are represented by dots. If an element y covers an element x, then the dot of y is placed above the dot of x, and a line is drawn to connect the two dots. 19 20 Hasse diagram example Hasse diagram for the partial order (S,!) where S = { 0, 1,, 7 } and x! y (x & y) = x 7 (111) bitwise and Partial order examples Which of the following partial orders are lattices? 1 2 3 3 (011) 6 (110) 5 (101) 2 (010) 1 (001) 4 (100) 4 5 6 0 (000) 21 22 Monotone function Fixed points A function f : L " L is monotone if and only if: x,y S, x! y f(x)! f(y) This does not imply that f is increasing, as constant functions are also monotone. Viewed as functions, # and " are monotone in both arguments. 24

Fixed point theorem Fixed points and equations Definition: a value v is a fixed point of a function f if and only if f(v) = v. Fixed point theorem: In a lattice L with finite height, every monotone function f has a unique least fixed point fix(f), and it is given by: fix(f) = " f( ) " f 2 ( ) " f 3 ( ) " Fixed points are interesting as they enable us to solve systems of equations of the following form: x 1 = F 1(x 1,, x n) x 2 = F 2(x 1,, x n) x n = F n(x 1,, x n) where x 1,..., x n are variables, and F 1,..., F n : L n " L are monotone functions. Such a system has a unique least solution that is the least fixed point of the composite function F : L n " L n defined as: F(x 1,, x n) = (F 1(x 1,, x n),, F n(x 1,, x n)) 25 26 Fixed points and inequations Systems of inequations of the following form: x 1! F 1(x 1,, x n) x 2! F 2(x 1,, x n) x n! F n(x 1,, x n) can be solved similarly by observing that x! y x = x # y and rewriting the inequations. Data-flow analysis 27 Overview Example: liveness Data-flow analysis works on a control-flow graph and a lattice L. The lattice can either be fixed for all programs, or depend on the analysed one. A variable v n ranging over the values of L is attached to every node n of the CFG. A set of (in)equations for these variables are then extracted from the CFG according to the analysis being performed and solved using the fixed point technique. As we have seen, liveness is a property that can be approximated using data-flow analysis. The lattice to use in that case is L = { P(V), } where V is the set of variables appearing in the analysed program, and P is the power set operator (set of all subsets). 29 30

Example: liveness Example: liveness For a program containing three variables x, y and z, the lattice for liveness is the following: {x,y,z} {x,y} {x,z} {y,z} {x} {y} {z} To every node n in the CFG, we attach a variable v n giving the set of variables live before that node. The value of that variable is given by: v n = (v s1 & v s2 & \ written(n)) & read(n) where s 1, s 2, are the successors of n, read(n) is the set of program variables read by n, and written(n) is the set of variables written by n. {} 31 32 Example: liveness CFG constraints solution 1 x!read-int 2 y!read-int 3 if (x < y) 4 z!x 5 z!y 6 print-int z v 1 = v 2 \ { x } v 2 = v 3 \ { y } v 3 = v 4 v 5 { x, y } v 4 = v 6 { x } \ { z } v 5 = v 6 { y } \ { z } v 6 = { z } v 1 = { } v 2 = { x } v 3 = { x, y } v 4 = { x } v 5 = { y } v 6 = { z } Fixed point algorithm To solve the data-flow constraints, we construct the composite function F and compute its least fixed point by iteration. F(x 1, x 2, x 3, x 4, x 5, x 6) = (x 2\{x}, x 3\{y}, x 4&x 5&{x,y}, x 6&{x}\{z}, x 6&{y}\{z}, {z}) Iteration x 1 x 2 x 3 x 4 x 5 x 6 0 {"} {"} {"} {"} {"} {"} 1 { } { } { x, y } { x } { y } { z } 2 { } { x"} { x, y } { x } { y } { z } 3 { } { x"} { x, y } { x } { y } { z } 33 34 Work-list algorithm Work-list algorithm Computing the fixed point by simple iteration as we did works, but is wasteful as the information for all nodes is recomputed at every iteration. It is possible to do better by remembering, for every variable v, the set dep(v) of the variables whose value depends on the value of v itself. Then, whenever the value of some variable v changes, we only re-compute the value of the variables that belong to dep(v). x 1 = x 2 = = x n = q = [ v 1,, v n ] while (q! []) assume q = [ v i, ] y = F i (x 1,, x n) q = q.tail if (y! x i) for (v dep(v i)) if (v q) q.append(v) x i = y 35 36

Work-list example: liveness Work-list improvements q x1 x2 x3 x4 x5 x6 [v1,v2,v3,v4,v5,v6] {} {} {} {} {} {} [v2,v3,v4,v5,v6] {} {} {} {} {} {} [v3,v4,v5,v6] {} {} {} {} {} {} [v4,v5,v6,v2] {} {} {x,y} {} {} {} [v5,v6,v2,v3] {} {} {x,y} {x} {} {} [v6,v2,v3,v3] {} {} {x,y} {x} {y} {} [v2,v3,v4,v5] {} {} {x,y} {x} {y} {z} [v3,v4,v5,v1] {} {x} {x,y} {x} {y} {z} [v4,v5,v1] {} {x} {x,y} {x} {y} {z} [v5,v1] {} {x} {x,y} {x} {y} {z} [v1] {} {x} {x,y} {x} {y} {z} [] {} {x} {x,y} {x} {y} {z} In our liveness example, the work-list algorithm would have terminated in only six iterations if the initial queue had been reversed! This is due to the fact that liveness analysis is a backward analysis: the value of variable v n depends on the successors of n. For such analysis, it is better to organise the queue with the latest nodes first. 37 38 Working with basic blocks Liveness example revisited CFG constraints solution Until now, we considered that the CFG nodes were single instructions. In practice, basic blocks tend to be used as nodes, to reduce the size of the CFG. When data-flow analysis is performed on a CFG composed of basic blocks, a variable is attached to every block, not to every instruction. Computing the result of the analysis for individual instructions is however trivial. 1 x!read-int y!read-int if (x < y) 2 z!x 3 z!y v 1 = v 2 v 3 \ { x, y } v 2 = v 4 { x } \ { z } v 3 = v 4 { y } \ { z } v 4 = { z } v 1 = { } v 2 = { x } v 3 = { y } v 4 = { z } 4 print-int z 39 40 Available expressions Analysis example #2: available expressions A non-trivial expression in a program is available at some point if its value has already been computed earlier. Data-flow analysis can be used to approximate the set of expressions available at all program points. The result from that analysis can then be used to eliminate common subexpressions, for example. 42

Intuitions Equations We will compute the set of expressions available after every node of the CFG. Intuitively, an expression e is available after some node n if: it is available after all predecessors of n, or it is defined by n itself, and not killed by n. A node n kills an expression e if it gives a new value to a variable used by e. For example, the assignment x!y kills all expressions that use x, like x+1. To approximate available expressions, we attach to every node n of the CFG a variable v n containing the set of expressions available after it. Then we derive constraints from the CFG nodes, which have the form: v n = (v p1 v p2 \ kill(n)) gen(n) where gen(n) is the set of expressions computed by n, and kill(n) the set of expressions killed by n. 43 44 Example CFG constraints solution 1 if a<b 2 x!a+b 3 x!d+e 4 y!x+1 5 y!a+b 6 z!a+b 7 t!x+1 v 1={a<b} v 2={a+b} (v 1#x) v 3={d+e} (v 1#x) v 4={x+1} (v 2#y) v 5={a+b} (v 3#y) v 6={a+b} (v 4 v 5)#z v 7={x+1} v 5#t v 1={a<b} v 2={a+b, a<b} v 3={d+e, a<b} v 4={x+1, a+b, a<b} v 5={a+b, d+e, a<b} v 6={a+b, a<b} v 7={x+1, a+b, a<b} v n=set of expressions live after node n. Notation: S#x = S \ all expressions using variable x. Analysis example #3: very busy expressions 45 Very busy expressions Intuitions An expression is very busy at some program point if it will definitely be evaluated before its value changes. Data-flow analysis can approximate the set of very busy expressions for all program points. The result of that analysis can then be used to perform code hoisting: the computation of a very busy expression e can be performed at the earliest point where it is busy. We will compute the set of very busy expressions before every node of the CFG. Intuitively, an expression e is very busy before node n if it is evaluated by n, or if it is very busy in all successors of n, and it is not killed by n. 47 48

Equations Example CFG constraints solution To approximate very busy expressions, we attach to each node n of the CFG a variable v n containing the set of expressions that are very busy before it. Then we derive constraints from the CFG nodes, which have the form: v n = (v s1 v s2 \ kill(n)) gen(n) where gen(n) is the set of expressions computed by n, and kill(n) the set of expressions killed by n. 1 x!100 2 y!a+b 3 x!x-1 4 if x>0 5 print y v 1=v 2#x v 2={a+b} v 3#y v 3={x-1} v 4#x v 4={x>0} (v 5 v 2) v 5={} v 1={a+b} v 2={a+b,x-1} v 3={x-1} v 4={x>0} v 5={} v n=set of expressions very busy before node n. Notation: S#x = S \ all expressions using variable x. 49 50 Reaching definitions Analysis example #4: reaching definitions The reaching definitions for a program point are the assignments that may have defined the values of variables at that point. Data-flow analysis can approximate the set of reaching definitions for all program points. These sets can then be used to perform constant propagation, for example. 52 Intuitions We will compute the set of reaching definitions after every node of the CFG. This set will be represented as a set of CFG node identifiers. Intuitively, the reaching definitions after a node n are all the reaching definitions of the predecessors of n, minus those that define a variable defined by n itself, plus n itself. Equations To approximate reaching definitions, we attach to node n of the CFG a variable v n containing the set of definitions (CFG nodes) that can reach n. For a node n that is not an assignment, the reaching definitions are simply those of its predecessors: v n = (v p1 v p2 ) For a node n that is an assignment, the equation is more complicated: v n = (v p1 v p2 ) \ kill(n) { n } where kill(n) are the definitions killed by n, i.e. those which define the same variable as n itself. For example, a definition like x!y kills all expressions of the form x! 53 54

Example 1 2 3 4 CFG x!100 z!0 z!z+3 x!x-1 constraints v 1={1} v 2=v 1#z {2} v 3=(v 2 v 5)#z {3} v 4=v 3#x {4} v 5=v 4 v 6=v 5 solution v 1={1} v 2={1,2} v 3={1,3,4} v 4={3,4} v 5={3,4} v 6={3,4} v n=set of reaching definitions after node n. Putting data-flow analyses to work 5 if x>0 Notation: S#x = S \ all nodes defining variable x. 6 print z 55 Using data-flow analysis Dead-code elimination Once a particular data-flow analysis has been conducted, its result can be used to optimise the analysed program. We will quickly examine some transformations that can be performed using the data-flow analysis presented before. Useless assignments can be eliminated using liveness analysis, as follows: Whenever a CFG node n is of the form x!e, and x is not live after n, then the assignment is useless and node n can be removed. 57 58 CSE Constant propagation Common sub-expressions can be eliminated using availability information, as follows: Whenever a CFG node n computes an expression of the form x op y and x op y is available before n, then the computation within n can be replaced by a reference to the previously-computed value. Constant propagation can be performed using the result of reaching definitions analysis, as follows: When a CFG node n uses a value x and the only definition of x reaching n has the form x!c where c is a constant, then the use of x in n can be replaced by c. 59 60

Copy propagation Copy propagation very similar to constant propagation can be performed using the result of reaching definitions analysis, as follows: When a CFG node n uses a value x, and the only definition of x reaching n has the form x!y where y is a variable, and y is not redefined on any path leading to n, then the use of x in n can be replaced by y. 61