Data-Flow Analysis. Compiler Design CSE 504. Preliminaries

Similar documents
CMSC 631 Program Analysis and Understanding. Spring Data Flow Analysis

Dataflow Analysis Lecture 2. Simple Constant Propagation. A sample program int fib10(void) {

Dataflow Analysis. A sample program int fib10(void) { int n = 10; int older = 0; int old = 1; Simple Constant Propagation

Lecture 5 Introduction to Data Flow Analysis

Data flow analysis. DataFlow analysis

Compiler Design. Data Flow Analysis. Hwansoo Han

Reading: Chapter 9.3. Carnegie Mellon

Data Flow Analysis. Lecture 6 ECS 240. ECS 240 Data Flow Analysis 1

CSC D70: Compiler Optimization Dataflow-2 and Loops

Introduction to data-flow analysis. Data-flow analysis. Control-flow graphs. Data-flow analysis. Example: liveness. Requirements

Lecture 4 Introduc-on to Data Flow Analysis

Topic-I-C Dataflow Analysis

Dataflow Analysis. Chapter 9, Section 9.2, 9.3, 9.4

Compiler Design Spring 2017

MIT Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

COSE312: Compilers. Lecture 17 Intermediate Representation (2)

Dataflow Analysis. Dragon book, Chapter 9, Section 9.2, 9.3, 9.4

What is SSA? each assignment to a variable is given a unique name all of the uses reached by that assignment are renamed

3/11/18. Final Code Generation and Code Optimization

Goal. Partially-ordered set. Game plan 2/2/2013. Solving fixpoint equations

Data Flow Analysis (I)

Dataflow analysis. Theory and Applications. cs6463 1

Department of Computer Science and Engineering, IIT Kanpur, INDIA. Code Optimization. Sanjeev K Aggarwal Code Optimization 1 of I

: Advanced Compiler Design Compu=ng DF(X) 3.4 Algorithm for inser=on of φ func=ons 3.5 Algorithm for variable renaming

Generalizing Data-flow Analysis

Static Program Analysis using Abstract Interpretation

CSC D70: Compiler Optimization Static Single Assignment (SSA)

Space-aware data flow analysis

n CS 160 or CS122 n Sets and Functions n Propositions and Predicates n Inference Rules n Proof Techniques n Program Verification n CS 161

Formal Methods in Software Engineering

Dataflow Analysis - 2. Monotone Dataflow Frameworks

Compiler Design Spring 2017

Model Checking & Program Analysis

Lecture Overview SSA Maximal SSA Semipruned SSA o Placing -functions o Renaming Translation out of SSA Using SSA: Sparse Simple Constant Propagation

Inductive Definitions

Lecture Notes: Selected Topics in Discrete Structures. Ulf Nilsson

The non-logical symbols determine a specific F OL language and consists of the following sets. Σ = {Σ n } n<ω

Scalar Optimisation Part 2

: Compiler Design. 9.5 Live variables 9.6 Available expressions. Thomas R. Gross. Computer Science Department ETH Zurich, Switzerland

Programming Languages and Types

(b). Identify all basic blocks in your three address code. B1: 1; B2: 2; B3: 3,4,5,6; B4: 7,8,9; B5: 10; B6: 11; B7: 12,13,14; B8: 15;

Foundations of Abstract Interpretation

Tutorial on Mathematical Induction

Introduction to Kleene Algebras

Interprocedural Analysis: Sharir-Pnueli s Call-strings Approach

Notes on Abstract Interpretation

Equivalence relations

We define a dataflow framework. A dataflow framework consists of:

Adventures in Dataflow Analysis

Mechanics of Static Analysis

Lecture 11: Data Flow Analysis III

Static Program Analysis

Mathematical Preliminaries. Sipser pages 1-28

COSE312: Compilers. Lecture 14 Semantic Analysis (4)

Models of Computation. by Costas Busch, LSU

On the Effectiveness of Symmetry Breaking

AAA616: Program Analysis. Lecture 3 Denotational Semantics

Register machines L2 18

DFA of non-distributive properties

Gerwin Klein, June Andronick, Ramana Kumar S2/2016

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Semantics and Verification of Software

ECE 5775 (Fall 17) High-Level Digital Design Automation. Control Flow Graph

Regular Expressions. Definitions Equivalence to Finite Automata

Principles of AI Planning

UNIT-III REGULAR LANGUAGES

Verifying the LLVM. Steve Zdancewic DeepSpec Summer School 2017

Introduction to Theory of Computing

Lecture 10: Data Flow Analysis II

Intra-procedural Data Flow Analysis Backwards analyses

Principles of AI Planning

MIT Loop Optimizations. Martin Rinard

Tape encoding of lists of numbers

EQUIVALENCE RELATIONS (NOTES FOR STUDENTS) 1. RELATIONS

Extremal Solutions of Inequations over Lattices with Applications to Supervisory Control 1

Flow grammars a flow analysis methodology

Propositional Logic: Models and Proofs

Topics on Compilers

Parts 3-6 are EXAMPLES for cse634

Mohammad Farshi Department of Computer Science Yazd University. Computer Science Theory. (Master Course) Yazd Univ.

Proving Fixed Points

Monotone Data Flow Analysis Framework

CSC 344 Algorithms and Complexity. Proof by Mathematical Induction

A scheme developed by Du Pont to figure out

A marks are for accuracy and are not given unless the relevant M mark has been given (M0 A1 is impossible!).

Embedded Systems 15. REVIEW: Aperiodic scheduling. C i J i 0 a i s i f i d i

Program Analysis. Lecture 5. Rayna Dimitrova WS 2016/2017

Organization of a Modern Compiler. Middle1

Computational Logic Fundamentals (of Definite Programs): Syntax and Semantics

Completeness of Star-Continuity

Static Program Analysis. Seidl/Wilhelm/Hack: Compiler Design Analysis and Transformation, Springer Verlag, 2012

Review. Principles of Programming Languages. Equality. The Diamond Property. The Church-Rosser Theorem. Corollaries. CSE 230: Winter 2007

2. Transience and Recurrence

Jacques Fleuriot. Automated Reasoning Rewrite Rules

Principles of Program Analysis: Control Flow Analysis

Optimizing Intra-Task Voltage Scheduling Using Data Flow Analysis Λ

Example: Use a direct argument to show that the sum of two even integers has to be even. Solution: Recall that an integer is even if it is a multiple

Class 5. Review; questions. Data-flow Analysis Review. Slicing overview Problem Set 2: due 9/3/09 Problem Set 3: due 9/8/09

CS 455/555: Mathematical preliminaries

A should be R-closed: R(A) A. We would like this to hold because we would like every element that the rules say should be in A to actually be in A.

Transcription:

Data-Flow Analysis Compiler Design CSE 504 1 Preliminaries 2 Live Variables 3 Data Flow Equations 4 Other Analyses Last modifled: Thu May 02 2013 at 09:01:11 EDT Version: 1.3 15:28:44 2015/01/25 Compiled at 22:06 on 2015/02/13 Compiler Design Data-Flow Analysis CSE 504 1 / 20 Program Analysis Preliminaries The compiler needs to understand properties of a program (e.g. the set of variables live at a program point). This information should be computed at compile time, with incomplete information on the values the program computes, and without executing the program itself! This information is likely to be approximate: in general, at compile time, we will not know which sequence of instructions will be executed. Data-Flow Analysis is a standard way to formulate intra-procedural program analysis. Compiler Design Data-Flow Analysis CSE 504 2 / 20

Preliminaries Control Flow Graphs When we try to deduce properties of a procedure, we first build a control flow graph (CFG). Nodes of a CFG are Basic Blocks. Edges indicate which blocks can follow which other blocks. A Basic Block is a sequence of instructions such that: There are no jumps/branches in the sequence except as the last instruction. For all jumps/branches in the program, the target is the first instruction in some basic block. In other words, no jump lands in the middle of a basic block. Compiler Design Data-Flow Analysis CSE 504 3 / 20 Preliminaries Example of CFGs B1: 1. i = 1 B2: 2. j = 1 Branches only at the end of a block. Branch destinations only at beginning of a block. B3: 3. t1 = 10 * i 4. t2 = t1 + j 5. t3 = 4 * t2 6. a[t3] = 0 7. j = j + 1 8. if j < 10 goto (3) B4: 9. i = i + 1 10. if i < 10 goto (2) B5: 11. i = 1 B6: 12. t4 = 10 * i 13. t5 = t4 + i 14. a[t5] = 1 15. i = i + 1 16. if i < 10 goto (12) Exit: Compiler Design Data-Flow Analysis CSE 504 4 / 20

Live Variables Live Variables Consider the problem of finding the set of live variables at some program point. A variable is live after a statement s in the program, if it is used in a statement s, and there is a control flow path from s to s. Example: 1. i = 1 2. j = 1 3. t1 = 10 * i 4. t2 = t1 + j 5. t3 = 4 * t2 6. a[t3] = 0 7. j = j + 1.. Variable t3 is live after statement 5 since it is used in statement 6. Variable j is also live after statement 5 since it is used in statement 7. Compiler Design Data-Flow Analysis CSE 504 5 / 20 Live Variables Live Variable Analysis (1) Let def (s) be the set of all variables defined by statement s (e.g. the lhs variable in an assignment statement). Let use(s) be the set of all variables used by statement s (e.g. the variables on the rhs of an assignment statement). succ(s): the set of statements that immediately follow statement s. The above definitions for def, use, and succ can be extended for whole blocks as well. def (B): set of variables defined in block b. use(b): set of variables used, but not defined earlier, in block b. succ(b): set of blocks that immediately succeed block B. Compiler Design Data-Flow Analysis CSE 504 6 / 20

Live Variables Live Variable Analysis (2) B1: 1. i = 1 B2: 2. j = 1 B3: 3. t1 = 10 * i 4. t2 = t1 + j 5. t3 = 4 * t2 6. a[t3] = 0 7. j = j + 1 8. if j < 10 goto (3) B4: 9. i = i + 1 10. if i < 10 goto (2) B5: 11. i = 1 B6: 12. t4 = 10 * i 13. t5 = t4 + i 14. a[t5] = 1 15. i = i + 1 16. if i < 10 goto (12) Exit: Block Succ Def Use 1 {2} {i} {} 2 {3} {j} {} 3 {3,4} {t1, t2, t3, j} {a,i, j} 4 {2,5} {i} {i} 5 {6} {i} {} 6 {6,Exit} {t4,t5,i} {a,i} Compiler Design Data-Flow Analysis CSE 504 7 / 20 Live Variables Live Variable Analysis (3) Out(s): the set of variables live just after statement s. In(s): the set of variables live just before statement s. The above definitions for Out and In can be readily extended for blocks. Observe that: If a variable is used by a statement, then it must be live before the statement. If a variable is live immediately after a statement, then it must be live before the statement as well, unless it is defined by the statement. For a statement s, if a variable is live before any of its successors, then it must be live after s. From these observations, we get: In(s) = use(s) (Out(s) def (s)) Out(s) = In(t) t succ(s) Compiler Design Data-Flow Analysis CSE 504 8 / 20

Live Variables Live Variable Analysis (4) In(s) = use(s) (Out(s) def (s)) Out(s) = In(t) t succ(s) Let a be a variable that is needed after the procedure exits (e.g. it is a global variable). Then, In(Exit) = {a}. Block Succ Def Use In Out 1 {2} {i} {} Out(1) {i} In(2) 2 {3} {j} {} Out(2) {j} In(3) 3 {3,4} {t1,t2,t3,j} {a,i, j} {a,i,j} Out(4) {t1,t2,t3,j} In(3) In(4) 4 {2,5} {i} {i} {i} Out(4) {i} In(2) In(5) 5 {6} {i} {} Out(5) {i} In(6) 6 {6,Exit} {t4,t5,i} {a,i} {a,i} Out(6) {t4,t5,i} In(6) In(Exit) Exit {} {} {} {a} Compiler Design Data-Flow Analysis CSE 504 9 / 20 Live Variables Live Variable Analysis (5) The equations for In and Out form a set of simultaneous set equations. For this analysis, we require the least solution to these equations. Consider the equations relating In(6), Out(6) and In(Exit): In(6) = {a, i} Out(6) {t4, t5, i} Out(6) = In(6) In(Exit) In(Exit) = {a} There are many solutions to these equations: 1 In(6) = Out(6) = {a, i}, and In(Exit) = {a}. 2 In(6) = Out(6) = {a, i, t3}, and In(Exit) = {a}. 3. Of these, (1) is the least. In fact, it can be shown that every solution will contain (1). Compiler Design Data-Flow Analysis CSE 504 10 / 20

Data Flow Equations Solutions to Data Flow Equations Data flow analysis is formulated in terms of finding the least (or sometimes, the greatest) solution to a set of simultaneous equations. The flow equations can be written as X = F (X ), where X is a vector of In s and Out s. Solutions X such that X = F (X ) are fixed points of F. The smallest X such that X = F (X ) is called the least fixed point of F. Compiler Design Data-Flow Analysis CSE 504 11 / 20 Partial Orders Data Flow Equations Let U be a finite set, and let D = P(U), i.e. the powerset of U. Let D n = D D (n times) D, i.e., an n-dimensional cartesian space over P(U). We can define partial order among vectors of sets such that X X if, and only if, for all components of the vector, X i X i. It is easy to verify that is a partial order: it is reflexive, transitive and anti-symmetric. Let be a n-vector of empty sets. Clearly, X for all X D n. Let be a n-vector of U. Observe that X for all X D n. (D n, ) is a complete lattice with as the least element and as the greatest element. Vectors X (0), X (1),..., X (i) is called a chain if X (0) X (1) X (i). Note all chains in (D n, ) are finite, since U is finite. Compiler Design Data-Flow Analysis CSE 504 12 / 20

Monotone Functions Data Flow Equations Let F : D n D n (i.e. a function from D n to D n ). A function F is monotone over partial order if, for every X and X such that X X, we have F (X ) F (X ). Note the definition of monotonicity. It says the function returns smaller values if it is given smaller argument values. It is not necessary that the returned values must be smaller than the argument values! It is easy to see that the flow equations for live variable analysis defines a monotone function. There is a simple way to show the existence of fixed points, and to compute the Least/Greatest Fixed Points of a monotone function. Tarski-Knaster Theorem: Given a complete lattice L and a monotone function G : L L, the fixed points of G form a complete lattice. Consequently, there exist both least and greatest fixed points. Compiler Design Data-Flow Analysis CSE 504 13 / 20 Data Flow Equations Computing Least Fixed Point (1) Kleene s Fixed Point Theorem: Construct a sequence X (0), X (1),..., X (i),..., where X 0 = and X (i+1) = F (X (i) ). This sequence forms a chain. X (0) = X (1). If X (i) X (i+1), then X (i+1) X (i+2). X (i+1) = F (X (i) ) Since X (i) X (i+1), by monotonicity of F, F (X (i) ) F (X (i+1) ). X (i+2) = F (X (i+1) ) Since all chains over are finite, consider the last element of the chain X (n). X (n) = F (X (n) ), otherwise it is not the last element. So, X (n) is a fixed point of F. Compiler Design Data-Flow Analysis CSE 504 14 / 20

Data Flow Equations Computing Least Fixed Point (2) Consider the sequence X (0), X (1),..., X (i),..., X (n), where X 0 = and X (i+1) = F (X (i) ). X (n) is the least fixed point of F. We already know that X (n) is a fixed point of F. Let Y be any fixed point of F. Clearly, X (0) = Y. If X (i) Y, since F is monotone, X (i+1) = F (X (i) ) F (Y ) = Y (since Y is a fixed point). Hence, by induction, for all elements of the chain X (i) Y. In particular, X (n) Y, is at least as small as any fixed point Y of F, and hence is the least fixed point. Compiler Design Data-Flow Analysis CSE 504 15 / 20 Data Flow Equations Computing the Greatest Fixed Point Consider the sequence X (0), X (1),..., X (i),..., X (n), where X 0 = and X (i+1) = F (X (i) ). Note the starting point of this sequence: the greatest element in the lattice. By an argument similar to the one we used for the least fixed point, X (n) can be shown to be the greatest fixed point of F. Compiler Design Data-Flow Analysis CSE 504 16 / 20

Data Flow Equations Live Variable Analysis Revisited Set Eqn 0 1 2 3 In(1) Out(1) {i} {} {a} {a} {a} Out(1) In(2) {} {a,i} {a,i} {a,i} In(2) Out(2) {j} {} {a,i} {a,i} {a,i} Out(2) In(3) {} {a,i,j} {a,i,j} {a,i,j} In(3) {a,i,j} Out(3) {t1,t2,t3,j} {} {a,i,j} {a,i,j} {a,i,j} Out(3) In(3) In(4) {} {a,i} {a,i,j} {a,i,j} In(4) {i} Out(4) {i} {} {a,i} {a,i} {a,i} Out(4) In(2) In(5) {} {a} {a,i} {a,i} In(5) Out(5) {i} {} {a} {a} {a} Out(5) In(6) {} {a,i} {a,i} {a,i} In(6) {a,i} Out(6) {t4,t5,i} {} {a,i} {a,i} {a,i} Out(6) In(6) In(Exit) {} {a} {a,i} {a,i} In(Exit) {a} {} {a} {a} {a} Compiler Design Data-Flow Analysis CSE 504 17 / 20 Reaching Definitions Other Analyses An assignment of the form x = e for some expression e is said to define x. A definition at statement s 1 reaches another statement s 2 if: there is some control flow path from s 1 to s 2, such that there is no other definition of x on the path from s 1 to s 2. Let In(s) be the set of all definitions that reach s. Let Out(s) be the set of all definitions that reach all the immediate successors of s. Then Out(s) = gen(s) (In(s) kill(s)), where gen(s) is the set of definitions generated by s, and kill(s)) is the set of definitions with the same lhs variables as those in s. In(s) = t pred (s) Out(t) Compiler Design Data-Flow Analysis CSE 504 18 / 20

Other Analyses Reaching Definitions vs. Live Variables Live Variables: In and Out are the smallest sets such that In(s) = use(s) (Out(s) def (s)) Out(s) = In(t) t succ(s) Reaching Definitions: In and Out are the smallest sets such that In(s) = Out(t) t pred (s) Out(s) = gen(s) (In(s) kill(s)) The form of equations is identical, and they can be computed using the same procedure, except: Live Variables are best computed backwards through the flow graph (information goes from successors to predecessors). Reaching Definitions are best computed forwards through the flow graph (information goes from predecessors to successors). Compiler Design Data-Flow Analysis CSE 504 19 / 20 Available Expressions Other Analyses An expression e is available at statement s if, for every path that reaches s 1, there is some statement s where e is evaluated. Let In(s) be the set of all expressions available immediately before s is evaluated. Let Out(s) be the set of all expressions available immediately after s is evaluated. Then Out(s) = gen(s) (In(s) kill(s)), where gen(s) is the set of all expressions evaluated in s, and kill(s) is the set of all expressions that use the lhs variables defined in s. In(s) = t pred (s) Out(t) In and Out are the greatest sets that satisfy the above equations. Compiler Design Data-Flow Analysis CSE 504 20 / 20