Topics on Compilers

Similar documents
DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =...

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =...

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University

Computing Static Single Assignment (SSA) Form

What is SSA? each assignment to a variable is given a unique name all of the uses reached by that assignment are renamed

COMP 515: Advanced Compilation for Vector and Parallel Processors

: Advanced Compiler Design Compu=ng DF(X) 3.4 Algorithm for inser=on of φ func=ons 3.5 Algorithm for variable renaming

MIT Loop Optimizations. Martin Rinard

CSC D70: Compiler Optimization Static Single Assignment (SSA)

Lecture 11: Data Flow Analysis III

Lecture Overview SSA Maximal SSA Semipruned SSA o Placing -functions o Renaming Translation out of SSA Using SSA: Sparse Simple Constant Propagation

Construction of Static Single-Assignment Form

Advanced Compiler Construction

' $ Dependence Analysis & % 1

Dataflow Analysis. A sample program int fib10(void) { int n = 10; int older = 0; int old = 1; Simple Constant Propagation

Compiler Optimisation

Compiler Design Spring 2017

(b). Identify all basic blocks in your three address code. B1: 1; B2: 2; B3: 3,4,5,6; B4: 7,8,9; B5: 10; B6: 11; B7: 12,13,14; B8: 15;

Lecture 10: Data Flow Analysis II

Saturday, April 23, Dependence Analysis

COSE312: Compilers. Lecture 17 Intermediate Representation (2)

Class 5. Review; questions. Data-flow Analysis Review. Slicing overview Problem Set 2: due 9/3/09 Problem Set 3: due 9/8/09

Compiler Design. Data Flow Analysis. Hwansoo Han

Homework #2: assignments/ps2/ps2.htm Due Thursday, March 7th.

Tute 10. Liam O'Connor. May 23, 2017

Advanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen

CS 553 Compiler Construction Fall 2006 Homework #2 Dominators, Loops, SSA, and Value Numbering

Please give details of your answer. A direct answer without explanation is not counted.

: Compiler Design. 9.5 Live variables 9.6 Available expressions. Thomas R. Gross. Computer Science Department ETH Zurich, Switzerland

Fast Inference with Min-Sum Matrix Product

EE263 Review Session 1

Path Testing and Test Coverage. Chapter 9

Path Testing and Test Coverage. Chapter 9

ECE 5775 (Fall 17) High-Level Digital Design Automation. Scheduling: Exact Methods

Turing Machines Part II

Machine Learning for Data Science (CS4786) Lecture 24

Algorithm Design CS 515 Fall 2015 Sample Final Exam Solutions

Topic-I-C Dataflow Analysis

University of the Virgin Islands, St. Thomas January 14, 2015 Algorithms and Programming for High Schoolers. Lecture 5

Data-Flow Analysis. Compiler Design CSE 504. Preliminaries

Stacks. Definitions Operations Implementation (Arrays and Linked Lists) Applications (system stack, expression evaluation) Data Structures 1 Stacks

Solutions. Problem 1: Suppose a polynomial in n of degree d has the form

Nonnegative Matrices I

Dataflow Analysis Lecture 2. Simple Constant Propagation. A sample program int fib10(void) {

CMSC 631 Program Analysis and Understanding. Spring Data Flow Analysis

Time Analysis of Sorting and Searching Algorithms

Compiler Design Spring 2017

Chapter 7 Network Flow Problems, I

Lecture 5 Introduction to Data Flow Analysis

Verification of Recursive Programs. Andreas Podelski February 8, 2012

Search Algorithms. Analysis of Algorithms. John Reif, Ph.D. Prepared by

NETWORK THEORY (BEES2211)

Heaps Induction. Heaps. Heaps. Tirgul 6

Proof of Theorem 1. Tao Lei CSAIL,MIT. Here we give the proofs of Theorem 1 and other necessary lemmas or corollaries.

Fundamental Algorithms

Data Flow Analysis (I)

Fundamental Algorithms

Dataflow analysis. Theory and Applications. cs6463 1

CSE P 501 Compilers. Value Numbering & Op;miza;ons Hal Perkins Winter UW CSE P 501 Winter 2016 S-1

DOMINO TILING. Contents 1. Introduction 1 2. Rectangular Grids 2 Acknowledgments 10 References 10

CSE548, AMS542: Analysis of Algorithms, Fall 2017 Date: Oct 26. Homework #2. ( Due: Nov 8 )

CS5371 Theory of Computation. Lecture 18: Complexity III (Two Classes: P and NP)

Syntax Directed Transla1on

Advanced Analysis of Algorithms - Midterm (Solutions)

The ordering on permutations induced by continuous maps of the real line

arxiv: v1 [math.co] 3 Nov 2014

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

Probe Placement Problem

Models of Computation, Recall Register Machines. A register machine (sometimes abbreviated to RM) is specified by:

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

5. Program Correctness: Mechanics

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

VECTOR ALGEBRA. 3. write a linear vector in the direction of the sum of the vector a = 2i + 2j 5k and

Control Dependence. Stanford University CS243 Winter 2006 Wei Li 1

Lecture 5: Loop Invariants and Insertion-sort

NUMERICAL MATHEMATICS & COMPUTING 7th Edition

Computational Models Lecture 8 1

Solution suggestions for examination of Logic, Algorithms and Data Structures,

Department of Computer Science and Engineering, IIT Kanpur, INDIA. Code Optimization. Sanjeev K Aggarwal Code Optimization 1 of I

Problem-Solving via Search Lecture 3

An Introduction to Exponential-Family Random Graph Models

Lecture 4. Quicksort

Simplex method(s) for solving LPs in standard form

Computational Models Lecture 8 1

The Reachability-Bound Problem. Gulwani and Zuleger PLDI 10

CS 1110: Introduction to Computing Using Python Sequence Algorithms

Linear Selection and Linear Sorting

Lecture 13: Spectral Graph Theory

CSC D70: Compiler Optimization Dataflow-2 and Loops

Randomized Sorting Algorithms Quick sort can be converted to a randomized algorithm by picking the pivot element randomly. In this case we can show th

Solutions. Prelim 2[Solutions]

Yangians In Deformed Super Yang-Mills Theories

Premaster Course Algorithms 1 Chapter 3: Elementary Data Structures

Lecture 2. Tensor Unfoldings. Charles F. Van Loan

Chapter 6 Additional Topics in Trigonometry, Part II

Chapter 7. Inclusion-Exclusion a.k.a. The Sieve Formula

Math.3336: Discrete Mathematics. Chapter 9 Relations

Searching Algorithms. CSE21 Winter 2017, Day 3 (B00), Day 2 (A00) January 13, 2017

Section Summary. Relations and Functions Properties of Relations. Combining Relations

Algorithms. Jordi Planes. Escola Politècnica Superior Universitat de Lleida

Transcription:

Assignment 2 4541.775 Topics on Compilers Sample Solution 1. Test for dependences on S. Write down the subscripts. Which positions are separable, which are coupled? Which dependence test would you apply to each position? a) for (k=0; k<100; k++) { for (j=0; j<100; j++) { S A[i+1,j+1,k+1] = A[i,j,1] + c; <i+1, i>: separable, strong SIV test <j+1, j>: separable, strong SIV test <k+1, 1>: separable, weak-zero SIV test b) for (k=0; k<100; k++) { for (j=0; j<100; j++) { S A[i+1,j+k+1,k+1] = A[i,j,k] + c; <i+1, i>: separable, strong SIV test ( Δi = 1 direction vector < ) <j+k+1, j>, <k+1, k>: coupled, Delta test: <k+1,k>: strong SIV test: k+1 = k + Δk Δk = 1 direction vector < <j+k+1,j>: j + k + 1 = j + Δj Δj = k + 1. k >= 0 Δj > 0 direction vec. < c) for (k=0; k<100; k++) { for (j=0; j<100; j++) { S A[i+1,j+k+1,i] = A[i,j,2] + c; <i+1,i>, <i, 2>: coupled, Delta test: <i, 2>: weak-zero test: i = 2 <i+1, i>: Δi = 1 dependence between i=2 and i = i+ Δi = 3. <j+k+1, j>: separate, solve as in (b) above. Note that k is unconstrained, i.e., the direction vector for k is (*)

2. Compute the entire set of direction vectors for all potential dependences in the loop. What type of dependence are we dealing with? What test would we apply to test for dependence? for (k=0; k<100; k++) { for (j=0; j<100; j++) { S1 A[i+1,j+4,k+1] = B[i,j,k] + c; S2 B[i+j,5,k+1] = A[2,k,k] + c; S1 S2 on A: <i+1,2>, <j+4,k>, <k+1,k> <i+1, 2>: separable, the weak-zero test yields i = 1, direction unbound. <j+4, k>, <k+1, k>: coupled, applying Delta test: <k+1, k>: the strong SIV test yields a distance vector of d = 1. <j+4, k>: propagating the distance constraint for k yields j + 4 = k + Δk = k + 1 j = k 3, direction unbound. D = (<, *, *). This is in any possible case a loop-carried level-1 true dependence. S2 S1 on B: <i+j, i>, <5, j>, <k+1, k> <k+1, k>: separable, the strong SIV test yields a distance of Δk = 1. <i+j, i>, <5, j>: coupled, applying Delta test: <5, j>: the weak-zero SIV test yields 5 = j (unconstrained direction) <i+j,i>: i + j = i + Δi Δi = j. Since j >= 0, the direction is either = or <. D = (<, *, =/<). Again, this is in any possible scenario a loop-carried level-1 true dep. 3. Construct valid breaking conditions for the following examples a) S A[i+ix] = A[i] + c; we have dependence if i + ix = i. Taking the loop bounds into consideration this is the case if -100 < ix < 100. This is the breaking condition, i.e., if ((-100 < ix) && (ix < 100)) { // the dependence manifests else { // loop can be directly parallelized

b) for (k=0; k<100; k++) { for (j=0; j<100; j++) { S A[i+1,j+1,k+1] = A[i,jx,1] + c; Determining the distance vector for S: <i+1,i>, <j+1,jx>, <k+1, 1> <k+1, 1>: dependence only if k = 0 (from the weak-zero SIV test) <i+1, i>: dependence if i = i+1, i.e., the distance is 1 (strong SIV test) <j+1, jx>: dependence if j+1 = jx. Considering the loop bounds this is only possible if jx = [1...100]. Here, it is best to treat the breaking condition on k separately, and only test for jx if the breaking condition for k does not hold: if (k > 0) { // no dependence possible, parallelize at will else { // A level-1 loop-carried dependence exists, but the level-2 dependence may or // may not manifest itself if ((0 < jx) && (jx < 101)) { // the level-2 dependence exists else { // no level-2 dependence, parallelize the j/i loop at will

4. Construct the CFG for the following program. Which blocks are included in the dominance frontier of variable a defined at line 18? 01 procedure foo(n: integer): integer; 02 var i,sum: integer; 03 begin 04 a := 0; 05 b := 1; 06 sum := 0; 07 if (n < 0) then begin 08 a := 5 09 end; 10 if (n = 1001) then begin 11 Result := 0; 12 Exit 13 end; 14 i := 0; 15 while (i < n) do begin 16 sum := sum + a; 17 if (i < 5) then begin 18 a := b + a; 19 if (sum > 100) then begin 20 b := 2 21 end else begin 22 b := 3 23 end 24 end; 25 if (a > 20) then break; 26 i := i+1 27 end; 28 Result := sum; 29 end;

CFG: (number in round circles denote block numbers) entry 04 a := 05 b := 06 sum := 07 if (n < 1 08 a := 2 10 if (n = 3 11 Result := 4 14 if (n = 5 6 15 while (i < n) 7 16 sum := sum + a 17 if (i < 11 9 25 if (a > 18 a := b + a 19 if (sum < 20 b := 22 b := 8 10 12 26 i := i + 13 28 Result := sum exit Block 8 (which contains the statement 18 a := b + a) only dominates blocks 9 and 10. The only successor of these blocks is block 11, the only block in the dominance frontier of block 8.

5. Construct the SSA graph for the program shown in the previous problem. Dominator tree: entry exit 1 2 5 3 4 Placement of Φ nodes for variable a: init: worklist = { 1, 2, 8 node = 1; worklist = { 2, 8 DF(1) = { exit node = 2; worklist = { 8 DF(2) = { 3 insert Φ node into 3, add 3 to worklist ({ 8, 3 ) 13 12 6 11 7 8 9 10 node = 8; worklist = { 3 DF(8) = { 11 insert Φ node into 11, add 11 to worklist ({ 3, 11 ) node = 3; worklist = { 11 DF(3) = { exit node = 11; worklist = { DF(11) = { 6, 13 insert Φ node into 6, 13, add 6, 13 to worklist ({ 6, 13 ) node = 6; worklist = { 13 DF(6) = { exit node = 13; worklist = { DF{13 = { exit

Renaming variables (only considering a): count = 0; stack = { search(entry); entry: no assigments, no Φ nodes in successors, search children { exit, 1 exit: no assigments, no successors, no children 1: assigment to a (line 4): a a0, count = 1; stack = { 0 Φ node in successor 3: replace first operand by a0: a = Φ(a0, a) search children { 2, 3 2: assigment to a (line 8): a a1, count = 2; stack = { 0, 1 Φ node in successor 3: replace second operand by a1: a = Φ(a0, a1) no children, pop stack = { 0 3: Φ node assignment: a a2 = Φ(a0, a1), count = 3, stack = {0, 2 no Φ nodes in successors, search children { 4, 5 4: no assigments, no Φ nodes in successors, no children 5: no assignments Φ node in successor 6: replace first operand by a2: a = Φ(a2, a) search children (6) 6: Φ node assignment a a3 = Φ(a2, a), count = 4; stack = { 0, 2, 3 Φ node in successor 13: replace first operand by a3: a = Φ(a3, a) search children { 7, 13 13: Φ node assignment a a4 = Φ(a3, a), count = 5, stack = { 0, 2, 3, 4 no Φ nodes in successors, no children, pop stack = { 0, 2, 3 7: replace use of a (line 16) by a3: sum := sum + a3, no assigment to a, Φ node in successor 11: replace first operand by a3: a = Φ(a3, a, a) search children (8, 11) 8: replace use of a (line 18) by a3: a := b + a3 assignment to a (line 18): a a5, count = 6, stack = { 0, 2, 3, 5 no Φ nodes in successors, search children {9, 10 9: no uses of, assigments to a Φ node in successor 11: replace 2 nd operand by a5: a = Φ(a3, a5, a) no children 10: no uses of, assignments to a Φ node in successor 11: replace 3 rd operand by a5: a=φ(a3, a5, a5) pop stack = { 0, 2, 3 11: Φ node assignment: a a6, count = 7, stack = { 0, 2, 3, 6 replace use of a (line 25) by a6: if (a6 > 20) Φ node in successor 13: replace 2 nd operand by a6: a4 = Φ(a3, a6) search children { 12 12: no uses of, assignments to a Φ node in successor 6: replace 2 nd operand by a6: a3 = Φ(a2, a6) no children pop stack = { 0, 2, 3 (node 6) pop stack = { 0, 2 (node 3) pop stack = { 0 (node 1) pop stack = {

Final code (only variable a in SSA form): 01 procedure foo(n: integer): integer; 02 var i,sum: integer; 03 begin 04 a0 := 0; 05 b := 1; 06 sum := 0; 07 if (n < 0) then begin 08 a1 := 5 09 end; a2 = Φ(a0, a1) 10 if (n = 1001) then begin 11 Result := 0; 12 Exit 13 end; 14 i := 0; 15 a3 = Φ(a2, a6); while (i < n) do begin 16 sum := sum + a3; 17 if (i < 5) then begin 18 a5 := b + a3; 19 if (sum > 100) then begin 20 b := 2 21 end else begin 22 b := 3 23 end 24 end; a6 = Φ(a3, a5, a5); 25 if (a6 > 20) then break; 26 i := i+1 27 end; a4 = Φ(a3, a6); 28 Result := sum; 29 end;

6. Use the algorithm vectorize from chapter 4 to vectorize the following code: for (k=0; k<100; k++) { for (j = 0; j<100; j++) { S1 B[1,j,k] = A[1,j-1,k]; S2 A[i+1,j,k] = B[i,100-j,k] + c; 1. construct dependence graph S1 S2 on B <k, k>, <j, 100-j>, <1, i> S1 δ 2 <k, k>: strong SIV, D = (=) <j, 100-j>: weak-crossing SIV: D = (*) <1, i>: ZIV: loop-independent true dependence δ 2-1 δ D = (=, *) = { (=, <), (=, =), (=, >) S2 S2 S1 on A <k, k>, <j, j-1>, <i+1, 1> <k, k>: strong SIV, D = (=) <j, j-1>: strong SIV, D = (<) <i+1, 1>: ZIV: loop-independent anti-dependence D = (=, <) S1 δ 2 δ 2-1 δ 2 δ -1 δ S2 2. call vectorize() for the whole loop nest vectorize(loop nest, level=1, D) { S1, S2 are strongly connected and the only such segment, hence π 1 = { S1,S2 π 1 is cyclic: for (k=0; k<100; k++) { vectorize( π1, level=2, D) vectorize(j,i-loop, level=2, D) strip all dependences < 2 (none), so the only π block is still { S1, S2 π 1 is still cyclic: for (j=0; j<100; j++) { vectorize( π1, level=3, D)

vectorize(i-loop, level=3, D) strip all dependences < 2, yielding the dependence graph S1 no stronly connected components, two π blocks: { S1, { S2 topological sorting: π 1 = { S1, π 2 = { S2 δ δ -1 π 1 not cyclic, enclosing loops: 2 vectorize in 2 3 + 1 = 0 dimensions (i.e., no dimension) B[1,j,k] = A[1,j-1,k]; S2 π 2 not cyclic, enclosing loops: 3 vectorize in 3 3 + 1 = 1 dimension: A[1:100,j,k] = B[0:99,100-j,k] + c; complete code: for (k=0; k<100; k++) { S1 for (j = 0; j<100; j++) { B[1,j,k] = A[1,j-1,k]; S2 A[1:100,j,k] = B[9:99,100-j,k] + c;