Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today

Similar documents
Loop Interchange. Loop Transformations. Taxonomy. do I = 1, N do J = 1, N S 1 A(I,J) = A(I-1,J) + 1 enddo enddo. Loop unrolling.

Advanced Compiler Construction

Advanced Restructuring Compilers. Advanced Topics Spring 2009 Prof. Robert van Engelen

Saturday, April 23, Dependence Analysis

Homework #2: assignments/ps2/ps2.htm Due Thursday, March 7th.

' $ Dependence Analysis & % 1

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1

Lecture 23: I. Data Dependence II. Dependence Testing: Formulation III. Dependence Testers IV. Loop Parallelization V.

The Problem: Mapping programs to architectures

Classes of data dependence. Dependence analysis. Flow dependence (True dependence)

Dependence analysis. However, it is often necessary to gather additional information to determine the correctness of a particular transformation.

Organization of a Modern Compiler. Middle1

Dependence Analysis. Dependence Examples. Last Time: Brief introduction to interprocedural analysis. do I = 2, 100 A(I) = A(I-1) + 1 enddo

Earlier this week we defined Boolean atom: A Boolean atom a is a nonzero element of a Boolean algebra, such that ax = a or ax = 0 for all x.


Loop Parallelization Techniques and dependence analysis

COMP 515: Advanced Compilation for Vector and Parallel Processors

Discrete Structures for Computer Science

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =...

Solution of Linear Systems

Data Dependences and Parallelization. Stanford University CS243 Winter 2006 Wei Li 1

Announcements Monday, November 13

CS 1110: Introduction to Computing Using Python Sequence Algorithms

(Refer Slide Time: 1:13)

Lecture 1: Asymptotic Complexity. 1 These slides include material originally prepared by Dr.Ron Cytron, Dr. Jeremy Buhler, and Dr. Steve Cole.

March 19 - Solving Linear Systems

CS5371 Theory of Computation. Lecture 7: Automata Theory V (CFG, CFL, CNF)

Section Summary. Sequences. Recurrence Relations. Summations Special Integer Sequences (optional)

Announcements Monday, November 13

Announcements Wednesday, October 10

Lecture 19. Architectural Directions

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University

Review: From problem to parallel algorithm

5.1 Banded Storage. u = temperature. The five-point difference operator. uh (x, y + h) 2u h (x, y)+u h (x, y h) uh (x + h, y) 2u h (x, y)+u h (x h, y)

CS 598: Communication Cost Analysis of Algorithms Lecture 9: The Ideal Cache Model and the Discrete Fourier Transform

A FIRST COURSE IN LINEAR ALGEBRA. An Open Text by Ken Kuttler. Matrix Arithmetic

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014

Announcements Monday, November 06

Compiler Optimisation

Course Staff. Textbook

Linear Regression (continued)

Designing Information Devices and Systems I Spring 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Lecture 6 September 21, 2016

Lecture 2: Change of Basis

COMS 6100 Class Notes

1 Last time: linear systems and row operations

A Hybrid Method for the Wave Equation. beilina

Announcements Monday, September 18

COMP 515: Advanced Compilation for Vector and Parallel Processors. Vivek Sarkar Department of Computer Science Rice University

Section Summary. Sequences. Recurrence Relations. Summations. Examples: Geometric Progression, Arithmetic Progression. Example: Fibonacci Sequence

Lecture 12: Solving Systems of Linear Equations by Gaussian Elimination

Computational Fluid Dynamics Prof. Sreenivas Jayanti Department of Computer Science and Engineering Indian Institute of Technology, Madras

Cache Oblivious Stencil Computations

Linear Equations in Linear Algebra

Turing Machines Part III

Control System Design

DO i 1 = L 1, U 1 DO i 2 = L 2, U 2... DO i n = L n, U n. S 1 A(f 1 (i 1,...,i, n ),...,f, m (i 1,...,i, n )) =...

Sets. Slides by Christopher M. Bourke Instructor: Berthe Y. Choueiry. Spring 2006

LU Factorization a 11 a 1 a 1n A = a 1 a a n (b) a n1 a n a nn L = l l 1 l ln1 ln 1 75 U = u 11 u 1 u 1n 0 u u n 0 u n...

Scientific Computing: An Introductory Survey

Week 11 Sample Means, CLT, Correlation

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Thermodynamics: Reversibility and Carnot

Topics in Algorithms. 1 Generation of Basic Combinatorial Objects. Exercises. 1.1 Generation of Subsets. 1. Consider the sum:

CS 188: Artificial Intelligence Spring Announcements

Announcements. CS243: Discrete Structures. Sequences, Summations, and Cardinality of Infinite Sets. More on Midterm. Midterm.

5. Surface Temperatures

CS 221 Lecture 9. Tuesday, 1 November 2011

CSE 160 Lecture 13. Numerical Linear Algebra

1 Closest Pair of Points on the Plane

It ain t no good if it ain t snappy enough. (Efficient Computations) COS 116, Spring 2011 Sanjeev Arora

Solving the Generalized Poisson Equation Using the Finite-Difference Method (FDM)

11.5 Reduction of a General Matrix to Hessenberg Form

High-Performance Scientific Computing

Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs

Tues Feb Vector spaces and subspaces. Announcements: Warm-up Exercise:

GRAPHITE Two Years After First Lessons Learned From Real-World Polyhedral Compilation

Numerical Linear Algebra

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

CSE 241 Class 1. Jeremy Buhler. August 24,

Morpheus: Neo, sooner or later you re going to realize, just as I did, that there s a difference between knowing the path, and walking the path.

CS100: DISCRETE STRUCTURES. Lecture 3 Matrices Ch 3 Pages:

Image Reconstruction And Poisson s equation

Math 360 Linear Algebra Fall Class Notes. a a a a a a. a a a

Linear Programming and its Extensions Prof. Prabha Shrama Department of Mathematics and Statistics Indian Institute of Technology, Kanpur

Lecture 5: Loop Invariants and Insertion-sort

Practicality of Large Scale Fast Matrix Multiplication

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 2. Systems of Linear Equations

Solutions. Problem 1: Suppose a polynomial in n of degree d has the form

MAT 1302B Mathematical Methods II

Lecture 6 & 7. Shuanglin Shao. September 16th and 18th, 2013

Finite Difference Methods (FDMs) 1

Kasetsart University Workshop. Multigrid methods: An introduction

Sorting. CMPS 2200 Fall Carola Wenk Slides courtesy of Charles Leiserson with changes and additions by Carola Wenk

Compiling for Parallelism & Locality. Example. Announcement Need to make up November 14th lecture. Last time Data dependences and loops

Solving Ax = b, an overview. Program

Announcements. Assignment 5 due today Assignment 6 (smaller than others) available sometime today Midterm: 27 February.

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Sequences are ordered lists of elements

Mathematics 13: Lecture 10

Transcription:

Loop Transformations Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today Today Recall stencil computations Intro to loop transformations Data dependencies between loop iterations CS553 Lecture Stencil Computations and Loop Transformations 1

Stencil Computation Stencil Computations Used to solve partial differential equations, in graphics, and cellular automata Computations operate over some mesh or grid Computation is modifying the value of something over time or as part of a relaxation to find steady state Each computation has some nearest neighbor(s) data dependence pattern The coefficients multiplied by neighbor can be constant or variable 1D data, 1D time Stencil Computation version 1 <demo in class> // assume A[0,i] initialized to some values! for (t=1; t<(t+1); t++) {! for (i=1; i<(n-1); i++) {! }! A[t,i] = 1/3 * (A[t-1,i-1] + A[t-1,i] + A[t-1,i+1];! } CS553 Lecture Stencil Computations and Loop Transformations 2

1D Stencil Computation (take 2) 1D data, 2D time Stencil Computation, version 2 <demo in class> // assume A[i] initialized to some values! for (t=0; t<t; t++) {! }!! for (i=1; i<(n-1); i++) {! }! Analysis A[i] = 1/3 * (A[i-1] + A[i] + A[i+1];! Are version 1 and version 2 computing the same thing? What is the operational intensity of version 1 versus version 2? What parallelism is there in version 1 versus version 2? Where is the data reuse in version 1 versus version 2?!! CS553 Lecture Stencil Computations and Loop Transformations 3

Jacobi in SWM code (Stencil Computation with Explicit Weights) Source: David Randall s research group DO j = 2,jm-1! DO i = 2,im-1! x2(i,j) = area_inv(i,j)* &! ENDDO! ENDDO! ((x1(i+1,j )-x1(i,j))*laplacian_wghts(1,i,j)+ &! (x1(i+1,j+1)-x1(i,j))*laplacian_wghts(2,i,j)+ &! (x1(i,j+1)-x1(i,j))*laplacian_wghts(3,i,j)+ &! (x1(i-1,j )-x1(i,j))*laplacian_wghts(4,i,j)+ &! (x1(i-1,j-1)-x1(i,j))*laplacian_wghts(5,i,j)+ &! (x1(i,j-1)-x1(i,j))*laplacian_wghts(6,i,j))! CS553 Lecture Stencil Computations and Loop Transformations 4

The Problem: Mapping programs to architectures Goal: keep each core as busy as possible Challenge: improve data locality (reuse data while in cache) and leverage parallelism From Modeling Parallel Computers as Memory Hierarchies by B. Alpern and L. Carter and J. Ferrante, 1993. 9/24/14: From running lstopo output-format pdf > out.pdf CS553 Lecture Stencil Computations and Loop Transformations 5

Loop Transformations Used to improve data locality and expose parallelism Compiler requirements Determine which loop transformations in which order are legal (aka satisfy the dependencies) Determine which loop transformations in which order are going to improve performance Loop permutation as a case study in the next set of slides CS553 Lecture Stencil Computations and Loop Transformations 6

Loop Permutation for Improved Locality Sample code: Assume Fortran s Column Major Order array layout do j = 1,6 do i = 1,5 A(j,i) = A(j,i)+1 do i = 1,5 do j = 1,6 A(j,i) = A(j,i)+1 j i 1 2 3 4 5 j i 1 7 13 19 25 6 7 8 9 10 2 8 14 20 26 11 12 13 14 15 3 9 15 21 27 16 17 18 19 20 4 10 16 22 28 21 22 23 24 25 5 11 17 23 29 26 27 28 28 30 6 12 18 24 30 poor cache locality good cache locality CS553 Lecture Stencil Computations and Loop Transformations 7

Loop Permutation Legality Sample code do j = 1,6 do i = 1,5 A(j,i) = A(j,i)+1 do i = 1,5 do j = 1,6 A(j,i) = A(j,i)+1 Why is this legal? No loop-carried dependences, so we can arbitrarily change order of iteration execution Does the loop always have to have NO inter-iteration dependences for loop permutation to be legal? CS553 Lecture Stencil Computations and Loop Transformations 8

Data Dependences Recall A data dependence defines ordering relationship two between statements In executing statements, data dependences must be respected to preserve correctness Example s 1 a := 5; s 1 a := 5;? s 2 b := a + 1; s 3 a := 6; s 3 a := 6; s 2 b := a + 1; CS553 Lecture Stencil Computations and Loop Transformations 9

Dependences and Loops Loop-independent dependences do i = 1,100 A(i) = B(i)+1 C(i) = A(i)*2 Dependences within the same loop iteration Loop-carried dependences do i = 1,100 A(i) = B(i)+1 C(i) = A(i-1)*2 Dependences that cross loop iterations CS553 Lecture Stencil Computations and Loop Transformations 10

Data Dependence Terminology We say statement s 2 depends on s 1 True (flow) dependence: s 1 writes memory that s 2 later reads Anti-dependence: s 1 reads memory that s 2 later writes Output dependences: s 1 writes memory that s 2 later writes Input dependences: s 1 reads memory that s 2 later reads Notation: s 1 δ s 2 s 1 is called the source of the dependence s 2 is called the sink or target s 1 must be executed before s 2 CS553 Lecture Stencil Computations and Loop Transformations 11

Yet Another Loop Permutation Example Consider another example do i = 1,n do j = 1,n C(i,j) = C(i+1,j-1) do j = 1,n do i = 1,n C(i,j) = C(i+1,j-1) Before After (1,1) C(1,1) = C(2,0) (1,2) C(1,2) = C(2,1) (1,1) C(1,1) = C(2,0) (2,1) C(2,1) = C(3,0)... (2,1) C(2,1) = C(3,0) δ a... (1,2) C(1,2) = C(2,1) δ f CS553 Lecture Stencil Computations and Loop Transformations 12

Data Dependences and Loops How do we identify dependences in loops? do i = 1,5 A(i) = A(i-1)+1 Simple view Imagine that all loops are fully unrolled Examine data dependences as before Problems - Impractical and often impossible - Lose loop structure A(1) = A(0)+1 A(2) = A(1)+1 A(3) = A(2)+1 A(4) = A(3)+1 A(5) = A(4)+1 CS553 Lecture Stencil Computations and Loop Transformations 13

Iteration Spaces Idea Explicitly represent the iterations of a loop nest Example Iteration Space do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j-1)+1 Iteration Space i A set of tuples that represents the iterations of a loop Can visualize the dependences in an iteration space j CS553 Lecture Stencil Computations and Loop Transformations 14

Distance Vectors Idea Concisely describe dependence relationships between iterations of an iteration space For each dimension of an iteration space, the distance is the number of iterations between accesses to the same memory location Definition v = i T - i S Example do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j-2)+1 outer loop Distance Vector: (1,2) inner loop CS553 Lecture Stencil Computations and Loop Transformations 15 j i

Distance Vectors and Loop Transformations Idea Any transformation we perform on the loop must respect the dependences Example do i = 2,7 do j = 3,7 A(i,j) = A(i-1,j-2)+1 j i Can we permute the i and j loops? CS553 Lecture Stencil Computations and Loop Transformations 16

Distance Vectors and Loop Transformations Idea Any transformation we perform on the loop must respect the dependences Example do j = 1,5 do i = 1,6 A(i,j) = A(i-1,j-2)+1 j i Can we permute the i and j loops? Yes CS553 Lecture Stencil Computations and Loop Transformations 17

Distance Vectors: Legality Definition A dependence vector, v, is lexicographically nonnegative when the leftmost entry in v is positive or all elements of v are zero Yes: (0,0,0), (0,1), (0,2,-2) No: (-1), (0,-2), (0,-1,1) A dependence vector is legal when it is lexicographically nonnegative (assuming that indices increase as we iterate) Why are lexicographically negative distance vectors illegal? What are legal direction vectors? CS553 Lecture Stencil Computations and Loop Transformations 18

Example where permutation is not legal Sample code do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j+1)+1 j i Kind of dependence: Flow Distance vector: (1, -1) CS553 Lecture Stencil Computations and Loop Transformations 19

Exercise Sample code do j = 1,5 do i = 1,6 A(i,j) = A(i-1,j+1)+1 j i Kind of dependence: Anti Distance vector: (1, -1) CS553 Lecture Stencil Computations and Loop Transformations 20

Loop-Carried Dependences Definition A dependence D=(d 1,...d n ) is carried at loop level i if d i is the first nonzero element of D Example do i = 1,6 do j = 1,6 A(i,j) = B(i-1,j)+1 B(i,j) = A(i,j-1)*2 Distance vectors: (0,1) for accesses to A (1,0) for accesses to B Loop-carried dependences The j loop carries dependence due to A The i loop carries dependence due to B CS553 Lecture Stencil Computations and Loop Transformations 21

Direction Vector Definition A direction vector serves the same purpose as a distance vector when less precision is required or available Element i of a direction vector is <, >, or = based on whether the source of the dependence precedes, follows or is in the same iteration as the target in loop i Example do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j-1)+1 Direction vector: Distance vector: (<,<) (1,1) j i CS553 Lecture Stencil Computations and Loop Transformations 22

Legality of Loop Permutation Case analysis of the direction vectors (=,=) The dependence is loop independent, so it is unaffected by permutation (=,<) The dependence is carried by the j loop. After permutation the dependence will be (<,=), so the dependence will still be carried by the j loop, so the dependence relations do not change. (<,=) The dependence is carried by the i loop. After permutation the dependence will be (=,<), so the dependence will still be carried by the i loop, so the dependence relations do not change. CS553 Lecture Stencil Computations and Loop Transformations 23

Legality of Loop Interchange (cont) Case analysis of the direction vectors (cont.) (<,<) The dependence distance is positive in both dimensions. After permutation it will still be positive in both dimensions, so the dependence relations do not change. (<,>) The dependence is carried by the outer loop. After interchange the dependence will be (>,<), which changes the dependences and results in an illegal direction vector, so interchange is illegal. (>,*) (=,>) Such direction vectors are not possible for the original loop. CS553 Lecture Stencil Computations and Loop Transformations 24

Loop Interchange Example Consider the (<,>) case do i = 1,n do j = 1,n C(i,j) = C(i+1,j-1) do j = 1,n do i = 1,n C(i,j) = C(i+1,j-1) Before After (1,1) C(1,1) = C(2,0) (1,2) C(1,2) = C(2,1) (1,1) C(1,1) = C(2,0) (2,1) C(2,1) = C(3,0)... (2,1) C(2,1) = C(3,0) δ a... (1,2) C(1,2) = C(2,1) δ f CS553 Lecture Stencil Computations and Loop Transformations 25

Concepts Stencil Computations PDEs, graphics, and cellular automata Memory-bandwidth bound Lots of parallelism Loop Transformations Memory layout for Fortran and C Loop permutation and when it is applicable Data dependences including distance vectors, loop carried dependences, and direction vectors CS553 Lecture Stencil Computations and Loop Transformations 26

Next Time Study for the midterm Do example problems from slides. Do example problems from previous midterms and finals. Ask if you cannot determine what problems are relevant for our class so far. Lecture Midterm review CS553 Lecture Stencil Computations and Loop Transformations 27

Next Time Reading Advanced Compiler Optimizations for Supercomputers by Padua and Wolfe Lecture Parallelization and Performance Optimization of Applications CS553 Lecture Stencil Computations and Loop Transformations 28