The Problem: Mapping programs to architectures

Similar documents
Compiling for Parallelism & Locality. Example. Announcement Need to make up November 14th lecture. Last time Data dependences and loops

Common loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule:

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Announcements PA2 due Friday Midterm is Wednesday next week, in class, one week from today

MEM 255 Introduction to Control Systems Review: Basics of Linear Algebra

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Feb 14: Spatial analysis of data fields

Structure and Drive Paul A. Jensen Copyright July 20, 2003

PHYS 705: Classical Mechanics. Calculus of Variations II

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

A 2D Bounded Linear Program (H,c) 2D Linear Programming

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41,

Feature Selection: Part 1

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Design and Analysis of Algorithms

Report on Image warping

The Study of Teaching-learning-based Optimization Algorithm

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Lecture 11. minimize. c j x j. j=1. 1 x j 0 j. +, b R m + and c R n +

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Unit 5: Quadratic Equations & Functions

Lecture 3. Ax x i a i. i i

Calculation of time complexity (3%)

Problem Set 9 Solutions

Unque Sets Orented Parallelzaton of Loops wth Non-unform Dependences Example : Example : Example : do =, do =, do =, do =, do =, do =, A( + ; ) = A( +

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Numerical Heat and Mass Transfer

2.3 Nilpotent endomorphisms

Homework 9 Solutions. 1. (Exercises from the book, 6 th edition, 6.6, 1-3.) Determine the number of distinct orderings of the letters given:

Fundamental loop-current method using virtual voltage sources technique for special cases

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Lecture 21: Numerical methods for pricing American type derivatives

The Minimum Universal Cost Flow in an Infeasible Flow Network

Discussion 11 Summary 11/20/2018

VQ widely used in coding speech, image, and video

On the Multicriteria Integer Network Flow Problem

8.6 The Complex Number System

18.1 Introduction and Recap

a b a In case b 0, a being divisible by b is the same as to say that

MMA and GCMMA two methods for nonlinear optimization

Relaxation Methods for Iterative Solution to Linear Systems of Equations

Lecture 10 Support Vector Machines II

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

THEOREMS OF QUANTUM MECHANICS

Math 261 Exercise sheet 2

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

Lecture 4: Universal Hash Functions/Streaming Cont d

A Simple Inventory System

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

Lecture 3: Dual problems and Kernels

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

CHAPTER 14 GENERAL PERTURBATION THEORY

Solution of Linear System of Equations and Matrix Inversion Gauss Seidel Iteration Method

Inductance Calculation for Conductors of Arbitrary Shape

5 The Rational Canonical Form

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Perron Vectors of an Irreducible Nonnegative Interval Matrix

Support Vector Machines CS434

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

HMMT February 2016 February 20, 2016

Mathematical Preparations

Lecture 17: Lee-Sidford Barrier

Difference Equations

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

Min Cut, Fast Cut, Polynomial Identities

RELIABILITY ASSESSMENT

Lecture Randomized Load Balancing strategies and their analysis. Probability concepts include, counting, the union bound, and Chernoff bounds.

NP-Completeness : Proofs

Computing Correlated Equilibria in Multi-Player Games

Support Vector Machines

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

The Second Anti-Mathima on Game Theory

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Introduction to Algorithms

Basic Regular Expressions. Introduction. Introduction to Computability. Theory. Motivation. Lecture4: Regular Expressions

CSE 252C: Computer Vision III

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Computer-Aided Circuit Simulation and Verification. CSE245 Fall 2004 Professor:Chung-Kuan Cheng

Grid Generation around a Cylinder by Complex Potential Functions

Bernoulli Numbers and Polynomials

Section 8.3 Polar Form of Complex Numbers

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Errors for Linear Systems

Linear Approximation with Regularization and Moving Least Squares

Lecture Notes on Linear Regression

10-701/ Machine Learning, Fall 2005 Homework 3

Linear Feature Engineering 11

Open string operator quantization

CS 331 DESIGN AND ANALYSIS OF ALGORITHMS DYNAMIC PROGRAMMING. Dr. Daisy Tang

= z 20 z n. (k 20) + 4 z k = 4

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

The Geometry of Logit and Probit

Compilers. Spring term. Alfonso Ortega: Enrique Alfonseca: Chapter 4: Syntactic analysis

An Interactive Optimisation Tool for Allocation Problems

Transcription:

Complng for Parallelsm & Localty!Last tme! SSA and ts uses!today! Parallelsm and localty! Data dependences and loops CS553 Lecture Complng for Parallelsm & Localty 1 The Problem: Mappng programs to archtectures Goal: keep each core as busy as possble. Challenge: get the data to the core when t needs t From Modelng Parallel Computers as Memory Herarches by B. Alpern and L. Carter and J. Ferrante, 1993. From Sequoa: Programmng the Memory Herarchy by Fatahalan et al., 2006. CS553 Lecture Complng for Parallelsm & Localty 2

! A(,)! A()! A() Example 1: Loop Permutaton for Improved Localty!Sample code: Assume Fortran s Column Maor Order array layout = 1,6 = 1,5 = A(,)+1 do = 1,5 = 1,6! A(,) = A(,)+1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 28 30 poor cache localty good cache localty CS553 Lecture Complng for Parallelsm & Localty 3 1 7 13 19 25 2 8 14 20 26 3 9 15 21 27 4 10 16 22 28 5 11 17 23 29 6 12 18 24 30 Example 2: Parallelzaton!Can we parallelze the followng loops? = 1,100 = A()+1 1 2 3 4 5... Yes = 1,100 = A(-1)+1 1 2 3 4 5... No CS553 Lecture Complng for Parallelsm & Localty 4

! s! s! s! A() Data Dependences!Recall! A data dependence defnes orderng relatonshp two between statements! In executng statements, data dependences must be respected to preserve correctness!example 1 a := 5; s 1 a := 5;!? 2 b := a + 1; s 3 a := 6; 3 a := 6; s 2 b := a + 1; CS553 Lecture Complng for Parallelsm & Localty 5 Data Dependences and Loops!How do we dentfy dependences n loops? = 1,5 = A(-1)+1 A(1) = A(0)+1!Smple vew! Imagne that all loops are fully unrolled! Examne data dependences as before!problems!!impractcal and often mpossble!!lose loop structure A(2) = A(1)+1 A(3) = A(2)+1 A(4) = A(3)+1 A(5) = A(4)+1 CS553 Lecture Complng for Parallelsm & Localty 6

! A()! C()! A() Concepts needed for automatng loop transformatons!questons! How do we determne f a transformaton or parallelzaton s legal?! What abstracton do we use for loops?! How do we represent transformatons and parallelzaton?! How do we generate the transformed code?! How do we determne when a transformaton s gong to be benefcal?!today! Basc abstractons for loops and dependences and computng dependences!thursday! Abstractons for loop transformatons and determnng ther legalty! Code generaton after performng a loop transformaton CS553 Lecture Complng for Parallelsm & Localty 7 Dependences and Loops!Loop-ndependent dependences = 1,100 = B()+1 = A()*2 Dependences wthn the same loop teraton!loop-carred dependences = 1,100 = B()+1 C() = A(-1)*2 Dependences that cross loop teratons CS553 Lecture Complng for Parallelsm & Localty 8

!...! A(f(!...! Heurstcs! Use Dependence Testng n General!General code 1 = l 1,h 1 n = l n,h n 1,..., n )) =... A(g( 1,..., n ))!There exsts a dependence between teratons I=( 1,..., n ) and J=( 1,..., n ) when! f(i) = g(j)! (l 1,...l n ) < I,J < (h 1,...,h n )! I < J or J < I, where < s lexcographcally less CS553 Lecture Data Dependence Analyss 9 Algorthms for Solvng the Dependence Problem can say NO or MAYBE! GCD test (Baneree76,Towle76): determnes whether nteger soluton s possble, no bounds checkng! Baneree test (Baneree 79): checks real bounds! Independent-varables test (pg. 820): useful when nequaltes are not coupled! I-Test (Kong et al. 90): nteger soluton n real bounds! Lambda test (L et al. 90): all dmensons smultaneously! Delta test (Goff et al. 91): pattern matches for effcency! Power test (Wolfe et al. 92): extended GCD and Fourer Motzkn combnaton some form of Fourer-Motzkn elmnaton for ntegers, exponental worst-case! Parametrc Integer Programmng (Feautrer91)! Omega test (Pugh92) CS553 Lecture Data Dependence Analyss 10

! A(a*+c Dependence Testng!Consder the followng code = 1,5 A(3*+2) = A(2*+1)+1!Queston! How do we determne whether one array reference depends on another across teratons of an teraton space? CS553 Lecture Data Dependence Analyss 11 Dependence Testng: Smple Case!Sample code = l,h 1 ) =... A(a*+c 2 )!Dependence?! a* 1 +c 1 = a* 2 +c 2, or! a* 1 a* 2 = c 2 -c 1! Soluton may exst f a dvdes c 2 -c 1 CS553 Lecture Data Dependence Analyss 12

! A(a! A(4* GCD Test!Idea! Generalze test to lnear functons of terators/nducton varables!code = l,h = l,h 1 * + a 2 * + a 0 ) =... A(b 1 * + b 2 * + b 0 )...!Agan! a 1 * 1 - b 1 * 2 + a 2 * 1 b 2 * 2 = b 0 a 0! Soluton exsts f gcd(a 1,a 2,b 1,b 2 ) dvdes b 0 a 0 CS553 Lecture Data Dependence Analyss 13 Example!Code = l,h = l,h + 2* + 1) =... A(6* + 2* + 4)...!gcd(4,-6,2,-2) = 2!Does 2 dvde 4-1? CS553 Lecture Data Dependence Analyss 14

! A(,) Baneree Test for (=L; <=U; ++) {! x[a0 + a1*] =...!... = x[b0 + b1*]! }! Does a0 + a1* = b0 + b1* for some real and? If so then (a1* - b1* ) = (b0 - a0)! Determne upper and lower bounds on (a1* - b1* ) for (=1; <=5; ++) { x[+5] = x[]; } upper bound = a1*max() - b1 * mn( ) = 4 lower bound = a1*mn() - b1*max( ) = -4 b_0 - a_0 = CS553 Lecture Data Dependence Analyss 15 Example 1: Loop Permutaton (reprse)!sample code = 1,6 = 1,5 = A(,)+1 do = 1,5 = 1,6! A(,) = A(,)+1!Why s ths legal?! No loop-carred dependences, so we can arbtrarly change order of teraton executon CS553 Lecture Complng for Parallelsm & Localty 16

! A()! A()! A(,) Example 2: Parallelzaton (reprse)!why can t ths loop be parallelzed? = 1,100 = A(-1)+1!Why can ths loop be parallelzed? 1 2 3 4 5... Loop carred dependence = 1,100 = A()+1 1 2 3 4 5... No loop carred dependence, No soluton to dependence problem CS553 Lecture Complng for Parallelsm & Localty 17 Iteraton Spaces!Idea! Explctly represent the teratons of a loop nest!example = 1,6 = 1,5 = A(-1,-1)+1!Iteraton Space! A set of tuples that represents the teratons of a loop! Can vsualze the dependences n an teraton space Iteraton Space CS553 Lecture Complng for Parallelsm & Localty 18

! A(,) Dstance Vectors!Idea! Concsely descrbe dependence relatonshps between teratons of an teraton space! For each dmenson of an teraton space, the dstance s the number of teratons between accesses to the same memory locaton!defnton! v = T - S!Example!do = 1,6 = 1,5! A(,) = A(-1,-2)+1! outer loop!dstance Vector: (1,2) nner loop CS553 Lecture Complng for Parallelsm & Localty 19 Dstance Vectors and Loop Transformatons!Idea! Any transformaton we perform on the loop must respect the dependences!example = 1,6 = 1,5 = A(-1,-2)+1!Can we permute the and loops? CS553 Lecture Complng for Parallelsm & Localty 20

! A(,) Dstance Vectors and Loop Transformatons!Idea! Any transformaton we perform on the loop must respect the dependences!example = 1,5 = 1,6 = A(-1,-2)+1!Can we permute the and loops?! Yes CS553 Lecture Complng for Parallelsm & Localty 21 Dstance Vectors: Legalty!Defnton! A dependence vector, v, s lexcographcally nonnegatve when the leftmost entry n v s postve or all elements of v are zero Yes: (0,0,0), (0,1), (0,2,-2) No: (-1), (0,-2), (0,-1,1)! A dependence vector s legal when t s lexcographcally nonnegatve (assumng that ndces ncrease as we terate)!why are lexcographcally negatve dstance vectors llegal?!what are legal drecton vectors? CS553 Lecture Complng for Parallelsm & Localty 22

! A(2*+2)! 2*! (yes, Data Dependence Termnology!We say statement s 2 depends on s 1! True (flow) dependence: s 1 wrtes memory that s 2 later reads! Ant-dependence: s 1 reads memory that s 2 later wrtes! Output dependences: s 1 wrtes memory that s 2 later wrtes! Input dependences: s 1 reads memory that s 2 later reads!notaton: s 1 " s 2! s 1 s called the source of the dependence! s 2 s called the snk or target! s 1 must be executed before s 2 CS553 Lecture Complng for Parallelsm & Localty 23 Example!Code 1 = l,h = A(2*-2)+1!Dependence? 1 2* 2 = -2 2 = -4 2 dvdes -4) 2!Knd of dependence?! Ant? 2 + d = 1 # d = -2!!Flow? 1 + d = 2 # d = 2 CS553 Lecture Data Dependence Analyss 24

! A(,)! A(,) Example Sample code = 1,6 = 1,5 = A(-1,+1)+1!Knd of dependence: Flow!Dstance vector: (1,!1) CS553 Lecture Complng for Parallelsm & Localty 25 Exercse Sample code = 1,5 = 1,6 = A(-1,+1)+1!Knd of dependence: Ant!Dstance vector: (1, -1) CS553 Lecture Complng for Parallelsm & Localty 26

! A()! A()! h[,0]! f[,0]! for! f[,]! EE[,]! h[,]! h[,]! p[,]! HH[,]! score[,] Example 2: Parallelzaton (reprse)!why can t ths loop be parallelzed? = 1,100 = A(-1)+1 1 2 3 4 5... Dstance Vector: (1)!Why can ths loop be parallelzed? = 1,100 = A()+1 1 2 3 4 5... Dstance Vector: (0) CS553 Lecture Complng for Parallelsm & Localty 27 Proten Strng Matchng Example!q = k_1!!r = k_2!!score[,] = 0 for the whole array!!for =1 to n1-1! = p[,0] = 0! = -q! =1 to n0-1! = max(f[,-1],h[,-1]-q)-r! = max(ee[-1,],hh[-1,],-q)-r! = p[,-1] + pam2[aa1[],aa0[]]! = max( max(0,ee[,]), max(f[,],h[,])! = HH[-1,]! = h[,]! = max(score[,-1],h[,])!! endfor!!endfor!!return score[n1-1,n0-1]! CS553 Lecture Complng for Parallelsm & Localty 28

! A(,)! B(,)! A(,)! B(,) Loop-Carred Dependences!Defnton! A dependence D=(d 1,...d n ) s carred at loop level f d s the frst nonzero element of D!Example = 1,6 = 1,6 = B(-1,)+1 = A(,-1)*2!Dstance vectors:!loop-carred dependences (0,1) for accesses to A (1,0) for accesses to B! The loop carres dependence due to A! The loop carres dependence due to B CS553 Lecture Data Dependence Analyss 29 Parallelzaton!Idea! Each teraton of a loop may be executed n parallel f that loop carres no dependences!example (dfferent from last slde) = 1,5 = 1,6 = B(-1,-1)+1 = A(,-1)*2!Parallelze loop? Dstance Vectors: (1,0) for A (flow) (1,1) for B (flow) CS553 Lecture Data Dependence Analyss 30 Iteraton Space

! t! C()!! A(,) Loop-Carred, Storage-Related Dependences!Problem! Loop-carred dependences nhbt parallelsm! Scalar references result n loop-carred dependences!example = 1,6 = A() + B() = t + 1/t!Can ths loop be parallelzed?!what knd of dependences are these? No. Ant dependences. Conventon for these sldes: Arrays start wth upper case letters, scalars do not CS553 Lecture Data Dependence Analyss 31 Drecton Vector!Defnton! A drecton vector serves the same purpose as a dstance vector when less precson s requred or avalable! Element of a drecton vector s <, >, or = based on whether the source of the dependence precedes, follows or s n the same teraton as the target n loop!example = 1,6 = 1,5 = A(-1,-1)+1!Drecton vector:!dstance vector: (<,<) (1,1) CS553 Lecture Data Dependence Analyss 32

! T()! C()! Removng False Dependences wth Scalar Expanson!Idea! Elmnate false dependences by ntroducng extra storage!example = 1,6 = A() + B() = T() + 1/T()!Can ths loop be parallelzed?!dsadvantages? CS553 Lecture Data Dependence Analyss 33 Scalar Expanson Detals!Restrctons! The loop must be a countable loop.e. The loop trp count must be ndependent of the body of the loop! The expanded scalar must have no upward exposed uses n the loop do = 1,6 prnt(t) t = A() + B() C() = t + 1/t!! Nested loops may requre much more storage!! When the scalar s lve after the loop, we must move the correct array value nto the scalar CS553 Lecture Data Dependence Analyss 34

Concepts!Improve performance by...! mprovng data localty! parallelzng the computaton Data Dependence Testng! general formulaton of the problem! GCD test and Baneree test Data Dependences! teraton space! dstance vectors and drecton vectors! loop carred Transformaton legalty! must respect data dependences CS553 Lecture Data Dependence Analyss 35 Next Tme!Readng! Ch 11.4-11.6!Lecture! Abstractons for loop transformatons and checkng ther legalty! Code generaton after transformatons have been performed CS553 Lecture Complng for Parallelsm & Localty 36