Complng for Parallelsm & Localty!Last tme! SSA and ts uses!today! Parallelsm and localty! Data dependences and loops CS553 Lecture Complng for Parallelsm & Localty 1 The Problem: Mappng programs to archtectures Goal: keep each core as busy as possble. Challenge: get the data to the core when t needs t From Modelng Parallel Computers as Memory Herarches by B. Alpern and L. Carter and J. Ferrante, 1993. From Sequoa: Programmng the Memory Herarchy by Fatahalan et al., 2006. CS553 Lecture Complng for Parallelsm & Localty 2
! A(,)! A()! A() Example 1: Loop Permutaton for Improved Localty!Sample code: Assume Fortran s Column Maor Order array layout = 1,6 = 1,5 = A(,)+1 do = 1,5 = 1,6! A(,) = A(,)+1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 28 30 poor cache localty good cache localty CS553 Lecture Complng for Parallelsm & Localty 3 1 7 13 19 25 2 8 14 20 26 3 9 15 21 27 4 10 16 22 28 5 11 17 23 29 6 12 18 24 30 Example 2: Parallelzaton!Can we parallelze the followng loops? = 1,100 = A()+1 1 2 3 4 5... Yes = 1,100 = A(-1)+1 1 2 3 4 5... No CS553 Lecture Complng for Parallelsm & Localty 4
! s! s! s! A() Data Dependences!Recall! A data dependence defnes orderng relatonshp two between statements! In executng statements, data dependences must be respected to preserve correctness!example 1 a := 5; s 1 a := 5;!? 2 b := a + 1; s 3 a := 6; 3 a := 6; s 2 b := a + 1; CS553 Lecture Complng for Parallelsm & Localty 5 Data Dependences and Loops!How do we dentfy dependences n loops? = 1,5 = A(-1)+1 A(1) = A(0)+1!Smple vew! Imagne that all loops are fully unrolled! Examne data dependences as before!problems!!impractcal and often mpossble!!lose loop structure A(2) = A(1)+1 A(3) = A(2)+1 A(4) = A(3)+1 A(5) = A(4)+1 CS553 Lecture Complng for Parallelsm & Localty 6
! A()! C()! A() Concepts needed for automatng loop transformatons!questons! How do we determne f a transformaton or parallelzaton s legal?! What abstracton do we use for loops?! How do we represent transformatons and parallelzaton?! How do we generate the transformed code?! How do we determne when a transformaton s gong to be benefcal?!today! Basc abstractons for loops and dependences and computng dependences!thursday! Abstractons for loop transformatons and determnng ther legalty! Code generaton after performng a loop transformaton CS553 Lecture Complng for Parallelsm & Localty 7 Dependences and Loops!Loop-ndependent dependences = 1,100 = B()+1 = A()*2 Dependences wthn the same loop teraton!loop-carred dependences = 1,100 = B()+1 C() = A(-1)*2 Dependences that cross loop teratons CS553 Lecture Complng for Parallelsm & Localty 8
!...! A(f(!...! Heurstcs! Use Dependence Testng n General!General code 1 = l 1,h 1 n = l n,h n 1,..., n )) =... A(g( 1,..., n ))!There exsts a dependence between teratons I=( 1,..., n ) and J=( 1,..., n ) when! f(i) = g(j)! (l 1,...l n ) < I,J < (h 1,...,h n )! I < J or J < I, where < s lexcographcally less CS553 Lecture Data Dependence Analyss 9 Algorthms for Solvng the Dependence Problem can say NO or MAYBE! GCD test (Baneree76,Towle76): determnes whether nteger soluton s possble, no bounds checkng! Baneree test (Baneree 79): checks real bounds! Independent-varables test (pg. 820): useful when nequaltes are not coupled! I-Test (Kong et al. 90): nteger soluton n real bounds! Lambda test (L et al. 90): all dmensons smultaneously! Delta test (Goff et al. 91): pattern matches for effcency! Power test (Wolfe et al. 92): extended GCD and Fourer Motzkn combnaton some form of Fourer-Motzkn elmnaton for ntegers, exponental worst-case! Parametrc Integer Programmng (Feautrer91)! Omega test (Pugh92) CS553 Lecture Data Dependence Analyss 10
! A(a*+c Dependence Testng!Consder the followng code = 1,5 A(3*+2) = A(2*+1)+1!Queston! How do we determne whether one array reference depends on another across teratons of an teraton space? CS553 Lecture Data Dependence Analyss 11 Dependence Testng: Smple Case!Sample code = l,h 1 ) =... A(a*+c 2 )!Dependence?! a* 1 +c 1 = a* 2 +c 2, or! a* 1 a* 2 = c 2 -c 1! Soluton may exst f a dvdes c 2 -c 1 CS553 Lecture Data Dependence Analyss 12
! A(a! A(4* GCD Test!Idea! Generalze test to lnear functons of terators/nducton varables!code = l,h = l,h 1 * + a 2 * + a 0 ) =... A(b 1 * + b 2 * + b 0 )...!Agan! a 1 * 1 - b 1 * 2 + a 2 * 1 b 2 * 2 = b 0 a 0! Soluton exsts f gcd(a 1,a 2,b 1,b 2 ) dvdes b 0 a 0 CS553 Lecture Data Dependence Analyss 13 Example!Code = l,h = l,h + 2* + 1) =... A(6* + 2* + 4)...!gcd(4,-6,2,-2) = 2!Does 2 dvde 4-1? CS553 Lecture Data Dependence Analyss 14
! A(,) Baneree Test for (=L; <=U; ++) {! x[a0 + a1*] =...!... = x[b0 + b1*]! }! Does a0 + a1* = b0 + b1* for some real and? If so then (a1* - b1* ) = (b0 - a0)! Determne upper and lower bounds on (a1* - b1* ) for (=1; <=5; ++) { x[+5] = x[]; } upper bound = a1*max() - b1 * mn( ) = 4 lower bound = a1*mn() - b1*max( ) = -4 b_0 - a_0 = CS553 Lecture Data Dependence Analyss 15 Example 1: Loop Permutaton (reprse)!sample code = 1,6 = 1,5 = A(,)+1 do = 1,5 = 1,6! A(,) = A(,)+1!Why s ths legal?! No loop-carred dependences, so we can arbtrarly change order of teraton executon CS553 Lecture Complng for Parallelsm & Localty 16
! A()! A()! A(,) Example 2: Parallelzaton (reprse)!why can t ths loop be parallelzed? = 1,100 = A(-1)+1!Why can ths loop be parallelzed? 1 2 3 4 5... Loop carred dependence = 1,100 = A()+1 1 2 3 4 5... No loop carred dependence, No soluton to dependence problem CS553 Lecture Complng for Parallelsm & Localty 17 Iteraton Spaces!Idea! Explctly represent the teratons of a loop nest!example = 1,6 = 1,5 = A(-1,-1)+1!Iteraton Space! A set of tuples that represents the teratons of a loop! Can vsualze the dependences n an teraton space Iteraton Space CS553 Lecture Complng for Parallelsm & Localty 18
! A(,) Dstance Vectors!Idea! Concsely descrbe dependence relatonshps between teratons of an teraton space! For each dmenson of an teraton space, the dstance s the number of teratons between accesses to the same memory locaton!defnton! v = T - S!Example!do = 1,6 = 1,5! A(,) = A(-1,-2)+1! outer loop!dstance Vector: (1,2) nner loop CS553 Lecture Complng for Parallelsm & Localty 19 Dstance Vectors and Loop Transformatons!Idea! Any transformaton we perform on the loop must respect the dependences!example = 1,6 = 1,5 = A(-1,-2)+1!Can we permute the and loops? CS553 Lecture Complng for Parallelsm & Localty 20
! A(,) Dstance Vectors and Loop Transformatons!Idea! Any transformaton we perform on the loop must respect the dependences!example = 1,5 = 1,6 = A(-1,-2)+1!Can we permute the and loops?! Yes CS553 Lecture Complng for Parallelsm & Localty 21 Dstance Vectors: Legalty!Defnton! A dependence vector, v, s lexcographcally nonnegatve when the leftmost entry n v s postve or all elements of v are zero Yes: (0,0,0), (0,1), (0,2,-2) No: (-1), (0,-2), (0,-1,1)! A dependence vector s legal when t s lexcographcally nonnegatve (assumng that ndces ncrease as we terate)!why are lexcographcally negatve dstance vectors llegal?!what are legal drecton vectors? CS553 Lecture Complng for Parallelsm & Localty 22
! A(2*+2)! 2*! (yes, Data Dependence Termnology!We say statement s 2 depends on s 1! True (flow) dependence: s 1 wrtes memory that s 2 later reads! Ant-dependence: s 1 reads memory that s 2 later wrtes! Output dependences: s 1 wrtes memory that s 2 later wrtes! Input dependences: s 1 reads memory that s 2 later reads!notaton: s 1 " s 2! s 1 s called the source of the dependence! s 2 s called the snk or target! s 1 must be executed before s 2 CS553 Lecture Complng for Parallelsm & Localty 23 Example!Code 1 = l,h = A(2*-2)+1!Dependence? 1 2* 2 = -2 2 = -4 2 dvdes -4) 2!Knd of dependence?! Ant? 2 + d = 1 # d = -2!!Flow? 1 + d = 2 # d = 2 CS553 Lecture Data Dependence Analyss 24
! A(,)! A(,) Example Sample code = 1,6 = 1,5 = A(-1,+1)+1!Knd of dependence: Flow!Dstance vector: (1,!1) CS553 Lecture Complng for Parallelsm & Localty 25 Exercse Sample code = 1,5 = 1,6 = A(-1,+1)+1!Knd of dependence: Ant!Dstance vector: (1, -1) CS553 Lecture Complng for Parallelsm & Localty 26
! A()! A()! h[,0]! f[,0]! for! f[,]! EE[,]! h[,]! h[,]! p[,]! HH[,]! score[,] Example 2: Parallelzaton (reprse)!why can t ths loop be parallelzed? = 1,100 = A(-1)+1 1 2 3 4 5... Dstance Vector: (1)!Why can ths loop be parallelzed? = 1,100 = A()+1 1 2 3 4 5... Dstance Vector: (0) CS553 Lecture Complng for Parallelsm & Localty 27 Proten Strng Matchng Example!q = k_1!!r = k_2!!score[,] = 0 for the whole array!!for =1 to n1-1! = p[,0] = 0! = -q! =1 to n0-1! = max(f[,-1],h[,-1]-q)-r! = max(ee[-1,],hh[-1,],-q)-r! = p[,-1] + pam2[aa1[],aa0[]]! = max( max(0,ee[,]), max(f[,],h[,])! = HH[-1,]! = h[,]! = max(score[,-1],h[,])!! endfor!!endfor!!return score[n1-1,n0-1]! CS553 Lecture Complng for Parallelsm & Localty 28
! A(,)! B(,)! A(,)! B(,) Loop-Carred Dependences!Defnton! A dependence D=(d 1,...d n ) s carred at loop level f d s the frst nonzero element of D!Example = 1,6 = 1,6 = B(-1,)+1 = A(,-1)*2!Dstance vectors:!loop-carred dependences (0,1) for accesses to A (1,0) for accesses to B! The loop carres dependence due to A! The loop carres dependence due to B CS553 Lecture Data Dependence Analyss 29 Parallelzaton!Idea! Each teraton of a loop may be executed n parallel f that loop carres no dependences!example (dfferent from last slde) = 1,5 = 1,6 = B(-1,-1)+1 = A(,-1)*2!Parallelze loop? Dstance Vectors: (1,0) for A (flow) (1,1) for B (flow) CS553 Lecture Data Dependence Analyss 30 Iteraton Space
! t! C()!! A(,) Loop-Carred, Storage-Related Dependences!Problem! Loop-carred dependences nhbt parallelsm! Scalar references result n loop-carred dependences!example = 1,6 = A() + B() = t + 1/t!Can ths loop be parallelzed?!what knd of dependences are these? No. Ant dependences. Conventon for these sldes: Arrays start wth upper case letters, scalars do not CS553 Lecture Data Dependence Analyss 31 Drecton Vector!Defnton! A drecton vector serves the same purpose as a dstance vector when less precson s requred or avalable! Element of a drecton vector s <, >, or = based on whether the source of the dependence precedes, follows or s n the same teraton as the target n loop!example = 1,6 = 1,5 = A(-1,-1)+1!Drecton vector:!dstance vector: (<,<) (1,1) CS553 Lecture Data Dependence Analyss 32
! T()! C()! Removng False Dependences wth Scalar Expanson!Idea! Elmnate false dependences by ntroducng extra storage!example = 1,6 = A() + B() = T() + 1/T()!Can ths loop be parallelzed?!dsadvantages? CS553 Lecture Data Dependence Analyss 33 Scalar Expanson Detals!Restrctons! The loop must be a countable loop.e. The loop trp count must be ndependent of the body of the loop! The expanded scalar must have no upward exposed uses n the loop do = 1,6 prnt(t) t = A() + B() C() = t + 1/t!! Nested loops may requre much more storage!! When the scalar s lve after the loop, we must move the correct array value nto the scalar CS553 Lecture Data Dependence Analyss 34
Concepts!Improve performance by...! mprovng data localty! parallelzng the computaton Data Dependence Testng! general formulaton of the problem! GCD test and Baneree test Data Dependences! teraton space! dstance vectors and drecton vectors! loop carred Transformaton legalty! must respect data dependences CS553 Lecture Data Dependence Analyss 35 Next Tme!Readng! Ch 11.4-11.6!Lecture! Abstractons for loop transformatons and checkng ther legalty! Code generaton after transformatons have been performed CS553 Lecture Complng for Parallelsm & Localty 36