15-745 Lecture 6 Data Dependence n Loops Copyrght Seth Goldsten, 2008 Based on sldes from Allen&Kennedy Lecture 6 15-745 2005-8 1 Common loop optmzatons Hostng of loop-nvarant computatons pre-compute before enterng the loop Elmnaton of nducton varables change p=*w+b to p=b,p+=w, when w,b nvarant Loop unrollng to to mprove schedulng of the loop body Software ppelnng To mprove schedulng of the loop body data Loop permutaton to mprove cache memory performance Requres understandng dependences Lecture 5 15-745 2008 2 Why Dependence Analyss Example to mprove localty Goal s to fnd best schedule: Improve memory localty Increase parallelsm Decrease schedulng stalls Before we schedule we need to know possble legal schedules and mpact of schedule on performance for =0 to N for =0 to M A[] = f(a[]); Unroll to see deps A[0] = f(a[0]) A[1] = f(a[1]) A[2] = f(a[2]) A[N] = f(a[n]) A[0] = f(a[0]) Is there a better schedule? Iteraton space Lecture 6 15-745 2005-8 3 Lecture 6 15-745 2005-8 4
Example to mprove localty for =0 to N for =0 to M A[] = f(a[]); Is there a better schedule? Iteraton space Unroll to see deps A[0] = f(a[0]) A[1] = f(a[1]) A[2] = f(a[2]) A[N] = f(a[n]) A[0] = f(a[0]) for =0 to M for =0 to N A[] = f(a[]); Transformed teraton space for =0 to N for =0 to M A[] = f(a[]); for =0 to M for =0 to N A[] = f(a[]); Old Iteraton t space New Iteraton space A[3] A[3] A[3] A[3] A[3] A[3] A[3] A[3] A[2] A[2] A[2] A[2] A[2] A[2] A[2] A[2] A[1] A[1] A[1] A[1] A[1] A[1] A[1] A[1] A[0] A[0] A[0] A[0] A[0] A[0] A[0] A[0] Lecture 6 15-745 2005-8 5 Lecture 6 15-745 2005-8 6 What about What about for =0 to N for =0 to M A[] = f(a[]); B[] = f(b[]); Is there a better schedule? Iteraton space for =0 to N for =0 to M A[] = f(a[]); B[] = f(b[]); Is there a better schedule? Iteraton space Unroll to see deps A[0] = f(a[0]) B[0] = f(b[0]) A[1] = f(a[1]) B[0] = f(b[0]]) A[N] = f(a[n]) N) B[0] = f(b[0]) A[0] = f(a[0]) B[1] = f(b[1]) A[0] A[0] A[1] A[1] A[2] A[2] A[3] A[3] B[3] B[3] B[3] B[3] B[3] B[3] B[3] B[3] A[0] A[0] B[2] B[2] A[0] A[0] B[1] B[1] A[0] A[0] B[0] B[0] A[1] A[1] B[2] B[2] A[1] A[1] B[1] B[1] A[1] A[1] B[0] B[0] A[2] A[2] B[2] B[2] A[2] A[2] B[1] B[1] A[2] A[2] B[0] B[0] A[3] A[3] B[2] B[2] A[3] A[3] B[1] B[1] A[3] A[3] B[0] B[0] Unroll to see deps A[0] = f(a[0]) B[0] = f(b[0]) A[1] = f(a[1]) B[0] = f(b[0]]) A[N] = f(a[n]) N) B[0] = f(b[0]) A[0] = f(a[0]) B[1] = f(b[1]) A[0] A[0] A[1] A[1] A[2] A[2] A[3] A[3] B[3] B[3] B[3] B[3] B[3] B[3] B[3] B[3] A[0] A[0] B[2] B[2] A[0] A[0] B[1] B[1] A[0] A[0] B[0] B[0] A[1] A[1] B[2] B[2] A[1] A[1] B[1] B[1] A[1] A[1] B[0] B[0] A[2] A[2] B[2] B[2] A[2] A[2] B[1] B[1] A[2] A[2] B[0] B[0] A[3] A[3] B[2] B[2] A[3] A[3] B[1] B[1] A[3] A[3] B[0] B[0] Lecture 6 15-745 2005-8 7 Lecture 6 15-745 2005-8 8
But, what f But, what f for =0 to N for =1 to M A[] = f(a[-1]); Can we reschedule? for =0 to N for =1 to M A[] = f(a[-1]); Can we reschedule? Iteraton space Iteraton space Unroll to see deps A[1] = f(a[0]) A[2] = f(a[1]) A[3] = f(a[2]) A[N] = f(a[n-1]) A[1] = f(a[0]) A[2] = f(a[1]) A[3] = f(a[2]) Lecture 6 15-745 2005-8 9 Lecture 6 15-745 2005-8 10 But, what f So, how do we know when/how? for =0 to N for =1 to M A[] = f(a[-1]); Can we reschedule? Iteraton space When should we transform a loop? What transforms are legal? How should we transform the loop. Dependence nformaton helps wth all three questons. A[1] A[0] A[2] A[1] A[3] A[2] A[] 1 A[0] A[2] A[1] A[3] A[2] A[4] A[3] In short, Determne all dependence nformaton Use dependence nformaton to analyze loop Gude transformatons usng dependence nfo Key s: Any transformaton * that preserves every dependence n a program preserves the meanng of the program Lecture 6 15-745 2005-8 11 Lecture 6 15-745 2005-8 12
Dependences n Loops Loop ndependent data dependence occurs between accesses n the same loop teraton. Loop-carred data dependence occurs between accesses across dfferent loop teratons. There s data dependence between access a at teraton -k and access b at teraton when: aand b access the same memory locaton There s a path from a to b Ether Ether a or b s a wrte Defnng Dependences Flow Dependence W R δ f Ant-Dependence R W δ a Output Dependence W W δ o false S1) a=0; S2) b=a; S3) c=a+d+e; S4) d=b; S5) b=5+e; true Lecture 5 15-745 2008 13 Lecture 5 15-745 2008 14 S1) a=0; S2) b=a; S3) c=a+d+e; S4) d=b; S5) b=5+e; Example Dependences These are scalar dependences. 1 The same dea holds for memory accesses. source type target due to S1 δ f S2 a S1 δ f S3 a S2 δ f S4 b S3 δ a S4 d S4 δ a S5 b S2 δ o S5 b What can we do wth ths nformaton? What are ant- and flow- called false dependences? 2 3 4 5 Data Dependence n Loops Dependence can flow across teratons of the loop. Dependence nformaton s annotated wth teraton nformaton. If dependence s across teratons t s loop carred otherwse loop ndependent. for (=0; <n; ++) { A[] = B[]; B[+1] = A[]; Lecture 5 15-745 2008 15 11/20/01 15-411 Fall '01 Seth Copen Goldsten 2001 16
Data Dependence n Loops Dependence can flow across teratons of the loop. Dependence nformaton s annotated wth teraton nformaton. If dependence s across teratons t s loop carred otherwse loop ndependent. δ f loop carred for (=0; <n; ++) { A[] = B[]; B[+1] = A[]; δ f loop ndependent Data Dependence There s a data dependence from statement S 1 to statement S 2 (S 2 depends on S 1 ) f: 1. Both statements t t access the same memory locaton and at least one of them stores onto t, and 2. There s a feasble run-tme executon path from S 1 to S 2 We need to characterze the dependence nformaton n terms of the loop teratons nvolved n the dependence, so we need a way to talk about teratons of a loop. Iteraton vector: a label for a loop teraton usng the nducton varables. Iteraton space: the set of all possble teraton vectors for a loop Lexcographc order: The order of the teratons 11/20/01 15-411 Fall '01 Seth Copen Goldsten 2001 17 Lecture 6 15-745 2005-8 18 Iteraton Space Every teraton generates a pont n an n- dmensonal space, where n s the depth of the loop nest. for (=0; <n; ++) { for (=0; <n; ++) for (=0; <4; ++) { 4 3 2 Iteraton Vectors Need to consder the nestng level of a loop Nestng level of a loop s equal to one more than the number of loops that enclose t. Gven a nest of n loops, the teraton vector of a partcular teraton of the nnermost loop s a vector of ntegers that contans the teraton numbers for each of the loops n order of nestng level. Thus, the teraton vector s: { 1, 2,, n where k, 1 k n represents the teraton number for the loop at nestng level k T. Mowry Lecture 6 15-745 2005-8 19 Lecture 6 15-745 2005-8 20
Iteraton Space Every teraton generates a pont n an n- dmensonal space, where n s the depth of the loop nest. for (=0; <n; ++) { for (=0; <n; ++) for (=0; <4; ++) { 4 3 2 Orderng of Iteraton Vectors Dan orderng for teraton vectors Use an ntutve, lexcographc order Iteraton precedes teraton, denoted <, ff: 1. [1:n-1] < [1:n-1], or 1 2. [1:k-1] = [1:k-1] and 1 k < k 2 2 < k k n n T. Mowry Lecture 6 15-745 2005-8 21 Lecture 6 15-745 2005-8 22 Example Iteraton Space Vstaton Order n Iteraton Space for = 0 to N-1 for = 0 to N-1 A[][] = B[][]; for = 0 to N-1 for = 0 to N-1 A[][] = B[][]; each poston represents an teraton Note: teraton space s not data space T. Mowry T. Mowry
Formal Def of Loop Dependence There exsts a dependence from statements S 1 to statement S 2 n a common nest of loops ff there exst two teraton vectors and for the nest, st. (1) (a) < or (b) = and there s a path from S 1 to S 2 n the body of the loop, (2) statement S 1 accesses memory locaton M on teraton and statement S 2 accesses locaton M on teraton, and (3) one of these accesses s a wrte. 1a: Loop carred and 1b: Loop ndependent S1 s source of dependence, S2 s snk or target of dep Dependence Dstance Usng teraton vectors and def of dependence we can determne the dstance of a dependence: In n-deep loop nest f S1 s source n teraton S2 s snk n teraton Dstance of dependence s represented wth a dstance vector: D Vector of length n, where d k = k - k Lecture 6 15-745 2005-8 25 Lecture 6 15-745 2005-8 26 Dstance Vector Example of Dstance Vectors for (=0; <n; ++) { A[] = B[]; B[+1] = A[]; A[0] = B[0]; B[1] = A[0]; A[1] = B[1]; B[2] = A[1]; A[2] = B[2]; B[3] = A[2]; =0 =1 =2 Dstance vector s the dfference between the target and source teratons. d = I t -I s Exactly the dstance of the dependence,.e., I s + d = I t for (=0; <n; ++) for (=0; <m; ++){ A[,] = ; = A[,]; B[,+1] = ; = B[,]; C[+1,] = ; = C[,+1] ; A 0,2 = =A 0,2 B 0,3 = =B 0,2 C 1,2 = =C 0,3 A 0,1 = =A 0,1 B 0,2 = =B 0,1 C 11 1,1 = =C 02 0,2 A 1,2 = =A 1,2 B 1,3 = =B 1,2 C 2,2 = =C 1,3 A 1,1 = =A 1,1 B 1,2 = =B 1,1 C 21 2,1 = =C 12 1,2 A 2,2 = =A 2,2 B 2,3 = =B 2,2 C 3,2 = =C 2,3 A 2,1 = =A 2,1 B 2,2 = =B 2,1 C 31 3,1 = =C 22 2,2 A 0,0 = =A 0,0 A 1,0 = =A 1,0 A 2,0 = =A 2,0 B 0,1 = =B 0,0 B 1,1 = =B 1,0 B 2,1 = =B 2,0 C 1,0 = =C 0,1 C 2,0 = =C 1,1 C 3,0 = =C 2,1 T. Mowry Lecture 6 15-745 2005-8 27 T. Mowry
Example of Dstance Vectors for (=0; <n; ++) for (=0; <m; ++){ A[,] = ; = A[,]; B[,+1] = ; = B[,]; C[+1,] = ; = C[,+1] ; A yelds: A 0,2 = =A 0,2 A 1,2 = =A 1,2 A 2,2 = =A 2,2 B 03 0,3= =B 02 0,2 B 13 1,3= =B 12 1,2 B 23 2,3= =B 22 2,2 C 1,2 = =C 0,3 C 2,2 = =C 1,3 C 3,2 = =C 2,3 A 0,1 = =A 0,1 B 0,2 = =B 0,1 C 1,1 = =C 0,2 A 00 0,0 = =A 00 0,0 B 0,1 = =B 0,0 C 1,0 = =C 0,1 A 1,1 = =A 1,1 B 1,2 = =B 1,1 C 2,1 = =C 1,2 A 10 1,0 = =A 10 1,0 B 1,1 = =B 1,0 C 2,0 = =C 1,1 0 0 1 B yelds: C yelds: 0 1-1 A 2,1 = =A 2,1 B 2,2 = =B 2,1 C 3,1 = =C 2,2 A 20 2,0 = =A 20 2,0 B 2,1 = =B 2,0 C 3,0 = =C 2,1 Drecton Vectors Less precse than dstance vectors, but often good enough In n-deep loop nest f S1 s source n teraton S2 s snk n teraton Dstance vector: F - Vector of length n, where -f k = k k Drecton vector also vector of length n, where d k = k < f f k > 0, or k < k = f f k = 0, or k = k > f f k < 0, or k > k T. Mowry Lecture 6 15-745 2005-8 30 Example of Drecton Vectors for (=0; <n; ++) for (=0; <m; ++){ A[,] = ; = A[,]; B[,+1] = ; = B[,]; C[+1,] = ; = C[,+1] ; A yelds: A 0,2 = =A 0,2 A 1,2 = =A 1,2 A 2,2 = =A 2,2 B 03 0,3= =B 02 0,2 B 13 1,3= =B 12 1,2 B 23 2,3= =B 22 2,2 C 1,2 = =C 0,3 C 2,2 = =C 1,3 C 3,2 = =C 2,3 A 0,1 = =A 0,1 B 0,2 = =B 0,1 C 1,1 = =C 0,2 A 00 0,0 = =A 00 0,0 B 0,1 = =B 0,0 C 1,0 = =C 0,1 A 1,1 = =A 1,1 B 1,2 = =B 1,1 C 2,1 = =C 1,2 A 10 1,0 = =A 10 1,0 B 1,1 = =B 1,0 C 2,0 = =C 1,1 = = < B yelds: C yelds: = < > A 2,1 = =A 2,1 B 2,2 = =B 2,1 C 3,1 = =C 2,2 A 20 2,0 = =A 20 2,0 B 2,1 = =B 2,0 C 3,0 = =C 2,1 Drecton Vectors Example: DO I = 1, N DO J = 1, M DO K = 1, L S 1 A(I+1, J, K-1) = A(I, J, K) + 10 S 1 has a true dependence on tself. Dstance Vector: (1, 0, -1) Drecton Vector: (<, =, >) T. Mowry Lecture 6 15-745 2005-8 32
Note on vectors A dependence cannot exst f t has a drecton vector whose leftmost non "=" component s not "<" as ths would mply that the snk of the dependence occurs before the source. Lkewse, the frst non-zero dstance n a dstance vector must be postve. The Key Any reorderng transformaton that preserves every dependence n a program preserves the meanng of the program A reorderng transformaton may change order of executon but does not add or remove statements. Lecture 6 15-745 2005-8 33 Lecture 6 15-745 2005-8 34 Man Theme Fndng Data Dependences Determnng whether h dependences d exst between two subscrpted references to the same array n a loop nest Several tests to detect these dependences Lecture 6 15-745 2005-8 35
DO 1 = L 1, U 1 DO 2 = L 2, U 2 The General Problem DO n = L n, U n S 1 A(f 1 ( 1,, n ),,f m ( 1,, n )) = S 2 = A(g 1 ( 1,, n ),,g m ( 1,, n )) A dependence exsts from S1 to S2 f: There exst α and β such that α < β (control flow requrement) f (α) =g g (β) for all, 1 m (common access requrement) Bascs: Conservatve Testng Consder only lnear subscrpt expressons Fndng nteger solutons to system of lnear Dophantne Equatons s NP-Complete Most common approxmaton s Conservatve Testng,.e., See f you can assert No dependence exsts between two subscrpted references of the same array Never ncorrect, may be less than optmal Bascs: Indces and Subscrpts Index: Index varable for some loop surroundng a par of references Subscrpt: A PAIR of subscrpt postons n a par of array references For Example: A(I,) = A(I,k) + C <I,I> s the frst subscrpt <,k> s the second subscrpt Bascs: Complexty A subscrpt s sad to be ZIV f t contans no ndex zero ndex varable SIV f t contans only one ndex sngle ndex varable MIV f t contans more than one ndex multple ndex varable For Example: For Example: A(5,I+1,) = A(1,I,k) + C Frst subscrpt s ZIV Second subscrpt s SIV Thrd subscrpt s MIV
Bascs: Separablty A subscrpt s separable f ts ndces do not occur n other subscrpts If two dfferent subscrpts contan the same ndex they are coupled For Example: A(I+1,) = A(k,) + C Both subscrpts are separable A(I,,) = A(I,,k) + C Second and thrd subscrpts are coupled Bascs:Coupled Subscrpt Groups Why are they mportant? Couplng can cause mprecson n dependence testng DO I = 1, 100 S1 A(I+1,I) = B(I) + C S2 D(I) = A(I,I) * E Dependence Testng: Overvew Partton subscrpts of a par of array references nto separable and coupled groups Classfy each subscrpt as ZIV, SIV or MIV Reason for classfcaton s to reduce complexty of the tests. For each separable subscrpt apply sngle subscrpt test. Contnue untl prove ndependence. Deal wth coupled groups If ndependent, done Otherwse, merge all drecton vectors computed n the prevous steps nto a sngle set of drecton vectors Step 1: Subscrpt Parttonng Parttons the subscrpts nto separable and mnmal coupled groups Notatons // S s a set of m subscrpt pars S 1, S 2, S m each enclosed n n loops wth ndexes I I I whch s to be n loops wth ndexes I 1, I 2, I n, whch s to be parttoned nto separable or mnmal coupled groups. // P s an output varable, contanng the set of parttons // n p s the number of parttons
Subscrpt Parttonng Algorthm procedure partton(s,p, n p ) n p = m; for := 1 to m do P = {S ; for := 1 to n do begn k := <none> for each remanng partton P do f there exsts s ε P such that s contans I then f k = < none > then k = ; else begn P k = P k P ; dscard P; n p = n p 1; end end end partton Step 2: Classfy as ZIV/SIV/MIV Easy step Just count the number of dfferent ndces n a subscrpt Step 3: Applyng Sngle Subscrpt Tests ZIV Test SIV Test Strong SIV Test Weak SIV Test Weak-zero SIV Weak Crossng SIV SIV Tests n Complex Iteraton Spaces ZIV Test DO = 1, 100 S A(e1) = A(e2) + B() e1,e2 are constants or loop nvarant symbols If (e1-e2)!=0 No Dependence exsts
Strong SIV Test Strong SIV Test Example Strong SIV subscrpts are of the form a + c1, a + c 2 For example the followng are strong SIV subscrpts +1, 4 + 2, 4 + 4 DO k = 1, 100 DO = 1, 100 S1 A(+1,k) = S2 = A(,k) + 32 Strong SIV Test Weak SIV Tests Weak SIV subscrpts are of the form a+c 1 1,a+c 2 2 c 1 c 2 d = ' = a For example the followng are weak SIV subscrpts +1, 5 2 + 1, + 5 2 + 1, 2 Dependence exsts f d U L
Geometrc vew of weak SIV Weak-zero SIV Test Specal case of Weak SIV where one of the coeffcents c ents of the ndex s zero The test conssts merely of checkng whether the soluton s an nteger and s wthn loop bounds = c 2 c 1 a 1 Lecture 6 15-745 2005-8 53 Weak-zero SIV Test Weak-zero SIV & Loop Peelng DO = 1, N S 1 Y(, N) = Y(1, N) + Y(N, N) Can be loop peeled to Y(1, N) = Y(1, N) + Y(N, N) DO = 2, N-1 S1 Y(, N) = Y(1, N) + Y(N, N) Y(N, N) = Y(1, N) + Y(N, N)
Weak-crossng SIV Test Weak-crossng SIV Test Specal case of Weak SIV where the coeffcents c ents of the ndex are equal n magntude but opposte n sgn The test conssts merely of checkng whether the soluton ndex s 1. wthn loop bounds and s 2. ether an nteger or has a non-nteger = c 2 c 1 part equal to 1/2 2 a 1 S1 Weak-crossng SIV & Loop Splttng DO = 1, N A() = A(N-+1) + C Ths loop can be splt nto DO = 1,(N+1)/2 A() = A(N-+1) + C DO = (N+1)/2 + 1, N A() = A(N-+1) + C Complex Iteraton Spaces Tll now we have appled the tests only to rectangular teraton spaces These tests can also be extended to apply to trangular or trapezodal loops Trangular: One of the loop bounds s a functon of at least one other loop ndex Trapezodal: Both the loop bounds are functons of at least one other loop ndex
Next Tme Complex teraton spaces MIV Tests Tests n Coupled groups Mergng drecton vectors