Loop-independent dependence: dependence exists within an iteration; i.e., if the loop is removed, the dependence still exists.

Loop-depedet vs. loop-carred depedeces [ 3.] Loop-carred depedece: depedece exsts across teratos;.e., f the loop s removed, the depedece o loger exsts. Loop-depedet depedece: depedece exsts wth a terato;.e., f the loop s removed, the depedece stll exsts. Example: for (=; <; ++) { S: a[] = a[-] + ; S: b[] = a[]; for (=; <; ++) for (j=; j< ; j++) S3: a[][j] = a[][j-] + ; S[] S[+]: loop-carred S[] S[]: loop-depedet S3[,j] S3[,j+]: loop-carred o for j loop o loop-carred depedece for loop for (=; <; ++) for (j=; j< ; j++) S4: a[][j] = a[-][j] + ; Iterato-space raversal Graph (IG) S4[,j] S4[+,j]: o loop-carred depedece for j loop loop-carred o for loop [ 3..] he IG shows graphcally the order of traversal the terato space. hs s sometmes called the happes-before relatoshp. I a IG, A ode represets a pot the terato space A drected edge dcates the ext pot that wll be ecoutered after the curret pot s traversed Example: for (=; <4; ++) for (j=; j<4; j++) S3: a[][j] = a[][j-] + ; Lecture 5 Archtecture of Parallel Computers

j 3 3 Loop-carred Depedece Graph (LDG) LDG shows the true/at/output depedece relatoshp graphcally. A ode s a pot the terato space. A drected edge represets the depedece. Example: for (=; <4; ++) for (j=; j<4; j++) S3: a[][j] = a[][j-] + ; 00 Edward F. Gehrger CSC/ECE 506 Lecture Notes, Sprg 00

j 3 3 Aother example: for (=; <=; ++) for (j=; j<=; j++) S: a[][j] = a[][j-] + a[][j+] + a[-][j] + a[+][j]; for (=; <=; ++) for (j=; j<=; j++) { S: a[][j] = b[][j] + c[][j]; S3: b[][j] = a[][j-] * d[][j]; Draw the IG Lst all the depedece relatoshps Note that there are two loop ests the code. he frst volves S. he other volves S ad S3. What do we kow about the IG for these ested loops? Lecture 5 Archtecture of Parallel Computers 3

...... Depedece relatoshps for Loop Nest rue depedeces: o S[,j] S[,j+] o S[,j] S[+,j] Output depedeces: o Noe At-depedeces: o S[,j] A S[+,j] o S[,j] A S[,j+] Exercse: Suppose we dropped off the frst half of S, so we had S: a[][j] = a[-][j] + a[+][j]; or the last half, so we had S: a[][j] = a[][j-] + a[][j+]; Whch of the depedeces would stll exst? 00 Edward F. Gehrger CSC/ECE 506 Lecture Notes, Sprg 00 4

Draw the LDG for Loop Nest. j...... Note: each edge represets both true, ad at-depedeces Depedece relatoshps for Loop Nest rue depedeces: o S[,j] S3[,j+] Output depedeces: o Noe At-depedeces: o S[,j] A S3[,j] (loop-depedet depedece) Lecture 5 Archtecture of Parallel Computers 5

Draw the LDG for Loop Nest. j... Note: each edge represets oly true depedeces... Why are there o vertcal edges ths graph? Aswer here. Why s the at-depedece ot show o the graph? Fdg parallel tasks across teratos [ 3..] Aalyze loop-carred depedeces: Depedeces must be eforced (especally true depedeces; other depedeces ca be removed by prvatzato) here are opportutes for parallelsm whe some depedeces are ot preset. Example for (=; <=; ++) S: a[] = a[-]; LDG: 00 Edward F. Gehrger CSC/ECE 506 Lecture Notes, Sprg 00 6

We ca dvde the loop to two parallel tasks (oe wth odd teratos ad aother wth eve teratos): Example for (=; <=; +=) S: a[] = a[-]; for (=3; <=; +=) S: a[] = a[-]; for (=0; <; ++) for (j=0; j< ; j++) S3: a[][j] = a[][j-] + ; LDG j...... How may parallel tasks are there here? Example 3 for (=; <=; ++) for (j=; j<=; j++) S: a[][j] = a[][j-] + a[][j+] + a[-][j] + a[+][j]; LDG... j Note: each edge represets both true, ad at-depedeces Lecture 5 Archtecture of Parallel Computers 7

Idetfy whch odes are ot depedet o each other I each at-dagoal, the odes are depedet of each other... Note: each edge represets both true, ad at-depedeces... We eed to rewrte the code to terate over at-dagoals: Calculate umber of at-dagoals for each at-dagoal do Calculate the umber of pots the curret at-dagoal for each pot the curret at-dagoal do Compute the value of the curret pot the matrx Parallelze loops hghlghted above. for (=; <= *-; ++) {// - at-dagoals f ( <= ) { pots = ; // umber of pots at-dag row = ; // frst pt (row,col) at-dag col = ; // ote that row+col = + always else { pots = * ; row = ; col = -+; // ote that row+col = + always for_all (k=; k <= pots; k++) { a[row][col] = // update a[row][col] row--; col++; 00 Edward F. Gehrger CSC/ECE 506 Lecture Notes, Sprg 00 8

DOACROSS Parallelsm [ 3..3] Suppose we have ths code: Ca we execute aythg parallel? for (=; <=N; ++) { S: a[] = a[-] + b[] * c[]; Well, we ca t ru the teratos of the for loop parallel, because S[] S[+] (here s a loop-carred depedece.) But, otce that the b[]*c[] part has o loop-carred depedece. hs suggests breakg up the loop to two: for (=; <=N; ++) { S: temp[] = b[] * c[]; for (=; <=N; ++) { S: a[] = a[-] + temp[]; he frst loop s zable. he secod s ot. Executo tme: N ( S + S ) What s a dsadvatage of ths approach? Here s how to solve ths problem: post(0); for (=; <=N; ++) { S: temp = b[] * c[]; wat(-); S: a[] = a[-] + temp; post(); What s the executo tme ow? Parallelsm across statemets a loop [ 3..4] Idetfy depedeces a loop body. If there are depedet statemets, ca splt/dstrbute the loops. Lecture 5 Archtecture of Parallel Computers 9

Example: for (=0; <; ++) { S: a[] = b[+] * a[-]; S: b[] = b[] * coef; S3: c[] = 0.5 * (c[] + a[]); S4: d[] = d[-] * d[]; Loop-carred depedeces: Loop-dep. depedeces: Note that S4 has o depedeces wth other statemets S[] A S[+] mples that S at terato + must be executed after S at terato. Hece, the depedece s ot volated f all Ss executed after all Ss. After loop dstrbuto: for (=0; <; ++) { S: a[] = b[+] * a[-]; S: b[] = b[] * coef; S3: c[] = 0.5 * (c[] + a[]); for (=0; <; ++) { S4: d[] = d[-] * d[]; Each loop s a parallel task. hs s called fucto parallelsm. Further trasformatos ca be performed (see p. 44 of text). hs s called fucto parallelsm, ad ca be dstgushed from data parallelsm, whch we saw DOALL ad DOACROSS. Characterstcs of fucto parallelsm: Ca use fucto parallelsm alog wth data parallelsm whe data parallelsm s lmted. DOPIPE Parallelsm [ 3..5] Aother strategy for loop-carred depedeces s ppelg the statemets the loop. 00 Edward F. Gehrger CSC/ECE 506 Lecture Notes, Sprg 00 0

Cosder ths stuato: Loop-carred depedeces: for (=; <=N; ++) { S: a[] = a[-] + b[]; S: c[] = c[] + a[]; Loop-dep. depedeces: o parallelze, we just eed to make sure the two statemets are executed syc: for (=; <=N; ++) { a[] = a[-] + b[]; post(); for (=; <=N; ++) { wat(); c[] = c[] + a[]; Questo: What s the dfferece betwee DOACROSS ad DOPIPE? Lecture 5 Archtecture of Parallel Computers