The DOACROSS sttement Is prllel loop similr to DOALL, ut it llows prouer-onsumer type of synhroniztion. Synhroniztion is llowe from lower to higher itertions sine it is ssume tht lower itertions re selete first y the impliit tsks. If synhroniztion were not from lower to higher itertions, elok oul our. Assume for exmple tht the first itertion wits t point w for n event from the seon itertion. If there were only one impliit tsk it woul wit forever t w sine there is no ontext swithing. 1
Exmples of DOACROSS * exmple 1. no ely * post (ev(0)) oross i=1,n (i) = (i) + (i) post(ev(i)) wit(ev(i-1)) x(i) = (i-1) + 2 en oross P1 (1) x(1) (4) x(4) P2 (2) x(2) (5) x(5) P3 (3) x(3) (6) x(6) 2
* exmple 2. ely etween onseutive itertions * post(ev(0)) oross i = 1, n wit(ev(i-1)) (i) = (i) + (i-1) post(ev(i)) x(i) = (i) + 2 en oross P1 (1) x(1) (4) x(4) P2 (2) x(2) (5) x(5) P3 (3) x(3) (6) 3
* exmple 3. ely etween non-onseutive itertions. * post (ev(0)) post(ev(1)) oross i = 2, n wit(ev(i-2)) (i) = (i) + (i-2) post(ev(i)) x(i) = (i) + 2 en oross P1 (1) x(1) (7) x(7) P2 (2) x(2) (8) x(8) P3 (3) x(3) (9) x(9) P4 (4) x(4) (10) x(10) P5 (5) x(5) (11) P6 (6) x(6) (12) 4
* exmple 4. ouly neste loop * oross i = 1, n integer j o j = 1, n wit (ev(i-1,j)) (i,j) = (i-1,j) + (i,j- 1) post (ev(i,j)) en o en oross P1 (1,1) (1,2) (1,3) (1,4) P2 (2,1) (2,2) (2,3) P3 (3,1) (3,2) P4 (4,1) 5
Exeution time of DOACROSS when orere ritil setions hve onstnt exeution time. Consier the loop oross i=1,n $orer $enorer $orer... $orer... $orer... $orer e... en oross Assume its exeution time lines hve the following form: 6
e e e e whih in terms of performne is equivlent to the following time lines: e II e II II e II II II e where onstnt ely II etween the strt of onseutive itertions is evient. This ely is equl to the time of the longest orere ritil setion (i.e II=T() in this se). 7
The exeution time of the previous loop using n proessors is: s n e seen next: T()+T()+nT()+T()+T(e) T()+T() nt()=nii T()+T(e) e e e In generl the exeution time when there re s mny proessors s itertions is nii+(b-ii)=(n-1)ii+b where B is the exeution time of the whole loop oy. S p = nb/[(n-1)ii+b] B/II 8
When there re p < n proessors the exeution time of the loop epens on whether B >= pii or not. Cse 1: B >= pii If p = 3, for the previous loop we hve: T(loop) = n/3 B + T()((n-1) mo 3) n/3 B II II e e e e e e e e In generl the formul is: n/p B+II((n-1) mo p) 9
Cse 2: B < pii For the previous loop, n in generl we hve T(loop) = nii + B - II B-II T()+T() nt() = nii T()+T(e) e e e e e e e e e e e e 10
From the previous two sttements we hve tht T(loop)= if B pii then ( n/p -1)B + II ((n-1) mo p) + B else (n-1)ii + B ut n-1 = p( n/p - 1) + (n-1) mo p therefore T(loop)= if B pii then ( n/p -1)B + II ((n-1) mo p) + B else (p( n/p - 1) + (n-1) mo p)ii + B n T(loop)= ( n/p -1) mx(b,pii) + II ((n-1) mo p) + B 11
Cyli Depenenes -- DOPIPE Assume loop with two or more epenene yles (strongly onnete omponents or π-loks) The first pproh evelope for onurrentiztion of o loops is illustrte elow: o i=1,n (i) = (i) + (i-1) (i) = (i) + (i-1) en o oegin o i=1,n (i) = (i) + (i-1) V(σ) en o // o i=1,n P(σ) (i) = (i) + (i-1) en o oen 12
i.e. to tke loop with two or more π-loks suh s: n exeute olletions of π-loks on seprte proessors in pipeline fshion: 13
4.18.1 Exeution time of DOPIPE Assume the epenene grph shown to the right. Assume lso tht T()=mx(T(),T(),T(),T(),T(e)) Then the exeution time of the DOPIPE on 4 proessors is T()+T()+nT()+T()+T(e) e T()+T() nt() T()+T(e) e e e e e 14
DOPIPE n Loop Distriution Assume loop with the epenene grph shown on the right The loop oul e istriute to proue: o i=1,n en o o i=1,n en o The first loop oul e trnsforme into DOALL, n the seon into DOPIPE. The resulting time lines woul e: 15
However, exeuting the originl loop s DOPIPE proues the sme exeution time with fewer proessor (if numer of itertions >4): 16
Prolems with DOPIPE 1. Proessor llotion is fixe t ompile-time, i.e. loops re ompile for fixe numer of proessors. Exmple 1: A loop with the epenene grph shown to the right, oul e ompile for three proessors s: oegin o i=1,n en o // o i=1,n en o // o i=1,n en o oen 17
ut for two proessors it shoul e ompile s oegin o i=1,n en o // o i=1,n en o oen 18
Exmple 2: The loop n e trnslte into oegin o i=1,n en o // o i=1,n en o // o i=1,n en o oen 19
or into oegin o i=1,n oen // o i=1,n,2 oegin // oen en o // o i=1,n en o oen If the exeution time of is unknown, (e.g. it inlues while loop), it is not possile to eie t ompile-time how mny opies of to o in prllel. 20
2. There is the nee to o pking whih is NP-hr Prtition: Given set A Z +, is there suset A A suh tht Σ ( A ) = Σ ( A-A )? DOPIPE trnsltion: Given loop with the following epenene grph 1 2 n with T() = (T( 1 )+T( 2 )+...+ T( n ))/2. Compute is n optiml sheule of the loop on 3 proessors. Clerly, solving the DOPIPE trnsltion prolem lso solves Prtition. 21
3. Cyles fore sequentil exeution Exmple 3 o i=3,n S: (i)=(i-2)-1 T: (i)=(i-3)*k en o S T Exmple 4 o i=1,n o j=1,n S: (i,j)=(i-1,j)+(i,j-1) en o en o S 22
Cyli epenenes -- DOACROSS A loop with yli epenenes n e trnsforme into DOACROSS s shown next: o i=1,n (i) = (i) + (i-1) (i) = (i) + (i-1) en o $oross orer(,),shre(,,) o i=1,n $orer (i) = (i) + (i-1) $enorer $orer (i) = (i) + (i-1) $enorer en o DOACROSS hs the vntge tht ll impliit tsks eseute the sme oe. This filittes oe ssignment. Other vntge of the DOACROSS onstrut over the DOPIPE onstrut re illustrte in the following exmples. 23
Exmple 1: The sme trnsltion works for two or three proessors: Two proessors Three proessors 24
Exmple 2: Inresing the numer of proessors improve performne 25
Exmple 3 When the following loop is exeute s oross on two proessors o i=1,n S: (i) = (i-2) -1 T: (i) = (i-3) * k en o we get the following time lines ( S i stns for sttement S in itertion i) Pro. 1 2 S 1 S 2 T 2 T 1 S 3 S 4 T 3 T 4 Cyle shrinking tkes ple utomtilly. This is lso true in the se of multiply-neste loops where ll wht is neee is to use tuple s the loop inex s in oross (i,j,k)=[1..n 1 ]..[1..n 2 ]..[1..n 3 ] 26
Exmple 4: The following loop o i=1,n o j=1,n S: (i,j) = (i-1,j) + (i.j-1) en o en o n e trnslte into the following oross loop: oross (i,j) = [1..n]..[1..n] wit (ev(i-1,j)); wit (ev(i,j-1)) S: (i,j) = (i-1,j) + (i.j-1) post (ev(i,j)) en oross 27
The itertion spe of the previous loop is: S 1,1 S 1,2 S 1,3 S 1,4 S 2,1 S 2,2 S 2,3 S 2,4 S 3,1 S 3,2 S 3,3 S 3,4 S 4,1 S 4,2 S 4,3 S 4,4 n its time lines when exeute on n proessors re: S 1,1 S 1,2 S 1,3 S 1,1 S 1,2 S 1,3 S 1,1 S 1,2 S 1,3 28
Sttement Sheuling n DOACROSS Exeution Time. Consier the following epenene grph for the oy of singly-neste o loop. S 1 S 2 S 3 S 4 S 5 When the DOACROSS oy hs the originl sttement orer, there is no speeup (S 1 of itertion i+1 nnot strt exeuting until S 5 of itertion i ompletes exeution). When the oy is permute into the orer S 1 S 4 S 5 S 2 S 3, then there will e speeup s shown in the following time lines S 1 S 4 S 5 S 2 S 3 S 1 S 4 S 5 S 2 S 3 S 1 S 4 S 5 S 2 S 4 S 5 S 3 S 1 29
Seleting n optimum sttement orering to minimize the ely is NP-Hr(Cytron s PhD Thesis). When the oross is ouly-neste o loop, the orer of the inex (if the loops re interhngele) lso influenes the exeution time.(tng et l 1988) 30
Cyli epenenes -- Loop Pipelining This metho ssumes the presene of no if sttements n tht ll epenene istnes re 0 or 1. (Aiken n Niolu 1988) It proees y (greey) sheuling the oy of the loop in prllel for the first itertion, n then for the seon, n so on until pttern is etete. One the pttern is etete, prllel oe n e esily generte s illustrte next.(the numers next to the rs represent A 0 0 F I 0 1 0 0 J K L 1 M 0 0 N 0 1 E 0 P Q R 0 B 1 H 0 1 D 0 0 C 1 G epenene istnes.) The resulting progrm n e exeute in VLIW mhine or in n synhronous multiproessor.
itertion time 1 2 3 4 5 6 7 1 ABC A A - - - - 2 DEFI I I - - - - 3 GHJKL CK KL A - - - 4 M BDM M I - - - 5 N EFGN FN KL - - - 6 PQR PQR CPQR M A - - 7 HJ DJ FN I - - 8 BG PQR KL - - 9 E J M A - 10 H C FN I - 11 BD PQR KL - 12 EG J M A 13 H C FN I 14 BD PQR KL 15 EG J M 16 H C FN 17 BD PQR 18 EG J 19 H C 20 BD 21 EG 22 H Finl Progrm Grph: H 1 C 2 F 3 N 3 I 4 B 2 D 2 P 3 Q 3 R 3 K 4 L 4 E 2 G 2 J 3 M 4 A 5