Key ompiler lgorithms (for emee systems) Peter Mrweel University of Dortmun, Germny P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri
0 0. 0.0 Universität Dortmun Opertions/Wtt [MOPS/mW].0µ ASIC Reonfigurle Computing Proessors 0.µ 0.2µ Motivtion: Wishes n Relity Amient Intelligene 0.3µ Avoi poor esign tehniques! 0.07µ Tehnology P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Results for DSPStone enhmrk Cyle overhe [ n] 8.0 7.0 6.0.0 4.0 3.0 2.0.0 TIC ADI20 DSP600 Fri 2
Reson for ompilerprolems: Applitionoriente Arhitetures Applition: u..: y[j] = i=0 x[ji]*[i] i: 0 i n: y i [j] = y i [j] x[ji]*[i] Arhiteture: Exmple: Dt pth ADSP20x D x P n Aressregisters A0, A, A2.. i, ji Aress genertion unit (AGU) AX AY,,.. AR AF x[ji] MX *, MR MY [i] MF x[ji]*[i] y i [j] Prllelism Deite registers No mthing ompiler ineffiient oe P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 3
Exploittion of prllel ress omputtions Aress Register File Generi ress genertion unit (AGU) moel Instrution Fiel Memory / Moify Register File P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Prmeters: k = # ress registers m = # moify registers Cost metri for AGU opertions: Opertion Fri 4 ost immeite AR lo immeite AR moify utoinrement/ 0 erement AR = MR 0
Aress pointer ssignment (APA) Given: Memory lyout ssemly oe (without ress oe) 0 2 3 r i r i lt r mpy r ltp i mpy r sl r ltp i mpy r p sl r How to ess r? time Aress pointer ssignment (APA) is the suprolem of fining n llotion of ress registers for given memory lyout n given sheule. P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri
Generl pproh: Minimum Cost Cirultion Prolem Let G = (V,E,u,), with (V,E): irete grph u: E R 0 is pity funtion, : E R is ost funtion; n = V, m = E. Definition: g: E R 0 is lle irultion if it stisfies : v V: w V:(v,w) E g(v,w)= w V:(w,v) E g(w,v) (flow onservtion) g is fesile if (v,w) E: g(v,w) u(v,w) (pity onstrints) The ost of irultion g is (g) = (v,w) E (v,w) g(v,w). There my e lower oun on the flow through n ege. The minimum ost irultion prolem is to fin fesile irultion of minimum ost. [K.D. Wyne: A Polynomil Comintoril Algorithm for Generlize Minimum Cost Flow, http://www.s.prineton.eu/ ~wyne/ ppers/ rtio.pf] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 6
Mpping APA to the Minimum Cost Cirultion Prolem Assemly sequene* lt r mpy r ltp i mpy i mpy r sl r ltp i mpy r p time sl r AR S 0 0 0 u(t S)= AR AR2 Flow into n out of vrile noes must e. Reple vrile noes y ege with lower oun= to otin pure irultion prolem irultion selete itionl eges of originl grph (only smples shown) * C2x proessor from ti Vriles T i r r i resses [C. Geotys: DSP Aress Optimiztion Using A Minimum Cost Cirultion Tehnique, ICCAD, 997] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 7
Results oring to Geotys Optimize oe size Originl oe size Limite to si loks P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 8
Beyon si loks: hnling rry referenes in loops Exmple: for (i=2; i<=n; i) {.. B[i] /*A2 */.. B[i] /*A */.. B[i2] /*A2 */.. B[i] /*A */.. B[i3] /*A2 */.. B[i] } /*A */ Cost for rossing loop ounries onsiere. Referene: A. Bsu, R. Leupers, P. Mrweel: Arry Inex Allotion uner Register Constrints, Int. Conf. on VLSI Design, Go/Ini, 999 Control steps Offsets P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 2 3 4 6 7 8 9 3 2 A X X X X A2 X X X 0 2 3 4 Fri 9 X X
Offset ssignment prolem (OA) Effet of optimise memory lyout Let's ssume tht we n moify the memory lyout offset ssignment prolem. (k,m,r)oa is the prolem of generting memory lyout whih minimizes the ost of ressing vriles, with k: numer of ress registers m: numer of moify registers r: the offset rnge The se (,0,) is lle simple offset ssignment (SOA), the se (k,0,) is lle generl offset ssignment (GOA). P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 0
SOA exmple Effet of optimise memory lyout Vriles in si lok: Aess sequene: V = {,,, } S = (,,,,, ) 0 2 3 Lo AR, ; AR = 2 ; AR = 3 ; AR = 2 ; AR ; AR ; 0 2 3 Lo AR,0 ; AR ; AR =2 ; AR ; AR ; AR ; ost: 4 ost: 2 P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri
SOA exmple: Aess sequene, ess grph n Hmiltonin pths ess sequene: 2 ess grph 2 mximum weighte pth= mx. weighte Hmilton pth overing (MW HC) 0 2 3 memory lyout SOA use s uiling lok for more omplex situtions signifint interest in goo SOA lgorithms P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 [Brtley, 992; Lio, 99] Fri 2
Nïve SOA Noes re e in the orer in whih they re use in the progrm. Exmple: Aess sequene: S = (,,,,, ) 0 0 0 0 2 3 0 2 3 memory lyout P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 3
Lio s lgorithm Similr to Kruskl s spnning tree lgorithms:. Sort eges of ess grph G=(V,E) oring to their weight 2. Construt new grph G =(V,E ), strting with E =0 3. Selet n ege e of G of highest weight; If this ege oes not use yle in G n oes not use ny noe in G to hve egree > 2 then this noe to E otherwise isr e. 4. Goto 3 s long s not ll eges from G hve een selete n s long s G hs less thn the mximum numer of eges ( V ). Exmple: Aess sequene: S=(,,,,, ) Impliit eges of weight 0 for ll unonnete noes. G 2 2 (,) (,) (,) (,) P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 0 0 0 0 G 2 Fri 4
Lio s lgorithm on more omplex grph e f f G 2 f G 2 f e e 2 2 P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri
P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Universität Dortmun Fri 6 Universität Dortmun Universität Dortmun Lio s lgorithm with tie rek heuristi Motivtion 6 4 6 4 6 4 6 4 Cost 9 6 4 6 4 Cost 6 [ Leupers]
Tierek funtion Aess grph (V,E,w), for eh ege e with weight w(e): T( e) w( e ) e E, e e 6 e 4 e' e' T(e)= In se of lterntive equlweight eges: give priority to the ege with lower weight T(e), euse this will (generlly) leve more freeom for seleting further highweight eges 6 4 e e' T(e)= P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 7
Comprison of ifferent SOA lgorithms Overhe [%] Runtime [mses] Delrtion Orer 67.09 0 Lio, s presente 4.34 0.67 Lio, with tie rek heuristis for eges of sme weight 0.6 0.68 Atri: itertive improvement of initil SOA solution 2.28 4.6 Sme s Atri, with tierek heuristi e 0. 23 Geneti lgorithm [Leupers] 0 8296.26 Brnh & Boun (for simple ses) 0 >> P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 8
Vrile olesing SOA (CSOA) Exmple: Aess sequene (v,x,v,u,v,u,v,y,u,y,u,x,u,x,u,x,u) v y x u Interferene grph y v 3 Aess grph y u 4 6 v x 2 Aess grph y, v Itertive extension of Hmilton pth n olesing Memory lyout u x 7 2 u 6 x y,v u x P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 9
Offset osts reltive to Lio [%] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 [Desiree Ottoni et l.: Improving Offset Assignment through Simultneous Vrile Colesing, SCOPES, 2003] Fri 20
Memory size reutions [%] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 [D. Ottoni et l.: Improving Offset Assignment through Simultneous Vrile Colesing, SCOPES, 2003] Fri 2
Integrte Sheuling n SOA: ShSOA Result for SOA CSOA P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 22
Integrte Sheuling n SOA: ShSOA Result for ShSOA [Y. Choi, T. Kim, Aress Assignment Comine with Sheuling in DSP Coe Genertion, DAC, 2002, ACM] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 23
GOA: Aress ssignment for k ress registers Two ses. All esses to vrile one with the sme ress register Prtitioning of vriles into sets 2. Allotion of ress registers for eh ess to vrile. Better results, ut more omplex. A A2 A A2 P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 f j f j 2 2! Fri 24 t t
GOA with vrile prtitioning: Aress ssignment for k ress registers 6 See: k eges of highest weight. 4 e 2 7 4 f Itertion: A ege(s) with smllest inrementl ost for eges not overe. Ege not overe Disjoint sets of vrs GOA k SOA [Leupers & Mrweel, ICCAD, 996] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 2
Optimise use of m moifyregisters Given: Sequene of MoifyVlues: Lo AR,0 AR =2 AR =4 AR =3 AR =2 AR =3 AR =2 AR =4 AR =2 AR =4 No seprte ress lultion yle if onstnts re ontine in moify registers Moify registers re ontiners, onstnts represent ontents. Try to ssign ontents to ontiners suh tht ontentmisses our s less frequent s possile. Equivlent to pges n pge frmes. Optiml pge replement lgorithm of Bely n e pplie (reple pge with lrgest forwr istne) P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 26 [Leupers]
Optimising the use of m moify registers Given: sequene of moify vlues 2 4 3 2 3 2 4 2 4 MR Lrgest forwr istne MR2 Lrgest forwr istne P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 [Leupers&Mrweel, ICCAD, 996] Fri 27
Integrte memory llotion/ Moify register ssignment Stnr pproh Memory llotion Simultneous Optimistion using geneti lgorithm: Perentge of sve ress omputtion yles (Sequene length,# Vriles) (00,90) (00,0) M register ssignment (00,0) (0,40) (0,20) k=m=8 k=m=4 k=m=2 (0,0) 0 20 40 60 80 % P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 28 [Leupers, Dvi; ISSS 98]
Work on lrger utoinrement/erement rnges Coesize [kb] s funtion of the numer of ress registers (k) n the inrement rnges (r), ol numers inite minimum Meiumsize progrm k r 2 3 4 23. 0.3 8.7 03.0 07.2 2 22.9 07.0 07. 04.9.4 3 6.4 0.2 04.7 04.4.3 4 8.6 0.8.4.3 8.2 4.4 04.7.3.3 8.2 Smll progrm k r 2 3 4 29.7 2.73 26.70 26.47 28.3 2 27.86 26.9 28.26 28.26 30.0 3 2.92 26.33 28.22 28.22 30.0 4 30.4 28.22 30.0 30.0 3.98 26.70 28.22 30.0 30.0 3.98 (k=3, r=) n (k=2, r=3) re useful esigns. Lrger rnges o not reue oe size. Refererene: Sursnm, Lio, Devs: Anlysis n Aress Arithmeti Cpilities in Custom DSP Arhitetures, DAC, 997, p. 287292 P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 29
(Some) Generliztions of SOA Sh SOA Lim Sh GOA Choi SOA Lio, Leupers GOA Leupers CSOA N/A Ottoni P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 30
Comprison of ifferent pprohes for (k,m,r)oa Author(s) Soure Prolem (k,m,r) APA OA A S M Approh Brtley Lio (,0,) Leupers @ ICCAD 96 (k,0,) Hmilton pth tie rek Geotys ICCAD 97 (k,0,r) Network flow Sursnm DAC 97 (k,0,r) Effet of r on oe size Bsu @ VLSI 97 (k,0,) Fous on rrys in loops Leupers @ ISSS 98 (k,m,) Geneti lgorithm Lim CODES 200 (,0,) VAG eges minimize uring sheuling Choi, Kim DAC 2002 (k,0,) Ottoni SCOPES 2003 (,0,) Vrile olesing Wess SCOPES 2004 (k,0,r) Iterte APAGOA yles k= ress regs.; m=moif. regs.; r: offset rnge; APA: ress pointer ss.; OA: offset ss.; A: ess se (), prtitione vriles (); S: integrte sheuling (); M: moulo ressing; @: t U. Dortm P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 3
Multiple memory nks Smple hrwre XMem From AGU or immeite X0 X Y0 Y Multiplier YMem From AGU or immeite Shifter ALU A B Shifter Prllel moves possile if using ifferent memory nks P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 32
Multiple memory nks Constrint grph genertion Preompte oe (symoli vriles n registers) Move v0,r0 v,r Move v2,r2 v3,r3 Constrint grph v0 v {XMem, YMem} r0 r {X0,X,Y0, Y, A, B} Do not ssign to sme register v2 v3 {XMem, YMem} r2 r3 {X0,X,Y0, Y, A,B} Links mintine, more onstrints... P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 33
Multiple memory nks Coe size reution through simulte nneling pm rv2 rv lms fir Convolution iir iqu Rel upte Complex upte Complex multiply 0 20 % 40 % 60 % [Sursnm, Mlik, 99] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 34
Summry Exploiting some of the speil fetures of emee proessors in ompilers utoinrement n erement moes ress pointer ssignment prolem (APA) simple offset ssignment prolem (SOA) SOA with vrile olesing integrte sheuling generl offset ssignment prolem use of moify registers multiple memory nks P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 3