Key compiler algorithms (for embedded systems)

Similar documents
Technology Mapping Method for Low Power Consumption and High Performance in General-Synchronous Framework

CS 491G Combinatorial Optimization Lecture Notes

Counting Paths Between Vertices. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs. Isomorphism of Graphs

Now we must transform the original model so we can use the new parameters. = S max. Recruits

Mid-Term Examination - Spring 2014 Mathematical Programming with Applications to Economics Total Score: 45; Time: 3 hours

Common intervals of genomes. Mathieu Raffinot CNRS LIAFA

Lecture 6: Coding theory

The DOACROSS statement

CIT 596 Theory of Computation 1. Graphs and Digraphs

22: Union Find. CS 473u - Algorithms - Spring April 14, We want to maintain a collection of sets, under the operations of:

Algebra 2 Semester 1 Practice Final

CSE 332. Sorting. Data Abstractions. CSE 332: Data Abstractions. QuickSort Cutoff 1. Where We Are 2. Bounding The MAXIMUM Problem 4

Implication Graphs and Logic Testing

Coding Techniques. Manjunatha. P. Professor Dept. of ECE. June 28, J.N.N. College of Engineering, Shimoga.

Laboratory for Foundations of Computer Science. An Unfolding Approach. University of Edinburgh. Model Checking. Javier Esparza

Algorithm Design and Analysis

Metaheuristics for the Asymmetric Hamiltonian Path Problem

for all x in [a,b], then the area of the region bounded by the graphs of f and g and the vertical lines x = a and x = b is b [ ( ) ( )] A= f x g x dx

Metodologie di progetto HW Technology Mapping. Last update: 19/03/09

I 3 2 = I I 4 = 2A

SOLUTIONS TO ASSIGNMENT NO The given nonrecursive signal processing structure is shown as

Solutions for HW9. Bipartite: put the red vertices in V 1 and the black in V 2. Not bipartite!

Solutions to Problem Set #1

Numbers and indices. 1.1 Fractions. GCSE C Example 1. Handy hint. Key point

Resources. Introduction: Binding. Resource Types. Resource Sharing. The type of a resource denotes its ability to perform different operations

Algorithm Design and Analysis

Part 4. Integration (with Proofs)

Monochromatic Plane Matchings in Bicolored Point Set

The Minimum Label Spanning Tree Problem: Illustrating the Utility of Genetic Algorithms

Lecture 11 Binary Decision Diagrams (BDDs)

A Primer on Continuous-time Economic Dynamics

6.5 Improper integrals

Data Structures LECTURE 10. Huffman coding. Example. Coding: problem definition

Review of Gaussian Quadrature method

CS261: A Second Course in Algorithms Lecture #5: Minimum-Cost Bipartite Matching

arxiv: v1 [cs.dm] 24 Jul 2017

Engr354: Digital Logic Circuits

Eigenvectors and Eigenvalues

XML and Databases. Outline. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths. 1. Top-Down Evaluation of Simple Paths

Lesson 2.1 Inductive Reasoning

Graph Theory. Simple Graph G = (V, E). V={a,b,c,d,e,f,g,h,k} E={(a,b),(a,g),( a,h),(a,k),(b,c),(b,k),...,(h,k)}

CS 2204 DIGITAL LOGIC & STATE MACHINE DESIGN SPRING 2014

( ) { } [ ] { } [ ) { } ( ] { }

Subsequence Automata with Default Transitions

Factorising FACTORISING.

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

Global alignment. Genome Rearrangements Finding preserved genes. Lecture 18

AP CALCULUS Test #6: Unit #6 Basic Integration and Applications

Section 2.3. Matrix Inverses

MCH T 111 Handout Triangle Review Page 1 of 3

System Validation (IN4387) November 2, 2012, 14:00-17:00

Outline Data Structures and Algorithms. Data compression. Data compression. Lossy vs. Lossless. Data Compression

We will see what is meant by standard form very shortly

Separable discrete functions: recognition and sufficient conditions

NON-DETERMINISTIC FSA

2.4 Theoretical Foundations

EE 108A Lecture 2 (c) W. J. Dally and P. Levis 2

Lesson 2.1 Inductive Reasoning

y = c 2 MULTIPLE CHOICE QUESTIONS (MCQ's) (Each question carries one mark) is...

Vidyalankar S.E. Sem. III [CMPN] Discrete Structures Prelim Question Paper Solution

Necessary and sucient conditions for some two. Abstract. Further we show that the necessary conditions for the existence of an OD(44 s 1 s 2 )

Aperiodic tilings and substitutions

Project 6: Minigoals Towards Simplifying and Rewriting Expressions

On a Class of Planar Graphs with Straight-Line Grid Drawings on Linear Area

Solutions to Chapte 7: Channel Coding

CS 360 Exam 2 Fall 2014 Name

Preview 11/1/2017. Greedy Algorithms. Coin Change. Coin Change. Coin Change. Coin Change. Greedy algorithms. Greedy Algorithms

Unit 4. Combinational Circuits

Chapter 2 Finite Automata

B Veitch. Calculus I Study Guide

Computing all-terminal reliability of stochastic networks with Binary Decision Diagrams

A Lower Bound for the Length of a Partial Transversal in a Latin Square, Revised Version

1 This diagram represents the energy change that occurs when a d electron in a transition metal ion is excited by visible light.

Genetic Programming. Outline. Evolutionary Strategies. Evolutionary strategies Genetic programming Summary

Fast Frequent Free Tree Mining in Graph Databases

Surds and Indices. Surds and Indices. Curriculum Ready ACMNA: 233,

6. Suppose lim = constant> 0. Which of the following does not hold?

Nondeterminism and Nodeterministic Automata

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

MATH 122, Final Exam

Section 1.3 Triangles

Chapter 4 State-Space Planning

Solids of Revolution

Solutions to Assignment 1

Welcome. Balanced search trees. Balanced Search Trees. Inge Li Gørtz

Area and Perimeter. Area and Perimeter. Solutions. Curriculum Ready.

THE EXISTENCE-UNIQUENESS THEOREM FOR FIRST-ORDER DIFFERENTIAL EQUATIONS.

CSC2542 State-Space Planning

Efficient Parameterized Algorithms for Data Packing

Precise timing analysis for direct-mapped caches

Generalization of 2-Corner Frequency Source Models Used in SMSIM

A Transformation Based Algorithm for Reversible Logic Synthesis

Towards Efficient Consistency Enforcement for Global Constraints in Weighted Constraint Satisfaction

Ranking Generalized Fuzzy Numbers using centroid of centroids

University of Sioux Falls. MAT204/205 Calculus I/II

ILLUSTRATING THE EXTENSION OF A SPECIAL PROPERTY OF CUBIC POLYNOMIALS TO NTH DEGREE POLYNOMIALS

Situation Calculus. Situation Calculus Building Blocks. Sheila McIlraith, CSC384, University of Toronto, Winter Situations Fluents Actions

March eq Implementing Additional Reasoning into an Efficient Look-Ahead SAT Solver

Sections 5.3: Antiderivatives and the Fundamental Theorem of Calculus Theory:

Let s divide up the interval [ ab, ] into n subintervals with the same length, so we have

Transcription:

Key ompiler lgorithms (for emee systems) Peter Mrweel University of Dortmun, Germny P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri

0 0. 0.0 Universität Dortmun Opertions/Wtt [MOPS/mW].0µ ASIC Reonfigurle Computing Proessors 0.µ 0.2µ Motivtion: Wishes n Relity Amient Intelligene 0.3µ Avoi poor esign tehniques! 0.07µ Tehnology P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Results for DSPStone enhmrk Cyle overhe [ n] 8.0 7.0 6.0.0 4.0 3.0 2.0.0 TIC ADI20 DSP600 Fri 2

Reson for ompilerprolems: Applitionoriente Arhitetures Applition: u..: y[j] = i=0 x[ji]*[i] i: 0 i n: y i [j] = y i [j] x[ji]*[i] Arhiteture: Exmple: Dt pth ADSP20x D x P n Aressregisters A0, A, A2.. i, ji Aress genertion unit (AGU) AX AY,,.. AR AF x[ji] MX *, MR MY [i] MF x[ji]*[i] y i [j] Prllelism Deite registers No mthing ompiler ineffiient oe P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 3

Exploittion of prllel ress omputtions Aress Register File Generi ress genertion unit (AGU) moel Instrution Fiel Memory / Moify Register File P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Prmeters: k = # ress registers m = # moify registers Cost metri for AGU opertions: Opertion Fri 4 ost immeite AR lo immeite AR moify utoinrement/ 0 erement AR = MR 0

Aress pointer ssignment (APA) Given: Memory lyout ssemly oe (without ress oe) 0 2 3 r i r i lt r mpy r ltp i mpy r sl r ltp i mpy r p sl r How to ess r? time Aress pointer ssignment (APA) is the suprolem of fining n llotion of ress registers for given memory lyout n given sheule. P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri

Generl pproh: Minimum Cost Cirultion Prolem Let G = (V,E,u,), with (V,E): irete grph u: E R 0 is pity funtion, : E R is ost funtion; n = V, m = E. Definition: g: E R 0 is lle irultion if it stisfies : v V: w V:(v,w) E g(v,w)= w V:(w,v) E g(w,v) (flow onservtion) g is fesile if (v,w) E: g(v,w) u(v,w) (pity onstrints) The ost of irultion g is (g) = (v,w) E (v,w) g(v,w). There my e lower oun on the flow through n ege. The minimum ost irultion prolem is to fin fesile irultion of minimum ost. [K.D. Wyne: A Polynomil Comintoril Algorithm for Generlize Minimum Cost Flow, http://www.s.prineton.eu/ ~wyne/ ppers/ rtio.pf] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 6

Mpping APA to the Minimum Cost Cirultion Prolem Assemly sequene* lt r mpy r ltp i mpy i mpy r sl r ltp i mpy r p time sl r AR S 0 0 0 u(t S)= AR AR2 Flow into n out of vrile noes must e. Reple vrile noes y ege with lower oun= to otin pure irultion prolem irultion selete itionl eges of originl grph (only smples shown) * C2x proessor from ti Vriles T i r r i resses [C. Geotys: DSP Aress Optimiztion Using A Minimum Cost Cirultion Tehnique, ICCAD, 997] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 7

Results oring to Geotys Optimize oe size Originl oe size Limite to si loks P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 8

Beyon si loks: hnling rry referenes in loops Exmple: for (i=2; i<=n; i) {.. B[i] /*A2 */.. B[i] /*A */.. B[i2] /*A2 */.. B[i] /*A */.. B[i3] /*A2 */.. B[i] } /*A */ Cost for rossing loop ounries onsiere. Referene: A. Bsu, R. Leupers, P. Mrweel: Arry Inex Allotion uner Register Constrints, Int. Conf. on VLSI Design, Go/Ini, 999 Control steps Offsets P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 2 3 4 6 7 8 9 3 2 A X X X X A2 X X X 0 2 3 4 Fri 9 X X

Offset ssignment prolem (OA) Effet of optimise memory lyout Let's ssume tht we n moify the memory lyout offset ssignment prolem. (k,m,r)oa is the prolem of generting memory lyout whih minimizes the ost of ressing vriles, with k: numer of ress registers m: numer of moify registers r: the offset rnge The se (,0,) is lle simple offset ssignment (SOA), the se (k,0,) is lle generl offset ssignment (GOA). P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 0

SOA exmple Effet of optimise memory lyout Vriles in si lok: Aess sequene: V = {,,, } S = (,,,,, ) 0 2 3 Lo AR, ; AR = 2 ; AR = 3 ; AR = 2 ; AR ; AR ; 0 2 3 Lo AR,0 ; AR ; AR =2 ; AR ; AR ; AR ; ost: 4 ost: 2 P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri

SOA exmple: Aess sequene, ess grph n Hmiltonin pths ess sequene: 2 ess grph 2 mximum weighte pth= mx. weighte Hmilton pth overing (MW HC) 0 2 3 memory lyout SOA use s uiling lok for more omplex situtions signifint interest in goo SOA lgorithms P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 [Brtley, 992; Lio, 99] Fri 2

Nïve SOA Noes re e in the orer in whih they re use in the progrm. Exmple: Aess sequene: S = (,,,,, ) 0 0 0 0 2 3 0 2 3 memory lyout P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 3

Lio s lgorithm Similr to Kruskl s spnning tree lgorithms:. Sort eges of ess grph G=(V,E) oring to their weight 2. Construt new grph G =(V,E ), strting with E =0 3. Selet n ege e of G of highest weight; If this ege oes not use yle in G n oes not use ny noe in G to hve egree > 2 then this noe to E otherwise isr e. 4. Goto 3 s long s not ll eges from G hve een selete n s long s G hs less thn the mximum numer of eges ( V ). Exmple: Aess sequene: S=(,,,,, ) Impliit eges of weight 0 for ll unonnete noes. G 2 2 (,) (,) (,) (,) P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 0 0 0 0 G 2 Fri 4

Lio s lgorithm on more omplex grph e f f G 2 f G 2 f e e 2 2 P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri

P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Universität Dortmun Fri 6 Universität Dortmun Universität Dortmun Lio s lgorithm with tie rek heuristi Motivtion 6 4 6 4 6 4 6 4 Cost 9 6 4 6 4 Cost 6 [ Leupers]

Tierek funtion Aess grph (V,E,w), for eh ege e with weight w(e): T( e) w( e ) e E, e e 6 e 4 e' e' T(e)= In se of lterntive equlweight eges: give priority to the ege with lower weight T(e), euse this will (generlly) leve more freeom for seleting further highweight eges 6 4 e e' T(e)= P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 7

Comprison of ifferent SOA lgorithms Overhe [%] Runtime [mses] Delrtion Orer 67.09 0 Lio, s presente 4.34 0.67 Lio, with tie rek heuristis for eges of sme weight 0.6 0.68 Atri: itertive improvement of initil SOA solution 2.28 4.6 Sme s Atri, with tierek heuristi e 0. 23 Geneti lgorithm [Leupers] 0 8296.26 Brnh & Boun (for simple ses) 0 >> P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 8

Vrile olesing SOA (CSOA) Exmple: Aess sequene (v,x,v,u,v,u,v,y,u,y,u,x,u,x,u,x,u) v y x u Interferene grph y v 3 Aess grph y u 4 6 v x 2 Aess grph y, v Itertive extension of Hmilton pth n olesing Memory lyout u x 7 2 u 6 x y,v u x P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 9

Offset osts reltive to Lio [%] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 [Desiree Ottoni et l.: Improving Offset Assignment through Simultneous Vrile Colesing, SCOPES, 2003] Fri 20

Memory size reutions [%] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 [D. Ottoni et l.: Improving Offset Assignment through Simultneous Vrile Colesing, SCOPES, 2003] Fri 2

Integrte Sheuling n SOA: ShSOA Result for SOA CSOA P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 22

Integrte Sheuling n SOA: ShSOA Result for ShSOA [Y. Choi, T. Kim, Aress Assignment Comine with Sheuling in DSP Coe Genertion, DAC, 2002, ACM] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 23

GOA: Aress ssignment for k ress registers Two ses. All esses to vrile one with the sme ress register Prtitioning of vriles into sets 2. Allotion of ress registers for eh ess to vrile. Better results, ut more omplex. A A2 A A2 P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 f j f j 2 2! Fri 24 t t

GOA with vrile prtitioning: Aress ssignment for k ress registers 6 See: k eges of highest weight. 4 e 2 7 4 f Itertion: A ege(s) with smllest inrementl ost for eges not overe. Ege not overe Disjoint sets of vrs GOA k SOA [Leupers & Mrweel, ICCAD, 996] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 2

Optimise use of m moifyregisters Given: Sequene of MoifyVlues: Lo AR,0 AR =2 AR =4 AR =3 AR =2 AR =3 AR =2 AR =4 AR =2 AR =4 No seprte ress lultion yle if onstnts re ontine in moify registers Moify registers re ontiners, onstnts represent ontents. Try to ssign ontents to ontiners suh tht ontentmisses our s less frequent s possile. Equivlent to pges n pge frmes. Optiml pge replement lgorithm of Bely n e pplie (reple pge with lrgest forwr istne) P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 26 [Leupers]

Optimising the use of m moify registers Given: sequene of moify vlues 2 4 3 2 3 2 4 2 4 MR Lrgest forwr istne MR2 Lrgest forwr istne P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 [Leupers&Mrweel, ICCAD, 996] Fri 27

Integrte memory llotion/ Moify register ssignment Stnr pproh Memory llotion Simultneous Optimistion using geneti lgorithm: Perentge of sve ress omputtion yles (Sequene length,# Vriles) (00,90) (00,0) M register ssignment (00,0) (0,40) (0,20) k=m=8 k=m=4 k=m=2 (0,0) 0 20 40 60 80 % P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 28 [Leupers, Dvi; ISSS 98]

Work on lrger utoinrement/erement rnges Coesize [kb] s funtion of the numer of ress registers (k) n the inrement rnges (r), ol numers inite minimum Meiumsize progrm k r 2 3 4 23. 0.3 8.7 03.0 07.2 2 22.9 07.0 07. 04.9.4 3 6.4 0.2 04.7 04.4.3 4 8.6 0.8.4.3 8.2 4.4 04.7.3.3 8.2 Smll progrm k r 2 3 4 29.7 2.73 26.70 26.47 28.3 2 27.86 26.9 28.26 28.26 30.0 3 2.92 26.33 28.22 28.22 30.0 4 30.4 28.22 30.0 30.0 3.98 26.70 28.22 30.0 30.0 3.98 (k=3, r=) n (k=2, r=3) re useful esigns. Lrger rnges o not reue oe size. Refererene: Sursnm, Lio, Devs: Anlysis n Aress Arithmeti Cpilities in Custom DSP Arhitetures, DAC, 997, p. 287292 P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 29

(Some) Generliztions of SOA Sh SOA Lim Sh GOA Choi SOA Lio, Leupers GOA Leupers CSOA N/A Ottoni P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 30

Comprison of ifferent pprohes for (k,m,r)oa Author(s) Soure Prolem (k,m,r) APA OA A S M Approh Brtley Lio (,0,) Leupers @ ICCAD 96 (k,0,) Hmilton pth tie rek Geotys ICCAD 97 (k,0,r) Network flow Sursnm DAC 97 (k,0,r) Effet of r on oe size Bsu @ VLSI 97 (k,0,) Fous on rrys in loops Leupers @ ISSS 98 (k,m,) Geneti lgorithm Lim CODES 200 (,0,) VAG eges minimize uring sheuling Choi, Kim DAC 2002 (k,0,) Ottoni SCOPES 2003 (,0,) Vrile olesing Wess SCOPES 2004 (k,0,r) Iterte APAGOA yles k= ress regs.; m=moif. regs.; r: offset rnge; APA: ress pointer ss.; OA: offset ss.; A: ess se (), prtitione vriles (); S: integrte sheuling (); M: moulo ressing; @: t U. Dortm P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 3

Multiple memory nks Smple hrwre XMem From AGU or immeite X0 X Y0 Y Multiplier YMem From AGU or immeite Shifter ALU A B Shifter Prllel moves possile if using ifferent memory nks P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 32

Multiple memory nks Constrint grph genertion Preompte oe (symoli vriles n registers) Move v0,r0 v,r Move v2,r2 v3,r3 Constrint grph v0 v {XMem, YMem} r0 r {X0,X,Y0, Y, A, B} Do not ssign to sme register v2 v3 {XMem, YMem} r2 r3 {X0,X,Y0, Y, A,B} Links mintine, more onstrints... P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 33

Multiple memory nks Coe size reution through simulte nneling pm rv2 rv lms fir Convolution iir iqu Rel upte Complex upte Complex multiply 0 20 % 40 % 60 % [Sursnm, Mlik, 99] P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 34

Summry Exploiting some of the speil fetures of emee proessors in ompilers utoinrement n erement moes ress pointer ssignment prolem (APA) simple offset ssignment prolem (SOA) SOA with vrile olesing integrte sheuling generl offset ssignment prolem use of moify registers multiple memory nks P. Mrweel, Univ. Dortmun/Informtik 2 ICD/ES, 2006 Fri 3