Archit ect ure s w it h

Similar documents
Nov Julien Michel

I zm ir I nstiute of Technology CS Lecture Notes are based on the CS 101 notes at the University of I llinois at Urbana-Cham paign

The Ind ian Mynah b ird is no t fro m Vanuat u. It w as b ro ug ht here fro m overseas and is now causing lo t s o f p ro b lem s.

Alles Taylor & Duke, LLC Bob Wright, PE RECORD DRAWINGS. CPOW Mini-Ed Conf er ence Mar ch 27, 2015

C o r p o r a t e l i f e i n A n c i e n t I n d i a e x p r e s s e d i t s e l f

Agenda Rationale for ETG S eek ing I d eas ETG fram ew ork and res u lts 2

Spontaneous reactions occur only between the reactants shown in red.

Building Validation Suites with Eclipse for M odel-based G eneration Tools

T h e C S E T I P r o j e c t

A L A BA M A L A W R E V IE W

LSU Historical Dissertations and Theses

CLASSIFICATION OF REACTION TO FIRE PERFORMANCE IN ACCORDANCE WITH EN :2007+A1:2009

e-hm REPAIR PARTS REPAIR PARTS ReHM R3

Software Process Models there are many process model s in th e li t e ra t u re, s om e a r e prescriptions and some are descriptions you need to mode

OH BOY! Story. N a r r a t iv e a n d o bj e c t s th ea t e r Fo r a l l a g e s, fr o m th e a ge of 9

S ca le M o d e l o f th e S o la r Sy ste m

Th e E u r o p e a n M ig r a t io n N e t w o r k ( E M N )

P a g e 5 1 of R e p o r t P B 4 / 0 9

176 5 t h Fl oo r. 337 P o ly me r Ma te ri al s

exclusive fully t iled one-piece p o o l s & s p a s Delivered t o your door High Qualit y Fully t iled & complet e page 1

THIS PAGE DECLASSIFIED IAW E

THE SHARED MILK RUN THE MISSING LINK IN LEAN MANUFACTUING HOW IT WORKS. spl it - bil l mil kr un. What is it?

Geometric Predicates P r og r a m s need t o t es t r ela t ive p os it ions of p oint s b a s ed on t heir coor d ina t es. S im p le exa m p les ( i

APPLICATION OF AUTOMATION IN THE STUDY AND PREDICTION OF TIDES AT THE FRENCH NAVAL HYDROGRAPHIC SERVICE

The Ability C ongress held at the Shoreham Hotel Decem ber 29 to 31, was a reco rd breaker for winter C ongresses.

H STO RY OF TH E SA NT

CHAPTER 1: TOOLS FOR ALGEBRA AND GEOMETRY

Fr anchi s ee appl i cat i on for m

R C o n t r o l l e d D o o r C l o s e r

Form and content. Iowa Research Online. University of Iowa. Ann A Rahim Khan University of Iowa. Theses and Dissertations

M M 3. F orc e th e insid e netw ork or p rivate netw ork traffic th rough th e G RE tunnel using i p r ou t e c ommand, fol l ow ed b y th e internal

FOR SALE T H S T E., P R I N C E AL BER T SK

Dangote Flour Mills Plc

ID: V i e w A l l F i r s t 1 of 1 L a s t

CPU. 60%/yr. Moore s Law. Processor-Memory Performance Gap: (grows 50% / year) DRAM. 7%/yr. DRAM

Alignment met hods developed for t he validat ion of t he t hermal and mechanical behavior of t he Two Beam Test Modules for t he CLIC project

Large chunks. voids. Use of Shale in Highway Embankments

Procedures for Computing Classification Consistency and Accuracy Indices with Multiple Categories

Executive Committee and Officers ( )

Use precise language and domain-specific vocabulary to inform about or explain the topic. CCSS.ELA-LITERACY.WHST D

Physics 663. Par t icle Physics Phenomenology. May 7, Physics 663, lecture 8 1

P a g e 3 6 of R e p o r t P B 4 / 0 9

Lesson Ten. What role does energy play in chemical reactions? Grade 8. Science. 90 minutes ENGLISH LANGUAGE ARTS


OPTIMISATION PROCESSES IN TIDAL ANALYSIS

K E L LY T H O M P S O N

F O R M T H R E E K enya C ertificate of Secondary E ducation

TIDAL PREDICTION WITH A SMALL PERSONAL COMPUTER

S O M A M O H A M M A D I 13 MARCH 2014

Designing the Human Machine Interface of Innovative Emergency Handling Systems in Cars

A Study of Drink Driving and Recidivism in the State of Victoria Australia, during the Fiscal Years 1992/ /96 (Inclusive)

Protein- Ligand Interactions

Table of C on t en t s Global Campus 21 in N umbe r s R e g ional Capac it y D e v e lopme nt in E-L e ar ning Structure a n d C o m p o n en ts R ea

I M P O R T A N T S A F E T Y I N S T R U C T I O N S W h e n u s i n g t h i s e l e c t r o n i c d e v i c e, b a s i c p r e c a u t i o n s s h o

(2009) Journal of Rem ote Sensing (, 2006) 2. 1 (, 1999), : ( : 2007CB714402) ;

TH E TO P 5 D O G O W N ER S M A K E

Rotary D ie-cut System RD series. RD series. Rotary D ie-cut System

REGISTRATION DOCUMENTATION ENROLLMENT RENEWAL SCHOOL YEAR STUDENT

GENETIC ALGORlIHMS WITH AN APPLICATION TO NONLINEAR TRANSPORTATION PROBLEMS

OR D ER FOR SUPPLIES OR SER VICES

c. What is the average rate of change of f on the interval [, ]? Answer: d. What is a local minimum value of f? Answer: 5 e. On what interval(s) is f

Benchm ar k One St ud y Guid e: Science Benchm ar k Wed. Oct. 2nd

ID: V i e w A l l F i r s t 1 of 1 L a s t

Use precise language and domain-specific vocabulary to inform about or explain the topic. CCSS.ELA-LITERACY.WHST D

Chapter 5 Solutions. Problem 5.1

Chapter 5 Workshop on Fitting of Linear Data

Basic Semantics of Prolog. CAS 706 Program m ing Language. Xinjun Wu mcmaster.ca

THIS PAGE DECLASSIFIED IAW EO IRIS u blic Record. Key I fo mation. Ma n: AIR MATERIEL COMM ND. Adm ni trative Mar ings.

Part 2: Video Coding Techniques

On the M in imum Spann ing Tree Determ ined by n Poin ts in the Un it Square

Contents. Samsung Pay W hat I s I t? Samsung Pay D et ails Card Regist rat ion-relat ed Q uest ions... 4

What are S M U s? SMU = Software Maintenance Upgrade Software patch del iv ery u nit wh ich once ins tal l ed and activ ated prov ides a point-fix for

Building Harmony and Success

o Alphabet Recitation

Internet-assisted Chinese-English Dictionary Compilation

MAST ACADEMY OUTREACH. WOW (Weather on Wheels)

Gen ova/ Pavi a/ Ro ma Ti m i ng Count er st at Sep t. 2004

Real Gas Equation of State for Methane

Applying Metrics to Rule-Based Systems

BOGOLIUBOV LABORATORY OF THEORETICAL PHYSICS. Theoretical Physics at BLTP. Precision tests of the Standard Model at LHC

Growth of A lgan f ilm s w ith d ifferen t A l fraction on A ln tem pla te

Grain Reserves, Volatility and the WTO

Beechwood Music Department Staff

Few thoughts on PFA, from the calorim etric point of view

On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work

INTERIM MANAGEMENT REPORT FIRST HALF OF 2018

Results as of 30 September 2018

Presented by Arkajit Dey, Matthew Low, Efrem Rensi, Eric Prawira Tan, Jason Thorsen, Michael Vartanian, Weitao Wu.

INS TITU TE O F CLINICAL CH E M IS TR Y AND LABO R ATO R Y

[ZB HASTINGS SUMMARY ESTABLISHMENTPHASE. AuroraDistrib ute d Solar,LLC 12/1/2017 Page 1of 4 Re vision 0. Ge ne ralcom m e nts

EC 219 SA M PLING AND INFERENCE. One and One HalfH ours (1 1 2 H ours) Answerallparts ofquestion 1,and ONE other question.

Origami: Folding Warps for Energy Efficient GPUs

I AM ASLEEP. 1. A nostalgia fo r Being. 2. The life force

SPU TTERIN G F R O M A LIQ U ID -PH A SE G A -IN EUTECTIC ALLOY KEVIN M A R K H U B B A R D YALE UNIVER SITY M A Y

BER KELEY D AV IS IR VINE LOS AN GELES RIVERS IDE SAN D IEGO S AN FRANCISCO

Per cent Wor d Pr oblems

Me n d e l s P e a s Exer c i se 1 - Par t 1

M a rtin H. B r e e n, M.S., Q u i T. D a n g, M.S., J o se p h T. J a in g, B.S., G reta N. B o y d,

High Performance Multiscale Simulation for Crack Propagation

K owi g yourself is the begi i g of all wisdo.

Transcription:

Inst ruct ion Select ion for Com pilers t hat Target Archit ect ure s w it h Echo Inst ruct ions Philip Brisk Ani Nahapetian Majid philip@cs.ucla.edu ani@cs.ucla.edu Sarrafzadeh m ajid@cs.ucla.edu Em bedded and Reconfigurable Syst em s Lab Com put er Science Depart m ent Universit y of California, Los Angeles

Outline Code Com pression LZ7 7 Com pression Echo Inst ruct ions Com piler Fram ew ork Experim ent al Result s Sum m ary

Code Com pression W hy Com press a Program? Silicon Require m ent s for on-chip ROM Pow e r Consum pt ion Cost t o Fabricat e Cost Paid by t he Consum er Are t here Cost s t o Decom pression? Perform a nce Overhe ad Dedicat e d Hardw a re (or Soft w are) Longer CPU Pipelines Incre ase d Bra nch M ispredict ion Pe na lt y

LZ77 Com pression To Com press a St ring, Ident ify Repeat ed Subst rings and Replace Each w it h a Point er (Offset, Lengt h of Sequence) ABCDBCABCDBACABCDBADAABCDBDC 2 8 ABCDBCABCDBACABCDBADAABCDBDC 2 8 one chara ct er Lengt h

Echo Instructions Decom presses LZ7 7 -com pressed Program s w it h M inim al Hardw are Requirem ent s 2 Dedicat e d Regist ers: R 1, R 2 1 Decre m e nt e r w it h = 0 Test ( NOR) Echo( Offset, N) 1. Save PC and N in R 1 a nd R 2 2. Branch t o PC Offse t 3. Execut e t he next N inst ruct ions 4. Re t urn t o t he Call Point 5. Re st ore PC from R 1 (Fraser, M icrosoft 0 2 ) (La u, CASES 0 3 )

Substring Matching 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 (Fraser, SCC 100 104 108 112 116 340 344 348 352 356 388 392 396 400 404 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 $Echo(240, 5) $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 Ech o(304, 5) $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1

Reschedule/Renam e 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10 $11 / $6 $10 $6 + 10 $11 $7 * $8 $1 $2 + $3 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 Renam e $4 : $3 $5 : $2 $6 : $8 $9 : $7 $10 : $1 $11 : $11 (Cooper, PLDI 9 9 ) (Debray, TOPLAS 0 0 ) Reschedule (Lau, CASES 0 3 ) 100 104 108 112 116 340 344 348 352 356 388 392 396 400 404 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1

DFG Isom orphism 100 104 108 112 116 340 344 348 352 356 404 408 412 416 420 $1 $2 + $3 $11 $7 * $8 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 $10 $5 + $4 $11 $9 * $6 $6 $9 * $10 $10 $11 / $6 $10 $6 + 10 $11 $7 * $8 $1 $2 + $3 $8 $7 * $1 $1 $11 / $8 $1 $8 + 1 + * / + * Our Approach Represe nt Sequence s a s Da t a Flow Graphs Ide nt ify Repeat ed Isom orphic Subgra phs Replace Subgraphs w it h Echo Inst ruct ions

Isom orphic Subgraph Identification Edge Cont ract ion (Kast ner, ICCAD 0 1 ) Conside r a New Subgraph for Each DFG Edge 1 3 1 1 1 1 2 0

Isom orphic Subgraph Identification Com put e an Independent Set for Each Edge Type NP-Com ple t e Problem It erat ive Im provem ent Algorit hm - (Kirovski, DAC 9 8 ) 1 2 1 1 8 1 1

Isom orphic Subgraph Identification Replace M ost Frequent ly Occurring Pat t ern w it h a Tem plat e Original DFG Edge Dat a Dependencies Incident on Te m plat es Dat a Dependencies t hat Cross Tem plat e Boundaries

Isom orphic Subgraph Identification Edge Cont ract ion in t he Presence of Tem plat es Gene ra t e Ne w Tem pla t es Along Bold Edges Test for Tem pla t e Equivale nce is DAG Isom orphism Used t he Publicly Available VF2 Algorit hm

Isom orphic Subgraph Identification 1 1 1 1 2 1 1 1 1 1 4

Isom orphic Subgraph Identification Replace M ost Frequent ly Occurring Pat t ern w it h a Tem plat e Original DFG Edge Dat a Dependencies Incident on Te m plat es Dat a Dependencies t hat Cross Tem plat e Boundaries

Isom orphic Subgraph Identification Replace M ost Frequent ly Occurring Pat t ern w it h a Tem plat e Original DFG Edge Dat a Dependencies Incident on Te m plat es Dat a Dependencies t hat Cross Tem plat e Boundaries

Register Allocation Isom orphic Tem plat es M ust Have Ident ical Usage of Regist ers Regist er s Shuffle or Spill Code

Register Allocation Code Reuse Const raint s M ay W ork Against Code Size Each Te m plat e Elim inat es 3 Inst rs. 5 Shuffle/Spill Ops. are Required The General Problem is Very Com plicat ed Exist ing Allocat ion Te chniques Are Not Applicable Regist er s Shuffle or Spill Code Present St at us: The Alloca t or is a W ork-in-

Isom orphic Subgraph Identification Aft er Regist er Allocat ion, Replace Subgraphs w it h Echo Inst ruct ions Echo Echo Echo

Our Goal is t o Evaluat e t he Effect iveness of Subgraph Ident ificat ion Experim ental Fram ework Built Subgraph Ident ificat ion int o t he M achine-suif Fram ew ork Pass Pla ced Be t w een Inst ruct ion Select ion and Regist e r Alloca t ion Curre nt Im ple m e nt a t ion Support s Alpha as Target Allow s for Fut ure Int e grat ion w it h Sim plescalar Sim ulat or

Experim ental Methodology W it hout Allocat ion in Place, W e Cannot : Est im at e W here Shuffle /Spill Code W ill be Insert ed at Tem plat e Boundaries Det erm ine W hich Copy Inst ruct ions W ill be Coalesce d But W e Can: M ake Assum pt ions Regarding t he St art ing Point for Regist er Allocat ion

Two Approaches to Coalescing Pessim ist ic Coalescing (M ost Allocat ors) Begin w it h All Copy Inst ruct ions in Pla ce Coalesce Copies W hen Safe Opt im ist ic Coalescing (Park & M oon, PACT 9 8 ): Init ially Coalesce ALL Copy Inst ruct ions Re-Int roduce Coale sced Copies t o Avoid Spilling Live Ranges W he ne ver Possible

Assum ptions and Measurem ent Pessim ist ic Assum pt ion No Copy Inst ruct ions a re Coalesced Opt im ist ic Assum pt ion ALL Copy Inst ruct ions are Coale sced Com put e t he Num ber of DFG Operat ions Before and Aft er Com pression St ep PU: Pessim ist ic, Uncom pressed OU: Opt im ist ic, Uncom pressed PC: Pessim ist ic, Com pressed Com pressed OC: Opt im ist ic,

Benchm arks Taken from t he M ediabench and M ibench Applicat ion Suit es Be nchm a r k ADPCM low fish Epic G7 2 1 JPEG M PEG2 Dec M PEG2 Enc Pegw it Descript ion Adapt ive Different Sym ial m Pulse et ric Code Block M odulat Cipher ion w it h Variable Key Lengt h Im age Dat a Com pression Ut ilit y Voice Com pression Im age Com pression and Decom pre ssion M PEG2 Decoder M PEG2 Encoder Public Key Encrypt ion and Aut hent icat ion

Experim ental Results PU: Pessim ist ic, Uncom pressed OU: Opt im ist ic, Uncom pressed PC: Pessim ist ic, Com pressed 90000 Com pressed 80000 OC: Opt im ist ic, DFG Ops 70000 60000 50000 40000 30000 20000 10000 PU PC OU OC 0 ADPCM Blowfish Epic G721 JPEG MPEG2 Dec. MPEG2 Enc. Pegwit

Runtim e Considerations Algorit hm Ran Efficient ly ( a few seconds) for M ost Benchm arks Several Not able Except ions Four Com m on Fe at ure s Large DFGs User-De fine d M acros Unrolled Loops Cyclic Shift ing of Pa ra m et ers sha1.c (Pegw it ) One DFG Com pilat ion Tim e W as in Excess of 3 Hrs

Runtim e Considerations sha1.c # define R0 ( v, w, x, y, z, i) { z + = ; w = } void SHA1 Transform ( unsigned long st at e[5], ) { unsigned long a = st at e[0 ], b = st at e[1 ], c = st at e[2 ], d = st at e[3], e = st at e[4 ] ; } R0 (a, b, c, d, e, 0 ); R0 (e, a, b, c, d, 1 ); R0 (d, e, a, b, c, 2 ); R0 (c, d, e, a, b, 3 ); R0 (b, c, d, e, a, 4 ); R0 (a, b, c, d, e, 5 ); R0 (a, b, c, d, e, 1 5 );

Com pilat ion Tim e W as Reduced t o Seconds Runtim e Considerations sha1.c # define R0 ( v, w, x, y, z, i) { z + = ; w = } void SHA1 Transform ( unsigned long st at e[5], ) { unsigned long a = st at e[0 ], b = st at e[1 ], c = st at e[2 ], d = st at e[3], e = st at e[4 ], t m p; } for( unsigned long i = 0 ; i < 1 6 ; i+ + ) { R0 (a, b, c, d, e, i); t m p = e; e = d; d = c; c = b; b = a; a = t m p; }

Conclusion Echo Inst ruct ions Com pression a t a M inim al Hardw are Cost Perform a nce Overhead is Tw o Branches pe r Echo Com piler Opt im izat ion Ident ify Redunda ncy via Subgra ph Isom orphism New Cha llenge s for Regist er Allocat ion Experim ent s Significant Redunda ncy Observed in