CPU. 60%/yr. Moore s Law. Processor-Memory Performance Gap: (grows 50% / year) DRAM. 7%/yr. DRAM

Similar documents
A L A BA M A L A W R E V IE W

OH BOY! Story. N a r r a t iv e a n d o bj e c t s th ea t e r Fo r a l l a g e s, fr o m th e a ge of 9

P a g e 5 1 of R e p o r t P B 4 / 0 9

M Line Card Redundancy with Y-Cab l es Seamless Line Card Failover Solu t ion f or Line Card H ardw or Sof t w are Failu res are Leverages hardware Y-

T h e C S E T I P r o j e c t

Table of C on t en t s Global Campus 21 in N umbe r s R e g ional Capac it y D e v e lopme nt in E-L e ar ning Structure a n d C o m p o n en ts R ea

P a g e 3 6 of R e p o r t P B 4 / 0 9

H STO RY OF TH E SA NT

Software Process Models there are many process model s in th e li t e ra t u re, s om e a r e prescriptions and some are descriptions you need to mode

176 5 t h Fl oo r. 337 P o ly me r Ma te ri al s

Agenda Rationale for ETG S eek ing I d eas ETG fram ew ork and res u lts 2

I zm ir I nstiute of Technology CS Lecture Notes are based on the CS 101 notes at the University of I llinois at Urbana-Cham paign

ICS 233 Computer Architecture & Assembly Language

Geometric Predicates P r og r a m s need t o t es t r ela t ive p os it ions of p oint s b a s ed on t heir coor d ina t es. S im p le exa m p les ( i

Gen ova/ Pavi a/ Ro ma Ti m i ng Count er st at Sep t. 2004

I M P O R T A N T S A F E T Y I N S T R U C T I O N S W h e n u s i n g t h i s e l e c t r o n i c d e v i c e, b a s i c p r e c a u t i o n s s h o

THIS PAGE DECLASSIFIED IAW E

What are S M U s? SMU = Software Maintenance Upgrade Software patch del iv ery u nit wh ich once ins tal l ed and activ ated prov ides a point-fix for

Executive Committee and Officers ( )

C o r p o r a t e l i f e i n A n c i e n t I n d i a e x p r e s s e d i t s e l f

Dangote Flour Mills Plc

o Alphabet Recitation

F l a s h-b a s e d S S D s i n E n t e r p r i s e F l a s h-b a s e d S S D s ( S o-s ltiad t e D r i v e s ) a r e b e c o m i n g a n a t t r a c

Use precise language and domain-specific vocabulary to inform about or explain the topic. CCSS.ELA-LITERACY.WHST D

W Table of Contents h at is Joint Marketing Fund (JMF) Joint Marketing Fund (JMF) G uidel ines Usage of Joint Marketing Fund (JMF) N ot P erm itted JM

Alles Taylor & Duke, LLC Bob Wright, PE RECORD DRAWINGS. CPOW Mini-Ed Conf er ence Mar ch 27, 2015

Instruction Sheet COOL SERIES DUCT COOL LISTED H NK O. PR D C FE - Re ove r fro e c sed rea. I Page 1 Rev A

Lesson Ten. What role does energy play in chemical reactions? Grade 8. Science. 90 minutes ENGLISH LANGUAGE ARTS

I N A C O M P L E X W O R L D

Use precise language and domain-specific vocabulary to inform about or explain the topic. CCSS.ELA-LITERACY.WHST D

Beechwood Music Department Staff

COMPILATION OF AUTOMATA FROM MORPHOLOGICAL TWO-LEVEL RULES

M M 3. F orc e th e insid e netw ork or p rivate netw ork traffic th rough th e G RE tunnel using i p r ou t e c ommand, fol l ow ed b y th e internal

THIS PAGE DECLASSIFIED IAW EO IRIS u blic Record. Key I fo mation. Ma n: AIR MATERIEL COMM ND. Adm ni trative Mar ings.

B ench mark Test 3. Special Segments in Triangles. Answers. Geometry B enchmark T ests. 1. What is AC if } DE is a midsegment of the triangle?

c. What is the average rate of change of f on the interval [, ]? Answer: d. What is a local minimum value of f? Answer: 5 e. On what interval(s) is f

FOR SALE T H S T E., P R I N C E AL BER T SK

M1 a. So there are 4 cases from the total 16.

Bellman-F o r d s A lg o r i t h m The id ea: There is a shortest p ath f rom s to any other verte that d oes not contain a non-negative cy cle ( can

Wint er 20 18?Special Edit ion? Elect ion Guide

This Specification is subject to change without notice

Fr anchi s ee appl i cat i on for m

Building Harmony and Success

The Ability C ongress held at the Shoreham Hotel Decem ber 29 to 31, was a reco rd breaker for winter C ongresses.

Description LB I/O15 I/O14 I/O13 I/O12 GND I/O11 I/O10 I/O9 I/O8

Th e E u r o p e a n M ig r a t io n N e t w o r k ( E M N )

Radioactive Decay and Half Life Simulation 2/17 Integrated Science 2 Redwood High School Name: Period:

2 tel

J A D A V PUR U N IV ERS IT Y K O LK AT A Fa cu lty of En gi n eer in g & T e ch no lo gy N O T I C E

_ J.. C C A 551NED. - n R ' ' t i :. t ; . b c c : : I I .., I AS IEC. r '2 5? 9

ECE 571 Advanced Microprocessor-Based Design Lecture 10

STANDARDIZATION OF BLENDED NECTAR USING BANANA PSEUDOSTEM SAP AND MANGO PULP SANTOSH VIJAYBHAI PATEL

You can see w h at gold is u sed f or dow n below : )

Provider Satisfaction

Le classeur à tampons

What Is Our Relationship with the Earth?

ALMA: Im aging the co ld Univ e rs e

Fall / Winter Multi - Media Campaign

e-hm REPAIR PARTS REPAIR PARTS ReHM R3

K owi g yourself is the begi i g of all wisdo.

Results as of 30 September 2018

S ca le M o d e l o f th e S o la r Sy ste m

Lecture: Pipelining Basics

Trade Patterns, Production networks, and Trade and employment in the Asia-US region

minceymarble.com 4321 Browns Bridge Road, Gainesville, GA Ph: Fx:

2

Form and content. Iowa Research Online. University of Iowa. Ann A Rahim Khan University of Iowa. Theses and Dissertations

Welcome to the Public Meeting Red Bluff Road from Kirby Boulevard to State Highway 146 Harris County, Texas CSJ No.: December 15, 2016

I n t e r n a t i o n a l E l e c t r o n i c J o u r n a l o f E l e m e n t a r y E.7 d u, c ai ts is ou n e, 1 V3 1o-2 l6, I n t h i s a r t

The Ind ian Mynah b ird is no t fro m Vanuat u. It w as b ro ug ht here fro m overseas and is now causing lo t s o f p ro b lem s.

Chapter 5 Solutions. Problem 5.1

Building Validation Suites with Eclipse for M odel-based G eneration Tools

Fragment Processor. Textures

INTERIM MANAGEMENT REPORT FIRST HALF OF 2018

2. T H E , ( 7 ) 2 2 ij ij. p i s

Middle Aged Men and Suicide

heliozoan Zoo flagellated holotrichs peritrichs hypotrichs Euplots, Aspidisca Amoeba Thecamoeba Pleuromonas Bodo, Monosiga


I/O7 I/O6 GND I/O5 I/O4. Pin Con fig u ra tion Pin Con fig u ra tion

MLSE in a single path channel. MLSE in a multipath channel. State model for a multipath channel. State model for a multipath channel

A new ThermicSol product

MAHARASHTRA STATE BOARD OF TECHNICAL EDUCATION

CMP 338: Third Class

Parts Manual. EPIC II Critical Care Bed REF 2031

WELCOME. O ne Vi si on Photography i s a award wi nni ng wed d i ng photographer & wed d i ng vi d eography i n S outh Wal e s

EKOLOGIE EN SYSTEMATIEK. T h is p a p e r n o t to be c i t e d w ith o u t p r i o r r e f e r e n c e to th e a u th o r. PRIMARY PRODUCTIVITY.

Spontaneous reactions occur only between the reactants shown in red.

SPECIFICATION SHEET : WHSG4-UNV-T8-HB

LSU Historical Dissertations and Theses

THIS PAGE DECLASSIFIED IAW EO 12958

I N F O R M A T I O N A N D C O M M U N I C A T I O N S T E C H N O L O G Y C O U N C I L ( I C T C )

m e m b e r s c o m e to feel less a m b i v a l e n t a b o u t t h e w o r k e r a n d r e l a t e

PC Based Thermal + Magnetic Trip Characterisitcs Test System for MCB

t h e c r i m i n a l l a w (T e m p o r a r y p r o v i s i o n s ) a c t,' 1957.V t -. -.» 0 [A pr N o. L X I o f 1957.] /

Bimetal Industrial Thermometers

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

NEC and OSS NEC Co r p o r a t i o n 2007

User Equilibrium in a Disrupted Network with Real-Time Information and Heterogeneous Risk Attitude

Australia November 13, 2017

SPU TTERIN G F R O M A LIQ U ID -PH A SE G A -IN EUTECTIC ALLOY KEVIN M A R K H U B B A R D YALE UNIVER SITY M A Y

Transcription:

ecture 1 3 C a ch e B a s i cs a n d C a ch e P erf o rm a n ce Computer Engineering 585 F a l l 2 0 0 2

What Is emory ierarchy typical memory hierarchy today "! '& % ere we focus on 1/2/3 caches and main memory

Why emory ierarchy Performance 1000 100 10 1 1980 1981 oore s aw 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 CPU D µproc 60%/yr. Processor-emory Performance ap grows 50% / year) D 7%/yr. 1980 no cache in µ p r oc; 1995 2 - l ev el cache on chip 1989 f ir s t I nt el µ p r oc w it h a cache on chip )

en erat ion s of icrop roces s ors Time of a full cache miss in instructions executed 1 st lp ha 3 4 0 ns/ 5. 0 ns 6 8 clk s x 2 or 1 3 6 2 nd lp ha 2 6 6 ns/ 3. 3 ns 8 0 clk s x 4 or 3 2 0 3 rd lp ha 1 8 0 ns/ 1. 7 ns 1 0 8 clk s x 6 or 6 4 8 1 / 2 X latency x 3 X clock rate x 3 X I nstr/ clock 4.5X

rea Costs of Caches P r o c e s s o r % r e a % T r a n s i s t o r s - c o s t ) - p o w e r ) I n t e l 8 0 3 8 6 0 % 0 % l p h a 2 1 1 6 4 3 7 % 7 7 % t r o n g r m 1 1 0 6 1 % 9 4% P e n t i u m P r o 6 4% 8 8 % 2 dies per package Proc/I/D + 2 It an iu m 9 2% C ach es st ore redu n dan t dat a on l y t o cl ose perf orm an ce gap

" % % What Is Exactly Cache mall, fast storage used to improve average access time to slow memory ; usually made b y Exploits locality spatial and temporal I n compu ter arch itectu re, almost ev eryth ing is a cach e!! % &! B eyond arch itectu re f ile cach e, b row ser cach e, proxy cach e ere w e f ocu s on 1 and 2 cach es 3 optional) as b u f f ers to main memory

- * 1 " % Example 1 KB Direct apped Cache b l ock s, b l ock si z e of 2 b y t es, 2 ssume a cache of 2 b y t es; N + K b l ock t i mes b l ock si z e) The cache stores tag, data, and valid bit for each block *, * +, * ' ) ' & %! " " / 0 /. % 2. " / 0 1 0 4 9 Block address 31 Block offset Ex 0x00 Index Ex 0x01 Tag Example 0x50 tored as part of the cache state 0 1 2 3 Byte 0 Byte 32 Cache Data Byte 31 Byte 63 Cache Tag Valid Bit Byte 1 Byte 33 0x50 Byte 992 31 Byte 1023

Four Questions bout Cache Design Block placement W h er e can a b lock b e placed Block i d enti f i cati on ow to f i nd a b lock i n th e cach e Block r eplacement I f a new b lock i s to b e f etch ed, w h i ch of ex i s ti ng b locks to r eplace i f th er e ar e mu lti ple ch oi ce) W r i te poli cy W h at h appens on a w r i te

Where Can Block Be Placed W h at i s a b lock d i v i d e memor y s pace i nto b locks as cach e i s d i v i d ed memory block is the basic unit to be cached Direct mapped cache there is only one place in the cache to b u f f er a g iv en memory b lock N - w ay set associativ e cache N places f or a g iv en memory b lock ike N direct map p ed caches op erating in p arallel educing miss rates w ith increased comp lex ity, cache access time, and p ow er consump tion F u lly associativ e cache a memory b lock can b e pu t anyw here in the cache

p et ssociative Cache E x amp le T w o-w ay set associativ e cache C ache index selects a set of tw o blocks T he tw o tag s in the set are comp ared to the inp ut in arallel Data is selected based on the tag comparison et associative or direct mapped Discuss later Valid Cache Tag Cache Data Cache Block 0 Cache Index Cache Data Cache Block 0 Cache Tag Valid dr Tag Compare el1 1 ux 0 el0 Compare it O Cache Block

ow to Find a Cached Block Direct mapped cach e th e stored tag f or th e cach e b lock match es th e in put tag F ully associative cach e an y of th e stored N tag s match es th e in put tag et associative cach e an y of th e stored K tag s f or th e cach e set match es th e in put tag C ach e h it laten cy is decided b y b oth tag comparison an d data access

W hich Block to ep lace Direct mapped cach e N ot an issue F or set associative or f ully associative* cach e andom elect candidate block s randomly f rom the cache set U east ecently U sed) eplace the block that has been u nu sed f or the longest time F I F O F irst I n, F irst O u t) eplace the oldest block U sually U perf orms th e b est, b ut h ard an d ex pen sive) to implemen t

What appens on Writes Where to write the data if the b l oc k is f ou n d in c ac he Write throu g h n ew data is written to b oth the c ac he b l oc k an d the l ower-l ev el m em ory el p to m ain tain c ac he c on s is ten c y Write b ac k n ew data is written on l y to the c ac he b l oc k ower-l ev el m em ory is u p dated when the b l oc k is rep l ac ed dirty b it is u s ed to in dic ate the n ec es s ity el p to redu c e m em ory traf f ic What hap p en s if the b l oc k is n ot f ou n d in c ac he Write al l oc ate F etc h the b l oc k in to c ac he, then write the data u s u al l y c om b in ed with write b ac k ) N o-write al l oc ate D o n ot f etc h the b l oc k in to c ac he u s u al l y c om b in ed with write throu g h)

eal Example lpha 21264 Caches 64KB 2-w a y a s s o c i a t i v e i n s t r u c t i o n c a c h e 64KB 2-w a y a s s o c i a t i v e d a t a c a c h e I- c a c h e D - c a c h e

! + - &. / lpha 21264 Data Cache D- c a c h e 6 4 K 2 - w a y a s s o c i a t i v e &' % " * ) -, ) ' ) - &' % 1 0 0 1 & )

Cache performance C a l c u l a t e a v e r a g e m e m o r y a c c e s s t i m e T ) T hit time + iss rate iss penalty E x a m p l e h i t t i m e 1 c y c l e, m i s s t i m e 1 0 0 c y c l e, miss rate 4%, than T 1+100*4% 5 Calculate cache impact on processor perf ormance CPU time CPU execution cycles + emory stall cycles) Cycle time CPU time IC CPI emory tall Cycles Instruction execution + CycleTime N o te c y c l es sp ent o n c ac he hit is u su al l y c o u nted into ex ec u tio n c y c l es I f clock cy cle is id entical, b etter T means b etter perf ormance

i i c l c i c c o c 1 / * 2 2 QP O K C B N QP O K C B X Example Evaluating plit Inst/Data Cache Unified v s p l it I ns t / da t a c a c h e a r v a r d r c h it ec t u r e) g h ef a p c ^_`a g4h ef a bdc n ml h ^_`a j[k g4h e f a n ml h ljk o g h ef a n ml h ljk g4h e f a E x a m p l e o n p a g e 4 0 6 / 4 0 7 "! --, * ) +* ) % &' %. ) /. 0/ W h ic h des ig n is b et t er ) ) 2 " -79 2 -- 2 8 6 576 4 3 E P EN C O F E IK F F E N IK F E D ; E E E N C O F ]\ N K IK F F ]\ N IK F E D YX[Z T UWV

Disadvantage of et ssociative Cache Compare n-w ay s et as s oc i at i v e w i t h d i rec t mapped c ac h e as n c omparat ors v s. 1 c omparat or as E x t ra U X d el ay f or t h e d at a D at a c omes af t er h i t / mi s s d ec i s i on and s et s el ec t i on In a direct mapped cache, cache block is available before hit/ miss decision U se the data assu ming the access is a hit, recover if Valid fou nd otherw ise Cache Tag Cache Data Cache Block 0 Cache Index Cache Data Cache Block 0 Cache Tag Valid dr Tag Compare el1 1 ux 0 el0 Compare it O Cache Block

Example Evaluating et ssociative Cache "! % & ' ) * + -,. / 0 1 ' 32, & 0 4 '65 1 ' )7, 4 8 9 8 * ; B B CED F% EI KF N O7P Q NE T Q3U V W X ; ;-Y D F F Q N I F N O7P Q E Z [ X Y \7] ; X ^ W _ Z` ; T _ Z` a; F P b _ Q F I F N KF N O P c dn b _ T _ Z` a; Q P b _ Q F F% Q N I F% N F N O P c e c b _ f g h g a C X X ; C; ^ _ Z b ; ji ; X U X W ; ] ; X B \;-k C

C V ^ YX W T YX T ; ; [ C YX W T T Y B \\ [ ] X T Evaluating Cache Performance for Outof- ord er Proces s ors ecall T h i t t i m e + m i s s r at e x m i s s p en alt y V er y d i f f i cu lt t o d ef i n e m i s s p en alt y t o f i t i n t h i s s i m p le m o d el, i n t h e co n t ex t o f O O O p r o ces s o r s ; B B \ C ; X ; X B] _ X X [ X ; B B \ C ; X X X _ X W X W e m ay as s u m e a cer t ai n p er cen t ag e o f o v er lap p i n g \ [; ; ; Y ; B B \ ji ; B b X i ; B B \ YX ; X ] ; a ; \ ^ C; [ X B ] C ach e h i t t i m e can als o b e o v er lap p ed \\ W ] ] b; Z _ a X C; X ; X W ;

imple Example C o n s i d er an O O O p r o ces s o r s i n t o t h e p r ev i o u s ex am p le s li d e 1 8 ) low clock 1.25x base cycle time) D ir ect map p ed cach e O v er lap p in g d eg r ee of 3 0 % v er ag e miss p en alty 7 0 % * 7 5n s 52.5n s T 1.0 x1.25 + 0.0 14 x52.5) 1.9 9 n s C P U time I C x2x1.0 x1.25+ 1.5x0.0 14 x52.5)) 3.6 0 xi C C omp ar e 3.58 f or in -or d er + d ir ect map p ed, 3.6 3 f or in - or d er + two-way associativ e T h is is on ly a simp lif ied examp le; id eal C P I cou ld be imp r ov ed by O O O execu tion