SOFTWARE. Computer Architecture Topics. Shared Memory, Message Passing, Data Parallelism. Network Interfaces. Interconnection Network

Similar documents
Instruction Execution

Revision MIPS Pipelined Architecture

CS152 Computer Architecture and Engineering Lecture 12. Introduction to Pipelining

Great Idea #4: Parallelism. CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Agenda. Review of Last Lecture

CS152 Computer Architecture and Engineering Lecture 12. Introduction to Pipelining

ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor

ECE4680 Computer Organization and Architecture. Hazards in a Pipeline Processor. Pipeline is good but you need be careful.

CS 61C: Great Ideas in Computer Architecture Control and Pipelining, Part II. Anything can be represented as a number, i.e., data or instrucwons

Agenda. Single Cycle Performance Assume >me for ac>ons are 100ps for register read or write; 200ps for other events. Review: Single- cycle Processor

Load Instr 1. Instr 2 Instr 3. Instr 4. Outline & Announcements. EEL-4713C Computer Architecture Pipelined Processor - Hazards

Recap: Microprogramming. Specialize state-diagrams easily captured by microsequencer simple increment & branch fields datapath control fields

CS420/520 Computer Architecture I

CENG 3420 Computer Organization and Design. Lecture 07: Pipeline Review. Bei Yu

Review. Agenda. Dynamic Pipeline Scheduling. Specula>on. Why Do Dynamic Scheduling? 11/7/10

ALU. Announcements. Lecture 9. Pipeline Hazards. Review: Single-cycle Datapath (load instruction) Review: Multi-cycle Datapath. R e g s.

ENGG 1203 Tutorial. Difference Equations. Find the Pole(s) Finding Equations and Poles

In Review: A Single Cycle Datapath We have everything! Now we just need to know how to BUILD CONTROL

P&H 4.51 Pipelined Control. 3. Control Hazards. Hazards. Stall => 2 Bubbles/Clocks Time (clock cycles) Control Hazard: Branching 4/15/14

CS61C Introduction to Pipelining. Lecture 25. April 28, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson)

Digitalteknik och Datorarkitektur 5hp

COMMUNITY LEGAL CLINIC OF YORK REGION 21 DUNLOP ST., SUITE 200 RICHMOND HILL, ON., L4C 2M6

CMP N 301 Computer Architecture. Appendix C

The tight-binding method

Instrumentation for Characterization of Nanomaterials (v11) 11. Crystal Potential

CS152 Computer Architecture and Engineering Lecture 12. Introduction to Pipelining

SUNWAY UNIVERSITY BUSINESS SCHOOL SAMPLE FINAL EXAMINATION FOR FIN 3024 INVESTMENT MANAGEMENT

Structural Hazard #1: Single Memory (1/2)! Structural Hazard #1: Single Memory (2/2)! Review! Pipelining is a BIG idea! Optimal Pipeline! !

COMPSCI 230 Discrete Math Trees March 21, / 22

A L A BA M A L A W R E V IE W

and integrated over all, the result is f ( 0) ] //Fourier transform ] //inverse Fourier transform

Anouncements. Conjugate Gradients. Steepest Descent. Outline. Steepest Descent. Steepest Descent

The Hydrogen Atom. Chapter 7

( ) ( ) ( ) 2011 HSC Mathematics Solutions ( 6) ( ) ( ) ( ) π π. αβ = = 2. α β αβ. Question 1. (iii) 1 1 β + (a) (4 sig. fig.

ELEC9721: Digital Signal Processing Theory and Applications

Galaxy Photometry. Recalling the relationship between flux and luminosity, Flux = brightness becomes

Today s topic 2 = Setting up the Hydrogen Atom problem. Schematic of Hydrogen Atom

Noise in electronic components.

1985 AP Calculus BC: Section I

ELG3150 Assignment 3

The angle between L and the z-axis is found from

P a g e 5 1 of R e p o r t P B 4 / 0 9

CS152 Computer Architecture and Engineering Lecture 12. Exceptions (continued) Introduction to Pipelining

COMP303 Computer Architecture Lecture 11. An Overview of Pipelining

Helping you learn to save. Pigby s tips and tricks

CDS 101: Lecture 5.1 Reachability and State Space Feedback

PLS-CADD DRAWING N IC TR EC EL L RA IVE ) R U AT H R ER 0. IDT FO P 9-1 W T OO -1 0 D EN C 0 E M ER C 3 FIN SE W SE DE EA PO /4 O 1 AY D E ) (N W AN N

Ch. 6 Free Electron Fermi Gas

SAFE OPERATION OF TUBULAR (PFR) ADIABATIC REACTORS. FIGURE 1: Temperature as a function of space time in an adiabatic PFR with exothermic reaction.

Homework 1: Solutions

Periodic Structures. Filter Design by the Image Parameter Method

Lecture 7 Diffusion. Our fluid equations that we developed before are: v t v mn t

Assignment 1 - Solutions. ECSE 420 Parallel Computing Fall November 2, 2014

Probability & Statistics,

Dielectric Waveguide 1

Relation between wavefunctions and vectors: In the previous lecture we noted that:

Control Systems. Lecture 8 Root Locus. Root Locus. Plant. Controller. Sensor

Mid Year Examination F.4 Mathematics Module 1 (Calculus & Statistics) Suggested Solutions

ICS 233 Computer Architecture & Assembly Language

Fourier transforms (Chapter 15) Fourier integrals are generalizations of Fourier series. The series representation

What Makes Production System Design Hard?

CDS 101: Lecture 5.1 Reachability and State Space Feedback

Review Exercises. 1. Evaluate using the definition of the definite integral as a Riemann Sum. Does the answer represent an area? 2

+ x. x 2x. 12. dx. 24. dx + 1)

Stanford University Medical Center

Load Equations. So let s look at a single machine connected to an infinite bus, as illustrated in Fig. 1 below.

How!do!humans!combine!sounds!into!an! infinite!number!of!utterances? How!do!they!use!these!utterances!!to! communicate!and!express!meaning?

Creative Office / R&D Space

Handout 30. Optical Processes in Solids and the Dielectric Constant

Central County Fire & Rescue - Station #5

Chapter 11 Solutions ( ) 1. The wavelength of the peak is. 2. The temperature is found with. 3. The power is. 4. a) The power is

VICTORIA AVE. Chip pawa- Gra ss Isl and Pool. Ice Dam Niagara Falls WTP and Intake. Chippawa. Cree

Lesson 5. Chapter 7. Wiener Filters. Bengt Mandersson. r k s r x LTH. September Prediction Error Filter PEF (second order) from chapter 4

School of Electrical Engineering. Lecture 2: Wire Antennas

MONTGOMERY COLLEGE Department of Mathematics Rockville Campus. 6x dx a. b. cos 2x dx ( ) 7. arctan x dx e. cos 2x dx. 2 cos3x dx

Lecture 2: Frequency domain analysis, Phasors. Announcements

New Advanced Higher Mathematics: Formulae

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instruc(on Level Parallelism: Mul(ple Instruc(on Issue

Negative Exponent a n = 1 a n, where a 0. Power of a Power Property ( a m ) n = a mn. Rational Exponents =

Lecture 24: Observability and Constructibility

GRAPHS IN SCIENCE. drawn correctly, the. other is not. Which. Best Fit Line # one is which?

20. CONFIDENCE INTERVALS FOR THE MEAN, UNKNOWN VARIANCE

Last time: Completed solution to the optimum linear filter in real-time operation

I M P O R T A N T S A F E T Y I N S T R U C T I O N S W h e n u s i n g t h i s e l e c t r o n i c d e v i c e, b a s i c p r e c a u t i o n s s h o

VICTORIA AVE. Chippawa. Cree

Accurate modeling of Quantum-Dot based Multi Tunnel Junction Memory : Optimization and process dispersion analyses for DRAM applications

CDS 101: Lecture 9.1 PID and Root Locus

Robust Control Toolbox for Time Delay Systems with Time Delay in Numerator and Denominator

PURE MATHEMATICS A-LEVEL PAPER 1

Kinetics. Central Force Motion & Space Mechanics

The Exile Began. Family Journal Page. God Called Jeremiah Jeremiah 1. Preschool. below. Tell. them too. Kids. Ke Passage: Ezekiel 37:27

2012 GCE A Level H2 Maths Solution Paper Let x,

CDS 101: Lecture 7.1 Loop Analysis of Feedback Systems

H2 Mathematics Arithmetic & Geometric Series ( )

Problem Value Score Earned No/Wrong Rec -3 Total

In the name of Allah Proton Electromagnetic Form Factors

P a g e 3 6 of R e p o r t P B 4 / 0 9

176 5 t h Fl oo r. 337 P o ly me r Ma te ri al s

TEST 1 REVIEW. Lectures 1-5

Winnie flies again. Winnie s Song. hat. A big tall hat Ten long toes A black magic wand A long red nose. nose. She s Winnie Winnie the Witch.

The Pigeonhole Principle 3.4 Binomial Coefficients

Transcription:

Lctu 1: Cot/Pfomac, DLX, Pipliig, Cach, Bach Pictio Pof. F Chog ECS 250A Comput Achitctu Wit 1999 Comput Achitctu I th attibut of a [computig] ytm a by th pogamm, i.., th cocptual tuctu a fuctioal bhavio, a itict fom th ogaizatio of th ata flow a cotol th logic ig, a th phyical implmtatio. Amahl, Blaaw, a Book, 1964 SFTWARE Comput Achitctu Chagig Dfiitio 1950 to 1960: Comput Achitctu Cou Comput Aithmtic 1970 to mi 1980: Comput Achitctu Cou Ituctio St Dig, pcially ISA appopiat fo compil 1990: Comput Achitctu Cou Dig of CPU, mmoy ytm, I/ ytm, Multipoco (Sli ba upo Patto UCB CS252 Spig 1998) FTC.W99 1 FTC.W99 2 FTC.W99 3 Comput Achitctu Topic Comput Achitctu Topic ECS 250A Cou Focu Iput/utput a Stoag Mmoy Hiachy VLSI DRAM L2 Cach L1 Cach Ituctio St Achitctu Dik, WRM, Tap Pipliig, Haza Rolutio, Supcala, Roig, Pictio, Spculatio, Vcto, DSP Emgig Tchologi Itlavig Bu potocol Cohc, Bawith, Latcy Aig, Potctio, Excptio Halig RAID Pipliig a Ituctio Lvl Paalllim FTC.W99 4 P M S P M P M P M Itcoctio Ntwok Poco-Mmoy-Switch Multipoco Ntwok a Itcoctio Sha Mmoy, Mag Paig, Data Paalllim Ntwok Itfac Topologi, Routig, Bawith, Latcy, Rliability FTC.W99 5 Utaig th ig tchiqu, machi tuctu, tchology facto, valuatio mtho that will tmi th fom of comput i 21t Ctuy Applicatio patig Sytm Tchology Paalllim Comput Achitctu: Ituctio St Dig gaizatio Hawa Maumt & Evaluatio Pogammig Laguag Itfac Dig (ISA) Hitoy FTC.W99 6 Pag 1

Topic Covag Txtbook: Hy a Patto, Comput Achitctu: A Quatitativ Appoach, 2 E., 1996. Pfomac/Cot, DLX, Pipliig, Cach, Bach Pictio ILP, Loop Uollig, Scoboaig, Tomaulo, Dyamic Bach Pictio Tac Schulig, Spculatio Vcto Poco, DSP Mmoy Hiachy I/ Itcoctio Ntwok Multipoco Itucto: F Chog ECS250A: Staff ffic: EUII-3031 chog@c ffic Hou: Mo 4-6pm o by appt. T. A: Diaa K Cla: Txt: ffic: EUII-2239 k@c TA ffic Hou: Fi 1-3pm Mo 6:10-9pm Comput Achitctu: A Quatitativ Appoach, Sco Eitio (1996) Wb pag: http://ach.c.ucavi.u/~chog/250a/ Lctu availabl oli bfo 1PM ay of lctu Nwgoup: uc.cla.c250a{.} Gaig Poblm St 35% 1 I-cla xam (plim imulatio) 20% Pojct Popoal a Daft 10% Pojct Fial Rpot 25% Pojct Pot Sio (CS colloquium) 10% FTC.W99 7 FTC.W99 8 FTC.W99 9 VLSI Taito CMS Ivt CMS NAND Gat A A G B G I ut I ut A C B A B C B FTC.W99 10 FTC.W99 11 FTC.W99 12 Pag 2

Itgat Cicuit Cot IC cot = Di cot + Ttig cot + Packagig cot Fial tt yil Di cot = Waf cot Di p Waf * Di yil Di p waf = š * ( Waf_iam / 2) 2 š * Waf_iam Tt i Di Aa 2 * Di Aa Di Yil = Waf yil * 1 + { α Dfct_p_uit_aa * Di_Aa } α Di Cot go oughly with i aa 4 FTC.W99 13 Ral Wol Exampl Chip Mtal Li Waf Dfct Aa Di/ Yil Di Cot lay with cot /cm 2 mm 2 waf 386DX 2 0.90 $900 1.0 43 360 71% $4 486DX2 3 0.80 $1200 1.0 81 181 54% $12 PowPC 601 4 0.80 $1700 1.3 121 115 28% $53 HP PA 7100 3 0.80 $1300 1.0 196 66 27% $73 DEC Alpha 3 0.70 $1500 1.2 234 53 19% $149 SupSPARC 3 0.70 $1700 1.6 256 48 13% $272 Ptium 3 0.80 $1500 1.5 296 40 9% $417 Fom "Etimatig IC Maufactuig Cot, by Lily Gwap, Micopoco Rpot, Augut 2, 1993, p. 15 FTC.W99 14 Cot/Pfomac What i Rlatiohip of Cot to Pic? Compot Cot Dict Cot (a 25% to 40%) cuig cot: labo, puchaig, cap, waaty Go Magi (a 82% to 186%) ocuig cot: R&D, maktig, al, quipmt maitac, tal, fiacig cot, ptax pofit, tax Avag Dicout to gt Lit Pic (a 33% to 66%): volum icout a/o tail makup Lit Pic Avg. Sllig Pic Avag Dicout Go Magi Dict Cot Compot Cot 25% to 40% 34% to 39% 6% to 8% 15% to 33% FTC.W99 15 Chip Pic (Augut 1993) Aum pucha 10,000 uit Summay: Pic v. Cot 100% 80% Avag Dicout 100000000 Tchology T: Micopoco Capacity Chip Aa Mfg. Pic Multi- Commt mm 2 cot pli 386DX 43 $9 $31 3.4 It Comptitio 486DX2 81 $35 $245 7.0 No Comptitio PowPC 601 121 $77 $280 3.6 DEC Alpha 234 $202 $1231 6.1 Rcoup R&D? Ptium 296 $473 $965 2.0 Ealy i hipmt FTC.W99 16 5 4 3 2 1 0 60% 40% 20% 0% Mii W/S PC 4.7 3.8 3.5 2.5 1.8 1.5 Mii W/S PC Go Magi Dict Cot Compot Cot Avag Dicout Go Magi Dict Cot Compot Cot FTC.W99 17 10000000 1000000 100000 10000 1000 i4004 Moo Law i8080 i8086 i80286 1970 1975 1980 1985 1990 1995 2000 Ya i80386 i80486 Ptium Alpha 21264: 15 millio Ptium Po: 5.5 millio PowPC 620: 6.9 millio Alpha 21164: 9.3 millio Spac Ulta: 5.2 millio CMS impovmt: Di iz: 2X vy 3 y Li with: halv / 7 y FTC.W99 18 Pag 3

Mmoy Capacity (Sigl Chip DRAM) iz Tchology T (Summay) 1000 Poco Pfomac T 1000000000 100000000 10000000 1000000 100000 10000 1000 1970 1975 1980 1985 1990 1995 2000 Ya ya iz(mb) cyc tim 1980 0.0625 250 1983 0.25 220 1986 1 190 1989 4 165 1992 16 145 1996 64 120 2000 256 100 Capacity Sp (latcy) Logic 2x i 3 ya 2x i 3 ya DRAM 4x i 3 ya 2x i 10 ya Dik 4x i 3 ya 2x i 10 ya 100 10 1 0.1 Supcomput Miicomput Micopoco Maifam 1965 1970 1975 1980 1985 1990 1995 2000 FTC.W99 19 FTC.W99 20 Ya FTC.W99 21 Poco Pfomac (1.35X bfo, 1.55X ow) Pfomac T (Summay) Comput Egiig Mthoology 1200 1000 1.54X/y DEC Alpha 21264/600 Woktatio pfomac (mau i Spc Mak) impov oughly 50% p ya (2X vy 18 moth) 800 600 DEC Alpha 5/500 400 DEC DEC Alpha 5/300 IBM HP 200 Su MIPS MIPS AXP/ 9000/ -4/ M M/ RS/ 500 DEC Alpha 4/266 6000 750 260 2000 120 IBM PWER 100 0 87 88 89 90 91 92 93 94 95 96 97 FTC.W99 22 Impovmt i cot pfomac timat at 70% p ya FTC.W99 23 Tchology T FTC.W99 24 Pag 4

Comput Egiig Mthoology Comput Egiig Mthoology Comput Egiig Mthoology Evaluat Exitig Sytm fo Bottlck Evaluat Exitig Sytm fo Bottlck Implmtatio Complxity Evaluat Exitig Sytm fo Bottlck Tchology T Bchmak Bchmak Tchology T Simulat Nw Dig a gaizatio Bchmak Tchology T Implmt Nxt Simulat Nw Gatio Sytm Dig a gaizatio FTC.W99 25 Wokloa FTC.W99 26 Wokloa FTC.W99 27 Maumt Tool Bchmak, Tac, Mix Hawa: Cot, lay, aa, pow timatio Simulatio (may lvl) ISA, RT, Gat, Cicuit Quuig Thoy Rul of Thumb Fuamtal Law /Picipl Pla Boig 747 BAD/Su Coco Th Bottom Li: Pfomac (a Cot) DC to Pai 6.5 hou 3 hou Sp 610 mph 1350 mph Pag Tim to u th tak (ExTim) Excutio tim, po tim, latcy Tak p ay, hou, wk, c, (Pfomac) Thoughput, bawith 470 132 Thoughput (pmph) 286,700 178,200 Th Bottom Li: Pfomac (a Cot) "X i tim fat tha Y" ma ExTim(Y) Pfomac(X) --------- = --------------- ExTim(X) Pfomac(Y) Sp of Coco v. Boig 747 Thoughput of Boig 747 v. Coco FTC.W99 28 FTC.W99 29 FTC.W99 30 Pag 5

Amahl' Law Spup u to hacmt E: ExTim w/o E Pfomac w/ E Spup(E) = ------------- = ------------------- ExTim w/ E Pfomac w/o E Amahl Law ExTim w = ExTim ol x (1 - Factio hac ) + Factio hac Spup hac Amahl Law Floatig poit ituctio impov to u 2X; but oly 10% of actual ituctio a FP Suppo that hacmt E acclat a factio F of th tak by a facto S, a th mai of th tak i uaffct Spup ovall = ExTim 1 ol = (1 - Factio ExTim hac ) + Factio hac w Spup hac ExTim w = Spup ovall = FTC.W99 31 FTC.W99 32 FTC.W99 33 Amahl Law Floatig poit ituctio impov to u 2X; but oly 10% of actual ituctio a FP ExTim w = ExTim ol x (0.9 +.1/2) = 0.95 x ExTim ol Spup ovall = 1 0.95 = 1.053 Mtic of Pfomac Applicatio Pogammig Laguag Compil ISA Datapath Cotol Fuctio Uit Taito Wi Pi Aw p moth patio p co (millio) of Ituctio p co: MIPS (millio) of (FP) opatio p co: MFLP/ Mgabyt p co Cycl p co (clock at) Apct of CPU Pfomac CPU CPU tim tim = Sco = Ituctio x Cycl x Sco Pogam Pogam Ituctio Cycl It Cout CPI Clock Rat Pogam X Compil X (X) It. St. X X gaizatio X X Tchology X FTC.W99 34 FTC.W99 35 FTC.W99 36 Pag 6

Cycl P Ituctio Avag Cycl p Ituctio CPI = (CPU Tim * Clock Rat) / Ituctio Cout = Cycl / Ituctio Cout CPU tim = CyclTim * Σ CPI i * I i i = 1 Ituctio Fqucy CPI = Σ CPI i * F i wh F i = I i i = 1 Ituctio Cout Ivt Rouc wh tim i Spt! FTC.W99 37 Exampl: Calculatig CPI Ba Machi (Rg / Rg) p Fq Cycl CPI(i) (% Tim) ALU 50% 1.5 (33%) Loa 20% 2.4 (27%) Sto 10% 2.2 (13%) Bach 20% 2.4 (27%) 1.5 Typical Mix FTC.W99 38 SPEC: Sytm Pfomac Evaluatio Coopativ Fit Rou 1989 10 pogam yilig a igl umb ( SPECmak ) Sco Rou 1992 SPECIt92 (6 itg pogam) a SPECfp92 (14 floatig poit pogam)» Compil Flag ulimit. Mach 93 of DEC 4000 Mol 610: pic: uix.c:/f=(yv,ha_bcopy, bcopy(a,b,c)= mmcpy(b,a,c) wav5: /ali=(all,com=at)/ag=a/u=4/u=200 aa7: /ocu/ag=a/u=4/u2=200/lc=bla Thi Rou 1995 w t of pogam: SPECit95 (8 itg pogam) a SPECfp95 (10 floatig poit) bchmak uful fo 3 ya Sigl flag ttig fo all pogam: SPECit_ba95, SPECfp_ba95 FTC.W99 39 How to Summaiz Pfomac Aithmtic ma (wight aithmtic ma) tack xcutio tim: Σ(T i )/ o Σ(W i *T i ) Hamoic ma (wight hamoic ma) of at (.g., MFLPS) tack xcutio tim: / Σ(1/R i ) o / Σ(W i /R i ) Nomaliz xcutio tim i hay fo calig pfomac (.g., X tim fat tha SPARCtatio 10) But o ot tak th aithmtic ma of omaliz xcutio tim, u th gomtic: (Π x i )^1/) FTC.W99 40 SPEC Fit Rou pogam: 99% of tim i igl li of co Nw fot- compil coul impov amatically 800 700 600 500 400 300 200 100 0 gcc po li pic ouc aa7 Bchmak qtott matix300 fpppp tomcatv FTC.W99 41 Impact of Ma o SPECmak89 fo IBM 550 Ratio to VAX: Tim: Wight Tim: Pogam Bfo Aft Bfo Aft Bfo Aft gcc 30 29 49 51 8.91 9.22 po 35 34 65 67 7.64 7.86 pic 47 47 510 510 5.69 5.69 ouc 46 49 41 38 5.81 5.45 aa7 78 144 258 140 3.43 1.86 li 34 34 183 183 7.86 7.86 qtott 40 40 28 28 6.68 6.68 matix300 78 730 58 6 3.43 0.37 fpppp 90 87 34 35 2.97 3.07 tomcatv 33 138 20 19 2.01 1.94 Ma 54 72 124 108 54.42 49.99 Gomtic Aithmtic Wight Aith. Ratio 1.33 Ratio 1.16 Ratio 1.09 FTC.W99 42 Pag 7

Pfomac Evaluatio Fo btt o wo, bchmak hap a fil Goo pouct cat wh hav: Goo bchmak Goo way to ummaiz pfomac Giv al i a fuctio i pat of pfomac lativ to comptitio, ivtmt i impovig pouct a pot by pfomac ummay If bchmak/ummay iaquat, th choo btw impovig pouct fo al pogam v. impovig pouct to gt mo al; Sal almot alway wi! Excutio tim i th mau of comput Ituctio St Achitctu (ISA) oftwa ituctio t hawa A goo itfac: Itfac Dig Lat though may implmtatio (potability, compatibility) I u i may iffy way (gality) Povi covit fuctioality to high lvl Pmit a fficit implmtatio at low lvl u u u Itfac imp 1 imp 2 imp 3 tim pfomac! FTC.W99 43 FTC.W99 44 FTC.W99 45 Evolutio of Ituctio St Sigl Accumulato (EDSAC 1950) Accumulato + Ix Rgit (Macht Mak I, IBM 700 i 1953) Spaatio of Pogammig Mol fom Implmtatio High-lvl Laguag Ba Cocpt of a Family (B5000 1963) (IBM 360 1964) Gal Pupo Rgit Machi Complx Ituctio St Loa/Sto Achitctu (Vax, Itl 432 1977-80) (CDC 6600, Cay 1 1963-76) RISC (Mip,Spac,HP-PA,IBM RS6000,...1987) FTC.W99 46 Evolutio of Ituctio St Majo avac i comput achitctu a typically aociat with lamak ituctio t ig Ex: Stack v GPR (Sytm 360) Dig ciio mut tak ito accout: tchology machi ogaizatio pogammig laguag compil tchology opatig ytm A thy i tu ifluc th FTC.W99 47 A "Typical" RISC 32-bit fix fomat ituctio (3 fomat) 32 32-bit GPR (R0 cotai zo, DP tak pai) 3-a, g-g aithmtic ituctio Sigl a mo fo loa/to: ba + iplacmt o iictio Simpl bach coitio Dlay bach : SPARC, MIPS, HP PA-Ric, DEC Alpha, IBM PowPC, CDC 6600, CDC 7600, Cay-1, Cay-2, Cay-3 FTC.W99 48 Pag 8

Rgit-Rgit p Exampl: MIPS 31 26 25 21 20 16 15 11 10 6 5 0 R1 Rgit-Immiat R2 31 26 25 21 20 16 15 0 p R1 R immiat Bach 31 26 25 0 p tagt R px 31 26 25 21 20 16 15 0 p R1 R2/px immiat Jump / Call FTC.W99 49 Summay, #1 Digig to Lat though T Capacity Sp Logic 2x i 3 ya 2x i 3 ya DRAM 4x i 3 ya 2x i 10 ya Dik 4x i 3 ya 2x i 10 ya 6y to gauat => 16X CPU p, DRAM/Dik iz Tim to u th tak Excutio tim, po tim, latcy Tak p ay, hou, wk, c,, Thoughput, bawith X i tim fat tha Y ma ExTim(Y) Pfomac(X) --------- = -------------- ExTim(X) Pfomac(Y) FTC.W99 50 Summay, #2 Amahl Law: Spup ovall = ExTim 1 ol = ExTim (1 - Factio hac ) + Factio hac w CPI Law: Spup hac CPU CPU tim tim = Sco = Ituctio x Cycl x Sco Pogam Pogam Ituctio Cycl Excutio tim i th REAL mau of comput pfomac! Goo pouct cat wh hav: Goo bchmak, goo way to ummaiz pfomac Di Cot go oughly with i aa 4 Ca PC iuty uppot giig/ach ivtmt? FTC.W99 51 Pipliig: It Natual! Lauy Exampl A, Bia, Cathy, Dav ach hav o loa of cloth to wah, y, a fol Wah tak 30 miut Dy tak 40 miut Fol tak 20 miut A B C D FTC.W99 52 T a k A B C D Squtial Lauy 6 PM 7 8 9 10 11 Miight Tim 30 40 20 30 40 20 30 40 20 30 40 20 Squtial lauy tak 6 hou fo 4 loa If thy la pipliig, how log woul lauy tak? FTC.W99 53 T a k A B C D Pipli Lauy Stat wok ASAP 6 PM 7 8 9 10 11 Miight Tim 30 40 40 40 40 20 Pipli lauy tak 3.5 hou fo 4 loa FTC.W99 54 Pag 9

T a k A B C D Pipliig Lo 6 PM 7 8 9 Tim 30 40 40 40 40 20 Pipliig o t hlp latcy of igl tak, it hlp thoughput of ti wokloa Pipli at limit by lowt pipli tag Multipl tak opatig imultaouly Pottial pup = Numb pip tag Ubalac lgth of pip tag uc pup Tim to fill pipli a tim to ai it uc pup FTC.W99 55 Comput Pipli Excut billio of ituctio, o thoughput i what matt DLX iabl fatu: all ituctio am lgth, git locat i am plac i ituctio fomat, mmoy opa oly i loa o to FTC.W99 56 Ituctio Ftch 5 Stp of DLX Datapath Figu 3.1, Pag 130 IR It. Dco Rg. Ftch Excut A. Calc Mmoy Acc L M D Wit Back FTC.W99 57 Ituctio Ftch Pipli DLX Datapath Figu 3.4, pag 137 It. Dco Rg. Ftch Excut A. Calc. Data tatioay cotol local co fo ach ituctio pha / pipli tag Mmoy Acc FTC.W99 58 Wit Back I t. Viualizig Pipliig Figu 3.3, Pag 133 Tim (clock cycl) FTC.W99 59 It Not That Eay fo Comput Limit to pipliig: Haza pvt xt ituctio fom xcutig uig it igat clock cycl Stuctual haza: HW caot uppot thi combiatio of ituctio (igl po to fol a put cloth away) Data haza: Ituctio p o ult of pio ituctio till i th pipli (miig ock) Cotol haza: Pipliig of bach & oth ituctiotall th pipli util th hazabubbl i th pipli FTC.W99 60 Pag 10

Mmoy Pot/Stuctual Haza Figu 3.6, Pag 142 I t. Loa It 1 It 2 It 3 It 4 Tim (clock cycl) FTC.W99 61 I t. Mmoy Pot/Stuctual Haza Loa It 1 It 2 tall It 3 Figu 3.7, Pag 143 Tim (clock cycl) FTC.W99 62 Sp Up Equatio fo Pipliig CPI pipli = Ial CPI + Pipli tall clock cycl p it Spup = Ial CPI x Pipli pth Clock Cycl x upipli Ial CPI + Pipli tall CPI Clock Cycl pipli Spup = Pipli pth Clock Cycl x upipli 1 + Pipli tall CPI Clock Cycl pipli FTC.W99 63 Exampl: Dual-pot v. Sigl-pot Machi A: Dual pot mmoy Machi B: Sigl pot mmoy, but it pipli implmtatio ha a 1.05 tim fat clock at Ial CPI = 1 fo both Loa a 40% of ituctio xcut Data Haza o R1 Figu 3.9, pag 147 SpUp A = Pipli Dpth/(1 + 0) x (clock upip /clock pip ) = Pipli Dpth SpUp B = Pipli Dpth/(1 + 0.4 x 1) a 6,1,7 x (clock upip /(clock upip / 1.05) = (Pipli Dpth/1.4) x 1.05 = 0.75 x Pipli Dpth o 8,1,9 SpUp A / SpUp B = Pipli Dpth/(0.75 x Pipli Dpth) = 1.33 Machi A i 1.33 tim fat FTC.W99 64 xo 10,1,11 FTC.W99 65 I t. Tim (clock cycl) a 1,2,3 ub 4,1,3 IF ID/RF EX MEM WB Th Gic Data Haza It I follow by It J Ra Aft Wit (RAW) It J ti to a opa bfo It I wit it FTC.W99 66 Pag 11

Th Gic Data Haza It I follow by It J Th Gic Data Haza It I follow by It J Fowaig to Avoi Data Haza Figu 3.10, Pag 149 Tim (clock cycl) Wit Aft Ra (WAR) It J ti to wit opa bfo It I a i Gt wog opa Ca t happ i DLX 5 tag pipli bcau: All ituctio tak 5 tag, a Ra a alway i tag 2, a Wit a alway i tag 5 FTC.W99 67 Wit Aft Wit (WAW) It J ti to wit opa bfo It I wit it Lav wog ult ( It I ot It J ) Ca t happ i DLX 5 tag pipli bcau: All ituctio tak 5 tag, a Wit a alway i tag 5 Will WAR a WAW i lat mo complicat pip FTC.W99 68 I t. a 1,2,3 ub 4,1,3 a 6,1,7 o 8,1,9 xo 10,1,11 FTC.W99 69 HW Chag fo Fowaig Figu 3.20, Pag 161 I t. Data Haza Ev with Fowaig Figu 3.12, Pag 153 Tim (clock cycl) lw 1, 0(2) ub 4,1,6 a 6,1,7 o 8,1,9 I t. Data Haza Ev with Fowaig Figu 3.13, Pag 154 Tim (clock cycl) lw 1, 0(2) ub 4,1,6 a 6,1,7 o 8,1,9 FTC.W99 70 FTC.W99 71 FTC.W99 72 Pag 12

Softwa Schulig to Avoi Loa Haza Ty poucig fat co fo a = b + c; = f; aumig a, b, c,,, a f i mmoy. Slow co: LW LW ADD SW LW LW SUB SW Rb,b Rc,c Ra,Rb,Rc a,ra R, Rf,f R,R,Rf,R Cotol Haza o Bach Th Stag Stall Fat co: LW Rb,b LW Rc,c LW R, ADD Ra,Rb,Rc LW Rf,f SW a,ra SUB R,R,Rf SW,R FTC.W99 74 FTC.W99 73 Bach Stall Impact If CPI = 1, 30% bach, Stall 3 cycl => w CPI = 1.9! Two pat olutio: Dtmi bach tak o ot oo, AND Comput tak bach a ali DLX bach tt if git = 0 o 0 DLX Solutio: Mov Zo tt to ID/RF tag A to calculat w PC i ID/RF tag 1 clock cycl palty fo bach vu 3 FTC.W99 75 Pipli DLX Datapath Figu 3.22, pag 163 Fou Bach Haza Altativ Fou Bach Haza Altativ Ituctio Ftch It. Dco Rg. Ftch Excut A. Calc. Mmoy Acc Thi i th coct 1 cycl latcy implmtatio! Wit Back #1: Stall util bach ictio i cla #2: Pict Bach Not Tak Excut ucco ituctio i quc Squah ituctio i pipli if bach actually tak Avatag of lat pipli tat upat 47% DLX bach ot tak o avag PC+4 alay calculat, o u it to gt xt ituctio #3: Pict Bach Tak 53% DLX bach tak o avag But hav t calculat bach tagt a i DLX» DLX till icu 1 cycl bach palty» th machi: bach tagt kow bfo outcom #4: Dlay Bach Dfi bach to tak plac AFTER a followig ituctio bach ituctio qutial ucco 1 qutial ucco 2... qutial ucco bach tagt if tak Bach lay of lgth 1 lot lay allow pop ciio a bach tagt a i 5 tag pipli DLX u thi FTC.W99 76 FTC.W99 77 FTC.W99 78 Pag 13

Dlay Bach Wh to gt ituctio to fill bach lay lot? Bfo bach ituctio Fom th tagt a: oly valuabl wh bach tak Fom fall though: oly valuabl wh bach ot tak Cacllig bach allow mo lot to b fill Compil ffctiv fo igl bach lay lot: Fill about 60% of bach lay lot About 80% of ituctio xcut i bach lay lot uful i computatio About 50% (60% x 80%) of lot ufully fill Dlay Bach owi: 7-8 tag pipli, multipl ituctio iu p clock (upcala) FTC.W99 79 Evaluatig Bach Altativ Pipli pup = Pipli pth 1 +Bach fqucy Bach palty Schulig Bach CPI pup v. pup v. chm palty upipli tall Stall pipli 3 1.42 3.5 1.0 Pict tak 1 1.14 4.4 1.26 Pict ot tak 1 1.09 4.5 1.29 Dlay bach 0.5 1.07 4.6 1.31 Coitioal & Ucoitioal = 14%, 65% chag PC FTC.W99 80 Pipliig Summay Jut ovlap tak, a ay if tak a ipt Sp Up Š Pipli Dpth; if ial CPI i 1, th: Spup = Pipli Dpth 1 + Pipli tall CPI Clock Cycl Upipli X Clock Cycl Pipli Haza limit pfomac o comput: Stuctual: mo HW ouc Data (RAW,WAR,WAW): fowaig, compil chulig Cotol: lay bach, pictio FTC.W99 81 Pag 14