- PDF Free Download

Size: px

Start display at page:

Download ""

Hugo Davis
5 years ago
Views:

1 CS 52 Computer rchitecture and Engineering Lecture 4 - Pipelining Krste sanovic Electrical Engineering and Computer Sciences University of California at Berkeley Last =me in Lecture 3 Microcoding became less aeracfve as gap beten RM and ROM speeds reduced Complex instrucfon sets difficult to pipeline, so difficult to increase performance as gate count grew Load- Store RISC ISs designed for efficient pipelined implementafons Very similar to verfcal microcode Inspired by earlier Cray machines (more on these later) Iron Law explains architecture design space Trade instrucfons/program, cycles/instrucfon, and Fme/cycle 2 CS252 S05

2 n Ideal Pipeline stage stage 2 stage 3 stage 4 ll objects go through the same stages No sharing of resources beten any two stages PropagaFon delay through all pipeline stages is equal The scheduling of an object entering the pipeline is not affected by the objects in other stages These condi+ons generally hold for industrial assembly lines, but instruc+ons depend on each other! 3 To pipeline RISC- V: Pipelined RISC- V First build RISC- V without pipelining with CPI= Next, add pipeline registers to reduce cycle Fme while maintaining CPI= 4 CS252 S05 2

3 Lecture 3: Unpipelined path for RISC- V Sel br rind jabs pc+4 RegWriteEn MemWrite WBSel clk clk inst. rs rd Control Br Logic Bcomp? clk OpCode WSel Sel FuncSel Op2Sel 5 Lecture 3: Hardwired Control Table Opcode Sel Op2Sel FuncSel MemWr RFWen WBSel WSel Sel i LW SW BEQ true BEQ false J JL JLR * Reg Func no yes rd pc+4 IType 2 Op no yes rd pc+4 IType 2 + no yes Mem rd pc+4 BsType 2 + yes no * * pc+4 BrType 2 * * no no * * BrType 2 * * no no * * * * * no no * * pc+4 jabs * * * no yes X jabs * * * no yes rd rind br Op2Sel= Reg / WBSel = / Mem / WSel = rd / X Sel = pc+4 / br / rind / jabs 6 CS252 S05 3

4 Pipelined path. rs rd fetch phase decode & Reg- fetch phase Clock period can be reduced by dividing the execufon of an instrucfon into mulfple cycles t C > max {t IM, t RF, t, t DM, t RW } ( = t DM probably) Hover, CPI will increase unless instruc+ons are pipelined execute phase memory phase write - back phase 7 Iron Law of Processor Performance Time = rucfons Cycles Time Program Program * rucfon * Cycle rucfons per program depends on source code, compiler technology, and IS Cycles per instrucfons (CPI) depends on IS and µarchitecture Time per cycle depends upon the µarchitecture and base technology Lecture 2 Lecture 3 Lecture 4 Microarchitecture CPI cycle Fme Microcoded > short Single- cycle unpipelined long Pipelined short 8 CS252 S05 4

5 CPI Examples Microcoded machine 7 cycles 5 cycles 0 cycles 2 3 Time Unpipelined machine Pipelined machine 3 instrucfons, 22 cycles, CPI= instrucfons, 3 cycles, CPI= 3 instrucfons, 3 cycles, CPI= 5- stage pipeline CPI 5!!! 9 Technology ssump=ons small amount of very fast memory (caches) backed up by a large, slor memory Fast (at least for integers) MulFported Register files (slor!) Thus, the following Fming assumpfon is reasonable t IM t RF t t DM t RW 5- stage pipeline will be focus of our detailed design - some commercial designs have over 30 pipeline stages to do an integer add! 0 CS252 S05 5

6 . I- Fetch (IF) 5- Stage Pipelined Execu=on rs rd Decode, Reg. Fetch (ID) Execute (EX) (M) +me t0 t t2 t3 t4 t5 t6 t7.... instrucfon IF ID EX M WB instrucfon2 IF 2 ID 2 EX 2 M 2 WB 2 instrucfon3 IF 3 ID 3 EX 3 M 3 WB 3 instrucfon4 IF 4 ID 4 EX 4 M 4 WB 4 instrucfon5 IF 5 ID 5 EX 5 M 5 WB 5 Write - Back (WB). I- Fetch (IF) Resources 5- Stage Pipelined Execu=on Resource Usage Diagram rs rd ws Decode, Reg. Fetch (ID) +me t0 t t2 t3 t4 t5 t6 t7.... IF I I 2 I 3 I 4 I 5 ID I I 2 I 3 I 4 I 5 EX I I 2 I 3 I 4 I 5 M I I 2 I 3 I 4 I 5 WB I I 2 I 3 I 4 I 5 Execute (EX) (M) Write - Back (WB) 2 CS252 S05 6

7 Pipelined Execu=on: ruc=ons inst rs rd B R MD MD2 Not quite correct! We need an ruc+on Reg () for each stage 3 Pipelined RISC- V path without jumps F D E M W inst rs rd RegWriteEn B MD FuncSel MD2 WSel MemWrite WBSel R Sel Op2Sel Control Points Need to Be Connected 4 CS252 S05 7

8 ruc=ons interact with each other in pipeline n instrucfon in the pipeline may need a resource being used by another instrucfon in the pipeline à structural hazard n instrucfon may depend on something produced by an earlier instrucfon Dependence may be for a data value à data hazard Dependence may be for the next instrucfon s ess à control hazard (branches, excep+ons) 5 Resolving Structural Hazards Structural hazard occurs when two instrucfons need same hardre resource at same Fme Can resolve in hardre by stalling ner instrucfon Fll older instrucfon finished with resource structural hazard can alys be avoided by adding more hardre to design E.g., if two instrucfons both need a port to memory at same Fme, could avoid hazard by adding second port to memory Our 5- stage pipeline has no structural hazards by design Thanks to RISC- V IS, which s designed for pipelining 6 CS252 S05 8

9 Hazards x4 x x inst rs rd B R MD MD2... x x0 + 0 x4 x x is stale. Oops! 7 Resolving Hazards () Strategy : Wait for the result to be available by freezing earlier pipeline stages è interlocks 8 CS252 S05 9

10 Feedback to Resolve Hazards FB FB 2 FB 3 FB 4 stage stage 2 stage 3 stage 4 Later stages provide dependence informafon to earlier stages which can stall (or kill) instruc+ons Controlling a pipeline in this manner works provided the instruction at stage i+ can complete without any interference from instructions in stages to i (otherwise deadlocks may occur) 9 Interlocks to resolve Hazards Stall Condition inst... x x0 + 0 x4 x rs rd B MD MD2 R 20 CS252 S05 0

11 Stalled Stages and Pipeline Bubbles time t0 t t2 t3 t4 t5 t6 t7.... (I ) x (x0) + 0 IF ID EX M WB (I 2 ) x4 (x) + 7 IF 2 ID 2 ID 2 ID 2 ID 2 EX 2 M 2 WB 2 (I 3 ) IF 3 IF 3 IF 3 IF 3 ID 3 EX 3 M 3 WB 3 (I 4 ) stalled stages IF 4 ID 4 EX 4 M 4 WB 4 (I 5 ) IF 5 ID 5 EX 5 M 5 WB 5 Resource Usage time t0 t t2 t3 t4 t5 t6 t7.... IF I I 2 I 3 I 3 I 3 I 3 I 4 I 5 ID I I 2 I 2 I 2 I 2 I 3 I 4 I 5 EX I I 2 I 3 I 4 I 5 M I I 2 I 3 I 4 I 5 WB I I 2 I 3 I 4 I 5 - pipeline 2 Interlock Control Logic stall C stall rs? inst rs rd B MD MD2 Compare the source registers of the instruction in the decode stage with the destination register of the uncommitted instructions. R 22 CS252 S05

12 Interlock Control Logic ignoring jumps & branches stall C stall rs? re re2 C dest C dest C re inst rs rd Should alys stall if an rs field matches some rd? not every instrucfon writes a register not every instrucfon reads a register re B MD MD2 C dest R 23 Source & Des=na=on Registers rd rs func0 opcode rd rs [:0] func3 opcode I/LW/JLR [:7] rs [6:0] func3 opcode SW/Bcond Jump offset[24:0] opcode source(s) des+na+on rd rs func0 rs, rd I rd rs op imm rs rd LW rd M [rs + imm] rs rd SW M [rs + imm] rs, - Bcond rs, rs, - true: + imm false: + 4 J + imm - - JL x, + imm - x JLR rd, rs + imm rs rd 24 CS252 S05 2

13 C dest ws = Case opcode JL X else rd Deriving the Stall Signal = Case opcode, i, LW,JLR (ws 0) JL on... off C re re = Case opcode, i, LW, SW, Bcond, JLR J, JL re2 = Case opcode, SW,Bcond... C stall stall = ((rs D =ws E ). E + (rs D =ws M ). M + (rs D =ws W ). W ). re D + (( D =ws E ). E + ( D =ws M ). M + ( D =ws W ). W ). re2 D on off on off 25 Hazards due to Loads & Stores Stall Condi+on What if x+7 = x3+5? inst... M[x+7] x2 x4 M[x3+5]... rs rd B MD MD2 Is there any possible data hazard in this instruc+on sequence? R 26 CS252 S05 3

14 Load & Store Hazards... M[x+7] x2 x4 M[x3+5]... x+7 = x3+5 data hazard Hover, the hazard is avoided because our memory system completes writes in one cycle! Load/Store hazards are somefmes resolved in the pipeline and somefmes in the memory system itself. More on this later in the course. 27 CS52 dministrivia Quiz on Feb 9 will cover PS, Lab, lectures - 5, and associated readings. 28 CS252 S05 4

15 Resolving Hazards (2) Strategy 2: Route data as soon as possible axer it is calculated to the earlier pipeline stage à bypass 29 Bypassing time t0 t t2 t3 t4 t5 t6 t7.... (I ) x x0 + 0 IF ID EX M WB (I 2 ) x4 x + 7 IF 2 ID 2 ID 2 ID 2 ID 2 EX 2 M 2 WB 2 (I 3 ) IF 3 IF 3 IF 3 IF 3 ID 3 EX 3 M 3 (I 4 ) stalled stages IF 4 ID 4 EX 4 (I 5 ) IF 5 ID 5 Each stall or kill introduces a in the pipeline CPI > new datapath, i.e., a bypass, can get the data from the output of the to its input time t0 t t2 t3 t4 t5 t6 t7.... (I ) x x0 + 0 IF ID EX M WB (I 2 ) x4 x + 7 IF 2 ID 2 EX 2 M 2 WB 2 (I 3 ) IF 3 ID 3 EX 3 M 3 WB 3 (I 4 ) IF 4 ID 4 EX 4 M 4 WB 4 (I 5 ) IF 5 ID 5 EX 5 M 5 WB 5 30 CS252 S05 5

16 stall ing a Bypass x4 x... x... E M W inst D rs rd Src B MD MD2 R... When does this bypass help? (I ) x x0 + 0 x M[x0 + 0] JL 500 (I 2 ) x4 x + 7 yes x4 x + 7 no x4 x + 7 no 3 The Bypass Signal Deriving it from the Stall Signal stall = ( ((rs D =ws E ). E + (rs D =ws M ). M + (rs D =ws W ). W ).re D +(( D =ws E ). E + ( D =ws M ). M + ( D =ws W ). W ).re2 D ) ws = Case opcode JL X else rd Src = (rs D =ws E ). E.re D = Case opcode, i, LW, JLR (ws 0) JL on... off Is this correct? No because only and i instrucfons can benefit from this bypass Split E into two components: - bypass, - stall 32 CS252 S05 6

17 Bypass and Stall Signals Split E into two components: - bypass, - stall - bypass E = Case opcode E, i (ws 0)... off - stall E = Case opcode E LW, JLR (ws 0) JL on... off Src = (rs D =ws E ).- bypass E. re D stall = ((rs D =ws E ).- stall E + (rs D =ws M ). M + (rs D =ws W ). W ). re D +(( D = ws E ). E + ( D = ws M ). M + ( D = ws W ). W ). re2 D 33 Fully Bypassed path stall for JL,... Src E M W inst Is there s+ll a need for the stall signal? D rs rd BSrc B MD MD2 stall = (rs D =ws E ). (opcode E =LW E ).(ws E 0 ).re D + ( D =ws E ). (opcode E =LW E ).(ws E 0 ).re2 D R 34 CS252 S05 7

18 Time Pipeline CPI Examples 2 3 Measure from when first instruc+on finishes to when last instruc+on in sequence finishes. 3 instrucfons finish in 3 cycles CPI = 3/3 = 2 Bubble 3 Bubble 2 Bubble instrucfons finish in 4 cycles CPI = 4/3 =.33 3 instrucfons finish in 5cycles CPI = 5/3 = Resolving Hazards (3) Strategy 3: Speculate on the dependence! Two cases: Guessed correctly è do nothing Guessed incorrectly è kill and restart. We ll later see examples of this approach in more complex processors. 36 CS252 S05 8

19 Specula=on that load value=zero stall for JL,... Src E M W inst D rs rd BSrc Guess_zero 0 B MD MD2 R Guess_zero= (rs D =ws E ). (opcode E =LW E ).(ws E 0 ).re D lso need to add circuitry to remember that this s a guess and flush pipeline if load not zero! Not worth doing in prac+ce why? 37 Control Hazards What do need to calculate next? For Jumps Opcode, and offset For Jump Register Opcode, Register value, and For CondiFonal Branches Opcode, Register (for condifon), and offset For all other instrucfons Opcode and ( and have to know it s not one of above ) 38 CS252 S05 9

20 Calcula=on Bubbles time t0 t t2 t3 t4 t5 t6 t7.... (I ) x x0 + 0 IF ID EX M WB (I 2 ) x3 x2 + 7 IF 2 IF 2 ID 2 EX 2 M 2 WB 2 (I 3 ) IF 3 IF 3 ID 3 EX 3 M 3 WB 3 (I 4 ) IF 4 IF 4 ID 4 EX 4 M 4 WB 4 Resource Usage time t0 t t2 t3 t4 t5 t6 t7.... IF I - I 2 - I 3 - I 4 ID I - I 2 - I 3 - I 4 EX I - I 2 - I 3 - I 4 M I - I 2 - I 3 - I 4 WB I - I 2 - I 3 - I 4 - pipeline 39 Speculate next ess is +4 Src (pc+4 / jabs / rind/ br) stall E M Jump? I 04 inst I 2 I 096 DD I 2 00 J 304 I 3 04 DD I DD kill jump instruction kills (not stalls) the following instruction How? 40 CS252 S05 20

21 Pipelining Jumps Src (pc+4 / jabs / rind/ br) stall To kill a fetched instruction -- Insert a mux before E M Jump? I 2 I inst Src D I 2 ny interaction beten stall and jump? I 096 DD I 2 00 J 304 I 3 04 DD I DD kill Src D = Case opcode D J, JL... IM 4 Jump Pipeline Diagrams time t0 t t2 t3 t4 t5 t6 t7.... (I ) 096: DD IF ID EX M WB (I 2 ) 00: J 304 IF 2 ID 2 EX 2 M 2 WB 2 (I 3 ) 04: DD IF (I 4 ) 304: DD IF 4 ID 4 EX 4 M 4 WB 4 Resource Usage time t0 t t2 t3 t4 t5 t6 t7.... IF I I 2 I 3 I 4 I 5 ID I I 2 - I 4 I 5 EX I I 2 - I 4 I 5 M I I 2 - I 4 I 5 WB I I 2 - I 4 I 5 - pipeline 42 CS252 S05 2

22 Pipelining Condi=onal Branches Src (pc+4 / jabs / rind / br) stall E M BEQ? I Taken? Src D 04 inst I 2 I 096 DD I 2 00 BEQ x,x I 3 04 DD I DD Branch condifon is not known unfl the execute stage what ac+on should be taken in the decode stage? 43 Pipelining Condi=onal Branches Src (pc+4 / jabs / rind / br) stall? E Bcond? M 08 inst Src D I 096 DD I 2 00 BEQ x,x I 3 04 DD I DD I 3 I 2 I Taken? If the branch is taken - kill the two following instrucfons - the instrucfon at the decode stage is not valid stall signal is not valid 44 CS252 S05 22

23 Pipelining Condi=onal Branches Src (pc+4/jabs/rind/br) stall Src E E Bcond? M Jump? I 2 I Taken? 08 inst Src D I : 096 DD I 2: 00 BEQ x,x I 3: 04 DD I 4: 304 DD I 3 If the branch is taken - kill the two following instrucfons - the instrucfon at the decode stage is not valid stall signal is not valid 45 Branch Pipeline Diagrams (resolved in execute stage) time t0 t t2 t3 t4 t5 t6 t7.... (I ) 096: DD IF ID EX M WB (I 2 ) 00: BEQ +200 IF 2 ID 2 EX 2 M 2 WB 2 (I 3 ) 04: DD IF 3 ID (I 4 ) 08: IF (I 5 ) 304: DD IF 5 ID 5 EX 5 M 5 WB 5 Resource Usage time t0 t t2 t3 t4 t5 t6 t7.... IF I I 2 I 3 I 4 I 5 ID I I 2 I 3 - I 5 EX I I I 5 M I I I 5 WB I I I 5 - pipeline 46 CS252 S05 23

24 cknowledgements These slides contain material developed and copyright by: rvind (MIT) Krste sanovic (MIT/UCB) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) David PaEerson (UCB) MIT material derived from course UCB material derived from course CS CS252 S05 24

Computer Architecture ELEC2401 & ELEC3441

Last Time Pipeline Hazard Computer Architecture ELEC2401 & ELEC3441 Lecture 8 Pipelining (3) Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Structural Hazard Hazard Control