6.823, L8--1 Simple ruction-pipelining Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. I fetch decode & eg-fetch execute memory Clock period can be reduced by dividing the execution of an instruction into multiple cycles t C > max {t IM, t F, t, t DM, t W } = t DM (probably) write -back Hover, CPI will increase unless instructions are pipelined Page 1
How to divide the datapath into s 6.823, L8--3 Suppose memory is significantly slor than other s. In particular, suppose t IM = t DM = 10 units t = 5 units t F = t W = 1 unit Since the slost determines the clock, it may be possible to combine some s without any loss of performance Minimizing Critical Path 6.823, L8--4 0 x4. I fetch decode & eg-fetch & execute t C > max {t IM, t F + t, t DM, t W } memory write -back Write-back takes much less time than other s. Suppose combined it with the memory increase the critical path by 10% Page 2
Speedup by Pipelining ignoring hazards 6.823, L8--5 For the 4- pipeline, given t IM = t DM = 10 units, t = 5 units, t F = t W = 1 unit t C could be reduced from 27 units to 10 units speedup = 2.7 Hover, if t IM = t DM = t = t F = t W = 5 units The same 4- pipeline can reduce t C from 25 units to 10 units speedup = 2.5 ut, since t IM = t DM = t = t F = t W, it is possible to achieve higher speedup with more s in the pipeline. 5- pipeline can reduce t C from 25 units to 5 units speedup = 5 n Ideal Pipeline 6.823, L8--6 1 2 3 4 ll objects go through the same s No sharing of resources beten any two s Propagation delay through all pipeline s is equal The scheduling of an object entering the pipeline is not affected by the objects in other s These conditions generally hold for industrial assembly lines. n instruction pipeline, hover, cannot satisfy the last condition. Why? Page 3
How ructions can Interact with each other in a pipeline 6.823, L8--7 n instruction in the pipeline may need a resource being used by another instruction in the pipeline structural hazard n instruction may produce data that is needed by a later instruction data hazard In the extreme case, an instruction may determine the next instruction to be executed control hazard (branches, interrupts,...) Feedback to esolve Hazards 6.823, L8--8 F 1 F 2 F 3 F 4 1 2 3 4 Controlling pipeline in this manner works provided the instruction at i+1 can complete without any interference from instructions in s 1 to i (otherwise deadlocks may occur) Feedback to previous s is used to stall or kill instructions Page 4
Technology ssumptions 6.823, L8--9 We will assume small amount of very fast memory (caches) backed up by a large, slor memory Fast (at least for integers) Multiported egister files (slor!). It makes the following timing assumption valid t IM t F t t DM t W 5- pipelined Harvard architecture will be the focus of our detailed design 5-Stage Pipelined Execution 6.823, L8--10 I fetch (IF) decode & eg-fetch (ID) execute (EX) memory (M) write -back (W) time t0 t1 t2 t3 t4 t5 t6 t7.... instruction1 IF 1 ID 1 EX 1 M 1 W 1 instruction2 IF 2 ID 2 EX 2 M 2 W 2 instruction3 IF 3 ID 3 EX 3 M 3 W 3 instruction4 IF 4 ID 4 EX 4 M 4 W 4 instruction5 IF 5 ID 5 EX 5 M 5 W 5 Page 5
5-Stage Pipelined Execution esource Usage Diagram I 6.823, L8--11 fetch (IF) decode & eg-fetch (ID) execute (EX) memory (M) write -back (W) esources time t0 t1 t2 t3 t4 t5 t6 t7.... IF I 1 I 2 I 4 I 5 ID I 1 I 2 I 4 I 5 EX I 1 I 2 I 4 I 5 M I 1 I 2 I 4 I 5 W I 1 I 2 I 4 I 5 Pipelined Execution: ructions 6.823, L8--12 not quite correct! Page 6
Pipelined Execution: Need for Several I s 6.823, L8--13 I I I Is and Control points 6.823, L8--14 I I I re control points connected properly? - Load/Store instructions - instructions Page 7
Pipelined Harvard path without interlocks and jumps 6.823, L8--15 egwrite I I I OpSel MemWrite egdst WSrc Sel Src Hardwired Control Equations: Harvard path - pipelined 6.823, L8--16 Sel = Case opcode D i, LW, SW, EQZ, NEZ s 16 ui u 16 J, JL s 26 Src = Case opcode D eg i, LW, SW OpSel = Case opcode E Func i Op LW, SW + EQZ, NEZ 0? MemWrite = Case opcode M SW on... off WSrc = Case opcode M, i LW Mem JL, JL egdst = Case opcode W rf3 i, LW rf2 JL, JL egwrite = Case opcode W, i, LW, JL, JL on... off Page 8
Hazards 6.823, L8--17 E M W I I I D... r1 (r0) + 10 r4 (r1) + 17... Oops! esolving Hazards 6.823, L8--18 1. Freeze earlier pipeline s until the data becomes available interlocks 2. If data is available somewhere in the datapath provide a bypass to get it to the right Page 9
6.823, L8--19 Interlocks to resolve Hazards Stall Condition E M W nop I I I D... r1 (r0) + 10 r4 (r1) + 17... Stalled Stages and Pipeline ubbles 6.823, L8--20 time t0 t1 t2 t3 t4 t5 t6 t7.... (I 1 ) r1 (r0) + 10 IF 1 ID 1 EX 1 M 1 W 1 (I 2 ) r4 (r1) + 17 IF 2 ID 2 ID 2 ID 2 ID 2 EX 2 M 2 W 2 ( ) IF 3 IF 3 IF 3 IF 3 ID 3 EX 3 M 3 W 3 (I 4 ) stalled s IF 4 ID 4 EX 4 M 4 W 4 (I 5 ) IF 5 ID 5 EX 5 M 5 W 5 esource Usage time t0 t1 t2 t3 t4 t5 t6 t7.... IF I 1 I 2 I 4 I 5 ID I 1 I 2 I 2 I 2 I 2 I 4 I 5 EX I 1 nop nop nop I 2 I 4 I 5 M I 1 nop nop nop I 2 I 4 I 5 W I 1 nop nop nop I 2 I 4 I 5 nop pipeline bubble Page 10
Interlock Control Logic worksheet 6.823, L8--21 stall C stall rf1 rf2? E M W nop I I I D C dest Compare the source registers of the instruction in the decode with the destination register of the uncommitted instructions. Interlock Control Logic ignoring jumps & branches W stall W C stall M M rf1 E rf2 E re1 re2 C re nop 6.823, L8--22 C E dest M Cdest W I I I D C dest Should always stall if the rs field matches some rd? not every instruction writes registers not every instruction reads registers re Page 11
Source & Destination egisters 6.823, L8--23 -type: op rf1 rf2 rf3 func I-type: op rf1 rf2 immediate16 J-type: op immediate26 source(s) destination rf3 (rf1) func (rf2) rf1, rf2 rf3 i rf2 (rf1) op imm rf1 rf2 LW rf2 M [(rf1) + imm] rf1 rf2 SW M [(rf1) + imm] (rf2) rf1, rf2 Z cond (rf1) true: () + imm rf1 false: () + 4 rf1 J () + imm JL r (), () + imm J (rf1) rf1 JL r (), (rf1) rf1 Deriving the Stall Signal 6.823, L8--24 C dest = Case opcode rf3 i, LW rf2 JL, JL = Case opcode, i, LW, JL, JL on... off C re re1 = Case opcode, i, on off re2 = Case opcode on off stall = Stall if the source registers of the instruction in the decode matches the destination register of the uncommitted instructions. Page 12
The Stall Signal 6.823, L8--25 C dest = Case opcode rf3 i, LW rf2 JL, JL = Case opcode, i, LW, JL, JL ( 0)... off C re re1 = Case opcode, i, LW, SW, Z, J, JL on J, JL off re2 = Case opcode, SW on... off stall stall = ( (rf1 = D E ). E + (rf1 D = M ). M + (rf1 D = W ). W ). re1 D + ((rf2 D = E ). E + (rf2 D = M ). M + (rf2 D = W ). W ). re2 D This is not the full story! Hazards due to Loads & Stores Stall Condition 6.823, L8--26 E M W nop I I I D... M[(r1)+7] (r2) r4 M[(r3)+5]... Is there any possible data hazard in this instruction sequence? Page 13
Hazards due to Loads & Stores depends on the memory system? 6.823, L8--27 E M W nop I I I D M[(r1)+7] (r2) (r1)+7 = (r3)+5 data hazard r4 M[(r3)+5] Hover, the hazard is avoided because... our memory system completes writes in one cycle! Complications due to Jumps stall 6.823, L8--28 nop E I M I I 1 I 2 ssuming no delay slot I 1 096 DD I 2 100 J 200 104 DD I 4 304 DD kill Page 14