Simple Instruction-Pipelining (cont.) Pipelining Jumps

Size: px

Start display at page:

Download "Simple Instruction-Pipelining (cont.) Pipelining Jumps"

Lenard George
5 years ago
Views:

1 6.823, L9--1 Simple ruction-pipelining (cont.) + Interrupts Updated March 6, 2000 Laboratory for Computer Science M.I.T. Src1 ( j / ~j ) Src2 ( / Ind) Pipelining Jumps stall 6.823, L E M Jump? Src D DD 100 J 200 I DD I DD no delay slot kill To kill the fetched ruction, insert a mux before D ny interaction beten stall and jump? No Page 1

2 Control Equations for Muxes Jumps only 6.823, L9--3 Src1 = Case opcode D J, JL, J, JL j D... ~j Src2 = Case opcode D J, JL J, JL egi Src D = Case opcode D J, JL, J, JL... IM Pipelining Conditional ranches Src1 ( j / ~j ) Src2 ( / Ind) stall 6.823, L9--4 E M EQZ? zero? Src D DD 100 EQZ r1, 200 I DD I DD no delay slot ranch condition is not known until the execute stage what action should be taken in the decode stage? Page 2

3 Conditional ranches: solution 1 Src1 Src2 ( j D /j E / ~j ) ( / Ind) stall 6.823, L9--5? E EQZ? M 108 Src D? I 3 zero? 096 DD 100 EQZ r1, 200 I DD I DD no delay slot If the branch is taken - kill the two following ructions - the ruction at the decode stage is not valid stall is not valid New Stall Signal 6.823, L9--6 stall = (((rf1 D = E ). E + (rf1 D = M ). M + (rf1 D = W ). W ).re1 D + ((rf2 D = E ). E + (rf2 D = M ). M + (rf2 D = W ). W ).re2 D ).!((opcode E =EQZ).z + (opcode E =NEZ).!z) Page 3

4 Control Equations for Muxes Solution 1 Src1 = Case opcode E EQZ.z, NEZ.!z j E... Case opcode D J, JL, J, JL j D... ~j Src2 = Case opcode D J, JL J, JL egi 6.823, L9--7 Give priority to the older ruction, i.e., execute Stage ruction Over decode Stage ruction Src D = Case opcode E EQZ.z, NEZ.!z... (Case opcode D J, JL, J, JL... IM) Src E = Case opcode E EQZ.z, NEZ.!z... stall. +!stall. D Conditional ranches: solution 2 Src1 ( j / ~j ) Test for zero at the decode stage Src2 ( / Ind) stall 6.823, L EQZ? E M 104 Src D or zero? 096 DD 100 EQZ r1, 200 I DD I DD no delay slot Need to kill only one ruction! Wouldn t work if DLX had general branch conditions (i.e., r1>r2)? Page 4

5 Conditional ranches: solution 3 Src1 ( j / ~j ) Src2 ( / Ind) Delayed ranches stall 6.823, L EQZ? E M zero? DD 100 EQZ r1, 200 I DD I DD Change the semantics of branches and jumps Need not kill any ructions! To delay or not to delay? 6.823, L9--10 Delay slot complicates IS specification and programming may simplify interlock logic clock time? So why have delay slots? hint: Consider ruction issue opportunities assuming 15% of the ructions are control ructions. Page 5

6 Pipelining Delayed Jumps & Links 6.823, L holding the return ess for linking 104 GPs 096 DD 100 JL 200 I DD I DD Src1 Complete Control Logic stall W W C M stall M rf1 E rf2 E re1re , L9--12 Src2 C re C dest C dest C dest Sel GPs zero? OpSel Cntrl MemWrite WSrc Src Page 6

7 Hardwired Control Equations 6.823, L9--13 Sel = Case opcode D i, LW, SW, EQZ, NEZ s 16 ui u 16 J, JL s 26 Src = Case opcode D eg i, LW, SW OpSel = Case opcode E Func i Op LW, SW + EQZ, NEZ 0? Ignoring Jumps and ranches MemWrite = Case opcode M SW on... off WSrc = Case opcode M, i LW Mem JL, JL egdst = Case opcode W rf3 i, LW rf2 JL, JL egwrite = Case opcode W, i, LW ( 0) JL, JL on... off The Stall Signal 6.823, L9--14 C dest = Case opcode rf3 i, LW rf2 JL, JL = Case opcode, i, LW ( 0) JL, JL on... off C re re1 = Case opcode, i, LW, SW, Z, J, JL on J, JL off re2 = Case opcode, SW on... off stall stall = ( (rf1 = D E ). E + (rf1 D = M ). M + (rf1 D = W ). W ). re1 D + ((rf2 D = E ). E + (rf2 D = M ). M + (rf2 D = W ). W ). re2 D Page 7

8 Control Equations for Muxes Delayed Jumps and ranches 6.823, L9--15 Src1 = Case opcode D EQZ.zero?, NEZ.!zero? j J, JL, J, JL j... ~j Src2 = Case opcode D EQZ.zero?, NEZ.!zero? J, JL J, JL egi... * Src E = stall. +!stall. D ypassing 6.823, L9--16 time t0 t1 t2 t3 t4 t5 t6 t7.... ( ) r1 (r0) + 10 IF 1 ID 1 EX 1 M 1 W 1 ( ) r4 (r1) + 17 IF 2 ID 2 ID 2 ID 2 ID 2 EX 2 M 2 W 2 (I 3 ) IF 3 IF 3 IF 3 IF 3 ID 3 EX 3 M 3 (I 4 ) stalled stages IF 4 ID 4 EX 4 (I 5 ) IF 5 ID 5 Each stall or kill introduces a bubble in the pipeline CPI > 1 new datapath, i.e., a bypass, can get the data from the output of the to its input time t0 t1 t2 t3 t4 t5 t6 t7.... ( ) r1 (r0) + 10 IF 1 ID 1 EX 1 M 1 W 1 ( ) r4 (r1) + 17 IF 2 ID 2 EX 2 M 2 W 2 (I 3 ) IF 3 ID 3 EX 3 M 3 W 3 (I 4 ) IF 4 ID 4 EX 4 M 4 W 4 (I 5 ) IF 5 ID 5 EX 5 M 5 W 5 Page 8

9 Stall ing ypasses 6.823, L9--17 E M W... ( ) r1 (r0) + 10 ( ) r4 (r1) D GPs Src Of course you can add many more bypasses! The ypass Signal deriving it from the stall signal 6.823, L9--18 C dest = Case opcode rf3 i, LW rf2 JL, JL = Case opcode, i, LW ( 0) JL, JL on... off C re re1 = Case opcode, i, LW, SW, Z, J, JL on J, JL off re2 = Case opcode, SW on... off stall stall = ( (rf1 = D E ). E + (rf1 D = M ). M + (rf1 D = W ). W ). re1 D + ((rf2 D = E ). E + (rf2 D = M ). M + (rf2 D = W ). W ). re2 D Is this correct? C bypass Src = (rf1 D = E ). E. re1 D Page 9

10 Stall Usefulness of a ypass 6.823, L9--19 M W Consider... ( ) r1 (r0) + 10 ( ) r4 (r1) + 17 GPs Where can this bypass help? Src... r1 M[(r0) + 10] r4 (r1) JL 500 r4 (r) + 17 ypass and Stall Signals E has to be split into two components -bypass E = ((opcode E =) + (opcode E =i E )). ( E 0 ) -stall E = (opcode E = LW E ).( E 0) + (opcode E =JL E ) + (opcode E =JL E ) Src Stall re1 re2 -stall -bypass Stall GPs Src E 6.823, L9--20 bypass = (rf1 D = E ).-bypass E. re1 D = ((rf1 D = E ).-stall E ) + (rf1 D = M ). M + (rf1 D = W ). W ). re1 D + ((rf2 D = E ). E + (rf2 D = M ). M + (rf2 D = W ). W ). re2 D M C dest C dest W Page 10

11 Fully ypassed path Stall 6.823, L9--21 W W C dest Src Is there still a need for the Stall signal? GPs Src Stall = (rf1 D = E ). (opcode E =LW E ).( E 0 ).re1 D + (rf2 D = E ). (opcode E =LW E ).( E 0 ).re2 D Why an ruction may not be dispatched every cycle 6.823, L9--22 Full bypassing may be too expensive to implement Loads may cause bubbles, if there are no load delay slots Conditional branches may cause bubbles, if there are no branch delay slots Page 11

12 Interrupts alter normal flow of control 6.823, L9--23 I i H program I i+1 H interrupt handler I i+2 HI n n external or internal event that needs to be processed by another (system) program. The event is usually unexpected or rare from program s point of view. Causes of Interrupts Interrupt is an event that requests the attention of the processor 6.823, L9--24 synchronous: an external event input/output device service-request timer expiration por disruptions, hardware failure Synchronous: an internal event (aka exceptions) undefined opcode, privileged ruction arithmetic overflow, FPU exception misaligned memory access virtual memory exceptions: page faults, TL misses, protection violations traps: system calls (i.e., jumps into kernel code) Page 12

13 synchronous Interrupts: invoking the interrupt handler 6.823, L9--25 n I/O device requests attention by asserting one of the prioritized interrupt request lines When the processor decides to process the interrupt it stops the current program at ruction I i, completing all the ructions up to I i (precise interrupt) it saves the of ruction I i+1 in a special register () disables interrupts and transfers control to a designated interrupt handler Interrupt Handler 6.823, L9--26 To allow nested interrupts, is saved before enabling interrupts need an ruction to move into GPs need a way to mask further interrupts at least until can be saved There is a status register which indicates the cause of the interrupt - it must be visible to an interrupt handler The return from an interrupt handler is a simple indirect jump but usually involves enabling interrupts restoring the processor to the user mode restoring hardware status and control state a special return-from-execption ruction (FE) Page 13

14 syn. Control Transfer: The start 6.823, L9--27 IH IH I i-1 I i-2 I i-3 i+1 I i interrupt to be processed after I i GPs i+1 must be saved here for the interrupt handler. syn. Control Transfer: step , L9--28 IH I i I i-1 I i-2 IH GPs i+1 The controller must insert s before IH is fetched into. Page 14

15 syn. Control Transfer: step , L9--29 IH +8 IH IH +4 I IH GPs i+1 The interrupt handler can begin execution path for syn. Interrupts 6.823, L9--30 IH interrupt control is a multicycle operation µcontroller GPs i Page 15

16 syn. Control Transfer: delay slot 6.823, L9-- IH IH I i-1 I i-2 I i-3 i+1 I i J interrupt to be processed after I i ( = J) GPs Can resume execution correctly by storing just i+1? DLX IS-specific Solution On an interrupt, if I i+1 is in a delay slot then save i ead of i+1 Execution can always be correctly resumed from the saved. Why? Hint: can a jump ruction be re-executed? 6.823, L9--32 J J Z JL JL modify state? OK to re-execute? Page 16

17 syn. Control Transfer: DLX Hack 6.823, L9--33 IH IH I i-1 I i-2 I i-3 i+1 I i J GPs i in case of a jump save i ead of i+1 Synchronous Interrupts 6.823, L9--34 synchronous interrupt (exception) is caused by a particular ruction In general, the ruction cannot be completed and needs to be restarted after the exception has been handled requires undoing the effect of one or more partially executed ructions In case of a trap (system call), the ruction is considered to have been completed a special jump ruction involving a change to the privilege mode Correct implementation of exceptions is quite difficult Page 17

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. fetch decode & eg-fetch execute