6.823, L6--1 Implementing the Controller Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L6--2 Harvard-Style Datapath for DLX Src1 ( j / ~j ) Src2 ( R / RInd) RegWrite MemWrite WBSrc delay dly inst Inst. 31 Control z Data OpCode RegDst Sel OpSel BSrc Page 1
Single-Cycle Hardwired Control: Harvard architecture We will assume clock period is sufficiently long for all of the following steps to be completed : 6.823, L6--3 1. instruction fetch 2. decode and register fetch 3. operation 4. data fetch if required 5. register write-back setup time t C > t IFetch + t RFetch + t + t DMem + t RWB At the rising edge of the following clock, the register file and the memory is updated Hardwired Control is pure Combinational Logic 6.823, L6--4 op code combinational logic Sel BSrc OpSel MemWrite WBSrc RegDst RegWrite Src1 Src2 Page 2
6.823, L6--5 Control & ediate ension Inst<5:0> (Func) Inst<31:26> (Opcode) + 0? op OpSel ( Func, Op, +, 0? ) Decode Map Sel ( s 16, u 16, s 26, High 16 ) Hardwired Control Table 6.823, L6--6 B Op Mem Reg WB Reg Sel Src Sel Write Write Src Dst Src1 Src2 u i ui LW SW BEQZ taken BEQZ ~taken J JAL JR JALR BSrc = Reg / WBSrc = / Mem / RegDst = rf2 / rf3 / R31 Src1 = j / ~j Src2 = R / RInd Page 3
Src1 j / ~j delay Hardwired Control worksheet Src2 R / RInd RegWrite MemWrite 6.823, L6--7 WBSrc / Mem / dly inst Inst. inst<25:21> inst<20:16> 31 inst<15:11> inst<25:0> inst<31:26><5:0> Control z Data OpCode RegDst rf2 / rf3 / R31 Sel s 16 /u 16 / s 26 /High 16 OpSel Func/ BSrc Reg / Op/+ / 0? Hardwired Control Table: Harvard DLX 6.823, L6--8 B Op Mem Reg WB Reg Sel Src Sel Write Write Src Dst Src1 Src2 u Reg Reg Func Func rf3 rf3 ~j ~j i ui s 16 u 16 Op Op rf2 rf2 ~j ~j LW SW s 16 s 16 + + Mem rf2 ~j ~j BEQZ =1 BEQZ =0 s 16 s 16 0? 0? j ~j R J JAL s 26 s 26 R31 j j R R JR JALR R31 j j RInd RInd BSrc = Reg / WBSrc = / Mem / RegDst = rf2 / rf3 / R31 Src1 = j / ~j Src2 = R / RInd Page 4
Hardwired Control Equations: Harvard DLX Sel = Case opcode i, LW, SW, BEQZ, BNEZ s 16 ui u 16 J, JAL s 26 BSrc = Case opcode Reg i, LW, SW OpSel = Case opcode Func i Op LW, SW + BEQZ, BNEZ 0? MemWrite = SW 6.823, L6--9 WBSrc = Case opcode, i LW Mem JAL, JALR RegDst = Case opcode rf3 i, LW rf2 JAL, JALR R31 RegWrite = + i + LW + JAL + JALR Src1 = J + JAL+ JR +JALR + BEQZ. + BNEZ.! Src2 = Case opcode BEQZ, BNEZ, J, JAL R JR, JALR RegI Datapath & Control: Harvard DLX Src1 6.823, L6--10 OpCode delay Src2 RegWrite MemWrite WBSrc dly inst Inst. inst<25:21> inst<20:16> 31 inst<15:11> inst<25:0> inst<31:26><5:0> Control z Data RegDst Sel OpSel BSrc OpCode Page 5
Harvard inst Instruction Harvard vs. Princeton Microarchitecture w Data 6.823, L6--11 Princeton w Data Multi-cycle Execution Princeton Architecture 6.823, L6--12 Instruction Execution 1. instruction fetch 2. decode and register fetch 3. operation 4. memory operation 5. write back May be steps 2 and 3 can be combined, steps 4 and 5 can be combined but t steps 1 and 4 because of Page 6
en Src1 Princeton Microarchitecture Src2 RegWrite MemWrite 6.823, L6--13 WBSrc delay dly 31 Control z Data en OpCode RegDst Sel OpSel BSrc rsrc Two-State Controller: Princeton Architecture 6.823, L6--14 instruction fetch rsrc= en=on en=off Wen=off instruction decode, register fetch, execute, (memory access), (write back) rsrc= en=off en=on Wen=on Page 7
Hardwired Controller: Princeton Architecture 6.823, L6--15 op code old combinational logic (Harvard)... Sel, BSrc, OpSel, WBSrc, RegDest, src1, src2 MemWrite RegWrite S 1-bit Toggle FF I-fetch / Execute new combinational logic Wen en en rsrc Hardwired Control Equations: Harvard and Princeton DLX Sel = Case opcode i, LW, SW, BEQZ, BNEZ s 16 ui u 16 J, JAL s 26 BSrc = Case opcode i, LW, SW Reg OpSel = Case opcode Func i Op LW, SW + BEQZ, BNEZ 0? 6.823, L6--16 MemWrite = Case opcode SW on... off WBSrc = Case opcode, i LW Mem JAL, JALR RegDst = Case opcode rf3 i, LW rf2 JAL, JALR R31 RegWrite = Case opcode, i, LW, JAL, JALR on... off Page 8
Hardwired Control Equations: Harvard and Princeton DLX 6.823, L6--17 Src1 = Case opcode J, JAL, JR, JALR jump BEQZ. jump BNEZ.! jump... don t jump Src2 = Case opcode BEQZ, BNEZ, J, JAL JR, JALR Princeton Controller en = (S == Execute) Wen = (S == Execute) en = (S == I-Fetch) R RegI rsrc = Case S Execute I-fetch Clock Period 6.823, L6--18 t C-Princeton > max {t M, t RF + t + t M + t WB } t C-Princeton > t RF + t + t M + t WB while in the hardwired Harvard architecture t C-Harvard > t M + t RF + t + t M + t WB which will execute instructions faster? Page 9
Clock Rate vs CPI 6.823, L6--19 Suppose t M >> t RF + t + t WB t C-Princeton 0.5 t C-Harvard CPI Princeton = 2 CPI Harvard = 1 No difference in performance Hover, it is possible to design a controller for the Princeton architecture with CPI< 2. How? CPI = Clock cycles Per Instruction Princeton Microarchitecture (redrawn) 6.823, L6--20 0 x4 The same (mux t shown) fetch execute Only one of the s is active in any cycle a lot of datapath is t in use at any given time Page 10
Princeton Microarchitecture Can an instruction be issued in every cycle? 6.823, L6--21 0 x4 fetch execute The next instruction can be fetched in the execute of the current instruction unless contains a Load or Store instruction... may need to stall the instruction fetch. how? Stalling the Instruction-Fetch: Princeton Microarchitecture 6.823, L6--22 stall? 0 x4 fetch execute When stall condition is indicated, don t enable the and set the Mem r mux to,... what about? Page 11
Injecting a NOP 6.823, L6--23 stall? 0 x4 p fetch execute When stall condition is indicated, delay instruction fetch by stalling and insert a NOP in on the next cycle. Does this affect branch target calculations? 6.823, L6--24 Pipelined Princeton Architecture stall? 0 x4 fetch execute If can implement the control properly Clock: t C-Princeton > t RF + t + t M CPI: (1- f) + 2f cycles per instruction where f is the fraction of instructions that cause a stall Page 12
Hardwired Controller: Princeton Architecture - redrawn 6.823, L6--25 op code... Sel, src, OpSel, WBSrc, RegDest, src1, src2 S combinational logic MemWrite RegWrite en en rsrc Next state Pipelined Harvard Datapath 6.823, L6--26 Inst. Data fetch decode & Reg-fetch execute memory Clock period can be reduced by dividing the execution of an instruction into multiple cycles t C > max {t IM, t RF, t, t DM, t RW } = t DM (probably) write -back Hover, CPI will increase unless instructions are pipelined Page 13
Datapath for Instructions 6.823, L6--27 Datapath with a 3-ported GPR rf1 rf2 rf3 rf3 (rf1) func (rf2) lda ldb RegSel Datapath with a single-ported GPR and a shared bus en A B data RegWrt enreg rf1 rf2 rf3 Bus Page 14