Implementing the Controller. Harvard-Style Datapath for DLX

Similar documents
Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

Simple Instruction-Pipelining (cont.) Pipelining Jumps


CPU DESIGN The Single-Cycle Implementation

Computer Architecture ELEC2401 & ELEC3441

CSE Computer Architecture I

3. (2) What is the difference between fixed and hybrid instructions?

Project Two RISC Processor Implementation ECE 485

CPSC 3300 Spring 2017 Exam 2

Processor Design & ALU Design

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary?

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

L07-L09 recap: Fundamental lesson(s)!

4. (3) What do we mean when we say something is an N-operand machine?

Spiral 1 / Unit 3

EC 413 Computer Organization

Review: Single-Cycle Processor. Limits on cycle time

CSCI-564 Advanced Computer Architecture

61C In the News. Processor Design: 5 steps

COVER SHEET: Problem#: Points

TEST 1 REVIEW. Lectures 1-5

Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1

ICS 233 Computer Architecture & Assembly Language

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati

CMP N 301 Computer Architecture. Appendix C

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...

Unit 6: Branch Prediction

Enrico Nardelli Logic Circuits and Computer Architecture

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Microprocessor Power Analysis by Labeled Simulation

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

Outcomes. Spiral 1 / Unit 2. Boolean Algebra BOOLEAN ALGEBRA INTRO. Basic Boolean Algebra Logic Functions Decoders Multiplexers

Review. Combined Datapath

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

Designing Single-Cycle MIPS Processor

Figure 4.9 MARIE s Datapath

Chapter 7: Digital Components. Oregon State University School of Electrical Engineering and Computer Science. Review basic digital design concepts:

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

COMP303 Computer Architecture Lecture 11. An Overview of Pipelining

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

EE 660: Computer Architecture Out-of-Order Processors

A Second Datapath Example YH16

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

Designing MIPS Processor

Lecture 3, Performance

/ : Computer Architecture and Design

CMP 334: Seventh Class

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

Computer Architecture

Lecture 13: Sequential Circuits, FSM

Chapter 8: Atmel s AVR 8-bit Microcontroller Part 3 Microarchitecture. Oregon State University School of Electrical Engineering and Computer Science

UMBC. At the system level, DFT includes boundary scan and analog test bus. The DFT techniques discussed focus on improving testability of SAFs.

UNIVERSITY OF WISCONSIN MADISON

Lecture 3, Performance

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Lecture 13: Sequential Circuits, FSM

Clock T FF1 T CL1 T FF2 T T T FF T T FF T CL T FF T CL T FF T T FF T T FF T CL. T cyc T H. Clock T FF T T FF T CL T FF T T FF T CL.

Sequential Circuits. Circuits with state. Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. L06-1

Pipeline no Prediction. Branch Delay Slots A. From before branch B. From branch target C. From fall through. Branch Prediction

CprE 281: Digital Logic

Sequential Logic Worksheet

Topics: A multiple cycle implementation. Distributed Notes

SISD SIMD. Flynn s Classification 8/8/2016. CS528 Parallel Architecture Classification & Single Core Architecture C P M

CMP 338: Third Class

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

Design for Testability

Portland State University ECE 587/687. Branch Prediction

Spiral 2-1. Datapath Components: Counters Adders Design Example: Crosswalk Controller

Lecture: Pipelining Basics

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

CSc 256 Midterm 2 Fall 2010

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

ALU A functional unit

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

Design at the Register Transfer Level

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Load. Load. Load 1 0 MUX B. MB select. Bus A. A B n H select S 2:0 C S. G select 4 V C N Z. unit (ALU) G. Zero Detect.

CPU DESIGN The Single-Cycle Implementation

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

Design for Testability

Chapter 8. Low-Power VLSI Design Methodology

Basic Computer Organization and Design Part 3/3

ENEE350 Lecture Notes-Weeks 14 and 15

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation

Instruction register. Data. Registers. Register # Memory data register

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Fig. 7-6 Single Bus versus Dedicated Multiplexers

Professor Fearing EECS150/Problem Set Solution Fall 2013 Due at 10 am, Thu. Oct. 3 (homework box under stairs)

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3

COE 328 Final Exam 2008

Fall 2011 Prof. Hyesoon Kim

Transcription:

6.823, L6--1 Implementing the Controller Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L6--2 Harvard-Style Datapath for DLX Src1 ( j / ~j ) Src2 ( R / RInd) RegWrite MemWrite WBSrc delay dly inst Inst. 31 Control z Data OpCode RegDst Sel OpSel BSrc Page 1

Single-Cycle Hardwired Control: Harvard architecture We will assume clock period is sufficiently long for all of the following steps to be completed : 6.823, L6--3 1. instruction fetch 2. decode and register fetch 3. operation 4. data fetch if required 5. register write-back setup time t C > t IFetch + t RFetch + t + t DMem + t RWB At the rising edge of the following clock, the register file and the memory is updated Hardwired Control is pure Combinational Logic 6.823, L6--4 op code combinational logic Sel BSrc OpSel MemWrite WBSrc RegDst RegWrite Src1 Src2 Page 2

6.823, L6--5 Control & ediate ension Inst<5:0> (Func) Inst<31:26> (Opcode) + 0? op OpSel ( Func, Op, +, 0? ) Decode Map Sel ( s 16, u 16, s 26, High 16 ) Hardwired Control Table 6.823, L6--6 B Op Mem Reg WB Reg Sel Src Sel Write Write Src Dst Src1 Src2 u i ui LW SW BEQZ taken BEQZ ~taken J JAL JR JALR BSrc = Reg / WBSrc = / Mem / RegDst = rf2 / rf3 / R31 Src1 = j / ~j Src2 = R / RInd Page 3

Src1 j / ~j delay Hardwired Control worksheet Src2 R / RInd RegWrite MemWrite 6.823, L6--7 WBSrc / Mem / dly inst Inst. inst<25:21> inst<20:16> 31 inst<15:11> inst<25:0> inst<31:26><5:0> Control z Data OpCode RegDst rf2 / rf3 / R31 Sel s 16 /u 16 / s 26 /High 16 OpSel Func/ BSrc Reg / Op/+ / 0? Hardwired Control Table: Harvard DLX 6.823, L6--8 B Op Mem Reg WB Reg Sel Src Sel Write Write Src Dst Src1 Src2 u Reg Reg Func Func rf3 rf3 ~j ~j i ui s 16 u 16 Op Op rf2 rf2 ~j ~j LW SW s 16 s 16 + + Mem rf2 ~j ~j BEQZ =1 BEQZ =0 s 16 s 16 0? 0? j ~j R J JAL s 26 s 26 R31 j j R R JR JALR R31 j j RInd RInd BSrc = Reg / WBSrc = / Mem / RegDst = rf2 / rf3 / R31 Src1 = j / ~j Src2 = R / RInd Page 4

Hardwired Control Equations: Harvard DLX Sel = Case opcode i, LW, SW, BEQZ, BNEZ s 16 ui u 16 J, JAL s 26 BSrc = Case opcode Reg i, LW, SW OpSel = Case opcode Func i Op LW, SW + BEQZ, BNEZ 0? MemWrite = SW 6.823, L6--9 WBSrc = Case opcode, i LW Mem JAL, JALR RegDst = Case opcode rf3 i, LW rf2 JAL, JALR R31 RegWrite = + i + LW + JAL + JALR Src1 = J + JAL+ JR +JALR + BEQZ. + BNEZ.! Src2 = Case opcode BEQZ, BNEZ, J, JAL R JR, JALR RegI Datapath & Control: Harvard DLX Src1 6.823, L6--10 OpCode delay Src2 RegWrite MemWrite WBSrc dly inst Inst. inst<25:21> inst<20:16> 31 inst<15:11> inst<25:0> inst<31:26><5:0> Control z Data RegDst Sel OpSel BSrc OpCode Page 5

Harvard inst Instruction Harvard vs. Princeton Microarchitecture w Data 6.823, L6--11 Princeton w Data Multi-cycle Execution Princeton Architecture 6.823, L6--12 Instruction Execution 1. instruction fetch 2. decode and register fetch 3. operation 4. memory operation 5. write back May be steps 2 and 3 can be combined, steps 4 and 5 can be combined but t steps 1 and 4 because of Page 6

en Src1 Princeton Microarchitecture Src2 RegWrite MemWrite 6.823, L6--13 WBSrc delay dly 31 Control z Data en OpCode RegDst Sel OpSel BSrc rsrc Two-State Controller: Princeton Architecture 6.823, L6--14 instruction fetch rsrc= en=on en=off Wen=off instruction decode, register fetch, execute, (memory access), (write back) rsrc= en=off en=on Wen=on Page 7

Hardwired Controller: Princeton Architecture 6.823, L6--15 op code old combinational logic (Harvard)... Sel, BSrc, OpSel, WBSrc, RegDest, src1, src2 MemWrite RegWrite S 1-bit Toggle FF I-fetch / Execute new combinational logic Wen en en rsrc Hardwired Control Equations: Harvard and Princeton DLX Sel = Case opcode i, LW, SW, BEQZ, BNEZ s 16 ui u 16 J, JAL s 26 BSrc = Case opcode i, LW, SW Reg OpSel = Case opcode Func i Op LW, SW + BEQZ, BNEZ 0? 6.823, L6--16 MemWrite = Case opcode SW on... off WBSrc = Case opcode, i LW Mem JAL, JALR RegDst = Case opcode rf3 i, LW rf2 JAL, JALR R31 RegWrite = Case opcode, i, LW, JAL, JALR on... off Page 8

Hardwired Control Equations: Harvard and Princeton DLX 6.823, L6--17 Src1 = Case opcode J, JAL, JR, JALR jump BEQZ. jump BNEZ.! jump... don t jump Src2 = Case opcode BEQZ, BNEZ, J, JAL JR, JALR Princeton Controller en = (S == Execute) Wen = (S == Execute) en = (S == I-Fetch) R RegI rsrc = Case S Execute I-fetch Clock Period 6.823, L6--18 t C-Princeton > max {t M, t RF + t + t M + t WB } t C-Princeton > t RF + t + t M + t WB while in the hardwired Harvard architecture t C-Harvard > t M + t RF + t + t M + t WB which will execute instructions faster? Page 9

Clock Rate vs CPI 6.823, L6--19 Suppose t M >> t RF + t + t WB t C-Princeton 0.5 t C-Harvard CPI Princeton = 2 CPI Harvard = 1 No difference in performance Hover, it is possible to design a controller for the Princeton architecture with CPI< 2. How? CPI = Clock cycles Per Instruction Princeton Microarchitecture (redrawn) 6.823, L6--20 0 x4 The same (mux t shown) fetch execute Only one of the s is active in any cycle a lot of datapath is t in use at any given time Page 10

Princeton Microarchitecture Can an instruction be issued in every cycle? 6.823, L6--21 0 x4 fetch execute The next instruction can be fetched in the execute of the current instruction unless contains a Load or Store instruction... may need to stall the instruction fetch. how? Stalling the Instruction-Fetch: Princeton Microarchitecture 6.823, L6--22 stall? 0 x4 fetch execute When stall condition is indicated, don t enable the and set the Mem r mux to,... what about? Page 11

Injecting a NOP 6.823, L6--23 stall? 0 x4 p fetch execute When stall condition is indicated, delay instruction fetch by stalling and insert a NOP in on the next cycle. Does this affect branch target calculations? 6.823, L6--24 Pipelined Princeton Architecture stall? 0 x4 fetch execute If can implement the control properly Clock: t C-Princeton > t RF + t + t M CPI: (1- f) + 2f cycles per instruction where f is the fraction of instructions that cause a stall Page 12

Hardwired Controller: Princeton Architecture - redrawn 6.823, L6--25 op code... Sel, src, OpSel, WBSrc, RegDest, src1, src2 S combinational logic MemWrite RegWrite en en rsrc Next state Pipelined Harvard Datapath 6.823, L6--26 Inst. Data fetch decode & Reg-fetch execute memory Clock period can be reduced by dividing the execution of an instruction into multiple cycles t C > max {t IM, t RF, t, t DM, t RW } = t DM (probably) write -back Hover, CPI will increase unless instructions are pipelined Page 13

Datapath for Instructions 6.823, L6--27 Datapath with a 3-ported GPR rf1 rf2 rf3 rf3 (rf1) func (rf2) lda ldb RegSel Datapath with a single-ported GPR and a shared bus en A B data RegWrt enreg rf1 rf2 rf3 Bus Page 14