Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017

Similar documents
CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

Department of Electrical and Computer Engineering The University of Texas at Austin

Review: Single-Cycle Processor. Limits on cycle time

CSE Computer Architecture I

CPU DESIGN The Single-Cycle Implementation

Computer Architecture Lecture 8: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/4/2013

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

CENG3420 Lab 3-1: LC-3b Datapath

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

Combinational vs. Sequential. Summary of Combinational Logic. Combinational device/circuit: any circuit built using the basic gates Expressed as

Spiral 1 / Unit 3

L07-L09 recap: Fundamental lesson(s)!

Project Two RISC Processor Implementation ECE 485

Designing MIPS Processor

Implementing the Controller. Harvard-Style Datapath for DLX

Topics: A multiple cycle implementation. Distributed Notes

EC 413 Computer Organization

CPSC 3300 Spring 2017 Exam 2

TEST 1 REVIEW. Lectures 1-5

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

A Second Datapath Example YH16

3. (2) What is the difference between fixed and hybrid instructions?

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

4. (3) What do we mean when we say something is an N-operand machine?

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary?

Figure 4.9 MARIE s Datapath

Instruction register. Data. Registers. Register # Memory data register

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

Outcomes. Spiral 1 / Unit 2. Boolean Algebra BOOLEAN ALGEBRA INTRO. Basic Boolean Algebra Logic Functions Decoders Multiplexers

Enrico Nardelli Logic Circuits and Computer Architecture

Review. Combined Datapath

Processor Design & ALU Design

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM

Lecture 3, Performance

Lecture 3, Performance

61C In the News. Processor Design: 5 steps

CSCI-564 Advanced Computer Architecture


CprE 281: Digital Logic

Formal Verification of Systems-on-Chip

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati

CMP N 301 Computer Architecture. Appendix C

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

UNIVERSITY OF WISCONSIN MADISON

ALU A functional unit

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Formal Verification of Systems-on-Chip Industrial Practices

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Computer Architecture ELEC2401 & ELEC3441

CMP 338: Third Class

State and Finite State Machines

From Sequential Circuits to Real Computers

ENEE350 Lecture Notes-Weeks 14 and 15

CSc 256 Midterm 2 Fall 2010

Microprocessor Power Analysis by Labeled Simulation

COVER SHEET: Problem#: Points

CMP 334: Seventh Class

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3

Computer Architecture

Clock T FF1 T CL1 T FF2 T T T FF T T FF T CL T FF T CL T FF T T FF T T FF T CL. T cyc T H. Clock T FF T T FF T CL T FF T T FF T CL.

Unit 6: Branch Prediction

Simple Instruction-Pipelining (cont.) Pipelining Jumps

EECS Components and Design Techniques for Digital Systems. FSMs 9/11/2007

Appendix B. Review of Digital Logic. Baback Izadi Division of Engineering Programs

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

Basic Computer Organization and Design Part 3/3

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...

Design at the Register Transfer Level

Fall 2011 Prof. Hyesoon Kim

ECE 448 Lecture 6. Finite State Machines. State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL Code. George Mason University

Design of Sequential Circuits

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1

CPS 104 Computer Organization and Programming Lecture 11: Gates, Buses, Latches. Robert Wagner

CSE Computer Architecture I

CPE100: Digital Logic Design I

State & Finite State Machines

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

CSE370: Introduction to Digital Design

Lecture: Pipelining Basics

From Sequential Circuits to Real Computers

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1>

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

ECE 341. Lecture # 3

Logic Design. CS 270: Mathematical Foundations of Computer Science Jeremy Johnson

Name: ID# a) Complete the state transition table for the aforementioned circuit

CPE100: Digital Logic Design I

COE 328 Final Exam 2008

Load. Load. Load 1 0 MUX B. MB select. Bus A. A B n H select S 2:0 C S. G select 4 V C N Z. unit (ALU) G. Zero Detect.

CSE140: Design of Sequential Logic

Chapter 3 Digital Logic Structures

Arithmetic and Logic Unit First Part

Transcription:

Design of Digital Circuits Lecture 4: Microprogramming Prof. Onur Mutlu ETH Zurich Spring 27 7 April 27

Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed Microarchitectures! Pipelining! Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery,! Out-of-Order Execution! Issues in OoO Execution: Load-Store Handling, 2

Readings for This Week! P&P, Chapter 4 " Microarchitecture! P&P, Revised Appendix C " Microarchitecture of the LC-3b " Appendix A (LC-3b ISA) will be useful in following this! H&H, Chapter 7.4 (keep reading)! Optional " Maurice Wilkes, The Best Way to Design an Automatic Calculating Machine, Manchester Univ. Computer Inaugural Conf., 95. 3

Multi-Cycle Microarchitectures 4

Remember: Multi-Cycle Microarchitecture AS = Architectural (programmer visible) state at the beginning of an instruction Step : Process part of instruction in one clock cycle Step 2: Process part of instruction in the next clock cycle AS = Architectural (programmer visible) state at the end of a clock cycle 5

One Example Multi-Cycle Microarchitecture 6

Carnegie Mellon Remember: Single-Cycle MIPS Processor Jump 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC A RD Instruction Memory Instr 25:2 2:6 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A WE RD Data Memory WD ReadData Result PCJump 4 + PCPlus4 2:6 5: 5: Sign Extend WriteReg 4: SignImm <<2 + PCBranch 27: 3:28 25: <<2 7

Carnegie Mellon Remember: Complete Mul9-cycle Processor IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' PC EN Adr A RD Instr / Data Memory WD WE Instr EN Data 25:2 2:6 2:6 5: RegDst MemtoReg A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm 8

Carnegie Mellon Control Unit Control Unit Opcode 5: Main Controller (FSM) MemtoReg RegDst IorD PCSrc ALUSrcB : ALUSrcA IRWrite MemWrite PCWrite Branch RegWrite Multiplexer Selects Register Enables ALUOp : Funct 5: ALU Decoder ALUControl 2: 9

Carnegie Mellon Main Controller FSM: Fetch Reset S: Fetch IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' WE PC Instr Adr RD EN A EN Instr / Data Memory WD Data 25:2 2:6 2:6 5: RegDst X MemtoReg X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm

Carnegie Mellon Main Controller FSM: Fetch S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' WE PC Instr Adr RD EN A EN Instr / Data Memory WD Data 25:2 2:6 2:6 5: RegDst X MemtoReg X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm

Carnegie Mellon Main Controller FSM: Decode S: Fetch S: Decode IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' X WE PC Instr Adr RD EN A EN Instr / Data Memory WD Data 25:2 2:6 2:6 5: RegDst X MemtoReg X A A2 A3 WD3 WE3 Register File RD RD2 A B X SrcA XXX Zero X XX ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm 2

Carnegie Mellon Main Controller FSM: Address Calcula9on S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' X WE PC Instr Adr RD EN A EN Instr / Data Memory WD Data 25:2 2:6 2:6 5: RegDst X MemtoReg X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero X ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm 3

Carnegie Mellon Main Controller FSM: Address Calcula9on S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' X WE PC Instr Adr RD EN A EN Instr / Data Memory WD Data 25:2 2:6 2:6 5: RegDst X MemtoReg X A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero X ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm 4

Carnegie Mellon Main Controller FSM: lw S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead IorD = S4: Mem Writeback RegDst = MemtoReg = RegWrite 5

Carnegie Mellon Main Controller FSM: sw S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite IorD = IorD = MemWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite 6

Carnegie Mellon Main Controller FSM: R-Type S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite S: Decode S2: MemAdr Op = LW or Op = SW Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite 7

Carnegie Mellon Main Controller FSM: beq S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite 8

Carnegie Mellon Complete Mul9-cycle Controller FSM S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite 9

Carnegie Mellon Main Controller FSM: addi S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S9: ADDI Execute Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite 2

Carnegie Mellon Main Controller FSM: addi S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S9: ADDI Execute ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite 2

Carnegie Mellon Extended Func9onality: j PCEn IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB : ALUControl 2: BranchPCWrite PCSrc : PC' PC EN Adr A RD Instr / Data Memory WD WE Instr EN Data 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 A B 3:28 4 <<2 SrcA SrcB ALU Zero ALUResult ALUOut PCJump <<2 27: SignImm 5: Sign Extend 25: (jump) 22

Carnegie Mellon Control FSM: j S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = J Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S: Jump S9: ADDI Execute ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite 23

Carnegie Mellon Control FSM: j S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = J Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S: Jump S9: ADDI Execute PCSrc = PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW S3: MemRead Op = SW S5: MemWrite S7: ALU Writeback S: ADDI Writeback IorD = IorD = MemWrite RegDst = MemtoReg = RegWrite RegDst = MemtoReg = RegWrite S4: Mem Writeback RegDst = MemtoReg = RegWrite 24

Carnegie Mellon Mul9-cycle Performance: CPI! Instruc9ons take different number of cycles: # 3 cycles: beq, j # 4 cycles: R-Type, sw, addi # 5 cycles: lw Realis9c?! CPI is weighted average, e.g. SPECINT2 benchmark: # 25% loads # % stores # % branches # 2% jumps # 52% R-type! Average CPI = (. +.2) 3 +(.52 +.) 4 +(.25) 5 = 4.2 25

Carnegie Mellon Mul9-cycle Performance: Cycle Time! Mul9-cycle cri9cal path: T c = IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' PC EN Adr A RD Instr / Data Memory WD WE Instr EN Data 25:2 2:6 2:6 5: RegDst MemtoReg A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm 26

Carnegie Mellon Mul9-cycle Performance: Cycle Time! Mul9-cycle cri9cal path: T c = t pcq + t mux + max(t ALU + t mux, t mem ) + t setup IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn PC' PC EN Adr A RD Instr / Data Memory WD WE Instr EN Data 25:2 2:6 2:6 5: RegDst MemtoReg A A2 A3 WD3 WE3 Register File RD RD2 A B SrcA Zero ALUResult ALUOut 4 SrcB <<2 ALU 5: Sign Extend SignImm 27

Carnegie Mellon Mul9cycle Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 3 Register setup t setup 2 MulOplexer t mux 25 ALU t ALU 2 Memory read t mem 25 Register file read t RFread 5 Register file setup t RFsetup 2 T c = 28

Carnegie Mellon Mul9cycle Performance Example Element Parameter Delay (ps) Register clock-to-q t pcq_pc 3 Register setup t setup 2 MulOplexer t mux 25 ALU t ALU 2 Memory read t mem 25 Register file read t RFread 5 Register file setup t RFsetup 2 T c = t pcq_pc + t mux + max(t ALU + t mux, t mem ) + t setup = [3 + 25 + 25 + 2] ps = 325 ps 29

Mul9-cycle Performance Example! For a program with billion instruc9ons execu9ng on a mul9-cycle MIPS processor # CPI = 4.2 # T c = 325 ps Carnegie Mellon! Execu/on Time = (# instruc9ons) CPI T c = ( 9 )(4.2)(325-2 ) = 33.9 seconds! This is slower than the single-cycle processor (92.5 seconds). Why?! Did we break the stages in a balanced manner?! Overhead of register setup/hold paid many 9mes! How would the results change with different assump9ons on memory latency and instruc9on mix? 3

Carnegie Mellon Recall: Single-Cycle Performance Example! Example: For a program with billion instrucoons execuong on a single-cycle MIPS processor: Execu/on Time = (# instrucoons) CPI T c = ( 9 )()(925-2 s) = 92.5 seconds 3

Carnegie Mellon Review: Single-Cycle MIPS Processor Jump 3:26 5: MemtoReg Control MemWrite Unit Branch ALUControl 2: Op ALUSrc Funct RegDst RegWrite PCSrc PC' PC A RD Instruction Memory Instr 25:2 2:6 A A2 A3 WD3 WE3 Register File RD RD2 SrcA SrcB ALU Zero ALUResult WriteData A WE RD Data Memory WD ReadData Result PCJump 4 + PCPlus4 2:6 5: 5: Sign Extend WriteReg 4: SignImm <<2 + PCBranch 27: 3:28 25: <<2 32

Carnegie Mellon Review: Mul9-Cycle MIPS Processor IorD MemWrite IRWrite 3:26 5: Control Unit Op Funct PCWrite Branch PCSrc ALUControl 2: ALUSrcB : ALUSrcA RegWrite PCEn MemtoReg RegDst PC' PC EN Adr A RD Instr / Data Memory WD WE Instr EN Data 25:2 2:6 2:6 5: A A2 A3 WD3 WE3 Register File RD RD2 A B 3:28 4 <<2 SrcA SrcB ALU Zero ALUResult ALUOut PCJump <<2 27: 5: Sign Extend ImmExt 25: (Addr) 33

Carnegie Mellon Review: Mul9-Cycle MIPS FSM S2: MemAdr S: Fetch IorD = Reset AluSrcA = ALUSrcB = ALUOp = PCSrc = IRWrite PCWrite ALUSrcA = ALUSrcB = ALUOp = Op = LW or Op = SW S: Decode ALUSrcA = ALUSrcB = ALUOp = Op = R-type S6: Execute ALUSrcA = ALUSrcB = ALUOp = Op = BEQ Op = J Op = ADDI S8: Branch ALUSrcA = ALUSrcB = ALUOp = PCSrc = Branch S: Jump S9: ADDI Execute PCSrc = PCWrite ALUSrcA = ALUSrcB = ALUOp = What is the shortcoming of this design? Op = LW S3: MemRead IorD = Op = SW S5: MemWrite IorD = MemWrite S7: ALU Writeback RegDst = MemtoReg = RegWrite S: ADDI Writeback RegDst = MemtoReg = RegWrite What does this design assume about memory? S4: Mem Writeback RegDst = MemtoReg = RegWrite 34

Carnegie Mellon What If Memory Takes > One Cycle?! Stay in the same memory access state un9l memory returns the data! Memory Ready? bit is an input to the control logic that determines the next state 35

Another Example: Microprogrammed Multi-Cycle Microarchitecture 36

How Do We Implement This?! Maurice Wilkes, The Best Way to Design an Automatic Calculating Machine, Manchester Univ. Computer Inaugural Conf., 95.! An elegant implementation: " The concept of microcoded/microprogrammed machines 37

Recall: A Basic Multi-Cycle Microarchitecture! Instruction processing cycle divided into states! A stage in the instruction processing cycle can take multiple states! A multi-cycle microarchitecture sequences from state to state to process an instruction! The behavior of the machine in a state is completely determined by control signals in that state! The behavior of the entire processor is specified fully by a finite state machine! In a state (clock cycle), control signals control two things:! How the datapath should process the data! How to generate the control signals for the (next) clock cycle 38

Microprogrammed Control Terminology! Control signals associated with the current state " Microinstruction! Act of transitioning from one state to another " Determining the next state and the microinstruction for the next state " Microsequencing! Control store stores control signals for every possible state " Store for microinstructions for the entire FSM! Microsequencer determines which set of control signals will be used in the next clock cycle (i.e., next state) 39

R IR[5:] BEN Example Control Microsequencer Structure 6 Simple Design of the Control Structure Control Store 2 6 x 35 35 Microinstruction 9 26 (J, COND, IRD)

What Happens In A Clock Cycle?! The control signals (microinstruction) for the current state control two things: " Processing in the data path " Generation of control signals (microinstruction) for the next cycle " See Supplemental Figure (next-next slide)! Datapath and microsequencer operate concurrently! Question: why not generate control signals for the current cycle in the current cycle? " This could lengthen the clock cycle " Why could it lengthen the clock cycle? " See Supplemental Figure 2 4

Example uprogrammed Control & Datapath Read Appendix C On website 42

A Clock Cycle 43

A Bad Clock Cycle! 44

A Simple LC-3b Control and Datapath Read Appendix C On website 45

What Determines Next-State Control Signals?! What is happening in the current clock cycle " See the 9 control signals coming from Control block! What are these for?! The instruction that is being executed " IR[5:] coming from the Data Path! Whether the condition of a branch is met, if the instruction being processed is a branch " BEN bit coming from the datapath! Whether the memory operation is completing in the current cycle, if one is in progress " R bit coming from memory 46

A Simple LC-3b Control and Datapath 47

The State Machine for Multi-Cycle Processing! The behavior of the LC-3b uarch is completely determined by " the 35 control signals and " additional 7 bits that go into the control logic from the datapath! 35 control signals completely describe the state of the control structure! We can completely describe the behavior of the LC-3b as a state machine, i.e. a directed graph of " Nodes (one corresponding to each state) " Arcs (showing flow from each state to the next state(s)) 48

An LC-3b State Machine! Patt and Patel, Appendix C, Figure C.2! Each state must be uniquely specified " Done by means of state variables! 3 distinct states in this LC-3b state machine " Encoded with 6 state variables! Examples " State 8,9 correspond to the beginning of the instruction processing cycle " Fetch phase: state 8, 9 $ state 33 $ state 35 " Decode phase: state 32 49

MAR <! PC PC <! PC + 2 8, 9 MDR <! M 33 R R IR <! MDR 35 To 8 RTI ADD 32 BEN<!IR[] & N + IR[] & Z + IR[9] & P [IR[5:2]] BR To To To 8 DR<!SR+OP2* set CC DR<!SR&OP2* set CC 5 AND XOR TRAP SHF LEA LDB LDW STW STB JSR JMP [BEN] 22 PC<!PC+LSHF(off9,) To 8 9 DR<!SR XOR OP2* set CC 2 PC<!BaseR To 8 To 8 MAR<!LSHF(ZEXT[IR[7:]],) 5 4 [IR[]] To 8 R MDR<!M[MAR] R7<!PC R PC<!MDR 28 3 2 R7<!PC PC<!BaseR 2 R7<!PC To 8 PC<!PC+LSHF(off,) To 8 3 DR<!SHF(SR,A,D,amt4) set CC To 8 To 8 4 DR<!PC+LSHF(off9, ) set CC 2 MAR<!B+off6 6 MAR<!B+LSHF(off6,) 7 MAR<!B+LSHF(off6,) 3 MAR<!B+off6 To 8 29 25 23 24 NOTES B+off6 : Base + SEXT[offset6] PC+off9 : PC + SEXT[offset9] *OP2 may be SR2 or SEXT[imm5] ** [5:8] or [7:] depending on MAR[] MDR<!M[MAR[5:] ] R R 3 DR<!SEXT[BYTE.DATA] set CC MDR<!M[MAR] 27 R DR<!MDR set CC R MDR<!SR 6 M[MAR]<!MDR R R MDR<!SR[7:] 7 M[MAR]<!MDR** R R To 8 To 8 To 8 To 9

This FSM Implements the LC-3b ISA! P&P Appendix A (revised): " https://www.ethz.ch/ content/dam/ethz/ special-interest/infk/instinfsec/system-securitygroup-dam/education/ Digitaltechnik_7/lecture/ pp-appendixa.pdf 5

LC-3b State Machine: Some Questions! How many cycles does the fastest instruction take?! How many cycles does the slowest instruction take?! Why does the BR take as long as it takes in the FSM?! What determines the clock cycle time? 52

LC-3b Datapath! Patt and Patel, Appendix C, Figure C.3! Single-bus datapath design " At any point only one value can be gated on the bus (i.e., can be driving the bus) " Advantage: Low hardware cost: one bus " Disadvantage: Reduced concurrency if instruction needs the bus twice for two different things, these need to happen in different states! Control signals (26 of them) determine what happens in the datapath in one clock cycle " Patt and Patel, Appendix C, Table C. 53

MEMORY OUTPUT INPUT KBDR ADDR. CTL. LOGIC MDR INMUX MAR L L MAR[] MAR[] DATA.SIZE R DATA.SIZE D D.. M MDR AR 2 KBSR MEM.EN R.W MIO.EN GatePC GateMARMUX 6 6 6 6 6 6 6 LD.CC SR2MUX SEXT SEXT [8:] [:] SEXT SEXT [5:] 6 +2 PC LD.PC 6 + 6 6 [7:] LSHF [4:] GateALU 6 SHF GateSHF 6 IR[5:] 6 6 6 6 6 6 6 LOGIC 6 6 GateMDR N Z P SR2 OUT SR OUT REG FILE MARMUX 6 3 6 R ADDR2MUX 2 ZEXT & LSHF 3 3 ALU ALUK 2 A B ADDRMUX PCMUX 2 SR DR SR2 LD.REG IR LD.IR CONTROL DDR DSR MIO.EN LOGIC LOGIC SIZE DATA. WE WE [] WE LOGIC

IR[:9] DR IR[:9] IR[8:6] SR DRMUX SRMUX (a) Remember the MIPS datapath (b) IR[:9] N Z P Logic BEN (c)

LC-3b Datapath: Some Questions! How does instruction fetch happen in this datapath according to the state machine?! What is the difference between gating and loading? " Gating: Enable/disable an input to be connected to the bus! Combinational: during a clock cycle " Loading: Enable/disable an input to be written to a register! Sequential: e.g., at a clock edge (assume at the end of cycle)! Is this the smallest hardware you can design? 57

LC-3b Microprogrammed Control Structure! Patt and Patel, Appendix C, Figure C.4! Three components: " Microinstruction, control store, microsequencer! Microinstruction: control signals that control the datapath (26 of them) and help determine the next state (9 of them)! Each microinstruction is stored in a unique location in the control store (a special memory structure)! Unique location: address of the state corresponding to the microinstruction " Remember each state corresponds to one microinstruction! Microsequencer determines the address of the next microinstruction (i.e., next state) 58

R IR[5:] BEN Microsequencer 6 Simple Design of the Control Structure Control Store 2 6 x 35 35 Microinstruction 9 26 (J, COND, IRD)

COND COND BEN R IR[] Branch Ready Addr. Mode J[5] J[4] J[3] J[2] J[] J[],,IR[5:2] 6 IRD 6 Address of Next State

J IRD Cond LD.MDR LD.IR LD.BEN LD.REG LD.CC LD.MAR GatePC GateMDR GateALU LD.PC GateMARMUX GateSHF PCMUX DRMUX SRMUX ADDRMUX ADDR2MUX MARMUX ALUK MIO.EN R.W DATA.SIZE LSHF (State ) (State ) (State 2) (State 3) (State 4) (State 5) (State 6) (State 7) (State 8) (State 9) (State ) (State ) (State 2) (State 3) (State 4) (State 5) (State 6) (State 7) (State 8) (State 9) (State 2) (State 2) (State 22) (State 23) (State 24) (State 25) (State 26) (State 27) (State 28) (State 29) (State 3) (State 3) (State 32) (State 33) (State 34) (State 35) (State 36) (State 37) (State 38) (State 39) (State 4) (State 4) (State 42) (State 43) (State 44) (State 45) (State 46) (State 47) (State 48) (State 49) (State 5) (State 5) (State 52) (State 53) (State 54) (State 55) (State 56) (State 57) (State 58) (State 59) (State 6) (State 6) (State 62) (State 63)

LC-3b Microsequencer! Patt and Patel, Appendix C, Figure C.5! The purpose of the microsequencer is to determine the address of the next microinstruction (i.e., next state) " Next state could be conditional or unconditional! Next state address depends on 9 control signals (plus 7 data signals) 62

COND COND BEN R IR[] Branch Ready Addr. Mode J[5] J[4] J[3] J[2] J[] J[],,IR[5:2] 6 IRD 6 Address of Next State

The Microsequencer: Some Questions! When is the IRD signal asserted?! What happens if an illegal instruction is decoded?! What are condition (COND) bits for?! How is variable latency memory handled?! How do you do the state encoding? " Minimize number of state variables (~ control store size) " Start with the 6-way branch " Then determine constraint tables and states dependent on COND 64

An Exercise in Microprogramming 65

Handouts! 7 pages of Microprogrammed LC-3b design! https://www.ethz.ch/content/dam/ethz/special-interest/ infk/inst-infsec/system-security-group-dam/education/ Digitaltechnik_7/lecture/lc3b-figures.pdf 66

A Simple LC-3b Control and Datapath 67

MAR <! PC PC <! PC + 2 8, 9 MDR <! M 33 R R IR <! MDR 35 To 8 RTI ADD 32 BEN<!IR[] & N + IR[] & Z + IR[9] & P [IR[5:2]] BR To To To 8 DR<!SR+OP2* set CC DR<!SR&OP2* set CC 5 AND XOR TRAP SHF LEA LDB LDW STW STB JSR JMP [BEN] 22 PC<!PC+LSHF(off9,) To 8 9 DR<!SR XOR OP2* set CC 2 PC<!BaseR To 8 To 8 MAR<!LSHF(ZEXT[IR[7:]],) 5 4 [IR[]] To 8 R MDR<!M[MAR] R7<!PC R PC<!MDR 28 3 2 R7<!PC PC<!BaseR 2 R7<!PC To 8 PC<!PC+LSHF(off,) To 8 3 DR<!SHF(SR,A,D,amt4) set CC To 8 To 8 4 DR<!PC+LSHF(off9, ) set CC 2 MAR<!B+off6 6 MAR<!B+LSHF(off6,) 7 MAR<!B+LSHF(off6,) 3 MAR<!B+off6 To 8 29 25 23 24 NOTES B+off6 : Base + SEXT[offset6] PC+off9 : PC + SEXT[offset9] *OP2 may be SR2 or SEXT[imm5] ** [5:8] or [7:] depending on MAR[] MDR<!M[MAR[5:] ] R R 3 DR<!SEXT[BYTE.DATA] set CC MDR<!M[MAR] 27 R DR<!MDR set CC R MDR<!SR 6 M[MAR]<!MDR R R MDR<!SR[7:] 7 M[MAR]<!MDR** R R To 8 To 8 To 8 To 9

GateMARMUX GatePC 6 6 6 6 LD.PC PC ZEXT & LSHF MARMUX 6 6 LSHF + 2 +2 PCMUX ADDRMUX LD.REG 3 SR2 6 SR2 OUT REG FILE SR OUT 3 3 DR SR [7:] 2 ADDR2MUX [:] SEXT 6 6 6 6 6 6 6 6 6 [8:] SEXT SR2MUX [5:] [4:] SEXT SEXT CONTROL R LD.IR IR 6 LD.CC N Z P 2 B A ALUK ALU SHF 6 IR[5:] LOGIC 6 6 GateALU 6 GateSHF GateMDR MAR LD. MAR A Simple Datapath Can Become Very Powerful LOGIC MDR DATA.SIZE MAR[] 6 LD. MDR MIO.EN WE WE WE LOGIC MEMORY MEM.EN R [] R.W DATA. SIZE ADDR. CTL. LOGIC 2 MIO.EN INPUT KBDR KBSR DDR OUTPUT DSR 6 6 LOGIC DATA.SIZE MAR[] INMUX

State Machine for LDW Microsequencer COND COND BEN R IR[] Branch Ready Addr. Mode J[5] J[4] J[3] J[2] J[] J[],,IR[5:2] 6 IRD 6 Address of Next State Fill in the microinstructions for the 7 states for LDW State 8 () State 33 () State 35 () State 32 () State 6 () State 25 () State 27 ()

IR[:9] DR IR[:9] IR[8:6] SR DRMUX SRMUX (a) (b) IR[:9] N Z P Logic BEN (c)

R IR[5:] BEN Microsequencer 6 Simple Design of the Control Structure Control Store 2 6 x 35 35 Microinstruction 9 26 (J, COND, IRD)

COND COND BEN R IR[] Branch Ready Addr. Mode J[5] J[4] J[3] J[2] J[] J[],,IR[5:2] 6 IRD 6 Address of Next State

J IRD Cond LD.MDR LD.IR LD.BEN LD.REG LD.CC LD.MAR GatePC GateMDR GateALU LD.PC GateMARMUX GateSHF PCMUX DRMUX SRMUX ADDRMUX ADDR2MUX MARMUX ALUK MIO.EN R.W DATA.SIZE LSHF (State ) (State ) (State 2) (State 3) (State 4) (State 5) (State 6) (State 7) (State 8) (State 9) (State ) (State ) (State 2) (State 3) (State 4) (State 5) (State 6) (State 7) (State 8) (State 9) (State 2) (State 2) (State 22) (State 23) (State 24) (State 25) (State 26) (State 27) (State 28) (State 29) (State 3) (State 3) (State 32) (State 33) (State 34) (State 35) (State 36) (State 37) (State 38) (State 39) (State 4) (State 4) (State 42) (State 43) (State 44) (State 45) (State 46) (State 47) (State 48) (State 49) (State 5) (State 5) (State 52) (State 53) (State 54) (State 55) (State 56) (State 57) (State 58) (State 59) (State 6) (State 6) (State 62) (State 63)

End of the Exercise in Microprogramming 76

Design of Digital Circuits Lecture 4: Microprogramming Prof. Onur Mutlu ETH Zurich Spring 27 7 April 27