Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

Similar documents
CPU DESIGN The Single-Cycle Implementation

Project Two RISC Processor Implementation ECE 485

Review. Combined Datapath

EC 413 Computer Organization

CSE Computer Architecture I

Processor Design & ALU Design

Implementing the Controller. Harvard-Style Datapath for DLX

L07-L09 recap: Fundamental lesson(s)!

Spiral 1 / Unit 3

61C In the News. Processor Design: 5 steps

Review: Single-Cycle Processor. Limits on cycle time

CPSC 3300 Spring 2017 Exam 2

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...

3. (2) What is the difference between fixed and hybrid instructions?

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath

[2] Predicting the direction of a branch is not enough. What else is necessary?

TEST 1 REVIEW. Lectures 1-5

[2] Predicting the direction of a branch is not enough. What else is necessary?

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

Topics: A multiple cycle implementation. Distributed Notes

4. (3) What do we mean when we say something is an N-operand machine?

Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1

Lecture 13: Sequential Circuits, FSM

Enrico Nardelli Logic Circuits and Computer Architecture


Designing Single-Cycle MIPS Processor

Simple Instruction-Pipelining (cont.) Pipelining Jumps

Arithmetic and Logic Unit First Part

Outcomes. Spiral 1 / Unit 2. Boolean Algebra BOOLEAN ALGEBRA INTRO. Basic Boolean Algebra Logic Functions Decoders Multiplexers

UNIVERSITY OF WISCONSIN MADISON

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

Lecture 13: Sequential Circuits, FSM

A Second Datapath Example YH16

CMP 334: Seventh Class

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

Designing MIPS Processor

CSc 256 Midterm 2 Fall 2010

Lecture 3, Performance

CSCI-564 Advanced Computer Architecture

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Computer Architecture ELEC2401 & ELEC3441

Microprocessor Power Analysis by Labeled Simulation

COVER SHEET: Problem#: Points

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

Numbers and Arithmetic

Design at the Register Transfer Level

Numbers and Arithmetic

CprE 281: Digital Logic

Lecture 3, Performance

Figure 4.9 MARIE s Datapath

Computer Architecture

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati

CMP N 301 Computer Architecture. Appendix C

Appendix B. Review of Digital Logic. Baback Izadi Division of Engineering Programs

Design of Sequential Circuits

CA Compiler Construction

ICS 233 Computer Architecture & Assembly Language

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

State and Finite State Machines

Formal Verification of Systems-on-Chip

Instruction register. Data. Registers. Register # Memory data register

Basic Computer Organization and Design Part 3/3

Table of Content. Chapter 11 Dedicated Microprocessors Page 1 of 25

Department of Electrical and Computer Engineering The University of Texas at Austin

ECE 2300 Digital Logic & Computer Organization

Hakim Weatherspoon CS 3410 Computer Science Cornell University

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

Spiral 2-1. Datapath Components: Counters Adders Design Example: Crosswalk Controller

CPU DESIGN The Single-Cycle Implementation

LABORATORY MANUAL MICROPROCESSOR AND MICROCONTROLLER

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

CS61C : Machine Structures

EXPERIMENT Bit Binary Sequential Multiplier

System Data Bus (8-bit) Data Buffer. Internal Data Bus (8-bit) 8-bit register (R) 3-bit address 16-bit register pair (P) 2-bit address

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec09 Counters Outline.

State & Finite State Machines

CS61C : Machine Structures

Computer Architecture

Clock T FF1 T CL1 T FF2 T T T FF T T FF T CL T FF T CL T FF T T FF T T FF T CL. T cyc T H. Clock T FF T T FF T CL T FF T T FF T CL.

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

Preparation of Examination Questions and Exercises: Solutions

CPS 104 Computer Organization and Programming Lecture 11: Gates, Buses, Latches. Robert Wagner

ALU A functional unit

Implementing Absolute Addressing in a Motorola Processor (DRAFT)

Solutions for Appendix C Exercises

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec 09 Counters Outline.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Chapter 9 Computer Design Basics

Practice Homework Solution for Module 4

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions

Computer Architecture. ESE 345 Computer Architecture. Design Process. CA: Design process

CSE Computer Architecture I

Transcription:

Computer Engineering Department CC 311- Computer Architecture Chapter 4 The Processor: Datapath and Control Single Cycle

Introduction The 5 classic components of a computer Processor Input Control Memory Datapath Output Chapter 4 discusses the processor Part-a: Datapath Part-b: Control 2

Review: Edge-triggered Methodology Input: Values written in a previous clock cycle Output: Values to be used in the following clock cycle Prevents reading the signal in the same time it is written More than one action can take place in the same clock cycle Falling edge Clock period Rising edge 3

Review: Timing Methodology Clock cycle(tick/period): Time for one clock period, usually of the processor, which runs at a constant rate Memory Access time: Time between the initiation of a read request and when the desired word arrives Memory Cycle time: Minimum time between requests to memory Should be greater than access time to keep address line stable between accesses 4

Review: Timing Methodology Typical execution cycle: Read contents of some state elements, Send values through some combinational logic Write results to one or more state elements Clock period should cover all these activities All signals must propagate from state element1 to state element2 in the time of one clock cycle If the state element is not updated on every clock, an explicit write control signal is required, in which case the state element is changed only when the control signal is asserted and the clock edge occurs 5

Review: Timing Methodology Edge triggered methodology allows a state element to be read and written in the same clock cycle The clock must be long enough to allow the stability of input value before the active edge occurs 6

Building the Datapath 7

Review: 2 x 1 MUX MUX Operation: Selects one of the (A or B) inputs to be the output, based on a control (select) input (S) We need a 2x1 MUX The MUX has 3 inputs Equivalent C-Operation If (s == 0) C:= A; else C = B; A B 0 1 (s) Selector C 8

Universal ALU Symbolic representation: (Sometimes used to represent adders as well) Operation a 32 C in to LSB Zero Flag ALU 32 Result b 32 C out from MSB Overflow Flag 9

Building the Datapath The major components required to execute each class of MIPS instructions? A memory unit to store the instructions PC to store the address of the current instruction Adder to increment PC to address of next instruction 10

Building the Datapath How are these components connected? 11

Building the Datapath The major components required to execute R-Format instructions add, sub, and, or, slt Three register operands Two read ports to read two registers from register file One write port to write into one register Data lines 32-bit Register lines ALUOp 12

Review: Register File A set of 32 registers 5-inputs 2 read-ports to supply source register numbers 1 write-port to supply destination register number 1 Data bus 1 Register write enable control signal 2-outputs Data from register 1 Data from register 2 1-Control signal RegWrite No need to read-enable 13 Register Selectors 5 5 32 5 Data Bus Read Reg1 Read Reg 2 Write Reg1 Read Data1 Register File Read Data2 Write Data Write Enable RegWrite 32 32 Data Buses 13

Review: Reading from Register File Need to submit: Read register number 1 Register 0 Register # For N-registers we need: Register 1... Register n 2 Register n 1 M u x Read data 1 n x 1 MUX Read register number 2 M u x Read data 2 14 14

Review: Writing to Register File Need to submit: Write Register # Data Write control signal Register number n-to-2n decoder 0 1. C D C Register 0 To choose a register, use Decoder To determine when to write, use the clock n 1 n D C Register 1. Note: C means control signal D C Register n 2 D means data lines Register data D Register n 1 15

Building the Datapath (lw, sw) Instruction Read register 1 Read Read data 1 register 2 Registers Read Write data 2 register Write data RegWrite ALUSrc M ux 3 ALU operation Zero ALU ALU result Address Write data MemWrite MemtoReg Data memory Read data M ux 16 Sign 32 extend MemRead 16

Building the Datapath (beq) For branch instructions, two operations are needed Compare register contents ALU Zero signal returns the result of comparison Compute branch target address A sign-extend unit is required If branch is taken Branch target address becomes the new PC contents If branch is not taken PC+4 is the new value for PC 17

Review: MIPS Instruction Format R-type 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 0 I-type 31 26 21 16 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 0 J-type 31 26 op target address 6 bits 26 bits 0 18

Addressing Modes Register addressing Base or displacement addressing Immediate addressing PC-relative addressing Pseudo-direct addressing 19

Addressing Modes (1) Register addressing Operand is a register Value is the contents of the register 20

Building the Datapath R-type 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 0 21

Addressing Modes Register addressing Base or displacement addressing Immediate addressing PC-relative addressing Pseudo-direct addressing 22

Addressing Modes (2) Base or displacement addressing Operand location = register + constant (offset) in the instruction 23

Building the Datapath (sw) 24

Building the Datapath (lw) 25

Building Datapath (R-type, lw, sw) Instruction Read register 1 Read data 1 Read register 2 Registers Read Write data 2 register Write data RegWrite ALUSrc M u x 3 ALU ALU operation Zero ALU result Address Write data MemWrite MemtoReg Data memory Read data M u x 16 Sign 32 extend MemRead 26

Building Datapath (R-type, lw, sw) 27

Datapath Operation for R-Format (add) 31 26 21 16 11 6 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 0 Step1: Add $t1, $t2, $t3 Fetch instruction from instruction memory Increment program counter Step2: Decode result is Add operation Read two registers $t2 & $t3 from register file Step 3: ALU operates on data, using funct field code to generate the ALU function Step4: Write result of step 3 from ALU into register $t1 28

Datapath Operation for R-Format (add) Add $t1, $t2, $t3 (Fig. 5.19, p. 322) 29

Datapath Operation for R-Format (add) Step 1: Fetch instruction & increment PC 0 4 A d d I n s t r u c t i o n [ 3 1 2 6 ] C o n t r o l R e g D s t B r a n c h M e m R e a d M e m t o R e g A L U O p M e m W r i t e A L U S r c R e g W r i t e S h i f t l e f t 2 A d d r e A s u L l U t M u x 1 P C R e a d a d d r e s s I n s t r u c t i o n m e m o r y I n s t r u c t i o n [ 3 1 0 ] I n s t r u c t i o n [ 25 21 ] I n s t r u c t i o n [ 20 16 ] I n s t r u c t i o n [ 15 1 ] I n s t r u c t i o n [ 15 0 ] 0 M u x 1 R e a d r e g i s t e r 1 R e a d d a t a 1 i s t e r 2 R e g i s t e r s R e a d d a t a 2 R e a d r e g W r i t e r e g i s t e r W r i t e d a t a 16 S i g n 32 e x t e n d 0 M u x 1 A L U c o n t r o l Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a D a t a m e m o r y R e a d d a t a 1 M u x 0 I n s t r u c t i o n [ 5 0 ] 30

Datapath Operation for R-Format (add) Step2: Read registers from register file 4 A d d I n s t r u c t i o n [ 3 1 2 6 ] C o n t r o l R e g D s t B r a n c h M e m R e a d M e m t o R e g A L U O p M e m W r i t e A L U S r c R e g W r i t e S h i f t l e f t 2 A d d r e A s u L l U t 0 M u x 1 P C R e a d a d d r e s s I n s t r u c t i o n m e m o r y I n s t r u c t i o n [ 3 1 0 ] I n s t r u c t i o n [ 25 21 ] I n s t r u c t i o n [ 20 16 ] I n s t r u c t i o n [ 15 11 ] I n s t r u c t i o n [ 15 0 ] 0 M u x 1 R e a d r e g i s t e r 1 R e a d R e a d d a t a 1 r e g i s t e r 2 i R e g i s t e r s R e a d W r i t e d a t a 2 r e g i s t e r i W r i t e d a t a 16 32 S i g n e x t e n d 0 M u x 1 A L U c o n t r o l Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a D a t a m e m o r y R e a d d a t a 1 M u x 0 I n s t r u c t i o n [ 5 0 ] 31

Datapath Operation for R-Format (add) Step 3: Perform ALU operation 0 4 A d d I n s t r u c t i o n [ 3 1 2 6 ] C o n t r o l R e g D s t B r a n c h M e m R e a d M e m t o R e g A L U O p M e m W r i t e A L U S r c R e g W r i t e S h i f t l e f t 2 A d d A L U r e s u t M u x 1 P C R e a d a d d r e s s I n s t r u c t o n m e m o r y I n s r u c t o n [ 3 1 0 ] I n s t r u c t i o n [ 25 21 ] I n s t r u c t i o n [ 20 16 ] I n s t r u c t i o n [ 15 1 ] I n s t r u c t i o n [ 15 0 ] 0 M u x 1 R e a d r e g i s t e r 1 R e a d R e a d d a t a 1 r e g i s t e r 2 R e g i s t e r s R e a d W r i e d a t a 2 r e g i s t e r W r i e d a t a 16 S i g n 32 e x t e n d 0 M u x 1 A L U c o n t r o l Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a R e a d d a t a D a t a m e m o r y 1 M u x 0 I n s t r u c t i o n [ 5 0 ] 32

Datapath Operation for R-Format (add) Step 4: Write result from ALU into register 4 A d d I n s t r u c t i o n [ 3 1 2 6 ] C o n t r o l R e g D s t B r a n c h M e m R e a d M e m t o R e g A L U O p M e m W r i t e A L U S r c R e g W r i t e S h i f t l e f t 2 A d d r e A s u L l U t 0 M u x 1 P C R e a d a d d r e s s I n s t r u c t i o n m e m o r y I n s t r u c t i o n [ 3 1 0 ] I n s t r u c t i o n [ 25 21 ] I n s t r u c t i o n [ 20 16 ] I n s t r u c t i o n [ 15 11 ] I n s t r u c t i o n [ 15 0 ] 0 M u x 1 R e a d r e g i s t e r 1 R e a d d a t a 1 i s t e r 2 R e g i s t e r s R e a d d a t a 2 R e a d r e g W r i t e r e g i s t e r W r i t e d a t a 16 32 S i g n e x t e n d 0 M u x 1 A L U c o n t r o l Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a D a t a m e m o r y R e a d d a t a 1 M u x 0 I n s t r u c t i o n [ 5 0 ] 33

Datapath Operation for I-Format (lw) 31 Step 1: lw $t1, offset($t2) Fetch instruction from instruction memory Increment program counter Step 2: 26 21 op rs rt/rd? offset 6 bits 5 bits 5 bits 16 bits 16 11 0 Decode result is lw operation Read register $t2 from register file Step 3: ALU operates on data to build an address Compute sum of $t2 & sign-extended 16-bits (offset) of instruction Step 4: Retrieve data located in the calculate address from memory Step5: Write memory contents into destination register $t1 34

Datapath Operation for I-Format (lw) lw $t1, offset($t2) - Exercise: Highlight the steps! 35

Datapath Operation for I-Format (sw) sw $t1, 33($t2) Step 1: Fetch instruction from instruction memory Increment program counter Step 2: Decode result is sw operation Read register contents of $t1 and $t2 from register file Step 3: ALU operates on data to build an address Compute sum of $t2 & sign-extended 16-bits (offset) of instruction Step 4: Write data into the computed memory address 36

Datapath Operation for I-Format (sw) sw $t1, offset($t2) 37

Addressing Modes Register addressing Base or displacement addressing Immediate addressing PC-relative addressing Pseudo-direct addressing 38

Addressing Modes (3) Immediate addressing Operand is a constant within the instruction itself 39

Addressing Modes (4) PC-relative addressing Address = PC (program counter) + constant in the instruction 40

Branch instruction bne $t4,$t5,label # Next instruction is at Label if $t4 $t5 beq $t4,$t5,label # Next instruction is at Label if $t4 = $t5 Formats: I op rs rt 16 bit address Use Instruction Address Register (PC = program counter) 41

Datapath Operation for I-Format (beq) beq $t1, $t2, offset (Fig. 5.21, p. 311) Step 1: Fetch instruction from instruction memory Increment PC Step 2: Decode instruction result in beq Read contents of $t1, $t2 registers from register file Step 3: ALU subtract the two values from registers Add PC+4 to sign-extended lower 16-bits of offset Step 4: The zero result from ALU is used to decide which adder result to store into PC 42

Branch instruction 43

Branch instruction 44

Datapath Operation for (beq) beq $t1, $t2, offset 45

Datapath Operation for (beq) beq $t1, $t2, offset 46

Datapath Operation for (beq) beq $t1, $t2, offset 47

Datapath Operation for (beq) beq $t1, $t2, offset 48

Addressing Modes Register addressing Base or displacement addressing Immediate addressing PC-relative addressing Pseudo-direct addressing 49

Addressing Modes (5) Pseudo-direct addressing Jump address = 26 bits of the instruction + upper bits of the PC 50

Jump Instruction (J-Format) Jump instruction uses high order bits of PC address boundaries of 256 MB 2 target 6 bits 26 bits 4 bits 26 bits 2 PC 51

Jump Instruction (J-Format) 52

Datapath Operation for J-Format (j) j label Step 1: Fetch instruction from instruction memory Increment PC by 4 Step 2: Decode instruction result is j Retrieve value of jump target (label) Step 3: Shift the label 26 bits left by 2 bits (to get byte count instead of word count) Concatenate the upper 4 bits of PC +4 as the high-order bits (to get the complete memory address) Step 4: Store the calculated address into PC 53

Datapath Operation for J-Format 54

Datapath Operation for J-Format 55

Processor Design Control Signals Register selection: Rs, Rt, Rd and Imed16 hardwired into datapath Operation selection: ALU data source (register or immediate address) (ALUsrc) ALU Operation selection (ALUctr) Memory Write (MemWr) MemtoReg (1 => Mem) RegDst (0 => rt ; 1 => rd ) RegWr (write dest. register) 56

Processor Design Step 4 Control Unit Analyze implementation of each instruction to determine setting of control points Inst Memory Address Instruction<31:0> <21:25> Op Fun Rt <21:25> Rs <16:20> Rd <11:15> <0:15> Imm16 Control npc_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg Equal DATA PATH 57

Processor Design Control Signals needed by each instruction: inst Register Transfer ADD R[rd] < R[rs] + R[rt]; PC < PC + 4 ALUsrc=R[rs], ALUctr= add, RegDst=rd, RegWr, npc_sel= +4 SUB R[rd] < R[rs] R[rt]; PC < PC + 4 ALUsrc=R[rs], ALUctr= sub, RegDst=rd, RegWr, npc_sel= +4 LOAD R[rt] < MEM[ R[rs] + sign_ext(imm16)]; PC < PC + 4 ALUsrc = Im, ALUctr = add, MemtoReg, RegDst = rt, RegWr, npc_sel = +4 STORE MEM[ R[rs] + sign_ext(imm16)] < R[rs]; PC < PC + 4 ALUsrc=Im, ALUctr= add, MemWr, npc_sel= +4 BEQ if(r[rs]== R[rt]) then PC< PC+sign_ext(Imm16)] 00 else PC < PC + 4 npc_sel = EQUAL, ALUctr = sub 58

Main Control Unit Effect of the control signals when they are asserted or de-asserted respectively See fig. 4.16, p. 321 Signal name If de-asserted (0) If asserted (1) RegDst Destination register <= rt field (20-16) Destination register <= rd field (15-11) RegWrite None Write data input => Write register ALUSrc Reg Data2 => 2nd ALU operand Sign-extnd 16-bits of instruction => 2nd ALU operand PCSrc PC <= PC + 4 (from adder) PC <= Branch target (from adder) MemRead None Mem[Address] => to Read data output MemWrite None Mem[Address] <= value on Write data input MemtoReg Value to Write data input comes from ALUValue to Write data input comes from data memory 59

Main Control Unit The control signals based on different Opcode 60

Main Control Unit Control lines needed by each instruction (See Fig 5.18, p. 308) R-Format (add, sub, AND, OR, & slt) Source register : rs & rt Destination register: rd Writes a register (RegWrite = 1) but neither reads nor writes data memory ALUOp for R-Type format = 10 indicating that ALU control should be generated from function field When Branch: Control signal = 0, PC<= PC +4; Otherwise it is replaced by Branch target For lw MemRead = 1, For sw MemWrite = 1 Instruction RegDest ALUSrc MemtoReg RegWrite MemRead MemWrite Banch ALUOp1 ALUOp0 R-Format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1

CLK Setup Setup 62

63

64

Single Cycle Implementation Exercise: Assuming we have the following instruction mix: Load 25% Store 10% ALU instructions 45% Branch 15% Jump 5% Calculate cycle time and total program (with L instructions) execution time Assuming negligible delays except for the following: memory (200 ps) (ps = Pico Second) ALU and adders (100 ps) Register file access (50 ps) Note: For single-cycle instruction, CPI = 1 Answer: (Complete answer) 65

Single Cycle Problems Single cycle instruction is not used in modern computers Disadvantages: Inefficient in both performance and HW cost Clock cycle must have the same length for every instruction Clock cycle determined by the longest possible path For complicated instruction like floating point, it will be harder Functional units must be duplicated, since each can be used only once per cycle 66