CPU DESIGN The Single-Cycle Implementation

Similar documents
Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

CPU DESIGN The Single-Cycle Implementation

Review. Combined Datapath

Spiral 1 / Unit 3

CSE Computer Architecture I

EC 413 Computer Organization

TEST 1 REVIEW. Lectures 1-5

Project Two RISC Processor Implementation ECE 485

Implementing the Controller. Harvard-Style Datapath for DLX

Processor Design & ALU Design

L07-L09 recap: Fundamental lesson(s)!

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...

Review: Single-Cycle Processor. Limits on cycle time

Outcomes. Spiral 1 / Unit 2. Boolean Algebra BOOLEAN ALGEBRA INTRO. Basic Boolean Algebra Logic Functions Decoders Multiplexers

A Second Datapath Example YH16

CPSC 3300 Spring 2017 Exam 2

Lecture 13: Sequential Circuits, FSM

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati

61C In the News. Processor Design: 5 steps

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Lecture 13: Sequential Circuits, FSM

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

CSc 256 Midterm 2 Fall 2010

Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017

3. (2) What is the difference between fixed and hybrid instructions?


[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary?

Designing Single-Cycle MIPS Processor

Shift Register Counters

CMP 334: Seventh Class

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

COVER SHEET: Problem#: Points

Topics: A multiple cycle implementation. Distributed Notes

4. (3) What do we mean when we say something is an N-operand machine?

Lecture 3, Performance

Figure 4.9 MARIE s Datapath

UNIVERSITY OF WISCONSIN MADISON

CMP 338: Third Class

Designing MIPS Processor

Lecture 3, Performance

Enrico Nardelli Logic Circuits and Computer Architecture

Design at the Register Transfer Level

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

UNIT 8A Computer Circuitry: Layers of Abstraction. Boolean Logic & Truth Tables

CS61C : Machine Structures

CS61C : Machine Structures

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

CprE 281: Digital Logic

Spiral 2-1. Datapath Components: Counters Adders Design Example: Crosswalk Controller

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Arithmetic and Logic Unit First Part

Computer Architecture ELEC2401 & ELEC3441

10/12/2016. An FSM with No Inputs Moves from State to State. ECE 120: Introduction to Computing. Eventually, the States Form a Loop

Sample Test Paper - I

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

Boolean Algebra and Digital Logic 2009, University of Colombo School of Computing

Lecture 8: Sequential Networks and Finite State Machines

Adders, subtractors comparators, multipliers and other ALU elements

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

Simple Instruction-Pipelining (cont.) Pipelining Jumps

Combinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #13

Lecture 10: Synchronous Sequential Circuits Design

Philadelphia University Student Name: Student Number:

Digital Circuits and Systems

Formal Verification of Systems-on-Chip

Arithme(c logic units and memory

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

Design of Sequential Circuits

Fundamentals of Digital Design

Sequential Logic. Rab Nawaz Khan Jadoon DCS. Lecturer COMSATS Lahore Pakistan. Department of Computer Science

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures

Lecture 17: Designing Sequential Systems Using Flip Flops

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

Name: ID# a) Complete the state transition table for the aforementioned circuit

ALU A functional unit

ECE 341. Lecture # 3

EECS Components and Design Techniques for Digital Systems. FSMs 9/11/2007

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

Basic Computer Organization and Design Part 3/3

Digital Circuits ECS 371

Lecture: Pipelining Basics

Table of Content. Chapter 11 Dedicated Microprocessors Page 1 of 25

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

Finite State Machine (FSM)

COE 328 Final Exam 2008

Adders, subtractors comparators, multipliers and other ALU elements

EE40 Lec 15. Logic Synthesis and Sequential Logic Circuits

Logic Design II (17.342) Spring Lecture Outline

Digital Logic: Boolean Algebra and Gates. Textbook Chapter 3

Clock T FF1 T CL1 T FF2 T T T FF T T FF T CL T FF T CL T FF T T FF T T FF T CL. T cyc T H. Clock T FF T T FF T CL T FF T T FF T CL.

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec09 Counters Outline.

EECS150 - Digital Design Lecture 18 - Counters

EECS150 - Digital Design Lecture 18 - Counters

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Menu. 7-Segment LED. Misc. 7-Segment LED MSI Components >MUX >Adders Memory Devices >D-FF, RAM, ROM Computer/Microprocessor >GCPU

Load. Load. Load 1 0 MUX B. MB select. Bus A. A B n H select S 2:0 C S. G select 4 V C N Z. unit (ALU) G. Zero Detect.

ELCT201: DIGITAL LOGIC DESIGN

Transcription:

CSE 202 Computer Organization CPU DESIGN The Single-Cycle Implementation Shakil M. Khan (adapted from Prof. H. Roumani) Dept of CS & Eng, York University

Sequential vs. Combinational Circuits Digital circuits can be classified into two categories:. Combinational Circuits: mux, ALU 2. Sequential Circuits: flip-flops, registers, memory CSE-202 June-28 202 2

Clocks Periodic signal oscillating between low and high states with fixed cycle time Clock frequency = inverse of clock cycle time F a l l i n g e d g e C l o c k p e r i o d R i s i n g e d g e Clock controls when the state of a memory element changes S t a t e e l e m e n t C o m b i n a t i o n a l l o g i c S t a t e e l e m e n t 2 C l o c k c y c l e CSE-202 June-28 202 3

CPU DESIGN The Datapath Single-Cycle Control Performance Focus on the Subset: addi, add/sub/and/or/slt, lw/sw, beq, j CSE-202 June-28 202 4

Building the Datapath

The Basic Datapath Components () P C 6 S i g n 32 e x t e n d. Program counter contains address of next instruction 2. Sign-extension unit extends a 6-bit integer to a 32-bit integer 3 A L U c o n t r o l A d d S u m Z e r o A L U A L U r e s u l t 3. Adder adds two 32-bit integers 4. A L U add/subtract/and/or/compare two 32-bit integers CSE-202 June-28 202 6

The Basic Datapath Components (2) M e m W r i t e I n s t r u c t i o n a d d r e s s I n s t r u c t i o n M e m o r y I n s t r u c t i o n A d d r e s s W r i t e d a t a D a t a m e m o r y d a t a 5.Instruction memory Register numbers Data 5 6. D a t a m e m o r y u n i t r e g i s t e r 5 5 r e g i s t e r 2 Re g i s t e r s W r i t e r e g i s t e r W r i t e d a t a d a t a d a t a 2 M e m R e g W r i t e 7. Register Files CSE-202 June-28 202 7

The Basic Datapath Components (3) P C IM RF A L U BUS CSE-202 June-28 202 8

The Basic Datapath [computational R-Type] P C IM RF A L U CSE-202 June-28 202 9

Recall the ML Formats: 6 5 5 5 5 6 R opcode rs rt rd sa funcode 6 5 5 6 I opcode rs rt immediate 6 26 J opcode immediate Register rs = source, rt = target, rd = destination. CSE-202 June-28 202 0

The Basic Datapath [computational R-Type] P C 25 2 IM 20 6 5 RF A L U CSE-202 June-28 202

The PC Circuitry P C 4 25 2 IM 20 6 5 RF A L U CSE-202 June-28 202 2

Recall the ML Formats: 6 5 5 5 5 6 R opcode rs rt rd sa funcode 6 5 5 6 I opcode rs rt immediate 6 26 J opcode immediate Register rs = source, rt = target, rd = destination. CSE-202 June-28 202 0

Add support for computational I-Types P C 4 25 2 IM 20 6 0 5 5- RF A L U CSE-202 June-28 202 3

Add support for computational I-Types P C 4 25 2 IM 20 6 0 5- RF 0 A L U SE CSE-202 June-28 202 4

Add support for lw P C 4 25 2 IM 20 6 0 5- RF 0 A L U DM SE CSE-202 June-28 202 5

Add support for lw P C 4 25 2 IM 20 6 0 5- RF 0 A L U DM 0 SE CSE-202 June-28 202 6

Add support for sw P C 4 25 2 IM 20 6 0 5- RF 0 A L U DM 0 SE CSE-202 June-28 202 7

Add Support for branch P C 4 0 sll 25 2 IM 20 6 0 5- RF 0 A L U DM 0 SE CSE-202 June-28 202 8

Combined Datapath (w/o Jump) P C S r c 4 A d d S h i f t l e f t 2 A d d A L U r e s u l t M u x P C a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y r e g i s t e r r e g i s t e r 2 W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e g i s t e r s d a t a d a t a 2 6 S i g n 32 e x t e n d A L U S r c M u x 3 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m M e m W r i t e d a t a D a t a m e m o r y M e m t o R e g M u x 20 CSE-202 June-28 202 9

add/sub/or/and/slt $s,$s2,$s3 P C S r c 4 A d d S h i f t l e f t 2 A d d A L U r e s u l t M u x P C a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y r e g i s t e r r e g i s t e r 2 W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e g i s t e r s d a t a d a t a 2 6 S i g n 32 e x t e n d A L U S r c M u x 3 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m M e m W r i t e d a t a D a t a m e m o r y M e m t o R e g M u x 2 CSE-202 June-28 202 20

lw $s, offset($s2) P C S r c 4 A d d S h i f t l e f t 2 A d d A L U r e s u l t M u x P C a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y r e g i s t e r r e g i s t e r 2 W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e g i s t e r s d a t a d a t a 2 6 S i g n 32 e x t e n d A L U S r c M u x 3 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m M e m W r i t e d a t a D a t a m e m o r y M e m t o R e g M u x 22 CSE-202 June-28 202 2

sw $s, offset($s2) P C S r c 4 A d d S h i f t l e f t 2 A d d A L U r e s u l t M u x P C a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y r e g i s t e r r e g i s t e r 2 W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e g i s t e r s d a t a d a t a 2 6 S i g n 32 e x t e n d A L U S r c M u x 3 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m M e m W r i t e d a t a D a t a m e m o r y M e m t o R e g M u x 23 CSE-202 June-28 202 22

beq $s, $s2, w_offset P C S r c 4 A d d S h i f t l e f t 2 A d d A L U r e s u l t M u x P C a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y r e g i s t e r r e g i s t e r 2 W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e g i s t e r s d a t a d a t a 2 6 S i g n 32 e x t e n d A L U S r c M u x 3 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m M e m W r i t e d a t a D a t a m e m o r y M e m t o R e g M u x 24 CSE-202 June-28 202 23

Instruction [25 0] Shift Jump address [3 0] left 2 26 28 0 4 Add PC+4 [3 28] Instruction [3 26] Control RegDst Jump Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Shift left 2 Add result ALU M u x 0 M u x PC Read address Instruction memory Instruction [3 0] Instruction [25 2] Instruction [20 6] Instruction [5 ] 0 M u x Read register Write data Read data Read register 2 Registers Read Write data 2 register 0 M u x Zero ALU ALU result Address Write data Data memory Read data M u x 0 Instruction [5 0] 6 Sign 32 extend ALU control Instruction [5 0] CSE-202 June-28 202 24

Building the Control

Control clk P C clk 4 0 sll IM 0 RF 0 A L U DM 0 SE CSE-202 June-28 202 26

Exercise add $t0, $s0, $a0 SIGNAL ALUSrc MemToReg RegDst RegWrite MemRead MemWrite Branch Jump Operation (3-bit) VALUE CSE-202 June-28 202 27

Exercise sw $t0, 500($s0) SIGNAL ALUSrc MemToReg RegDst RegWrite MemRead MemWrite Branch Jump Operation (3-bit) VALUE CSE-202 June-28 202 28

Exercise beq $t0, $s0, 40 SIGNAL ALUSrc MemToReg RegDst RegWrite MemRead MemWrite Branch Jump Operation (3-bit) VALUE CSE-202 June-28 202 29

Generating the Control Signals All signals depend on the instruction, i.e. on a total of 2 bits complex. Note that non-alu signals depend only on the 6-bit op_code simpler. Hence, split the control into a main control unit that sees only the opcode, and an auxiliary one that sees the funtion code. The two communicate via a new signal, ALUop CSE-202 June-28 202 30

Splitting the Control 3 26 Main Control Unit 8 8 control signals 2 5 0 ALU Control Unit 3 Operation CSE-202 June-28 202 3

The Operation Signal A 3-bit signal through which the auxiliary control unit tells the ALU to: 000 = and 00 = or 00 = add 0 = sub = slt CSE-202 June-28 202 32

The ALUop Signal A 2-bit signal through which the main control unit tells the auxiliary to: 00 = add (no matter what the fun_code is) 0 = subtract (no matter what the fun_code is) 0 = R-Type (follow the fun_code) CSE-202 June-28 202 33

The Main Control Unit 3-26 Combinational Logic RegDst ALUsrc MemToReg RegWrite MemRead MemWrite Branch Jump ALUop- ALUop-0 CSE-202 June-28 202 34

The Main Control Unit () Inputs of Control Unit: Instruction Opcode in Decimal Opcode in Binary Op5 Op4 Op3 Op2 Op Op0 R-format 0 ten 0 0 0 0 0 0 lw 35 ten 0 0 0 sw 43 ten 0 0 beq 4 ten 0 0 0 0 0 Outputs of Control Unit: Instruction RegDst ALUSrc MemtoReg RegWrite MemRd MemWrt Branch ALUOp ALUOp0 R-format 0 0 0 0 0 0 lw 0 0 0 0 0 sw X X 0 0 0 0 0 beq X 0 X 0 0 0 0 CSE-202 June-28 202 35

The Main Control Unit (2) Inputs Op5 Op4 Op3 Op2 Op Op0 R-format Iw sw beq Outputs RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp ALUOpO CSE-202 June-28 202 36

ALU Control ALUop- ALUop-0 5-0 AUXILURY Control Unit Operation-2 Operation- Operation-0 CSE-202 June-28 202 37

Aux Controller Implementation () Instruction (opcode) ALUOp (ALUOp ALUOp0) Inputs Function Field (F5 F0) Desired ALU action Outputs Operation (Op3 Op0) lw (I) 0 0 (0 0) X X X X X X add 0 0 0 sw (I) 0 0 (0 0) X X X X X X add 0 0 0 beq (I) 0 (0 ) X X X X X X sub 0 0 add (32) 0 ( 0) X X 0 0 0 0 add 0 0 0 sub (34) X ( 0) X X 0 0 0 sub 0 0 and (36) 0 ( 0) X X 0 0 0 and 0 0 0 0 or (37) 0 ( 0) X X 0 0 or 0 0 0 slt (42) X ( ) X X 0 0 slt 0 CSE-202 June-28 202 38

Aux Controller Implementation (2) ALUOp ALUOp0 ALUOp ALU control block Op0 ALUOp (F0 F3) Op ALUOp F2 Op2 ALUOp0 ALUOP F F3 Operation2 F (5 0) F2 F F0 Operation Operation0 Operation CSE-202 June-28 202 39

The Single-Cycle Performance CSE-202 June-28 202

Performance Analysis Load = 5 functional units: inst. fetch, register access, ALU, data memory access, register access Store = 4 functional units: instruction fetch, register access, ALU, data memory access R-type = 4 functional units: instruction fetch, register access, ALU, register access Branch = 3 functional units: instruction fetch, register access, ALU Jump = functional unit: instruction fetch CSE-202 June-28 202 4

Component Delays RF=50, ALU=00, and MEM (both IM and DM)=200 ps. Compute CPU Time to execute various instructions j, beq, add, sw, lw Compute Max GHz for the CPU Clock Answer:.66 GHz Critique of S/Cycle +very simple -caters to the slowest -h/w redundancy CSE-202 June-28 202 42