CSE 202 Computer Organization CPU DESIGN The Single-Cycle Implementation Shakil M. Khan (adapted from Prof. H. Roumani) Dept of CS & Eng, York University
Sequential vs. Combinational Circuits Digital circuits can be classified into two categories:. Combinational Circuits: mux, ALU 2. Sequential Circuits: flip-flops, registers, memory CSE-202 June-28 202 2
Clocks Periodic signal oscillating between low and high states with fixed cycle time Clock frequency = inverse of clock cycle time F a l l i n g e d g e C l o c k p e r i o d R i s i n g e d g e Clock controls when the state of a memory element changes S t a t e e l e m e n t C o m b i n a t i o n a l l o g i c S t a t e e l e m e n t 2 C l o c k c y c l e CSE-202 June-28 202 3
CPU DESIGN The Datapath Single-Cycle Control Performance Focus on the Subset: addi, add/sub/and/or/slt, lw/sw, beq, j CSE-202 June-28 202 4
Building the Datapath
The Basic Datapath Components () P C 6 S i g n 32 e x t e n d. Program counter contains address of next instruction 2. Sign-extension unit extends a 6-bit integer to a 32-bit integer 3 A L U c o n t r o l A d d S u m Z e r o A L U A L U r e s u l t 3. Adder adds two 32-bit integers 4. A L U add/subtract/and/or/compare two 32-bit integers CSE-202 June-28 202 6
The Basic Datapath Components (2) M e m W r i t e I n s t r u c t i o n a d d r e s s I n s t r u c t i o n M e m o r y I n s t r u c t i o n A d d r e s s W r i t e d a t a D a t a m e m o r y d a t a 5.Instruction memory Register numbers Data 5 6. D a t a m e m o r y u n i t r e g i s t e r 5 5 r e g i s t e r 2 Re g i s t e r s W r i t e r e g i s t e r W r i t e d a t a d a t a d a t a 2 M e m R e g W r i t e 7. Register Files CSE-202 June-28 202 7
The Basic Datapath Components (3) P C IM RF A L U BUS CSE-202 June-28 202 8
The Basic Datapath [computational R-Type] P C IM RF A L U CSE-202 June-28 202 9
Recall the ML Formats: 6 5 5 5 5 6 R opcode rs rt rd sa funcode 6 5 5 6 I opcode rs rt immediate 6 26 J opcode immediate Register rs = source, rt = target, rd = destination. CSE-202 June-28 202 0
The Basic Datapath [computational R-Type] P C 25 2 IM 20 6 5 RF A L U CSE-202 June-28 202
The PC Circuitry P C 4 25 2 IM 20 6 5 RF A L U CSE-202 June-28 202 2
Recall the ML Formats: 6 5 5 5 5 6 R opcode rs rt rd sa funcode 6 5 5 6 I opcode rs rt immediate 6 26 J opcode immediate Register rs = source, rt = target, rd = destination. CSE-202 June-28 202 0
Add support for computational I-Types P C 4 25 2 IM 20 6 0 5 5- RF A L U CSE-202 June-28 202 3
Add support for computational I-Types P C 4 25 2 IM 20 6 0 5- RF 0 A L U SE CSE-202 June-28 202 4
Add support for lw P C 4 25 2 IM 20 6 0 5- RF 0 A L U DM SE CSE-202 June-28 202 5
Add support for lw P C 4 25 2 IM 20 6 0 5- RF 0 A L U DM 0 SE CSE-202 June-28 202 6
Add support for sw P C 4 25 2 IM 20 6 0 5- RF 0 A L U DM 0 SE CSE-202 June-28 202 7
Add Support for branch P C 4 0 sll 25 2 IM 20 6 0 5- RF 0 A L U DM 0 SE CSE-202 June-28 202 8
Combined Datapath (w/o Jump) P C S r c 4 A d d S h i f t l e f t 2 A d d A L U r e s u l t M u x P C a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y r e g i s t e r r e g i s t e r 2 W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e g i s t e r s d a t a d a t a 2 6 S i g n 32 e x t e n d A L U S r c M u x 3 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m M e m W r i t e d a t a D a t a m e m o r y M e m t o R e g M u x 20 CSE-202 June-28 202 9
add/sub/or/and/slt $s,$s2,$s3 P C S r c 4 A d d S h i f t l e f t 2 A d d A L U r e s u l t M u x P C a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y r e g i s t e r r e g i s t e r 2 W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e g i s t e r s d a t a d a t a 2 6 S i g n 32 e x t e n d A L U S r c M u x 3 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m M e m W r i t e d a t a D a t a m e m o r y M e m t o R e g M u x 2 CSE-202 June-28 202 20
lw $s, offset($s2) P C S r c 4 A d d S h i f t l e f t 2 A d d A L U r e s u l t M u x P C a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y r e g i s t e r r e g i s t e r 2 W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e g i s t e r s d a t a d a t a 2 6 S i g n 32 e x t e n d A L U S r c M u x 3 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m M e m W r i t e d a t a D a t a m e m o r y M e m t o R e g M u x 22 CSE-202 June-28 202 2
sw $s, offset($s2) P C S r c 4 A d d S h i f t l e f t 2 A d d A L U r e s u l t M u x P C a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y r e g i s t e r r e g i s t e r 2 W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e g i s t e r s d a t a d a t a 2 6 S i g n 32 e x t e n d A L U S r c M u x 3 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m M e m W r i t e d a t a D a t a m e m o r y M e m t o R e g M u x 23 CSE-202 June-28 202 22
beq $s, $s2, w_offset P C S r c 4 A d d S h i f t l e f t 2 A d d A L U r e s u l t M u x P C a d d r e s s I n s t r u c t i o n I n s t r u c t i o n m e m o r y r e g i s t e r r e g i s t e r 2 W r i t e r e g i s t e r W r i t e d a t a R e g W r i t e R e g i s t e r s d a t a d a t a 2 6 S i g n 32 e x t e n d A L U S r c M u x 3 A L U o p e r a t i o n Z e r o A L U A L U r e s u l t A d d r e s s W r i t e d a t a M e m M e m W r i t e d a t a D a t a m e m o r y M e m t o R e g M u x 24 CSE-202 June-28 202 23
Instruction [25 0] Shift Jump address [3 0] left 2 26 28 0 4 Add PC+4 [3 28] Instruction [3 26] Control RegDst Jump Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Shift left 2 Add result ALU M u x 0 M u x PC Read address Instruction memory Instruction [3 0] Instruction [25 2] Instruction [20 6] Instruction [5 ] 0 M u x Read register Write data Read data Read register 2 Registers Read Write data 2 register 0 M u x Zero ALU ALU result Address Write data Data memory Read data M u x 0 Instruction [5 0] 6 Sign 32 extend ALU control Instruction [5 0] CSE-202 June-28 202 24
Building the Control
Control clk P C clk 4 0 sll IM 0 RF 0 A L U DM 0 SE CSE-202 June-28 202 26
Exercise add $t0, $s0, $a0 SIGNAL ALUSrc MemToReg RegDst RegWrite MemRead MemWrite Branch Jump Operation (3-bit) VALUE CSE-202 June-28 202 27
Exercise sw $t0, 500($s0) SIGNAL ALUSrc MemToReg RegDst RegWrite MemRead MemWrite Branch Jump Operation (3-bit) VALUE CSE-202 June-28 202 28
Exercise beq $t0, $s0, 40 SIGNAL ALUSrc MemToReg RegDst RegWrite MemRead MemWrite Branch Jump Operation (3-bit) VALUE CSE-202 June-28 202 29
Generating the Control Signals All signals depend on the instruction, i.e. on a total of 2 bits complex. Note that non-alu signals depend only on the 6-bit op_code simpler. Hence, split the control into a main control unit that sees only the opcode, and an auxiliary one that sees the funtion code. The two communicate via a new signal, ALUop CSE-202 June-28 202 30
Splitting the Control 3 26 Main Control Unit 8 8 control signals 2 5 0 ALU Control Unit 3 Operation CSE-202 June-28 202 3
The Operation Signal A 3-bit signal through which the auxiliary control unit tells the ALU to: 000 = and 00 = or 00 = add 0 = sub = slt CSE-202 June-28 202 32
The ALUop Signal A 2-bit signal through which the main control unit tells the auxiliary to: 00 = add (no matter what the fun_code is) 0 = subtract (no matter what the fun_code is) 0 = R-Type (follow the fun_code) CSE-202 June-28 202 33
The Main Control Unit 3-26 Combinational Logic RegDst ALUsrc MemToReg RegWrite MemRead MemWrite Branch Jump ALUop- ALUop-0 CSE-202 June-28 202 34
The Main Control Unit () Inputs of Control Unit: Instruction Opcode in Decimal Opcode in Binary Op5 Op4 Op3 Op2 Op Op0 R-format 0 ten 0 0 0 0 0 0 lw 35 ten 0 0 0 sw 43 ten 0 0 beq 4 ten 0 0 0 0 0 Outputs of Control Unit: Instruction RegDst ALUSrc MemtoReg RegWrite MemRd MemWrt Branch ALUOp ALUOp0 R-format 0 0 0 0 0 0 lw 0 0 0 0 0 sw X X 0 0 0 0 0 beq X 0 X 0 0 0 0 CSE-202 June-28 202 35
The Main Control Unit (2) Inputs Op5 Op4 Op3 Op2 Op Op0 R-format Iw sw beq Outputs RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp ALUOpO CSE-202 June-28 202 36
ALU Control ALUop- ALUop-0 5-0 AUXILURY Control Unit Operation-2 Operation- Operation-0 CSE-202 June-28 202 37
Aux Controller Implementation () Instruction (opcode) ALUOp (ALUOp ALUOp0) Inputs Function Field (F5 F0) Desired ALU action Outputs Operation (Op3 Op0) lw (I) 0 0 (0 0) X X X X X X add 0 0 0 sw (I) 0 0 (0 0) X X X X X X add 0 0 0 beq (I) 0 (0 ) X X X X X X sub 0 0 add (32) 0 ( 0) X X 0 0 0 0 add 0 0 0 sub (34) X ( 0) X X 0 0 0 sub 0 0 and (36) 0 ( 0) X X 0 0 0 and 0 0 0 0 or (37) 0 ( 0) X X 0 0 or 0 0 0 slt (42) X ( ) X X 0 0 slt 0 CSE-202 June-28 202 38
Aux Controller Implementation (2) ALUOp ALUOp0 ALUOp ALU control block Op0 ALUOp (F0 F3) Op ALUOp F2 Op2 ALUOp0 ALUOP F F3 Operation2 F (5 0) F2 F F0 Operation Operation0 Operation CSE-202 June-28 202 39
The Single-Cycle Performance CSE-202 June-28 202
Performance Analysis Load = 5 functional units: inst. fetch, register access, ALU, data memory access, register access Store = 4 functional units: instruction fetch, register access, ALU, data memory access R-type = 4 functional units: instruction fetch, register access, ALU, register access Branch = 3 functional units: instruction fetch, register access, ALU Jump = functional unit: instruction fetch CSE-202 June-28 202 4
Component Delays RF=50, ALU=00, and MEM (both IM and DM)=200 ps. Compute CPU Time to execute various instructions j, beq, add, sw, lw Compute Max GHz for the CPU Clock Answer:.66 GHz Critique of S/Cycle +very simple -caters to the slowest -h/w redundancy CSE-202 June-28 202 42