ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

Similar documents
CMP N 301 Computer Architecture. Appendix C

Load. Load. Load 1 0 MUX B. MB select. Bus A. A B n H select S 2:0 C S. G select 4 V C N Z. unit (ALU) G. Zero Detect.

Lecture: Pipelining Basics

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

3. (2) What is the difference between fixed and hybrid instructions?

4. (3) What do we mean when we say something is an N-operand machine?

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

[2] Predicting the direction of a branch is not enough. What else is necessary?

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

Simple Instruction-Pipelining. Pipelined Harvard Datapath

COE 328 Final Exam 2008

[2] Predicting the direction of a branch is not enough. What else is necessary?

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Project Two RISC Processor Implementation ECE 485

Computer Architecture ELEC2401 & ELEC3441

CSCI-564 Advanced Computer Architecture

ECE 448 Lecture 6. Finite State Machines. State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL Code. George Mason University


Table of Content. Chapter 11 Dedicated Microprocessors Page 1 of 25

ICS 233 Computer Architecture & Assembly Language

CPSC 3300 Spring 2017 Exam 2

Solutions to Problems Marked with a * in Logic and Computer Design Fundamentals, 4th Edition Chapter Pearson Education, Inc.

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

ENEE350 Lecture Notes-Weeks 14 and 15

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures

Microprocessor Power Analysis by Labeled Simulation

Assignment # 3 - CSI 2111(Solutions)

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

Department of Electrical and Computer Engineering The University of Texas at Austin

Processor Design & ALU Design

CPU DESIGN The Single-Cycle Implementation

Implementing the Controller. Harvard-Style Datapath for DLX

Computer Architecture

7 Multipliers and their VHDL representation

EE 660: Computer Architecture Out-of-Order Processors

L07-L09 recap: Fundamental lesson(s)!

UNIVERSITY OF WISCONSIN MADISON

Lecture 3, Performance

Luleå Tekniska Universitet Kurskod SMD098 Tentamensdatum

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

Problem Set 6 Solutions

Unit 16 Problem Solutions

Lecture 3, Performance

CMP 338: Third Class

/ : Computer Architecture and Design

CMP 334: Seventh Class

Performance, Power & Energy

ALU A functional unit

Basic Computer Organization and Design Part 3/3

Digital Control of Electric Drives

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018

SISD SIMD. Flynn s Classification 8/8/2016. CS528 Parallel Architecture Classification & Single Core Architecture C P M

Figure 4.9 MARIE s Datapath

A Second Datapath Example YH16

Preparation of Examination Questions and Exercises: Solutions

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1

2

Simple Instruction-Pipelining (cont.) Pipelining Jumps

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

Goals for Performance Lecture

Unit 6: Branch Prediction

COMP303 Computer Architecture Lecture 11. An Overview of Pipelining

EC 413 Computer Organization

Lecture 13: Sequential Circuits, FSM

Introduction to the Xilinx Spartan-3E

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

PERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Experiment 4 Decoder Encoder Design using VHDL

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

Practice Homework Solution for Module 4

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Chapter 9. Counters and Shift Registers. Counters and Shift Registers

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Lecture 13: Sequential Circuits, FSM

ECE 5775 (Fall 17) High-Level Digital Design Automation. Scheduling: Exact Methods

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Spiral 2-1. Datapath Components: Counters Adders Design Example: Crosswalk Controller

Acknowledgment. DLD Lab. This set of slides on VHDL are due to Brown and Vranesic.

CPS 104 Computer Organization and Programming Lecture 11: Gates, Buses, Latches. Robert Wagner

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

CPE100: Digital Logic Design I

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

CSE140: Design of Sequential Logic

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Using Global Clock Networks

Enrico Nardelli Logic Circuits and Computer Architecture

Parity Checker Example. EECS150 - Digital Design Lecture 9 - Finite State Machines 1. Formal Design Process. Formal Design Process

Transcription:

ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC<-PC+1 I n st ruction Speci fications for the Simple Comput er - Part 1 Instr u ctio n O pc ode Mnem on ic Form a t D escrip tion St a t u s Bits Move A 0000000 MO V A RD,RA R [DR] R[SA ] N, Z Increment 0000001 INC R D, RA R[DR] R [ SA] + 1 N, Z Add 0000010 ADD R D, RA,RB R [DR] R[SA ] + R[ SB] N, Z Subtr a ct 0000101 SUB R D, RA,RB R [DR] R[SA ] - [ SB] N, Z D e crement 0000110 DEC R D, RA R[DR] R[SA ] - R 1 N, Z AND 0001000 AND R D, RA,RB R [DR] R[SA ] R[SB ] N, Z O R 0001001 OR RD,RA,RB R[DR] R[SA ] R[SB ] N, Z Exclusive OR 0001010 XOR R D, RA,RB R [DR] R[SA ] R[SB] N, Z NO T 0001011 NO T R D, RA R[DR] R[SA ] N, Z I n st ruction Speci fications for the Simple Comput er - Part 2 Instr u ctio n O pc ode Mnem on ic Form a t D escrip tion Move B 0001100 MO VB RD,RB R [DR] R[SB] Shift Right 0001101 SHR R D, RB R[DR] sr R[SB] Shift Left 0001110 SHL R D, RB R[DR] sl R[SB] Load Imm e diate 1001100 LDI R D, O P R[DR] zf OP Add Immediate 1000010 ADI R D, RA,OP R [DR] R[SA] + zf OP Load 0010000 LD RD,RA R [DR] M[ SA ] Store 0100000 ST RA,RB M [SA] R[SB] Branch on Zero 1100000 BRZ R A,AD if (R[ S A] = 0) PC PC + s e A D Branch on Negative 1100001 BRN R A,AD if (R[ S A] < 0) PC PC + s e A D J u mp 1110000 JMP R A P C R[SA ] St a t u s Bits + R [ S B ] 1 v R [ S B ] + 1 + R [ S B ] + 1 R [ S B ] + R [ S B ] 0 0 0 0 0 0 0 1 0 0 01 0 0 0 10 1 0 0 1 0 0 01 0 0 0 01 0 1 0 01 01 0 0 01 01 O p c o d e 000 1 00 0010 000 0100 000 1001 100 1000 010 1100 000 1100 001 1 1 1 0 0 0 0 0 0 R [ D R ] [ R [ S A ] ] M R [ D R ] Z N zf OP 1 1 M PC R [ D R ] R [ S B ] [ R [ S A ] ] PC + se AD PC R [ S A ] R [ S B ] R [ D R ] R [ S A ] + zf OP To INF State Table for 2-Cycle Instructions Control Unit I n p u t s O u t p u t s N e x t S t a t e s t a t e I P M M R M M O p c o d e V C N Z L S D X A X B X B F S D W M W C o m m e n t s I N F X X X X X X EX0 1 00 X X X X X X X X X X 0 1 0 I R M [ PC ] E X 0 0000000 X X INF 0 01 0 X X 0 X X X X X 0000 0 1 X 0 M O V A R [DR] R [SA]* E X 0 0000001 X X INF 0 01 0 X X 0 X X X X X 0001 0 1 X 0 I N C R [DR] R [S A ] + 1* E X 0 0000010 X X INF 0 01 0 X X 0 X X 0 X X 0 0010 0 1 X 0 A D D R [DR] R [S A ] + R [S B ]* E X 0 0000101 X X INF 0 01 0 X X 0 X X 0 X X 0 0101 0 1 X 0 S U B R [DR} R [S A ] + R [ S B ] + 1* E X 0 0000110 X X INF 0 01 0 X X 0 X X X X X 0110 0 1 X 0 D E C R [DR] R [S A ] + (- 1) * E X 0 0001000 X X INF 0 01 0 X X 0 X X 0 X X 0 1000 0 1 X 0 A N D R [DR] R [SA] ^ R [S B ]* E X 0 0001001 X X INF 0 01 0 X X 0 X X 0 X X 0 1001 0 1 X 0 O R R [DR] R [SA] v R [S B ]* E X 0 0001010 X X INF 0 01 0 X X 0 X X 0 X X 0 1010 0 1 X 0 X O R R [DR] R [SA] + R [S B ]* E X 0 0001011 X X INF 0 01 0 X X 0 X X X X X 1011 0 1 X 0 N O T R [DR] R [ S A ] * E X 0 0001100 X X INF 0 01 0 X X X X 0 X X 0 1100 0 1 X 0 M O V B R [DR] R [S B ]* E X 0 0010000 X X INF 0 01 0 X X 0 X X X X X X X 1 1 0 0 L D R [DR] M [ R [SA]]* E X 0 0100000 X X INF 0 01 X X 0 X X 0 X X 0 X X X 0 0 1 S T M [ R [SA]] R [S B ]* E X 0 1001100 X X INF 0 01 0 X X X X X X 1 1100 0 1 0 0 LDI R [DR] z f OP * E X 0 1000010 X X INF 0 01 0 X X 0 X X X X 1 0010 0 1 0 0 ADI R [DR] R [S A ] + z f OP * E X 0 1100000 X X 1 I N F 0 10 X X 0 X X X X X 0000 X 0 0 0 B R Z PC PC + s e A D E X 0 1100000 X X 0 I N F 0 01 X X 0 X X X X X 0000 X 0 0 0 B R Z PC PC + 1 E X 0 1100001 X 1 X INF 0 10 X X 0 X X X X X 0000 X 0 0 0 B R N PC PC + s e A D E X 0 1100001 X 0 X INF 0 01 X X 0 X X X X X 0000 X 0 0 0 B R N PC PC + 1 E X 0 1110000 X X INF 0 11 X X 0 X X X X X 0000 X 0 0 0 J M P PC R [S A ] * For this state and input combinations, PC PC+1 also occurs ----------------------------------- -- controller -- ----------------------------------- library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; -- Uncomment the following lines to use the declarations that are -- provided for instantiating Xilinx primitive components. --library UNISIM; --use UNISIM.VComponents.all; entity controller is Port (clk : in std_logic; opcode : in std_logic_vector(6 downto 0); reset : in std_logic; carry : in std_logic; neg : in std_logic; zero : in std_logic; overflw : in std_logic; IL : out std_logic; PS : out std_logic_vector(1 downto 0); DX : out std_logic_vector(3 downto 0); AX : out std_logic_vector(3 downto 0); BX : out std_logic_vector(3 downto 0); FS : out std_logic_vector(3 downto 0); MB : out std_logic; MD : out std_logic; RW : out std_logic; MM: out std_logic; MW: out std_logic) end controller; architecture Behavioral of controller is type state_type is (RES, INF, EX0); attribute enum_encoding : string; attribute enum_encoding of state_type : type is 0001 0010 0100"; signal cur_state, next_state : state_type; begin state_register: process(clk, reset) begin if(reset='1') then cur_state<=res; elsif (clk'event and clk='1') then cur_state<=next_state; end if; end process; out_func: process (cur_state, opcode, carry, zero, neg, overflw ) begin (IL, PS, MB, MD, RW, MM, MW) <= std_logic_vector'( 00000000"); (DX, AX, BX)<=std_logic_vector (X 000 ); FS<= 0000"; case cur_state is when RES => next_state <= INF; when INF => next_state<=ex0; MM <= 1 ; IL <= 1 ; when EX0 => next_state<= INF; case opcode is when 0000000" => PS <= 01 ; RW <= 1 ; when 0000001 => PS <= 01 ; RW <= 1 ; FS <= 0001 ;.. when 1100000 => if (zero = 1 ) then PS <= 10 ; else PS <= 01 ; end if;.. when others=> report "Unrecognizable state" severity error; end case; end case; end process; end Behavioral; 1

Outline for Pipelined Design Abstract View of Critical Path Pipelined Design Basic 5-stage pipe Speedup of pipelined vs non-pipelined implementations Pipeline hazards Structural, data, control Parallel digital systems 7 8 Pipelined critical path Steps in Instruction Processing Critical path is longest path between stage registers 9 10 Un-pipelined (Non-overlapped) Implementation Pipelined Implementation Consider loads with DF stage 11 12 2

5-stage Pipeline Pipelining Lessons CPU stages IF: Instruction fetch DR: Instruction decode & Register read E: Execute DF: Data fetch( Memory load/store) W: Write Back Regs Another set of mnemonic names IF, ID, E, MEM, WB 13 Pipelining doesn t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stages Potential speedup = number of pipe stages Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup 14 Computer Pipelines Execute billions of instruction, so throughput is what matters Throughput versus latency + Throughput increases - :Latency for a single instruction increases May have to wait longer for single instruction to complete Allows much faster clock cycle RISC pipeline architecture features: All instructions same length Registers located in same place in instruction format Memory operands only in loads and stores Pipelining Every clock cycle requires New instruction fetch New ALU operation New data word to/from memory Memory Requirements Faster memory (5x faster) Separate instruction and data paths Better caches 15 16 Unpipelined Datapath Pipelined Datapath MAR MDR 17 18 3

Outline Pipelined Design Basic 5-stage pipe Speedup of pipelined vs. non-pipelined implementations Pipeline hazards Structural, data, control Parallel digital systems Pipelining Hazards Hazards cause the pipe to stall because of some conflict in the pipe (prevents the next instruction in pipe from executing in its turn) Types of hazards Structural: contention for same hardware resource Data: dependency on earlier instruction for the correct sequencing of register reads and writes Control: branch/jump instructions stall the pipe until get correct target address into PC 19 20 Structural Hazards Structural Hazards Resource conflicts in the pipeline Examples Single memory port shared for instruction and data access Register file without a separate write port STALL load sub and or 21 22 Structural Hazards IF and DF compete for single memory port Ideal Machine No stalls, 1 cycle per instruction Assume 30% of instructions access data With structural hazard, 1.3 cycles per instruction Performance has gone down by 30% Solutions: Pipeline stall (insert bubble) Have 2 memory ports for shared instruction-data cache-memory (expensive) Have separate instruction cache-memory and data cache-memory Three Generic Data Hazards (I) - RAW Instr 1 followed by Instr 2 add r1, r3, r2 add r4, r5, r1 Instr 2 tries to read operand before Instr 1 writes it Can be due to true data dependency (data must be produced before it can be consumed) Or can be due to pipeline staging (data already produced, but not yet written to general register file 23 24 4

Data Hazards (II) - WAR Data Hazards (III) - WAW Instr 1 followed by Instr 2 ld r1, (r3)+ add r3, r4, r1 Instr 2 tries to write operand before Instr 1 reads it Instr 1 gets wrong operand Can t happen in the 5-stage RISC pipeline we just covered All instruction take 5 stages Reads are always in stage 2 Writes are always in stage 5 Instr 1 followed by Instr 2 mul r1, r0, r2 add r1, r5, r6 Instr 2 tries to write operand before Instr 1 writes it Leaves wrong result (Instr 1, not Instr 2 ) Can t happen in our 5-stage pipeline because All instructions take 5 stages Writes are always in stage 5 25 26 Data Hazards Overlapping instructions cause dependencies on data (RAW) e.g., R 1 R 5 MOVA R1, R5 ADD R3, R1, R2 R 2 R 1 + R 6 R 3 R 1 + R 2 1 2 3 4 5 Write R 1 Write R 2 Data Hazards Remedy - SW Software delay (compiler or machine code programming to insert s) MOVA R1, R5 ADD R3, R1, R2 27 28 Data Hazards Remedy - HW Data Forwarding (Reg. Bypassing) Hardware stalls Hazard detection MOVA R 1, R 5 IF DR IF Hardware Data Forwarding Add an extra path connecting ALU outputs to ALU inputs on the next clock 29 30 5

Pipelined Datapath with data forwarding 31 6