EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

Similar documents
Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

CSCI-564 Advanced Computer Architecture

Computer Architecture ELEC2401 & ELEC3441

Simple Instruction-Pipelining. Pipelined Harvard Datapath

3. (2) What is the difference between fixed and hybrid instructions?

ICS 233 Computer Architecture & Assembly Language

CMP N 301 Computer Architecture. Appendix C

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

4. (3) What do we mean when we say something is an N-operand machine?

/ : Computer Architecture and Design

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Implementing the Controller. Harvard-Style Datapath for DLX

COVER SHEET: Problem#: Points

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Fall 2011 Prof. Hyesoon Kim

CPSC 3300 Spring 2017 Exam 2

Microprocessor Power Analysis by Labeled Simulation

TEST 1 REVIEW. Lectures 1-5

[2] Predicting the direction of a branch is not enough. What else is necessary?

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

[2] Predicting the direction of a branch is not enough. What else is necessary?

Project Two RISC Processor Implementation ECE 485

Unit 6: Branch Prediction

COMP303 Computer Architecture Lecture 11. An Overview of Pipelining

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati

Simple Instruction-Pipelining (cont.) Pipelining Jumps

ENEE350 Lecture Notes-Weeks 14 and 15

Computer Architecture


CPU DESIGN The Single-Cycle Implementation

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Processor Design & ALU Design

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

L07-L09 recap: Fundamental lesson(s)!

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

Circuit Theory ES3, EE21

Fall 2008 CSE Qualifying Exam. September 13, 2008

Lecture 3, Performance

Lecture 3, Performance

ECE QUALIFYING EXAM I (QE l) January 5, 2015

SISD SIMD. Flynn s Classification 8/8/2016. CS528 Parallel Architecture Classification & Single Core Architecture C P M

Pipeline no Prediction. Branch Delay Slots A. From before branch B. From branch target C. From fall through. Branch Prediction

Review: Single-Cycle Processor. Limits on cycle time

Lecture 5 - Assembly Programming(II), Intro to Digital Filters

A Second Datapath Example YH16

Lecture 13: Sequential Circuits, FSM

CSE Computer Architecture I

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

Lecture 13: Sequential Circuits, FSM

Lecture: Pipelining Basics

On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...

Computer Architecture

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018

Portland State University ECE 587/687. Branch Prediction

CA Compiler Construction

Performance Metrics & Architectural Adaptivity. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

MT225 Homework 6 - Solution. Problem 1: [34] 5 ft 5 ft 5 ft A B C D. 12 kips. 10ft E F G

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Designing MIPS Processor

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

GATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session)

Fundamentals of Computer Systems

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1

Figure 4.9 MARIE s Datapath

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

61C In the News. Processor Design: 5 steps

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

CODE GENERATION REGISTER ALLOCATION. Goal. Interplay between. Translate intermediate code into target code

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017

CS 700: Quantitative Methods & Experimental Design in Computer Science

Rigid Body Equilibrium. Free Body Diagrams. Equations of Equilibrium

CHARACTERIZATION AND CLASSIFICATION OF MODERN MICRO-PROCESSOR BENCHMARKS KUNXIANG YAN, B.S. A thesis submitted to the Graduate School

Department of Electrical and Computer Engineering The University of Texas at Austin

From Sequential Circuits to Real Computers

EE 660: Computer Architecture Out-of-Order Processors

Enrico Nardelli Logic Circuits and Computer Architecture

From Sequential Circuits to Real Computers

EC 413 Computer Organization

Scalable Store-Load Forwarding via Store Queue Index Prediction

EET 310 Flip-Flops 11/17/2011 1

CPU DESIGN The Single-Cycle Implementation

CLEP Chemistry - Problem Drill 10: Atomic Structure and Electron Configuration

EECE 301 Signals & Systems

Review. Combined Datapath

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

Outcomes. Spiral 1 / Unit 2. Boolean Algebra BOOLEAN ALGEBRA INTRO. Basic Boolean Algebra Logic Functions Decoders Multiplexers

Compiling Techniques

Lecture 6: Logical Effort

Measurement & Performance

Measurement & Performance

Transcription:

The MIPS Pipeline CSCI206 - Computer Organization & Programming Pipeline Datapath and Control zybook: 11.6 Developed and maintained by the Bucknell University Computer Science Department - 2017 Hazard Summary data - An instruction depends on a data value produced or consumed by another instruction -- Reorder -- orwarding (EX-EX, Mem-EX) control - The execution of an instruction depends on a control decision made by an earlier instruction (e.g., branch) -- Delay slot (nop) -- Compute diff at the ID stage structural - An instruction in the pipeline needs a resource being used by another instruction in the pipeline at the same moment -- Reorder if possible -- Delay EXAMPLES CYCLE 1 CYCLE 2 li v0, 100 add v1, v1, v2 beq v0, v1, loop li v0, 100 D add v1, v1, v2 beq v0, v1, loop 1

CYCLE 3 CYCLE 4 li v0, 100 D E add v1, v1, v2 D beq v0, v1, loop li v0, 100 D E M add v1, v1, v2 D E beq v0, v1, loop - - In cycle 4 branch wants to execute, but needs the new value of v1. It is available at the end of cycle 4. So we have to stall. This stalls everything before this stage in the pipeline, so we cannot fetch the addi. CYCLE 5 CYCLE 6 li v0, 100 D E M W add v1, v1, v2 D E M beq v0, v1, loop - - - - li v0, 100 D E M W add v1, v1, v2 D E M W beq v0, v1, loop - - D In cycle 5, we have the value of v1 in EX. But MIPS only has forwarding EX-EX, MEM-EX, and MEM- MEM. Not EX-ID. So, we have to again stall. (no fetch again) inally in cycle 6 we can decode the new value of v1 (without forwarding). Since we were able to decode, we can also fetch the next instruction in cycle 6. Since branch is resolved in Decode, we don t have to show EMW stages (they are NOPs) CYCLE 7 CYCLE 11 li v0, 100 D E M W add v1, v1, v2 D E M W beq v0, v1, loop - - D D li v0, 100 D E M W add v1, v1, v2 D E M W beq v0, v1, loop - - D D E M W D E M W no hazards for addi ast forward to cycle 11. execution took 11 cycles. IPC = 5 / 11 = 0.45 2

CYCLE 0 CYCLE 0 lw r1, 0(r4) lw r1, 0(r4) lw r2, 400(r4) lw r2, 400(r4) addi r3, r1, r2 addi r3, r1, r2 irst two instructions are hazard free irst two instructions are hazard free addi depends on both r1 and r2. addi depends on both r1 and r2. sw depends on r3 (addi) sw depends on r3 (addi) CYCLE 3 CYCLE 4 lw r1, 0(r4) D E lw r2, 400(r4) D addi r3, r1, r2 lw r1, 0(r4) D E M lw r2, 400(r4) D E addi r3, r1, r2 - - No issues until addi goes to decode Decode in cycle 4 would get both old values (r1, r2) We could forward r1 from MEM to EX in 5 But r1 is not yet available, so we must stall, since D stalls, sw cannot fetch. CYCLE 5 CYCLE 6 lw r1, 0(r4) D E M W lw r2, 400(r4) D E M addi r3, r1, r2 - D lw r1, 0(r4) D E M W lw r2, 400(r4) D E M W addi r3, r1, r2 - D E D Decode in cycle 5. Load new value for r1 (WB in same cycle is OK) Need to forward MEM->EX for r2 in cycle 6. orward MEM->EX for r2. Draw an arrow from previous cycle s M to current cycle s E Decode sw in 6, but we get the old value for r3. But that s OK, sw doesn t need the new value until the start of MEM, we can use a forwarding path 3

CYCLE 7 CYCLE 8 lw r1, 0(r4) D E M W lw r2, 400(r4) D E M W addi r3, r1, r2 - D E M D E D lw r1, 0(r4) D E M W lw r2, 400(r4) D E M W addi r3, r1, r2 - D E M W D E M D E No issues SW fetched the wrong r3, but the new value for r3 is at the output of the MEM stage, so we need a MEM- MEM forward. CYCLE 9 CYCLE 10 lw r1, 0(r4) D E M W lw r2, 400(r4) D E M W addi r3, r1, r2 - D E M W D E M W D E M lw r1, 0(r4) D E M W lw r2, 400(r4) D E M W addi r3, r1, r2 - D E M W D E M W D E M W IPC = 5 / 10 = 0.5 lw r1, 0(sp) lw r1, 0(sp) add r1, r1, r2 add r1, r1, r2 4

CYCLE 1 CYCLE 2 lw r1, 0(sp) add r1, r1, r2 lw r1, 0(sp) D add r1, r1, r2 CYCLE 3 CYCLE 4 lw r1, 0(sp) D E lw r1, 0(sp) D E M add r1, r1, r2 - add r1, r1, r2 - D - - if we decode in 3, add will need r1 to execute in 4. the value isn t available until the end of cycle 4 (lw finishes mem). So we need to stall in cycle 3. We will forward from the output of MEM to the input of EX in the next cycle (r1 for add) CYCLE 5 CYCLE 6 lw r1, 0(sp) D E M W lw r1, 0(sp) D E M W add r1, r1, r2 - D E add r1, r1, r2 - D E M - D - D E We will forward from the output of MEM to the input of EX in the next cycle (r1 for add) sw computes the memory address 0 + sp in EX, so no forward needed. sw needs the new r1 at the beginning of MEM, that will be in cycle 7, we can get it from the output of MEM 5

CYCLE 7 CYCLE 8 lw r1, 0(sp) D E M W lw r1, 0(sp) D E M W add r1, r1, r2 - D E M W add r1, r1, r2 - D E M W - D E M - D E M W sw writes r1 at mem[0+sp] in cycle 7, the value r1 is at the output of MEM so forward it to the input. Done. IPC = 3 / 8 6