Simple Instruction-Pipelining (cont.) Pipelining Jumps

Similar documents
Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Implementing the Controller. Harvard-Style Datapath for DLX


Computer Architecture ELEC2401 & ELEC3441

3. (2) What is the difference between fixed and hybrid instructions?

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary?

4. (3) What do we mean when we say something is an N-operand machine?

CSE Computer Architecture I

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

CSCI-564 Advanced Computer Architecture

Processor Design & ALU Design

L07-L09 recap: Fundamental lesson(s)!

Project Two RISC Processor Implementation ECE 485

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

CPU DESIGN The Single-Cycle Implementation

Review: Single-Cycle Processor. Limits on cycle time

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

ICS 233 Computer Architecture & Assembly Language

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati

CPSC 3300 Spring 2017 Exam 2

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1

Basic Computer Organization and Design Part 3/3

CMP N 301 Computer Architecture. Appendix C

Computer Architecture

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

Fall 2011 Prof. Hyesoon Kim

COVER SHEET: Problem#: Points

EE 660: Computer Architecture Out-of-Order Processors

Lecture 3, Performance

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

Department of Electrical and Computer Engineering The University of Texas at Austin

EC 413 Computer Organization

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

Lecture 3, Performance

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Design at the Register Transfer Level

Enrico Nardelli Logic Circuits and Computer Architecture

Microprocessor Power Analysis by Labeled Simulation

Lecture 13: Sequential Circuits, FSM

CSE 380 Computer Operating Systems

CIS 4930/6930: Principles of Cyber-Physical Systems

Implementing Absolute Addressing in a Motorola Processor (DRAFT)

CMP 338: Third Class

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018

61C In the News. Processor Design: 5 steps

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

CprE 281: Digital Logic

A Second Datapath Example YH16

2

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

COMP303 Computer Architecture Lecture 11. An Overview of Pipelining

Unit 6: Branch Prediction

Synchronous Sequential Circuit Design. Dr. Ehab A. H. AL-Hialy Page 1

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

CPU scheduling. CPU Scheduling

Lecture 13: Sequential Circuits, FSM

ENEE350 Lecture Notes-Weeks 14 and 15

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

Instruction register. Data. Registers. Register # Memory data register

COE 202: Digital Logic Design Sequential Circuits Part 4. Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office:

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

From Sequential Circuits to Real Computers

Spiral 1 / Unit 3

Introduction to CMOS VLSI Design Lecture 1: Introduction

System Data Bus (8-bit) Data Buffer. Internal Data Bus (8-bit) 8-bit register (R) 3-bit address 16-bit register pair (P) 2-bit address

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

LABORATORY MANUAL MICROPROCESSOR AND MICROCONTROLLER

Hardware Design I Chap. 4 Representative combinational logic

LSN 15 Processor Scheduling

Chapter 7. Sequential Circuits Registers, Counters, RAM

GATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session)

Real-time operating systems course. 6 Definitions Non real-time scheduling algorithms Real-time scheduling algorithm

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

Digital Design. Register Transfer Specification And Design

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

Computer Architecture

課程名稱 : 數位邏輯設計 P-1/ /6/11

UC Santa Barbara. Operating Systems. Christopher Kruegel Department of Computer Science UC Santa Barbara

Figure 4.9 MARIE s Datapath

CMP 334: Seventh Class

Simulation of Process Scheduling Algorithms

Operating Systems. VII. Synchronization

Arithmetic and Logic Unit First Part

The conceptual view. by Gerrit Muller University of Southeast Norway-NISE

Review. Combined Datapath

Clock-driven scheduling

Designing Single-Cycle MIPS Processor

UNIVERSITY OF WISCONSIN MADISON

SISD SIMD. Flynn s Classification 8/8/2016. CS528 Parallel Architecture Classification & Single Core Architecture C P M

CHAPTER 5 - PROCESS SCHEDULING

Chapter 7: Digital Components. Oregon State University School of Electrical Engineering and Computer Science. Review basic digital design concepts:

Transcription:

6.823, L9--1 Simple ruction-pipelining (cont.) + Interrupts Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Src1 ( j / ~j ) Src2 ( / Ind) Pipelining Jumps stall 6.823, L9--2 304 E M Jump? Src D 104 096 DD 100 J 200 I 3 104 DD I 4 304 DD no delay slot kill To kill the fetched ruction, insert a mux before D ny interaction beten stall and jump? No Page 1

Control Equations for Muxes Jumps only 6.823, L9--3 Src1 = Case opcode D J, JL, J, JL j D... ~j Src2 = Case opcode D J, JL J, JL egi Src D = Case opcode D J, JL, J, JL... IM Pipelining Conditional ranches Src1 ( j / ~j ) Src2 ( / Ind) stall 6.823, L9--4 E M EQZ? zero? Src D 104 096 DD 100 EQZ r1, 200 I 3 104 DD I 4 304 DD no delay slot ranch condition is not known until the execute stage what action should be taken in the decode stage? Page 2

Conditional ranches: solution 1 Src1 Src2 ( j D /j E / ~j ) ( / Ind) stall 6.823, L9--5? E EQZ? M 108 Src D? I 3 zero? 096 DD 100 EQZ r1, 200 I 3 104 DD I 4 304 DD no delay slot If the branch is taken - kill the two following ructions - the ruction at the decode stage is not valid stall is not valid New Stall Signal 6.823, L9--6 stall = (((rf1 D = E ). E + (rf1 D = M ). M + (rf1 D = W ). W ).re1 D + ((rf2 D = E ). E + (rf2 D = M ). M + (rf2 D = W ). W ).re2 D ).!((opcode E =EQZ).z + (opcode E =NEZ).!z) Page 3

Control Equations for Muxes Solution 1 Src1 = Case opcode E EQZ.z, NEZ.!z j E... Case opcode D J, JL, J, JL j D... ~j Src2 = Case opcode D J, JL J, JL egi 6.823, L9--7 Give priority to the older ruction, i.e., execute Stage ruction Over decode Stage ruction Src D = Case opcode E EQZ.z, NEZ.!z... (Case opcode D J, JL, J, JL... IM) Src E = Case opcode E EQZ.z, NEZ.!z... stall. +!stall. D Conditional ranches: solution 2 Src1 ( j / ~j ) Test for zero at the decode stage Src2 ( / Ind) stall 6.823, L9--8 304 304 108 EQZ? E M 104 Src D or zero? 096 DD 100 EQZ r1, 200 I 3 104 DD I 4 304 DD no delay slot Need to kill only one ruction! Wouldn t work if DLX had general branch conditions (i.e., r1>r2)? Page 4

Conditional ranches: solution 3 Src1 ( j / ~j ) Src2 ( / Ind) Delayed ranches stall 6.823, L9--9 304 304 108 EQZ? E M zero? 104 096 DD 100 EQZ r1, 200 I 3 104 DD I 4 304 DD Change the semantics of branches and jumps Need not kill any ructions! To delay or not to delay? 6.823, L9--10 Delay slot complicates IS specification and programming may simplify interlock logic clock time? So why have delay slots? hint: Consider ruction issue opportunities assuming 15% of the ructions are control ructions. Page 5

Pipelining Delayed Jumps & Links 6.823, L9--11 304 304 108 holding the return ess for linking 104 GPs 096 DD 100 JL 200 I 3 104 DD I 4 304 DD Src1 Complete Control Logic stall W W C M stall M rf1 E rf2 E re1re2 6.823, L9--12 Src2 C re C dest C dest C dest Sel GPs zero? OpSel Cntrl MemWrite WSrc Src Page 6

Hardwired Control Equations 6.823, L9--13 Sel = Case opcode D i, LW, SW, EQZ, NEZ s 16 ui u 16 J, JL s 26 Src = Case opcode D eg i, LW, SW OpSel = Case opcode E Func i Op LW, SW + EQZ, NEZ 0? Ignoring Jumps and ranches MemWrite = Case opcode M SW on... off WSrc = Case opcode M, i LW Mem JL, JL egdst = Case opcode W rf3 i, LW rf2 JL, JL egwrite = Case opcode W, i, LW ( 0) JL, JL on... off The Stall Signal 6.823, L9--14 C dest = Case opcode rf3 i, LW rf2 JL, JL = Case opcode, i, LW ( 0) JL, JL on... off C re re1 = Case opcode, i, LW, SW, Z, J, JL on J, JL off re2 = Case opcode, SW on... off stall stall = ( (rf1 = D E ). E + (rf1 D = M ). M + (rf1 D = W ). W ). re1 D + ((rf2 D = E ). E + (rf2 D = M ). M + (rf2 D = W ). W ). re2 D Page 7

Control Equations for Muxes Delayed Jumps and ranches 6.823, L9--15 Src1 = Case opcode D EQZ.zero?, NEZ.!zero? j J, JL, J, JL j... ~j Src2 = Case opcode D EQZ.zero?, NEZ.!zero? J, JL J, JL egi... * Src E = stall. +!stall. D ypassing 6.823, L9--16 time t0 t1 t2 t3 t4 t5 t6 t7.... ( ) r1 (r0) + 10 IF 1 ID 1 EX 1 M 1 W 1 ( ) r4 (r1) + 17 IF 2 ID 2 ID 2 ID 2 ID 2 EX 2 M 2 W 2 (I 3 ) IF 3 IF 3 IF 3 IF 3 ID 3 EX 3 M 3 (I 4 ) stalled stages IF 4 ID 4 EX 4 (I 5 ) IF 5 ID 5 Each stall or kill introduces a bubble in the pipeline CPI > 1 new datapath, i.e., a bypass, can get the data from the output of the to its input time t0 t1 t2 t3 t4 t5 t6 t7.... ( ) r1 (r0) + 10 IF 1 ID 1 EX 1 M 1 W 1 ( ) r4 (r1) + 17 IF 2 ID 2 EX 2 M 2 W 2 (I 3 ) IF 3 ID 3 EX 3 M 3 W 3 (I 4 ) IF 4 ID 4 EX 4 M 4 W 4 (I 5 ) IF 5 ID 5 EX 5 M 5 W 5 Page 8

Stall ing ypasses 6.823, L9--17 E M W... ( ) r1 (r0) + 10 ( ) r4 (r1) + 17... D GPs Src Of course you can add many more bypasses! The ypass Signal deriving it from the stall signal 6.823, L9--18 C dest = Case opcode rf3 i, LW rf2 JL, JL = Case opcode, i, LW ( 0) JL, JL on... off C re re1 = Case opcode, i, LW, SW, Z, J, JL on J, JL off re2 = Case opcode, SW on... off stall stall = ( (rf1 = D E ). E + (rf1 D = M ). M + (rf1 D = W ). W ). re1 D + ((rf2 D = E ). E + (rf2 D = M ). M + (rf2 D = W ). W ). re2 D Is this correct? C bypass Src = (rf1 D = E ). E. re1 D Page 9

Stall Usefulness of a ypass 6.823, L9--19 M W Consider... ( ) r1 (r0) + 10 ( ) r4 (r1) + 17 GPs Where can this bypass help? Src... r1 M[(r0) + 10] r4 (r1) + 17... JL 500 r4 (r) + 17 ypass and Stall Signals E has to be split into two components -bypass E = ((opcode E =) + (opcode E =i E )). ( E 0 ) -stall E = (opcode E = LW E ).( E 0) + (opcode E =JL E ) + (opcode E =JL E ) Src Stall re1 re2 -stall -bypass Stall GPs Src E 6.823, L9--20 bypass = (rf1 D = E ).-bypass E. re1 D = ((rf1 D = E ).-stall E ) + (rf1 D = M ). M + (rf1 D = W ). W ). re1 D + ((rf2 D = E ). E + (rf2 D = M ). M + (rf2 D = W ). W ). re2 D M C dest C dest W Page 10

Fully ypassed path Stall 6.823, L9--21 W W C dest Src Is there still a need for the Stall signal? GPs Src Stall = (rf1 D = E ). (opcode E =LW E ).( E 0 ).re1 D + (rf2 D = E ). (opcode E =LW E ).( E 0 ).re2 D Why an ruction may not be dispatched every cycle 6.823, L9--22 Full bypassing may be too expensive to implement Loads may cause bubbles, if there are no load delay slots Conditional branches may cause bubbles, if there are no branch delay slots Page 11

Interrupts alter normal flow of control 6.823, L9--23 I i H program I i+1 H interrupt handler I i+2 HI n n external or internal event that needs to be processed by another (system) program. The event is usually unexpected or rare from program s point of view. Causes of Interrupts Interrupt is an event that requests the attention of the processor 6.823, L9--24 synchronous: an external event input/output device service-request timer expiration por disruptions, hardware failure Synchronous: an internal event (aka exceptions) undefined opcode, privileged ruction arithmetic overflow, FPU exception misaligned memory access virtual memory exceptions: page faults, TL misses, protection violations traps: system calls (i.e., jumps into kernel code) Page 12

synchronous Interrupts: invoking the interrupt handler 6.823, L9--25 n I/O device requests attention by asserting one of the prioritized interrupt request lines When the processor decides to process the interrupt it stops the current program at ruction I i, completing all the ructions up to I i (precise interrupt) it saves the of ruction I i+1 in a special register () disables interrupts and transfers control to a designated interrupt handler Interrupt Handler 6.823, L9--26 To allow nested interrupts, is saved before enabling interrupts need an ruction to move into GPs need a way to mask further interrupts at least until can be saved There is a status register which indicates the cause of the interrupt - it must be visible to an interrupt handler The return from an interrupt handler is a simple indirect jump but usually involves enabling interrupts restoring the processor to the user mode restoring hardware status and control state a special return-from-execption ruction (FE) Page 13

syn. Control Transfer: The start 6.823, L9--27 IH IH I i-1 I i-2 I i-3 i+1 I i interrupt to be processed after I i GPs i+1 must be saved here for the interrupt handler. syn. Control Transfer: step 1 6.823, L9--28 IH I i I i-1 I i-2 IH GPs i+1 The controller must insert s before IH is fetched into. Page 14

syn. Control Transfer: step 4 6.823, L9--29 IH +8 IH IH +4 I IH GPs i+1 The interrupt handler can begin execution path for syn. Interrupts 6.823, L9--30 IH interrupt control is a multicycle operation µcontroller GPs i Page 15

syn. Control Transfer: delay slot 6.823, L9-- IH IH I i-1 I i-2 I i-3 i+1 I i J interrupt to be processed after I i ( = J) GPs Can resume execution correctly by storing just i+1? DLX IS-specific Solution On an interrupt, if I i+1 is in a delay slot then save i ead of i+1 Execution can always be correctly resumed from the saved. Why? Hint: can a jump ruction be re-executed? 6.823, L9--32 J J Z JL JL modify state? OK to re-execute? Page 16

syn. Control Transfer: DLX Hack 6.823, L9--33 IH IH I i-1 I i-2 I i-3 i+1 I i J GPs i in case of a jump save i ead of i+1 Synchronous Interrupts 6.823, L9--34 synchronous interrupt (exception) is caused by a particular ruction In general, the ruction cannot be completed and needs to be restarted after the exception has been handled requires undoing the effect of one or more partially executed ructions In case of a trap (system call), the ruction is considered to have been completed a special jump ruction involving a change to the privilege mode Correct implementation of exceptions is quite difficult Page 17