CMP N 301 Computer Architecture. Appendix C

Size: px
Start display at page:

Download "CMP N 301 Computer Architecture. Appendix C"

Transcription

1 CMP N 301 Computer Architecture Appendix C

2 Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc) 2

3 Pipelining: Introduction Implementation technique in which multiple instructions are overlapped in execution (instruction execution steps could run in parallel You have 4 loads of cloths to wash: Steps (stages) required: Wash Dry Fold Store clothes into drawers A B C D Each stage needs 30 minutes We can t start the next step until the previous step is finished 3

4 Pipelining Example: Laundry There are 2 approaches to do this job: Sequential (non-pipelined): Wait until the first load is put away in order to start the next load Pipelined (ASAP): As soon as the washer is empty, start putting the next load, while the first load is put into dryer 4

5 Pipelining Example: Laundry Sequential Laundry Needs 8 hours for 4 loads T i m e T a s k o r d e r A 6 P M A M B C D 5

6 Pipelining Example: Laundry Pipelined Laundry: Start work ASAP Needs only 3.5 hours for 4 loads! 6 P M A M T i m e T a s k o r d e r A B C D 6

7 CPU Pipelining Review: 5 stages of a MIPS instruction Fetch instruction from instruction memory Read registers while decoding instruction Execute operation or calculate address, depending on the instruction type Access an operand from data memory Write result into a register We can reduce the cycles to fit the stages. Do you catch an advantage of load/stor RISC architecture over CISC? Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load /Dec Exec Mem Wr 7

8 CPU Pipelining isters should be added to separate between stages and stabilize the signals passed from one stage to the next 8

9 Pipelining Example: Observations Pipelining Observations: After filling the pipeline, all stages will be operating concurrently Pipelining doesn t reduce number of stages doesn t help latency of single task helps throughput of entire workload In order to pipeline the task, we should have separate resources. (Multiple tasks operating simultaneously use different resources) 9

10 Pipelining: Speedup Speedup due to pipelining depends on the number of stages in the pipeline Ideal maximum speedup = number of stages (pipeline depth) Why ideal speedup never been achieved Stages are usually not balanced and pipeline rate is limited by slowest pipeline stage If dryer needs 45 min, time for all stages has to be 45 min to accommodate it Pipelining overheads (like inter-stage registers delay) Time to fill the pipeline and time to drain it If one load depends on another, we will have to wait (Delay/Stall for Dependencies or Hazards) 10

11 CPU Pipelining: Examples Example (1): Textbook Page C-10 Example (2): Single-Cycle, non-pipelined execution Total time for 3 instructions: 24 ns P r o g r a m e x e c u t i o n o r d e r T i m e ( i n i n s t r u c t i o n s ) l w $ 1, ( $ 0 ) l w $ 2, ( $ 0 ) l w $ 3, ( $ 0 ) I n s t r u c t i o n f e t c h R e g A L U 8 n s D a t a a c c e s s R e g I n s t r u c t i o n f e t c h R e g A L U 8 ns D a t a a c c e s s R e g I n s t r u c t i o n f e t c h... 8 n s 11

12 CPU Pipelining: Example Single-cycle, pipelined execution Improve performance by increasing instruction throughput Total time for 3 instructions = 14 ns Each instruction adds 2 ns to total execution time Stage time limited by slowest resource (2 ns) Assumptions: Write to register occurs in 1 st half of clock Read from register occurs in 2 nd half of clock P r o g r a m e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) T i m e l w $ 1, ( $ 0 ) I n s t r u c t i o n f e t c h R e g A L U D a t a a c c e s s R e g l w $ 2, ( $ 0 ) 2 n s I n s t r u c t i o n f e t c h R e g A L U D a t a a c c e s s R e g l w $ 3, ( $ 0 ) 2 n s I n s t r u c t i o n f e t c h R e g A L U D a t a a c c e s s R e g 2 n s 2 n s 2 n s 2 n s 2 n s 12

13 Pipelining Hazards Hazards: Situations that prevent an instruction from being executed in its designated clock cycle Types of Hazards 1. Structural Hazards: Conflict on resources 2. Data Hazards: An instruction depends on the results of previous instruction 3. Control Hazards: When an instruction changes PC, like branches The simplest hazards solution is to stall the pipeline (some instructions are allowed to proceed, while other are delayed) Speedup = pipeline depth/(1+average stall cycles per instruction)

14 Structural hazards No two instructions are processed by the same module at the same time. Solutions: Stalling the pipeline (inserting bubbles) or duplicating the resource Instruction fetch is conflicting with data memory access Use Harvard architecture with two separate caches ister file is accessed by the instruction in the decode (reading) and the instruction in the write-back (writing) Ensure that writing is done in the first half of the clock and reading is done at the second half of the clock

15 Solving Structural Hazards by Stalling Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load Instr 1 Instr 2 Instr 3 Instr 4 Structural Hazard 15

16 Solving Structural Hazards by Stalling Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t r. O r d e r Load Instr 1 Instr 2 Stall Instr 3 Bubble Bubble Bubble Bubble Bubble 16

17 Data Hazards Pipelining might change the order of reading/writing operands from that of sequential execution Time (clock cycles) IF ID/RF EX MEM WB I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11

18 Types of Data Hazards Read After Write (RAW): Instruction J tries to read an operand before Instruction I writes it (True data dependency) I: add r1,r2,r3 J: sub r4,r1,r3 Write After Read (WAR): Instruction J writes an operand before Instruction I reads it (Anti dependency) I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Write After Write (WAW): Instruction J writes an operand before Instruction I writes it (output dependency) I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7

19 Data Hazards Solutions Writing in the first half of the cycle and reading in the second half mitigates the effect of data hazards Solution (1): Stalling the pipeline: inserting bubbles and stall the instruction till the data is written back Solution (2): Data Forwarding or bypassing : forward the data back to the requesting stage immediately when it is available Solution (3): Software-solution: Compiler arranges instructions to avoid data dependencies

20 Solving Data Hazards by Forwarding Time (clock cycles) I n s t r. add r1,r2,r3 sub r4,r1,r3 O r d e r and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 20

21 HW Change for Forwarding NextPC isters Immediate ID/EX mux mux EX/MEM Data Memory MEM/WR mux 21

22 Data Hazard Even with Forwarding Time (clock cycles) I n s t r. lw r1, 0(r2) sub r4,r1,r6 O r d e r and r6,r1,r7 or r8,r1,r9 22

23 Resolving the Load Data Hazard Time (clock cycles) I n s t r. O r d e r lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 Bubble Bubble or r8,r1,r9 Bubble Pipeline should be stalled till the data becomes available 23

24 Software Scheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e f; assuming a, b, c, d,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,rd Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,ra SUB Rd,Re,Rf SW d,rd 24

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference) ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC

More information

Lecture: Pipelining Basics

Lecture: Pipelining Basics Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control The MIPS Pipeline CSCI206 - Computer Organization & Programming Pipeline Datapath and Control zybook: 11.6 Developed and maintained by the Bucknell University Computer Science Department - 2017 Hazard

More information

COMP303 Computer Architecture Lecture 11. An Overview of Pipelining

COMP303 Computer Architecture Lecture 11. An Overview of Pipelining COMP303 Compute Achitectue Lectue 11 An Oveview of Pipelining Pipelining Pipelining povides a method fo executing multiple instuctions at the same time. Laundy Example: Ann, Bian, Cathy, Dave each have

More information

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. I fetch decode & eg-fetch execute memory Clock period

More information

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. fetch decode & eg-fetch execute

More information

CPSC 3300 Spring 2017 Exam 2

CPSC 3300 Spring 2017 Exam 2 CPSC 3300 Spring 2017 Exam 2 Name: 1. Matching. Write the correct term from the list into each blank. (2 pts. each) structural hazard EPIC forwarding precise exception hardwired load-use data hazard VLIW

More information

CSCI-564 Advanced Computer Architecture

CSCI-564 Advanced Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA

More information

Computer Architecture

Computer Architecture Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines

More information

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2 Pipelining CS 365 Lecture 12 Prof. Yih Huang CS 365 1 Traditional Execution 1 2 3 4 1 2 3 4 5 1 2 3 add ld beq CS 365 2 1 Pipelined Execution 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

More information

3. (2) What is the difference between fixed and hybrid instructions?

3. (2) What is the difference between fixed and hybrid instructions? 1. (2 pts) What is a "balanced" pipeline? 2. (2 pts) What are the two main ways to define performance? 3. (2) What is the difference between fixed and hybrid instructions? 4. (2 pts) Clock rates have grown

More information

Microprocessor Power Analysis by Labeled Simulation

Microprocessor Power Analysis by Labeled Simulation Microprocessor Power Analysis by Labeled Simulation Cheng-Ta Hsieh, Kevin Chen and Massoud Pedram University of Southern California Dept. of EE-Systems Los Angeles CA 989 Outline! Introduction! Problem

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

CS 52 Computer rchitecture and Engineering Lecture 4 - Pipelining Krste sanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs52!

More information

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 2. (2 )What are the two main ways to define performance? 3. (2 )What

More information

ENEE350 Lecture Notes-Weeks 14 and 15

ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining & Amdahl s Law ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining is a method of processing in which a problem is divided into a number of sub problems and solved and the solu8ons of the sub problems

More information

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary? [2] When we talk about the number of operands in an instruction (a 1-operand or a 2-operand instruction, for example), what do we mean? [2] What are the two main ways to define performance? [2] Predicting

More information

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018 ECE 172 Digital Systems Chapter 12 Instruction Pipelining Herbert G. Mayer, PSU Status 7/20/2018 1 Syllabus l Scheduling on Pipelined Architecture l Idealized Pipeline l Goal of Scheduling l Causes for

More information

Processor Design & ALU Design

Processor Design & ALU Design 3/8/2 Processor Design A. Sahu CSE, IIT Guwahati Please be updated with http://jatinga.iitg.ernet.in/~asahu/c22/ Outline Components of CPU Register, Multiplexor, Decoder, / Adder, substractor, Varity of

More information

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide) Out-of-order Pipeline Buffer of instructions Issue = Select + Wakeup Select N oldest, read instructions N=, xor N=, xor and sub Note: ma have execution resource constraints: i.e., load/store/fp Fetch Decode

More information

4. (3) What do we mean when we say something is an N-operand machine?

4. (3) What do we mean when we say something is an N-operand machine? 1. (2) What are the two main ways to define performance? 2. (2) When dealing with control hazards, a prediction is not enough - what else is necessary in order to eliminate stalls? 3. (3) What is an "unbalanced"

More information

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION

More information

Computer Architecture ELEC2401 & ELEC3441

Computer Architecture ELEC2401 & ELEC3441 Last Time Pipeline Hazard Computer Architecture ELEC2401 & ELEC3441 Lecture 8 Pipelining (3) Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Structural Hazard Hazard Control

More information

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary? [2] What are the two main ways to define performance? [2] Predicting the direction of a branch is not enough. What else is necessary? [2] The power consumed by a chip has increased over time, but the clock

More information

Project Two RISC Processor Implementation ECE 485

Project Two RISC Processor Implementation ECE 485 Project Two RISC Processor Implementation ECE 485 Chenqi Bao Peter Chinetti November 6, 2013 Instructor: Professor Borkar 1 Statement of Problem This project requires the design and test of a RISC processor

More information

Fall 2011 Prof. Hyesoon Kim

Fall 2011 Prof. Hyesoon Kim Fall 2011 Prof. Hyesoon Kim Add: 2 cycles FE_stage add r1, r2, r3 FE L ID L EX L MEM L WB L add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub add add mul sub sub add add mul sub sub add

More information

EE 660: Computer Architecture Out-of-Order Processors

EE 660: Computer Architecture Out-of-Order Processors EE 660: Computer Architecture Out-of-Order Processors Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David entzlaff Agenda I4 Processors I2O2

More information

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle Computer Engineering Department CC 311- Computer Architecture Chapter 4 The Processor: Datapath and Control Single Cycle Introduction The 5 classic components of a computer Processor Input Control Memory

More information

Performance, Power & Energy

Performance, Power & Energy Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?

More information

Unit 6: Branch Prediction

Unit 6: Branch Prediction CIS 501: Computer Architecture Unit 6: Branch Prediction Slides developed by Joe Devie/, Milo Mar4n & Amir Roth at Upenn with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi,

More information

Implementing the Controller. Harvard-Style Datapath for DLX

Implementing the Controller. Harvard-Style Datapath for DLX 6.823, L6--1 Implementing the Controller Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L6--2 Harvard-Style Datapath for DLX Src1 ( j / ~j ) Src2 ( R / RInd) RegWrite MemWrite

More information

L07-L09 recap: Fundamental lesson(s)!

L07-L09 recap: Fundamental lesson(s)! L7-L9 recap: Fundamental lesson(s)! Over the next 3 lectures (using the IPS ISA as context) I ll explain:! How functions are treated and processed in assembly! How system calls are enabled in assembly!

More information

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example This Unit: Scheduling (Static + Dnamic) CIS 50 Computer Architecture Unit 8: Static and Dnamic Scheduling Application OS Compiler Firmware CPU I/O Memor Digital Circuits Gates & Transistors! Previousl:!

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper

More information

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining CMU 18-447 Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining Instructor: Prof Onur Mutlu TAs: Rachata Ausavarungnirun, Kevin Chang, Albert Cho, Jeremie

More information

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT CSE 560 Practice Problem Set 4 Solution 1. In this question, you will examine several different schemes for branch prediction, using the following code sequence for a simple load store ISA with no branch

More information

Goals for Performance Lecture

Goals for Performance Lecture Goals for Performance Lecture Understand performance, speedup, throughput, latency Relationship between cycle time, cycles/instruction (CPI), number of instructions (the performance equation) Amdahl s

More information

/ : Computer Architecture and Design

/ : Computer Architecture and Design 16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

GATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session)

GATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session) GATE 4 A Brief Analysis (Based on student test experiences in the stream of CS on st March, 4 - Second Session) Section wise analysis of the paper Mark Marks Total No of Questions Engineering Mathematics

More information

COVER SHEET: Problem#: Points

COVER SHEET: Problem#: Points EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)

More information

Designing Sequential Logic Circuits

Designing Sequential Logic Circuits igital Integrated Circuits (83-313) Lecture 5: esigning Sequential Logic Circuits Semester B, 2016-17 Lecturer: r. Adam Teman TAs: Itamar Levi, Robert Giterman 26 April 2017 isclaimer: This course was

More information

SISD SIMD. Flynn s Classification 8/8/2016. CS528 Parallel Architecture Classification & Single Core Architecture C P M

SISD SIMD. Flynn s Classification 8/8/2016. CS528 Parallel Architecture Classification & Single Core Architecture C P M 8/8/26 S528 arallel Architecture lassification & Single ore Architecture arallel Architecture lassification A Sahu Dept of SE, IIT Guwahati A Sahu Flynn s lassification SISD Architecture ategories M SISD

More information

CPU DESIGN The Single-Cycle Implementation

CPU DESIGN The Single-Cycle Implementation CSE 202 Computer Organization CPU DESIGN The Single-Cycle Implementation Shakil M. Khan (adapted from Prof. H. Roumani) Dept of CS & Eng, York University Sequential vs. Combinational Circuits Digital circuits

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines 1 Clocks A microprocessor is composed of many different circuits that are operating simultaneously if each

More information

Simple Instruction-Pipelining (cont.) Pipelining Jumps

Simple Instruction-Pipelining (cont.) Pipelining Jumps 6.823, L9--1 Simple ruction-pipelining (cont.) + Interrupts Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Src1 ( j / ~j ) Src2 ( / Ind) Pipelining Jumps

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

CMP 338: Third Class

CMP 338: Third Class CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines Reminder: midterm on Tue 2/28 will cover Chapters 1-3, App A, B if you understand all slides, assignments,

More information

Lecture 2: Metrics to Evaluate Systems

Lecture 2: Metrics to Evaluate Systems Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

Fall 2008 CSE Qualifying Exam. September 13, 2008

Fall 2008 CSE Qualifying Exam. September 13, 2008 Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for

More information

CMP 334: Seventh Class

CMP 334: Seventh Class CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative

More information

Scheduling I. Today Introduction to scheduling Classical algorithms. Next Time Advanced topics on scheduling

Scheduling I. Today Introduction to scheduling Classical algorithms. Next Time Advanced topics on scheduling Scheduling I Today Introduction to scheduling Classical algorithms Next Time Advanced topics on scheduling Scheduling out there You are the manager of a supermarket (ok, things don t always turn out the

More information

Enrico Nardelli Logic Circuits and Computer Architecture

Enrico Nardelli Logic Circuits and Computer Architecture Enrico Nardelli Logic Circuits and Computer Architecture Appendix B The design of VS0: a very simple CPU Rev. 1.4 (2009-10) by Enrico Nardelli B - 1 Instruction set Just 4 instructions LOAD M - Copy into

More information

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lectre 9: Control Hazard and Resoltion James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L09 S1, James C. Hoe, CU/ECE/CALC, 2018 Yor goal today Hosekeeping simple control flow

More information

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling Scheduling I Today! Introduction to scheduling! Classical algorithms Next Time! Advanced topics on scheduling Scheduling out there! You are the manager of a supermarket (ok, things don t always turn out

More information

Scalable Store-Load Forwarding via Store Queue Index Prediction

Scalable Store-Load Forwarding via Store Queue Index Prediction Scalable Store-Load Forwarding via Store Queue Index Prediction Tingting Sha, Milo M.K. Martin, Amir Roth University of Pennsylvania {shatingt, milom, amir}@cis.upenn.edu addr addr addr (CAM) predictor

More information

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B. Javadi, H. Honda, K. Inoue, K. Murakami Faculty

More information

TDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts.

TDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts. TDDI4 Concurrent Programming, Operating Systems, and Real-time Operating Systems CPU Scheduling Overview: CPU Scheduling CPU bursts and I/O bursts Scheduling Criteria Scheduling Algorithms Multiprocessor

More information

Review: Single-Cycle Processor. Limits on cycle time

Review: Single-Cycle Processor. Limits on cycle time Review: Single-Cycle Processor Jump 3:26 5: MemtoReg Control Unit LUControl 2: Op Funct LUSrc RegDst PCJump PC 4 uction + PCPlus4 25:2 2:6 2:6 5: 5: 2 3 WriteReg 4: Src Src LU Zero LUResult Write + PC

More information

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it... // CS 2, Fall 2! CS 2, Fall 2! We built the instrument. Now we read music and play it... A simple implementa/on uc/on uct r r 2 Write r Src Src Extend 6 Mem Next: path 7-2 CS 2, Fall 2! signals CS 2, Fall

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures Chapter Additional Design Examples Verilog HDL:Digital Design and Modeling Chapter Additional Design Examples Additional Figures Chapter Additional Design Examples 2 Page 62 a b y y 2 y 3 c d e f Figure

More information

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions 18-447 Lectre 12: Pipelined Implementations: Control Hazards and Resoltions S 09 L12-1 James C. Hoe Dept of ECE, CU arch 2, 2009 Annoncements: Spring break net week!! Project 2 de the week after spring

More information

Instruction Set Extensions for Reed-Solomon Encoding and Decoding

Instruction Set Extensions for Reed-Solomon Encoding and Decoding Instruction Set Extensions for Reed-Solomon Encoding and Decoding Suman Mamidi and Michael J Schulte Dept of ECE University of Wisconsin-Madison {mamidi, schulte}@caewiscedu http://mesaecewiscedu Daniel

More information

Last class: Today: Threads. CPU Scheduling

Last class: Today: Threads. CPU Scheduling 1 Last class: Threads Today: CPU Scheduling 2 Resource Allocation In a multiprogramming system, we need to share resources among the running processes What are the types of OS resources? Question: Which

More information

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk ECE290 Fall 2012 Lecture 22 Dr. Zbigniew Kalbarczyk Today LC-3 Micro-sequencer (the control store) LC-3 Micro-programmed control memory LC-3 Micro-instruction format LC -3 Micro-sequencer (the circuitry)

More information

Portland State University ECE 587/687. Branch Prediction

Portland State University ECE 587/687. Branch Prediction Portland State University ECE 587/687 Branch Prediction Copyright by Alaa Alameldeen and Haitham Akkary 2015 Branch Penalty Example: Comparing perfect branch prediction to 90%, 95%, 99% prediction accuracy,

More information

Lecture 3, Performance

Lecture 3, Performance Lecture 3, Performance Repeating some definitions: CPI Clocks Per Instruction MHz megahertz, millions of cycles per second MIPS Millions of Instructions Per Second = MHz / CPI MOPS Millions of Operations

More information

MICROPROCESSOR REPORT. THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE

MICROPROCESSOR REPORT.   THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE MICROPROCESSOR www.mpronline.com REPORT THE INSIDER S GUIDE TO MICROPROCESSOR HARDWARE ENERGY COROLLARIES TO AMDAHL S LAW Analyzing the Interactions Between Parallel Execution and Energy Consumption By

More information

DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman.

DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman. SP esign Lecture 7 Unfolding cont. & Folding r. Fredrik Edman fredrik.edman@eit.lth.se Unfolding Unfolding creates a program with more than one iteration, J=unfolding factor Unfolding is a structured way

More information

EC 413 Computer Organization

EC 413 Computer Organization EC 413 Computer Organization rithmetic Logic Unit (LU) and Register File Prof. Michel. Kinsy Computing: Computer Organization The DN of Modern Computing Computer CPU Memory System LU Register File Disks

More information

Simulation of Process Scheduling Algorithms

Simulation of Process Scheduling Algorithms Simulation of Process Scheduling Algorithms Project Report Instructor: Dr. Raimund Ege Submitted by: Sonal Sood Pramod Barthwal Index 1. Introduction 2. Proposal 3. Background 3.1 What is a Process 4.

More information

Module 5: CPU Scheduling

Module 5: CPU Scheduling Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 5.1 Basic Concepts Maximum CPU utilization obtained

More information

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1 Computer Architecture ECE 361 Lecture 5: The Design Process & Design 361 design.1 Quick Review of Last Lecture 361 design.2 MIPS ISA Design Objectives and Implications Support general OS and C- style language

More information

Chapter 6: CPU Scheduling

Chapter 6: CPU Scheduling Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 6.1 Basic Concepts Maximum CPU utilization obtained

More information

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning Yuchun Ma* Zhuoyuan Li* Jason Cong Xianlong Hong Glenn Reinman Sheqin Dong* Qiang Zhou *Department of Computer Science &

More information

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I. Last (family) name: Solution First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE/CS 752 Advanced Computer Architecture I Midterm

More information

CPE100: Digital Logic Design I

CPE100: Digital Logic Design I Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu CPE100: Digital Logic Design I Final Review http://www.ee.unlv.edu/~b1morris/cpe100/ 2 Logistics Tuesday Dec 12 th 13:00-15:00 (1-3pm) 2 hour

More information

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations!

Parallel Numerics. Scope: Revise standard numerical methods considering parallel computations! Parallel Numerics Scope: Revise standard numerical methods considering parallel computations! Required knowledge: Numerics Parallel Programming Graphs Literature: Dongarra, Du, Sorensen, van der Vorst:

More information

Scheduling. Uwe R. Zimmer & Alistair Rendell The Australian National University

Scheduling. Uwe R. Zimmer & Alistair Rendell The Australian National University 6 Scheduling Uwe R. Zimmer & Alistair Rendell The Australian National University References for this chapter [Bacon98] J. Bacon Concurrent Systems 1998 (2nd Edition) Addison Wesley Longman Ltd, ISBN 0-201-17767-6

More information

Introduction The Nature of High-Performance Computation

Introduction The Nature of High-Performance Computation 1 Introduction The Nature of High-Performance Computation The need for speed. Since the beginning of the era of the modern digital computer in the early 1940s, computing power has increased at an exponential

More information

Circuit Modeling for Practical Many-core Architecture Design Exploration

Circuit Modeling for Practical Many-core Architecture Design Exploration Circuit Modeling for Practical Many-core Architecture Design Exploration Redefining design abstractions Dean Truong Bevan Baas VLSI Computation Lab University of California, Davis Outline Motivation Circuit

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

Lecture 3, Performance

Lecture 3, Performance Repeating some definitions: Lecture 3, Performance CPI MHz MIPS MOPS Clocks Per Instruction megahertz, millions of cycles per second Millions of Instructions Per Second = MHz / CPI Millions of Operations

More information

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications

GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Fall 2004 Yale Patt, Instructor Aater Suleman, Huzefa Sanjeliwala, Dam Sunwoo, TAs Exam 1, October 6, 2004 Name:

More information

Basic Computer Organization and Design Part 3/3

Basic Computer Organization and Design Part 3/3 Basic Computer Organization and Design Part 3/3 Adapted by Dr. Adel Ammar Computer Organization Interrupt Initiated Input/Output Open communication only when some data has to be passed --> interrupt. The

More information

Origami: Folding Warps for Energy Efficient GPUs

Origami: Folding Warps for Energy Efficient GPUs Origami: Folding Warps for Energy Efficient GPUs Mohammad Abdel-Majeed*, Daniel Wong, Justin Huang and Murali Annavaram* * University of Southern alifornia University of alifornia, Riverside Stanford University

More information

Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism

Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism Cyrus: Unintrusive Application-Level Record-Replay for Replay Parallelism Nima Honarmand, Nathan Dautenhahn, Josep Torrellas and Samuel T. King (UIUC) Gilles Pokam and Cristiano Pereira (Intel) iacoma.cs.uiuc.edu

More information

CPU SCHEDULING RONG ZHENG

CPU SCHEDULING RONG ZHENG CPU SCHEDULING RONG ZHENG OVERVIEW Why scheduling? Non-preemptive vs Preemptive policies FCFS, SJF, Round robin, multilevel queues with feedback, guaranteed scheduling 2 SHORT-TERM, MID-TERM, LONG- TERM

More information

Synchronous Elastic Systems

Synchronous Elastic Systems Synchronous Elastic Systems Mike Kishinevsky and Jordi Cortadella Intel Strategic CAD Labs Hillsboro, USA Universitat Politecnica de Catalunya Barcelona, Spain 2 Contributors to SELF research Performance

More information

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19

More information

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation Jingyang Zhu 1, Zhiliang Qian 2, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and

More information

CSE Computer Architecture I

CSE Computer Architecture I Execution Sequence Summary CSE 30321 Computer Architecture I Lecture 17 - Multi Cycle Control Michael Niemier Department of Computer Science and Engineering Step name Instruction fetch Instruction decode/register

More information

Computer Architecture

Computer Architecture Computer Architecture QtSpim, a Mips simulator S. Coudert and R. Pacalet January 4, 2018..................... Memory Mapping 0xFFFF000C 0xFFFF0008 0xFFFF0004 0xffff0000 0x90000000 0x80000000 0x7ffff4d4

More information