Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

Size: px
Start display at page:

Download "Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)"

Transcription

1 Out-of-order Pipeline Buffer of instructions Issue = Select + Wakeup Select N oldest, read instructions N=, xor N=, xor and sub Note: ma have execution resource constraints: i.e., load/store/fp Fetch Decode Rename Dispatch Issue Reg-read Execute Writeback Commit Insn Inp R Inp R Dst Age xor p p 0 Read! add n p4 In-order front end Out-of-order execution sub p5 p addi n Read! 40 Issue = Select + Wakeup Wakeup dependent instructions CAM search for Dst in inputs Set read Also update read-bit table for future instructions Insn Inp R Inp R Dst Age xor p p 0 add p4 sub p5 p addi Read bits p p p3 p4 p5 n n Select/Wakeup one ccle Dependents go back to back Next ccle: add/addi are read: Issue Insn Inp R Inp R Dst Age add p4 addi OOO execution (-wide) OOO execution (-wide) xor RDY add sub RDY addi p 7 p xor add RDY sub addi RDY xor p, p sub p5, p p 7 p

2 OOO execution (-wide) OOO execution (-wide) xor add sub addi add, p4 addi, p 7 p xor 7, 3 sub 6 3 xor add sub addi p 7 p add _, 9 addi _, OOO execution (-wide) OOO execution (-wide) p 7 p 7 xor add sub addi p xor add sub addi p Note similarit to in-order OOO execution (-wide) p 7 p Multi-ccle operations Multi-ccle ops (load, fp, multipl, etc.) Wakeup deferred a few ccles Structural hazard? Cache misses? Speculative wake-up (assume hit) Cancel exec of dependents Re-issue later Details: complicated, not important 54 55

3 Re-order Buffer (ROB) All instructions in order Two purposes Misprediction recover In-order commit Maintain appearance of in-order execution Freeing of phsical registers RENAMING REVISITED Renaming revisited Overwritten register Freed at commit Restore in map table on recover Branch mis-prediction recover Also must be read at rename Original insns xor r,r r3 addi r3, r r p r3 p3 p xor r,r r3 xor p, p addi r3, r xor r,r r3 xor p, p addi r3, r r p r3 p3 p0 r p r3 p0 60 6

4 xor r,r r3 xor p, p add, p4 addi r3, r xor r,r r3 xor p, p addi r3, r r p r3 p0 r p r3 r4 p xor r,r r3 xor p, p sub p5, p addi r3, r xor r,r r3 xor p, p sub p5, p addi r3, r r p r3 r4 p0 r p r3 r4 p xor r,r r3 xor p, p addi r3, r sub p5, p addi, [p] xor r,r r3 xor p, p addi r3, r sub p5, p addi, [p] r p r3 r4 p0 r r3 r4 p

5 ROB ROB entr holds all info for recover/commit Logical register names Phsical register names Instruction tpes Dispatch: insert at tail Full? Stall Commit: remove from head Not completed? Stall Recover Completel remove wrong path instructions Flush from IQ Remove from ROB Restore map table to before misprediction Free destination registers Recover example bnz r loop bnz p, loop xor r, r r3 xor p, p add r3, r4 r4 sub r5, r r3 addi r3, r sub p5, p addi, [p] Recover example bnz r loop bnz p, loop xor r, r r3 xor p, p add r3, r4 r4 sub r5, r r3 addi r3, r sub p5, p addi, [p] r r3 r4 p0 r p r3 r4 p Recover example bnz r loop bnz p, loop xor r, r r3 xor p, p add r3, r4 r4 sub r5, r r3 sub p5, p Recover example bnz r loop bnz p, loop xor r, r r3 xor p, p add r3, r4 r4 r p r3 r4 p0 r p r3 p0 7 73

6 bnz r loop xor r, r r3 Recover example bnz p, loop xor p, p Recover example bnz r loop bnz p, loop r p r3 p3 p0 r p r3 p3 p What about stores Stores: Write D$, not registers Can we rename memor? Recover in the cache? No (at least not easil) Cache writes unrecoverable Stores: onl when certain Commit Commit xor r, r r3 xor p, p add r3, r4 r4 sub r5, r r3 addi r3, r sub p5, p addi, [p] At commit: instruction becomes architected state In order Onl when instructions are finished Free overwritten register (wh?) xor r,r r3 addi r3, r Freeing over-written register xor p, p sub p5, p addi, [p] Commit Example xor r,r r3 xor p, p addi r3, r sub p5, p addi, [p] Before xor: r3 p3 After xor: r3 Insns older than xor reads p3 Insns ounger than xor read (until next r3-writing instruction) At commit of xor, no older instructions exist No one else needs p3 free it! r r3 r4 p

7 Commit Example xor r,r r3 addi r3, r xor p, p sub p5, p addi, [p] Commit Example addi r3, r sub p5, p addi, [p] r r r3 r4 r5 p p5 p0 p3 r r r3 r4 r5 p p5 p0 p3 p Commit Example Commit Example addi r3, r sub p5, p addi, [p] addi r3, r addi, [p] r r r3 r4 r5 p p5 p0 p3 p4 r r r3 r4 r5 p p5 p0 p3 p4 p 8 83 Standard stle: large and cumbersome Change laout slightl Columns = stages (dispatch, issue, etc.) Rows = instructions Content of boxes = ccles For our purposes: issue/exec = ccle Ignore preg read latenc, etc. Load-use, mul, div, and FP longer ld [p] p add p, p3 p4 ld [] Buffer of instructions Fetch Decode Rename Dispatch Issue Reg-read Execute Writeback Commit 84 85

8 ld [p] p add p, p3 p4 ld [] ld [p] p add p, p3 p4 ld [] -wide Infinite ROB, IQ, Pregs Loads: 3 ccles Ccle : Dispatch xor and ld ld [p] p 5 ld [p] p 5 add p, p3 p4 add p, p3 p4 ld [] ld [] 3 6 Ccle : Dispatch xor and ld st Ld issues -- also note WB ccle while ou do this (Note: don t issue if WB ports full) Ccle 3: add and xor are not read nd load is issue it ld [p] p 5 ld [p] p 5 6 add p, p3 p4 5 6 add p, p3 p ld [] 3 6 ld [] 3 6 Ccle 4: nothing Ccle 5: add can issue Ccle 6: st load can commit (oldest instruction & finished) xor can issue 90 9

9 ld [p] p 5 6 ld [p] p 5 6 add p, p3 p add p, p3 p ld [] 3 6 ld [] Ccle 7: add can commit (oldest instruction & finished) Ccle 8: xor and ld can commit (-wide: can do both at once) 9 93 ld [p] p add p, p3 p4 ld [] Buffer of instructions HANDLING MEMORY OPS Fetch Decode Rename Dispatch Issue Reg-read Execute Writeback Commit Dnamicall Scheduling Memor Ops Compilers must schedule memor ops conservativel Options for hardware: Hold loads until all prior stores execute (conservative) Execute loads as soon as possible, detect violations (aggressive) When a store executes, it checks if an later loads executed too earl (to same address). If so, flush pipeline Learn violations over time, selectivel reorder (predictive) Before Wrong(?) ld r,4(sp) ld r,4(sp) ld r3,8(sp) ld r3,8(sp) add r3,r,r //stall ld r5,0(r8) //does r8sp? st r,0(sp) add r3,r,r ld r5,0(r8) ld r6,4(r8) //does r8+4sp? ld r6,4(r8) st r,0(sp) sub r5,r6,r4 //stall sub r5,r6,r4 st r4,8(r8) st r4,8(r8) 96 Loads and Stores fdiv p,p p3 st p4 [p5] st p3 ld [] 5 3 Ccle 3: Can ld[] execute? (wh or wh not?) 97

10 Loads and Stores Loads and Stores fdiv p,p p3 5 fdiv p,p p3 5 st p4 [p5] 3 st p4 [p5] 3 st p3 st p3 ld [] ld [] Aliasing (again) p5?? Suppose p5 and!= Can ld[] execute? (wh or wh not?) Memor Forwarding Stores write cache at commit Commit is in-order, delaed b all instructions Allows stores to be undone on branch mis-predictions, etc. Loads read cache Earl execution of loads is critical Forwarding Allow store load communication before store commit Conceptuall like reg. bpassing, but different implementation Wh? Addresses unknown until execute Forwarding: Store Queue Store Queue Holds all in-flight stores CAM: searchable b address Age logic: determine oungest matching store older than load Store execution Write Store Queue Address + Data Load execution Search SQ Match? Forward Read D$ address address load position Store Queue (SQ) age Data cache data in value data out head tail 00 0

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example This Unit: Scheduling (Static + Dnamic) CIS 50 Computer Architecture Unit 8: Static and Dnamic Scheduling Application OS Compiler Firmware CPU I/O Memor Digital Circuits Gates & Transistors! Previousl:!

More information

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines

This Unit: Dynamic Scheduling. Can Hardware Overcome These Limits? Scheduling: Compiler or Hardware. The Problem With In-Order Pipelines This Uit: Damic Schedulig CSE 0 Computer Sstems Architecture Damic Schedulig Slides origiall developed b Drew Hilto (IBM) ad Milo Marti (Uiversit of Peslvaia) App App App Sstem software Mem CPU I/O Code

More information

EE 660: Computer Architecture Out-of-Order Processors

EE 660: Computer Architecture Out-of-Order Processors EE 660: Computer Architecture Out-of-Order Processors Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David entzlaff Agenda I4 Processors I2O2

More information

CSCI-564 Advanced Computer Architecture

CSCI-564 Advanced Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA

More information

Scalable Store-Load Forwarding via Store Queue Index Prediction

Scalable Store-Load Forwarding via Store Queue Index Prediction Scalable Store-Load Forwarding via Store Queue Index Prediction Tingting Sha, Milo M.K. Martin, Amir Roth University of Pennsylvania {shatingt, milom, amir}@cis.upenn.edu addr addr addr (CAM) predictor

More information

Unit 6: Branch Prediction

Unit 6: Branch Prediction CIS 501: Computer Architecture Unit 6: Branch Prediction Slides developed by Joe Devie/, Milo Mar4n & Amir Roth at Upenn with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi,

More information

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I. Last (family) name: Solution First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE/CS 752 Advanced Computer Architecture I Midterm

More information

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control The MIPS Pipeline CSCI206 - Computer Organization & Programming Pipeline Datapath and Control zybook: 11.6 Developed and maintained by the Bucknell University Computer Science Department - 2017 Hazard

More information

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture. Appendix C CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)

More information

Fall 2011 Prof. Hyesoon Kim

Fall 2011 Prof. Hyesoon Kim Fall 2011 Prof. Hyesoon Kim Add: 2 cycles FE_stage add r1, r2, r3 FE L ID L EX L MEM L WB L add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub add add mul sub sub add add mul sub sub add

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2 Pipelining CS 365 Lecture 12 Prof. Yih Huang CS 365 1 Traditional Execution 1 2 3 4 1 2 3 4 5 1 2 3 add ld beq CS 365 2 1 Pipelined Execution 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

More information

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT CSE 560 Practice Problem Set 4 Solution 1. In this question, you will examine several different schemes for branch prediction, using the following code sequence for a simple load store ISA with no branch

More information

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern

More information

/ : Computer Architecture and Design

/ : Computer Architecture and Design 16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)

More information

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference) ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC

More information

Portland State University ECE 587/687. Branch Prediction

Portland State University ECE 587/687. Branch Prediction Portland State University ECE 587/687 Branch Prediction Copyright by Alaa Alameldeen and Haitham Akkary 2015 Branch Penalty Example: Comparing perfect branch prediction to 90%, 95%, 99% prediction accuracy,

More information

CPSC 3300 Spring 2017 Exam 2

CPSC 3300 Spring 2017 Exam 2 CPSC 3300 Spring 2017 Exam 2 Name: 1. Matching. Write the correct term from the list into each blank. (2 pts. each) structural hazard EPIC forwarding precise exception hardwired load-use data hazard VLIW

More information

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION

More information

3. (2) What is the difference between fixed and hybrid instructions?

3. (2) What is the difference between fixed and hybrid instructions? 1. (2 pts) What is a "balanced" pipeline? 2. (2 pts) What are the two main ways to define performance? 3. (2) What is the difference between fixed and hybrid instructions? 4. (2 pts) Clock rates have grown

More information

Implementing the Controller. Harvard-Style Datapath for DLX

Implementing the Controller. Harvard-Style Datapath for DLX 6.823, L6--1 Implementing the Controller Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L6--2 Harvard-Style Datapath for DLX Src1 ( j / ~j ) Src2 ( R / RInd) RegWrite MemWrite

More information

Computer Architecture ELEC2401 & ELEC3441

Computer Architecture ELEC2401 & ELEC3441 Last Time Pipeline Hazard Computer Architecture ELEC2401 & ELEC3441 Lecture 8 Pipelining (3) Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Structural Hazard Hazard Control

More information

4. (3) What do we mean when we say something is an N-operand machine?

4. (3) What do we mean when we say something is an N-operand machine? 1. (2) What are the two main ways to define performance? 2. (2) When dealing with control hazards, a prediction is not enough - what else is necessary in order to eliminate stalls? 3. (3) What is an "unbalanced"

More information

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 2. (2 )What are the two main ways to define performance? 3. (2 )What

More information

Simple Instruction-Pipelining (cont.) Pipelining Jumps

Simple Instruction-Pipelining (cont.) Pipelining Jumps 6.823, L9--1 Simple ruction-pipelining (cont.) + Interrupts Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Src1 ( j / ~j ) Src2 ( / Ind) Pipelining Jumps

More information

Performance Prediction Models

Performance Prediction Models Performance Prediction Models Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, Scott Mahlke {shrupad,lukefahr,reetudas,mahlke}@umich.edu Gem5 workshop Micro 2012 December 2, 2012 Composite Cores Pushing

More information

COVER SHEET: Problem#: Points

COVER SHEET: Problem#: Points EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)

More information

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary? [2] When we talk about the number of operands in an instruction (a 1-operand or a 2-operand instruction, for example), what do we mean? [2] What are the two main ways to define performance? [2] Predicting

More information

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. I fetch decode & eg-fetch execute memory Clock period

More information

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary? [2] What are the two main ways to define performance? [2] Predicting the direction of a branch is not enough. What else is necessary? [2] The power consumed by a chip has increased over time, but the clock

More information

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. fetch decode & eg-fetch execute

More information

CS 52 Computer rchitecture and Engineering Lecture 4 - Pipelining Krste sanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs52!

More information

ENEE350 Lecture Notes-Weeks 14 and 15

ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining & Amdahl s Law ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining is a method of processing in which a problem is divided into a number of sub problems and solved and the solu8ons of the sub problems

More information

Microprocessor Power Analysis by Labeled Simulation

Microprocessor Power Analysis by Labeled Simulation Microprocessor Power Analysis by Labeled Simulation Cheng-Ta Hsieh, Kevin Chen and Massoud Pedram University of Southern California Dept. of EE-Systems Los Angeles CA 989 Outline! Introduction! Problem

More information

Project Two RISC Processor Implementation ECE 485

Project Two RISC Processor Implementation ECE 485 Project Two RISC Processor Implementation ECE 485 Chenqi Bao Peter Chinetti November 6, 2013 Instructor: Professor Borkar 1 Statement of Problem This project requires the design and test of a RISC processor

More information

Performance, Power & Energy

Performance, Power & Energy Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining CMU 18-447 Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining Instructor: Prof Onur Mutlu TAs: Rachata Ausavarungnirun, Kevin Chang, Albert Cho, Jeremie

More information

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018 ECE 172 Digital Systems Chapter 12 Instruction Pipelining Herbert G. Mayer, PSU Status 7/20/2018 1 Syllabus l Scheduling on Pipelined Architecture l Idealized Pipeline l Goal of Scheduling l Causes for

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

Fall 2008 CSE Qualifying Exam. September 13, 2008

Fall 2008 CSE Qualifying Exam. September 13, 2008 Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for

More information

Compiling Techniques

Compiling Techniques Lecture 11: Introduction to 13 November 2015 Table of contents 1 Introduction Overview The Backend The Big Picture 2 Code Shape Overview Introduction Overview The Backend The Big Picture Source code FrontEnd

More information

Lecture 3, Performance

Lecture 3, Performance Repeating some definitions: Lecture 3, Performance CPI MHz MIPS MOPS Clocks Per Instruction megahertz, millions of cycles per second Millions of Instructions Per Second = MHz / CPI Millions of Operations

More information

Lecture: Pipelining Basics

Lecture: Pipelining Basics Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:

More information

Lecture 3, Performance

Lecture 3, Performance Lecture 3, Performance Repeating some definitions: CPI Clocks Per Instruction MHz megahertz, millions of cycles per second MIPS Millions of Instructions Per Second = MHz / CPI MOPS Millions of Operations

More information

1 st Semester 2007/2008

1 st Semester 2007/2008 Chapter 17: System Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2007/2008 Slides baseados nos slides oficiais do livro Database System c Silberschatz, Korth and Sudarshan.

More information

IITK CSE PG Admissions: Sample Test

IITK CSE PG Admissions: Sample Test IITK CSE PG dmissions: Sample Test pril 11, 2018 Page 1 of 10 1. Suppose X is a uniform random variable between 0.50 and 1.00. What is the probability that a randomly selected value of X is between 0.55

More information

Evaluating Overheads of Multi-bit Soft Error Protection Techniques at Hardware Level Sponsored by SRC and Freescale under SRC task number 2042

Evaluating Overheads of Multi-bit Soft Error Protection Techniques at Hardware Level Sponsored by SRC and Freescale under SRC task number 2042 Evaluating Overheads of Multi-bit Soft Error Protection Techniques at Hardware Level Sponsored by SR and Freescale under SR task number 2042 Lukasz G. Szafaryn, Kevin Skadron Department of omputer Science

More information

Pipeline no Prediction. Branch Delay Slots A. From before branch B. From branch target C. From fall through. Branch Prediction

Pipeline no Prediction. Branch Delay Slots A. From before branch B. From branch target C. From fall through. Branch Prediction Pipeline no Prediction Branching completes in 2 cycles We know the target address after the second stage? PC fetch Instruction Memory Decode Check the condition Calculate the branch target address PC+4

More information

Circuit Theory ES3, EE21

Circuit Theory ES3, EE21 Circuit Theory ES3, EE21 2017 ECE PhD Qualifier: Circuit Theory (a) Find the transfer function for the circuit shown below, H(jω) = V out/v in. The op amp may be considered ideal within the operating limits

More information

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle Computer Engineering Department CC 311- Computer Architecture Chapter 4 The Processor: Datapath and Control Single Cycle Introduction The 5 classic components of a computer Processor Input Control Memory

More information

Timeline of a Vulnerability

Timeline of a Vulnerability Timeline of a Vulnerability Is this all a conspiracy? Vulnerability existed for many years 1 Daniel Gruss, Moritz Lipp, Michael Schwarz www.iaik.tugraz.at Timeline of a Vulnerability Is this all a conspiracy?

More information

CIS 371 Computer Organization and Design

CIS 371 Computer Organization and Design CIS 371 Computer Organization and Design Unit 7: Caches Based on slides by Prof. Amir Roth & Prof. Milo Martin CIS 371: Comp. Org. Prof. Milo Martin Caches 1 This Unit: Caches I$ Core L2 D$ Main Memory

More information

Branch Prediction using Advanced Neural Methods

Branch Prediction using Advanced Neural Methods Branch Prediction using Advanced Neural Methods Sunghoon Kim Department of Mechanical Engineering University of California, Berkeley shkim@newton.berkeley.edu Abstract Among the hardware techniques, two-level

More information

NEW MEXICO STATE UNIVERSITY THE KLIPSCH SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING Ph.D. QUALIFYING EXAMINATION

NEW MEXICO STATE UNIVERSITY THE KLIPSCH SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING Ph.D. QUALIFYING EXAMINATION Write your four digit code here... NEW MEXICO STATE UNIVERSITY THE KLIPSCH SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING Ph.D. QUALIFYING EXAMINATION Exam Instructions: August 16, 2010 9:00 AM - 1:00 PM

More information

Chapter 7. Sequential Circuits Registers, Counters, RAM

Chapter 7. Sequential Circuits Registers, Counters, RAM Chapter 7. Sequential Circuits Registers, Counters, RAM Register - a group of binary storage elements suitable for holding binary info A group of FFs constitutes a register Commonly used as temporary storage

More information

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control Logic and Computer Design Fundamentals Chapter 8 Sequencing and Control Datapath and Control Datapath - performs data transfer and processing operations Control Unit - Determines enabling and sequencing

More information

Vector Lane Threading

Vector Lane Threading Vector Lane Threading S. Rivoire, R. Schultz, T. Okuda, C. Kozyrakis Computer Systems Laboratory Stanford University Motivation Vector processors excel at data-level parallelism (DLP) What happens to program

More information

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lectre 9: Control Hazard and Resoltion James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L09 S1, James C. Hoe, CU/ECE/CALC, 2018 Yor goal today Hosekeeping simple control flow

More information

Outcomes. Spiral 1 / Unit 2. Boolean Algebra BOOLEAN ALGEBRA INTRO. Basic Boolean Algebra Logic Functions Decoders Multiplexers

Outcomes. Spiral 1 / Unit 2. Boolean Algebra BOOLEAN ALGEBRA INTRO. Basic Boolean Algebra Logic Functions Decoders Multiplexers -2. -2.2 piral / Unit 2 Basic Boolean Algebra Logic Functions Decoders Multipleers Mark Redekopp Outcomes I know the difference between combinational and sequential logic and can name eamples of each.

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 09/10, Jan., 2018 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 43 Most Essential Assumptions for Real-Time Systems Upper

More information

Unit 8: Sequ. ential Circuits

Unit 8: Sequ. ential Circuits CPSC 121: Models of Computation Unit 8: Sequ ential Circuits Based on slides by Patrice Be lleville and Steve Wolfman Pre-Class Learning Goals By the start of class, you s hould be able to Trace the operation

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Fall 2004 Yale Patt, Instructor Aater Suleman, Huzefa Sanjeliwala, Dam Sunwoo, TAs Exam 1, October 6, 2004 Name:

More information

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions 18-447 Lectre 12: Pipelined Implementations: Control Hazards and Resoltions S 09 L12-1 James C. Hoe Dept of ECE, CU arch 2, 2009 Annoncements: Spring break net week!! Project 2 de the week after spring

More information

CODE GENERATION REGISTER ALLOCATION. Goal. Interplay between. Translate intermediate code into target code

CODE GENERATION REGISTER ALLOCATION. Goal. Interplay between. Translate intermediate code into target code CODE GENERATION Goal Translate intermediate code into target code Interplay between Register Allocation Instruction Selection Instruction Scheduling 1 REGISTER ALLOCATION 1 REGISTER ALLOCATION Motivation

More information

Menu. Binary Adder EEL3701 EEL3701. Add, Subtract, Compare, ALU

Menu. Binary Adder EEL3701 EEL3701. Add, Subtract, Compare, ALU Other MSI Circuit: Adders >Binar, Half & Full Canonical forms Binar Subtraction Full-Subtractor Magnitude Comparators >See Lam: Fig 4.8 ALU Menu Look into m... 1 Binar Adder Suppose we want to add two

More information

Last class: Today: Threads. CPU Scheduling

Last class: Today: Threads. CPU Scheduling 1 Last class: Threads Today: CPU Scheduling 2 Resource Allocation In a multiprogramming system, we need to share resources among the running processes What are the types of OS resources? Question: Which

More information

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism Raj Parihar Advisor: Prof. Michael C. Huang March 22, 2013 Raj Parihar Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

More information

A Second Datapath Example YH16

A Second Datapath Example YH16 A Second Datapath Example YH16 Lecture 09 Prof. Yih Huang S365 1 A 16-Bit Architecture: YH16 A word is 16 bit wide 32 general purpose registers, 16 bits each Like MIPS, 0 is hardwired zero. 16 bit P 16

More information

Timing analysis and predictability of architectures

Timing analysis and predictability of architectures Timing analysis and predictability of architectures Cache analysis Claire Maiza Verimag/INP 01/12/2010 Claire Maiza Synchron 2010 01/12/2010 1 / 18 Timing Analysis Frequency Analysis-guaranteed timing

More information

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures Chapter Additional Design Examples Verilog HDL:Digital Design and Modeling Chapter Additional Design Examples Additional Figures Chapter Additional Design Examples 2 Page 62 a b y y 2 y 3 c d e f Figure

More information

Part I: Definitions and Properties

Part I: Definitions and Properties Turing Machines Part I: Definitions and Properties Finite State Automata Deterministic Automata (DFSA) M = {Q, Σ, δ, q 0, F} -- Σ = Symbols -- Q = States -- q 0 = Initial State -- F = Accepting States

More information

ECE QUALIFYING EXAM I (QE l) January 5, 2015

ECE QUALIFYING EXAM I (QE l) January 5, 2015 ECE QUALIFYING EAM I (QE l) January 5, 2015 Student Exam No. --- Do not put your name or SU ID on any of your paperwork. This is a closed-book exam. It is designed such that the math skills, as well as

More information

Lecture 5 - Assembly Programming(II), Intro to Digital Filters

Lecture 5 - Assembly Programming(II), Intro to Digital Filters GoBack Lecture 5 - Assembly Programming(II), Intro to Digital Filters James Barnes (James.Barnes@colostate.edu) Spring 2009 Colorado State University Dept of Electrical and Computer Engineering ECE423

More information

Timeline of a Vulnerability

Timeline of a Vulnerability Introduction Timeline of a Vulnerability Is this all a conspiracy? Vulnerability existed for many years 2 Michael Schwarz (@misc0110) www.iaik.tugraz.at Timeline of a Vulnerability Is this all a conspiracy?

More information

Clock-driven scheduling

Clock-driven scheduling Clock-driven scheduling Also known as static or off-line scheduling Michal Sojka Czech Technical University in Prague, Faculty of Electrical Engineering, Department of Control Engineering November 8, 2017

More information

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5. CHPTER 9 2008 Pearson Education, Inc. 9-. log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C 7 Z = F 7 + F 6 + F 5 + F 4 + F 3 + F 2 + F + F 0 N = F 7 9-3.* = S + S = S + S S S S0 C in C 0 dder

More information

ALU A functional unit

ALU A functional unit ALU A functional unit that performs arithmetic operations such as ADD, SUB, MPY logical operations such as AND, OR, XOR, NOT on given data types: 8-,16-,32-, or 64-bit values A n-1 A n-2... A 1 A 0 B n-1

More information

Figure 4.9 MARIE s Datapath

Figure 4.9 MARIE s Datapath Term Control Word Microoperation Hardwired Control Microprogrammed Control Discussion A set of signals that executes a microoperation. A register transfer or other operation that the CPU can execute in

More information

CHARACTERIZATION AND CLASSIFICATION OF MODERN MICRO-PROCESSOR BENCHMARKS KUNXIANG YAN, B.S. A thesis submitted to the Graduate School

CHARACTERIZATION AND CLASSIFICATION OF MODERN MICRO-PROCESSOR BENCHMARKS KUNXIANG YAN, B.S. A thesis submitted to the Graduate School CHARACTERIZATION AND CLASSIFICATION OF MODERN MICRO-PROCESSOR BENCHMARKS BY KUNXIANG YAN, B.S. A thesis submitted to the Graduate School in partial fulfillment of the requirements for the degree Master

More information

Equalities and Uninterpreted Functions. Chapter 3. Decision Procedures. An Algorithmic Point of View. Revision 1.0

Equalities and Uninterpreted Functions. Chapter 3. Decision Procedures. An Algorithmic Point of View. Revision 1.0 Equalities and Uninterpreted Functions Chapter 3 Decision Procedures An Algorithmic Point of View D.Kroening O.Strichman Revision 1.0 Outline Decision Procedures Equalities and Uninterpreted Functions

More information

CMP 334: Seventh Class

CMP 334: Seventh Class CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative

More information

GATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session)

GATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session) GATE 4 A Brief Analysis (Based on student test experiences in the stream of CS on st March, 4 - Second Session) Section wise analysis of the paper Mark Marks Total No of Questions Engineering Mathematics

More information

Processor Design & ALU Design

Processor Design & ALU Design 3/8/2 Processor Design A. Sahu CSE, IIT Guwahati Please be updated with http://jatinga.iitg.ernet.in/~asahu/c22/ Outline Components of CPU Register, Multiplexor, Decoder, / Adder, substractor, Varity of

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law Topics 2 Page The Nature of Time real (i.e. wall clock) time = User Time: time spent

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Topics Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law 2 The Nature of Time real (i.e. wall clock) time = User Time: time spent executing

More information

System Data Bus (8-bit) Data Buffer. Internal Data Bus (8-bit) 8-bit register (R) 3-bit address 16-bit register pair (P) 2-bit address

System Data Bus (8-bit) Data Buffer. Internal Data Bus (8-bit) 8-bit register (R) 3-bit address 16-bit register pair (P) 2-bit address Intel 8080 CPU block diagram 8 System Data Bus (8-bit) Data Buffer Registry Array B 8 C Internal Data Bus (8-bit) F D E H L ALU SP A PC Address Buffer 16 System Address Bus (16-bit) Internal register addressing:

More information

AFRAMEWORK FOR STATISTICAL MODELING OF SUPERSCALAR PROCESSOR PERFORMANCE

AFRAMEWORK FOR STATISTICAL MODELING OF SUPERSCALAR PROCESSOR PERFORMANCE CARNEGIE MELLON UNIVERSITY AFRAMEWORK FOR STATISTICAL MODELING OF SUPERSCALAR PROCESSOR PERFORMANCE A DISSERTATION SUBMITTED TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS for the degree

More information

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor

A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B. Javadi, H. Honda, K. Inoue, K. Murakami Faculty

More information

Spiral 2-1. Datapath Components: Counters Adders Design Example: Crosswalk Controller

Spiral 2-1. Datapath Components: Counters Adders Design Example: Crosswalk Controller 2-. piral 2- Datapath Components: Counters s Design Example: Crosswalk Controller 2-.2 piral Content Mapping piral Theory Combinational Design equential Design ystem Level Design Implementation and Tools

More information

Discrete Event Simulation

Discrete Event Simulation Discrete Event Simulation Henry Z. Lo June 9, 2014 1 Motivation Suppose you were a branch manager of a bank. How many tellers should you staff to keep customers happy? With prior knowledge of how often

More information

Module 9: "Introduction to Shared Memory Multiprocessors" Lecture 17: "Introduction to Cache Coherence Protocols" Invalidation vs.

Module 9: Introduction to Shared Memory Multiprocessors Lecture 17: Introduction to Cache Coherence Protocols Invalidation vs. Invalidation vs. Update Sharing patterns Migratory hand-off States of a cache line Stores MSI protocol MSI example MESI protocol MESI example MOESI protocol Hybrid inval+update file:///e /parallel_com_arch/lecture17/17_1.htm[6/13/2012

More information

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1 Computer Architecture ECE 361 Lecture 5: The Design Process & Design 361 design.1 Quick Review of Last Lecture 361 design.2 MIPS ISA Design Objectives and Implications Support general OS and C- style language

More information

Universal Turing Machine. Lecture 20

Universal Turing Machine. Lecture 20 Universal Turing Machine Lecture 20 1 Turing Machine move the head left or right by one cell read write sequentially accessed infinite memory finite memory (state) next-action look-up table Variants don

More information

Scheduling. Uwe R. Zimmer & Alistair Rendell The Australian National University

Scheduling. Uwe R. Zimmer & Alistair Rendell The Australian National University 6 Scheduling Uwe R. Zimmer & Alistair Rendell The Australian National University References for this chapter [Bacon98] J. Bacon Concurrent Systems 1998 (2nd Edition) Addison Wesley Longman Ltd, ISBN 0-201-17767-6

More information

Basic Computer Organization and Design Part 3/3

Basic Computer Organization and Design Part 3/3 Basic Computer Organization and Design Part 3/3 Adapted by Dr. Adel Ammar Computer Organization Interrupt Initiated Input/Output Open communication only when some data has to be passed --> interrupt. The

More information

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 18: Sharing Patterns and Cache Coherence Protocols. The Lecture Contains:

Module 9: Addendum to Module 6: Shared Memory Multiprocessors Lecture 18: Sharing Patterns and Cache Coherence Protocols. The Lecture Contains: The Lecture Contains: Invalidation vs. Update Sharing Patterns Migratory Hand-off States of a Cache Line Stores MSI Protocol State Transition MSI Example MESI Protocol MESI Example MOESI Protocol MOSI

More information

Multicore Semantics and Programming

Multicore Semantics and Programming Multicore Semantics and Programming Peter Sewell Tim Harris University of Cambridge Oracle October November, 2015 p. 1 These Lectures Part 1: Multicore Semantics: the concurrency of multiprocessors and

More information

Origami: Folding Warps for Energy Efficient GPUs

Origami: Folding Warps for Energy Efficient GPUs Origami: Folding Warps for Energy Efficient GPUs Mohammad Abdel-Majeed*, Daniel Wong, Justin Huang and Murali Annavaram* * University of Southern alifornia University of alifornia, Riverside Stanford University

More information