Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

Size: px
Start display at page:

Download "Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2"

Transcription

1 Pipelining CS 365 Lecture 12 Prof. Yih Huang CS Traditional Execution add ld beq CS

2 Pipelined Execution CS Basic Ideas Do not wait for an instruction to complete to start the next. Start the Cycle 0 of the next instruction when the previous one enters Cycle 1. Instruction executions are overlapped. Pipelining increases instruction throughput, as opposed to decreasing the execution time of individual instructions. CS

3 Easier Said Than Done? In every cycle, activities of all five stages take place. Many problems arises with overlapped executions. Structural hazards Control hazards Data hazards CS Pipeline Hazards I Structural Hazards The data path cannot support the combination of instructions that we want to execute in the same cycle Consider what happens when an R-type instruction is followed by a BEQ? add beq RR + RW RR CS

4 Summary of MIPS Lite Instruction Executions Step name Instruction fetch Instruction decode/register fetch Action for R-type instructions Action for memory-reference Action for instructions branches IR = Memory[PC] PC = PC + 4 A = Reg [IR[25-21]] B = Reg [IR[20-16]] ALUOut = PC + (sign-extend (IR[15-0]) << 2) Action for jumps Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2) jump completion Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut] completion ALUOut or Store: Memory [ALUOut] = B Memory read completion Load: Reg[IR[20-16]] = MDR CS Resource Conflicts on the Multicycle Datapath Io rd M e m R e a d M e m W r i te IR W r it e R e g D s t R e g W r ite A L U S rc A P C 0 M u x 1 A d d r e s s M e m o r y M e m D a ta W rite d a t a In s tru c tio n [ ] In s tru c tio n [ ] In s tru c tio n [1 5 0 ] I n s tr u c tio n re g is te r 0 M I n s tr u c tio n u [ ] x 1 R e a d r e g i s te r 1 R e a d R e a d d a t a 1 r e g i s te r 2 R e g is te r s W r ite R e a d r e g i s te r d a ta 2 W r ite d a t a A B 4 0 M u x M u 2 x Z e ro A L U A L U r e s u l t A L U O u t In s tr u c tio n 0 3 [1 5 0 ] M u x M e m o ry 1 d a ta r e g is te r 1 6 S ig n e x t e n d 3 2 S h ift le ft 2 A L U c o n t ro l In s tr u c tio n [5 0 ] M e m to R e g CS A L U S r c B A L U O p 4

5 Pipeline Hazards II Control Hazards: When we decide to branch, other instructions are in the pipeline! beq add sub RR RR + RW RR + RW Target: ld RR + MR RW Which one is the next? CS Pipeline Hazards III Data Hazards: (data dependencies) an instruction depends on the result of a previous instruction still in the pipeline. Writing new value of r1 Add r1, r2, r3 Sub r4, r1, r10 RR + RW RR Reading new value of r1, not available yet CS

6 Lessons To achieve pipelining and avoid hazards, we need to redesign the datapath and instruction execution steps, CS Pipeline Stages R-Type LD ST BEQ J Stage 1: IR Mem[PC] PC PC + 4 Stage 2: RR Read Reg[rs] and Reg[rt] Stage 3: EX (use ALU) RS op RT Calculate RS+Immd Calculate RS+Immd Calculate PC + Immd Compare RS and RT Set PC to Immd Stage 4: DM Read memory Write memory Set PC accordingly Stage 5: RW/WB Write to Rd Write to Rt CS

7 Graphically Representing Pipelines T i m e ( i n c l o c k c y c l e s ) P r o g r a m e x e c u t io n C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 o r d e r ( i n i n s t r u c t i o n s ) l w $ 1 0, 2 0 ( $ 1 ) I M R e g A L U D M R e g s u b $ 1 1, $ 2, $ 3 I M R e g A L U D M R e g CS M u x Solving Structural Hazards: Pipelined Datapath 1 I F /I D I D /E X E X / M E M M E M / W B A d d 4 A d d A d d r e s u lt S h if t l e ft 2 R e a d P C A d d r e s s I n s tr u c t io n m e m o r y I n s t r u c t i o n r e g is t e r 1 R e a d d a t a 1 R e a d r e g is t e r 2 R e g i s te r s R e a d W r it e d a t a 2 r e g is t e r W r it e d a t a 0 M u x 1 Z e r o A L U A L U r e s u lt A d d re s s D a t a m e m o r y W r i te R e a d d a t a 1 M u x 0 d a t a 1 6 S ig n e x t e n d 3 2 CS

8 Discussions Make sure you understand why two extra adders are added. Is the assumption of using separate instruction and data memory reasonable? CS Consider Control Hazard beq Target: When is the decision made? CS

9 Solution 1: Stalling The Pipeline OR Decision is in CS Solution 2: Branch Prediction Make a guess about the branch decision and start execute the guessed path before the decision is in (aka speculative execution). If guess was wrong, abandon those in the pipeline and jump to the right target. Branch Prediction Strategies Predict branch fails Predict branch succeeds Look into history CS

10 Predict Branch Fails We guess the branch will fail. That is, the next will fetch the next sequential execution. If guess is right, just proceed. If guess is wrong, abandon sequential instructions and fetch the instruction from the target address. CS beq Predict branch fails: Guess Is Right Following instructions Decision is in; Do not branch CS

11 beq Following instructions RR EX Target instructions RR Predict branch fails: Guess Is Wrong CS Decision is in: Do branch Discussion Notice that programmers are not aware of branch predictions; right or wrong guesses affect only performance. Delayed Branches: Make it official that CPU always executes the instr following a branch. The branch determines the next next instruction. Notice the programmer awareness CS

12 Example BEQ r1, r2, target add r10, r11, r12 add r20, r21, r22 sub r30, r10, r20 Delay slot, executed regardless of the branch decision Target: sub r30, r10, r20 CS Delayed Branch with Wrong Guess beq Following instructions RR EX Target instructions RR DM RW CS Decision is in: Do branch Delay slot is always finished 12

13 Discussions Compilers/programmers must be smart enough to make good use of the delay slots. The problem is not entirely solved: We still need to stall before the decision comes in. The situations are exacerbated by deeper pipelining. Branch predictions are still important. CS Smart Branch Predictions Observations Branches of if-else statements are hard to predict. Branches of loops typically repeat previous decisions. For performance, loops are more important than other control structures. Many modern processors use specialpurpose hardware to remember the targets of recent branch instructions. CS

14 Recall Data Hazards Sub $2, $1, $3 And $12, $2, $5 Or $13, $6, $2 Add $14, $2, $2 Sw $15, 100($2) $2 available CS Solution 1: Stalling Postpone subsequent instructions until data is available Simple but inefficient Sub RR EX RW And RR EX RW Or RR EX RW Add RR EX Sw RR EX RW RW CS

15 Solution 2: Internal Forwarding New value of $2 is available after the EX of Sub, but not in $2 yet Use special circuits to forward the new value to subsequent instructions Sub r2, r1, r3 Sub RR EX RW And r12, r2, r5 And RR EX RW Or r13, r6, r2 Or RR EX RW Add r14, r2, r2 Add RR EX RW Sw r15, 100(r2) Sw RR EX DM CS Types of Forwarding ALU forwarding: forward an ALU output to subsequent instructions This is the case we have seen Memory Forwarding: forward a memory output (ld result) to subsequent instructions. CS

16 Memory Forwarding Ld $1, 4($2) ld Or $10, $1, $3 Or RR EX RW Sub $20, $1, $10 Sub RR EX RW Notice that we still have to stall for one cycle CS Delayed Loads Make it official in the ISA that the result of a load will not be available to the next instruction. Have the compiler find something useful to do in load slots. (by reordering) CS

17 MIPS Solution Delayed result with internal forwarding. Ld Or RR EX RW Sub RR EX RW Nop or something useful CS Summary Three types of pipeline hazards Structural hazards Control hazards Data hazards Compilers/programmers could reorder the code to avoid hazards and eliminate bubbles. Ideal cases: no bubbles; one instr per cycle. Stalling (bubbles) is the last resort but must be supported by hardware. CS

18 Registers Exercise: Code Reordering for (i=0; i<n; i++) z[i] = x[i] + y[i]; r1 points to x[i] r2 points to y[i] r3 points to z[i] r4 holds x[i] r5 holds y[i] r6 holds z[i] r7 holds i r8 holds N CS loop: lw lw r4, 0(r1) r5, 0(r2) add r6, r4, r5 sw r6, 0(r3) addi r1,r1,4 addi r2,r2,4 addi r3,r3,4 addi r7,r7,1 bne r7,r8, loop CS

19 CS loop: lw r4, 0(r1) lw r5, 0(r2) exit: add r6, r4, r5 sw r6, 0(r3) addi r7,r7,1 beq r7,r8, exit lw r4, 4(r1) lw r5, 4(r2) add r6, r4, r5 sw r6, 4(r3) addi r1,r1,8 addi r2,r2,8 addi r3,r3,8 addi r7,r7,1 bne r7,r8, loop Loop Unrolling 19

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control The MIPS Pipeline CSCI206 - Computer Organization & Programming Pipeline Datapath and Control zybook: 11.6 Developed and maintained by the Bucknell University Computer Science Department - 2017 Hazard

More information

3. (2) What is the difference between fixed and hybrid instructions?

3. (2) What is the difference between fixed and hybrid instructions? 1. (2 pts) What is a "balanced" pipeline? 2. (2 pts) What are the two main ways to define performance? 3. (2) What is the difference between fixed and hybrid instructions? 4. (2 pts) Clock rates have grown

More information

Computer Architecture ELEC2401 & ELEC3441

Computer Architecture ELEC2401 & ELEC3441 Last Time Pipeline Hazard Computer Architecture ELEC2401 & ELEC3441 Lecture 8 Pipelining (3) Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Structural Hazard Hazard Control

More information

CSCI-564 Advanced Computer Architecture

CSCI-564 Advanced Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA

More information

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 2. (2 )What are the two main ways to define performance? 3. (2 )What

More information

4. (3) What do we mean when we say something is an N-operand machine?

4. (3) What do we mean when we say something is an N-operand machine? 1. (2) What are the two main ways to define performance? 2. (2) When dealing with control hazards, a prediction is not enough - what else is necessary in order to eliminate stalls? 3. (3) What is an "unbalanced"

More information

CPU DESIGN The Single-Cycle Implementation

CPU DESIGN The Single-Cycle Implementation CSE 202 Computer Organization CPU DESIGN The Single-Cycle Implementation Shakil M. Khan (adapted from Prof. H. Roumani) Dept of CS & Eng, York University Sequential vs. Combinational Circuits Digital circuits

More information

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati CS222: Processor Design: Multi Cycle Design Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati Mid Semester Exam Multi Cycle design Outline Clock periods in single cycle and

More information

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference) ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC

More information

A Second Datapath Example YH16

A Second Datapath Example YH16 A Second Datapath Example YH16 Lecture 09 Prof. Yih Huang S365 1 A 16-Bit Architecture: YH16 A word is 16 bit wide 32 general purpose registers, 16 bits each Like MIPS, 0 is hardwired zero. 16 bit P 16

More information

CSE Computer Architecture I

CSE Computer Architecture I Execution Sequence Summary CSE 30321 Computer Architecture I Lecture 17 - Multi Cycle Control Michael Niemier Department of Computer Science and Engineering Step name Instruction fetch Instruction decode/register

More information

Project Two RISC Processor Implementation ECE 485

Project Two RISC Processor Implementation ECE 485 Project Two RISC Processor Implementation ECE 485 Chenqi Bao Peter Chinetti November 6, 2013 Instructor: Professor Borkar 1 Statement of Problem This project requires the design and test of a RISC processor

More information

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture. Appendix C CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)

More information

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle Computer Engineering Department CC 311- Computer Architecture Chapter 4 The Processor: Datapath and Control Single Cycle Introduction The 5 classic components of a computer Processor Input Control Memory

More information

Computer Architecture

Computer Architecture Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines

More information

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary? [2] What are the two main ways to define performance? [2] Predicting the direction of a branch is not enough. What else is necessary? [2] The power consumed by a chip has increased over time, but the clock

More information

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary? [2] When we talk about the number of operands in an instruction (a 1-operand or a 2-operand instruction, for example), what do we mean? [2] What are the two main ways to define performance? [2] Predicting

More information

COVER SHEET: Problem#: Points

COVER SHEET: Problem#: Points EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)

More information

Processor Design & ALU Design

Processor Design & ALU Design 3/8/2 Processor Design A. Sahu CSE, IIT Guwahati Please be updated with http://jatinga.iitg.ernet.in/~asahu/c22/ Outline Components of CPU Register, Multiplexor, Decoder, / Adder, substractor, Varity of

More information

L07-L09 recap: Fundamental lesson(s)!

L07-L09 recap: Fundamental lesson(s)! L7-L9 recap: Fundamental lesson(s)! Over the next 3 lectures (using the IPS ISA as context) I ll explain:! How functions are treated and processed in assembly! How system calls are enabled in assembly!

More information

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. I fetch decode & eg-fetch execute memory Clock period

More information

EC 413 Computer Organization

EC 413 Computer Organization EC 413 Computer Organization rithmetic Logic Unit (LU) and Register File Prof. Michel. Kinsy Computing: Computer Organization The DN of Modern Computing Computer CPU Memory System LU Register File Disks

More information

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1 Building a Computer I wonder where this goes? B LU MIPS Kit Quiz # on /3, open book and notes (This is the last lecture covered) Comp 4 Fall 7 /4/7 L6- Building a Computer THIS IS IT! Motivating Force

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

CPSC 3300 Spring 2017 Exam 2

CPSC 3300 Spring 2017 Exam 2 CPSC 3300 Spring 2017 Exam 2 Name: 1. Matching. Write the correct term from the list into each blank. (2 pts. each) structural hazard EPIC forwarding precise exception hardwired load-use data hazard VLIW

More information

CS 52 Computer rchitecture and Engineering Lecture 4 - Pipelining Krste sanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs52!

More information

ENEE350 Lecture Notes-Weeks 14 and 15

ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining & Amdahl s Law ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining is a method of processing in which a problem is divided into a number of sub problems and solved and the solu8ons of the sub problems

More information

Lecture: Pipelining Basics

Lecture: Pipelining Basics Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:

More information

Lecture 3, Performance

Lecture 3, Performance Repeating some definitions: Lecture 3, Performance CPI MHz MIPS MOPS Clocks Per Instruction megahertz, millions of cycles per second Millions of Instructions Per Second = MHz / CPI Millions of Operations

More information

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. fetch decode & eg-fetch execute

More information

Lecture 3, Performance

Lecture 3, Performance Lecture 3, Performance Repeating some definitions: CPI Clocks Per Instruction MHz megahertz, millions of cycles per second MIPS Millions of Instructions Per Second = MHz / CPI MOPS Millions of Operations

More information

/ : Computer Architecture and Design

/ : Computer Architecture and Design 16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines 1 Clocks A microprocessor is composed of many different circuits that are operating simultaneously if each

More information

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1 Computer Architecture ECE 361 Lecture 5: The Design Process & Design 361 design.1 Quick Review of Last Lecture 361 design.2 MIPS ISA Design Objectives and Implications Support general OS and C- style language

More information

Simple Instruction-Pipelining (cont.) Pipelining Jumps

Simple Instruction-Pipelining (cont.) Pipelining Jumps 6.823, L9--1 Simple ruction-pipelining (cont.) + Interrupts Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Src1 ( j / ~j ) Src2 ( / Ind) Pipelining Jumps

More information

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #4 MIPS Assembly and Simulation

More information

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide) Out-of-order Pipeline Buffer of instructions Issue = Select + Wakeup Select N oldest, read instructions N=, xor N=, xor and sub Note: ma have execution resource constraints: i.e., load/store/fp Fetch Decode

More information

Unit 6: Branch Prediction

Unit 6: Branch Prediction CIS 501: Computer Architecture Unit 6: Branch Prediction Slides developed by Joe Devie/, Milo Mar4n & Amir Roth at Upenn with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi,

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines Reminder: midterm on Tue 2/28 will cover Chapters 1-3, App A, B if you understand all slides, assignments,

More information

Instruction register. Data. Registers. Register # Memory data register

Instruction register. Data. Registers. Register # Memory data register Where we are headed Single Cycle Problems: what if we had a more complicated instrction like floating point? wastefl of area One Soltion: se a smaller cycle time have different instrctions take different

More information

Designing Single-Cycle MIPS Processor

Designing Single-Cycle MIPS Processor CSE 32: Introdction to Compter Architectre Designing Single-Cycle IPS Processor Presentation G Stdy:.-. Gojko Babić 2/9/28 Introdction We're now ready to look at an implementation of the system that incldes

More information

TEST 1 REVIEW. Lectures 1-5

TEST 1 REVIEW. Lectures 1-5 TEST 1 REVIEW Lectures 1-5 REVIEW Test 1 will cover lectures 1-5. There are 10 questions in total with the last being a bonus question. The questions take the form of short answers (where you are expected

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Fall 2004 Yale Patt, Instructor Aater Suleman, Huzefa Sanjeliwala, Dam Sunwoo, TAs Exam 1, October 6, 2004 Name:

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

Microprocessor Power Analysis by Labeled Simulation

Microprocessor Power Analysis by Labeled Simulation Microprocessor Power Analysis by Labeled Simulation Cheng-Ta Hsieh, Kevin Chen and Massoud Pedram University of Southern California Dept. of EE-Systems Los Angeles CA 989 Outline! Introduction! Problem

More information

61C In the News. Processor Design: 5 steps

61C In the News. Processor Design: 5 steps www.eetimes.com/electronics-news/23235/thailand-floods-take-toll-on--makers The Thai floods have already claimed the lives of hundreds of pele, with tens of thousands more having had to flee their homes

More information

Fall 2011 Prof. Hyesoon Kim

Fall 2011 Prof. Hyesoon Kim Fall 2011 Prof. Hyesoon Kim Add: 2 cycles FE_stage add r1, r2, r3 FE L ID L EX L MEM L WB L add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub add add mul sub sub add add mul sub sub add

More information

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION

More information

Designing MIPS Processor

Designing MIPS Processor CSE 675.: Introdction to Compter Architectre Designing IPS Processor (lti-cycle) Presentation H Reading Assignment: 5.5,5.6 lti-cycle Design Principles Break p eection of each instrction into steps. The

More information

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lectre 9: Control Hazard and Resoltion James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L09 S1, James C. Hoe, CU/ECE/CALC, 2018 Yor goal today Hosekeeping simple control flow

More information

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining CMU 18-447 Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining Instructor: Prof Onur Mutlu TAs: Rachata Ausavarungnirun, Kevin Chang, Albert Cho, Jeremie

More information

CA Compiler Construction

CA Compiler Construction CA4003 - Compiler Construction Code Generation to MIPS David Sinclair Code Generation The generation o machine code depends heavily on: the intermediate representation;and the target processor. We are

More information

Implementing the Controller. Harvard-Style Datapath for DLX

Implementing the Controller. Harvard-Style Datapath for DLX 6.823, L6--1 Implementing the Controller Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L6--2 Harvard-Style Datapath for DLX Src1 ( j / ~j ) Src2 ( R / RInd) RegWrite MemWrite

More information

Computer Architecture

Computer Architecture Computer Architecture QtSpim, a Mips simulator S. Coudert and R. Pacalet January 4, 2018..................... Memory Mapping 0xFFFF000C 0xFFFF0008 0xFFFF0004 0xffff0000 0x90000000 0x80000000 0x7ffff4d4

More information

Basic Computer Organization and Design Part 3/3

Basic Computer Organization and Design Part 3/3 Basic Computer Organization and Design Part 3/3 Adapted by Dr. Adel Ammar Computer Organization Interrupt Initiated Input/Output Open communication only when some data has to be passed --> interrupt. The

More information

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5. CHPTER 9 2008 Pearson Education, Inc. 9-. log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C 7 Z = F 7 + F 6 + F 5 + F 4 + F 3 + F 2 + F + F 0 N = F 7 9-3.* = S + S = S + S S S S0 C in C 0 dder

More information

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018 ECE 172 Digital Systems Chapter 12 Instruction Pipelining Herbert G. Mayer, PSU Status 7/20/2018 1 Syllabus l Scheduling on Pipelined Architecture l Idealized Pipeline l Goal of Scheduling l Causes for

More information

Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 4: Microprogramming Prof. Onur Mutlu ETH Zurich Spring 27 7 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT CSE 560 Practice Problem Set 4 Solution 1. In this question, you will examine several different schemes for branch prediction, using the following code sequence for a simple load store ISA with no branch

More information

COMP303 Computer Architecture Lecture 11. An Overview of Pipelining

COMP303 Computer Architecture Lecture 11. An Overview of Pipelining COMP303 Compute Achitectue Lectue 11 An Oveview of Pipelining Pipelining Pipelining povides a method fo executing multiple instuctions at the same time. Laundy Example: Ann, Bian, Cathy, Dave each have

More information

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it... // CS 2, Fall 2! CS 2, Fall 2! We built the instrument. Now we read music and play it... A simple implementa/on uc/on uct r r 2 Write r Src Src Extend 6 Mem Next: path 7-2 CS 2, Fall 2! signals CS 2, Fall

More information

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control Logic and Computer Design Fundamentals Chapter 8 Sequencing and Control Datapath and Control Datapath - performs data transfer and processing operations Control Unit - Determines enabling and sequencing

More information

EE 660: Computer Architecture Out-of-Order Processors

EE 660: Computer Architecture Out-of-Order Processors EE 660: Computer Architecture Out-of-Order Processors Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David entzlaff Agenda I4 Processors I2O2

More information

Enrico Nardelli Logic Circuits and Computer Architecture

Enrico Nardelli Logic Circuits and Computer Architecture Enrico Nardelli Logic Circuits and Computer Architecture Appendix B The design of VS0: a very simple CPU Rev. 1.4 (2009-10) by Enrico Nardelli B - 1 Instruction set Just 4 instructions LOAD M - Copy into

More information

CMP 338: Third Class

CMP 338: Third Class CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does

More information

DETERMINING THE VARIABLE QUANTUM TIME (VQT) IN ROUND ROBIN AND IT S IMPORTANCE OVER AVERAGE QUANTUM TIME METHOD

DETERMINING THE VARIABLE QUANTUM TIME (VQT) IN ROUND ROBIN AND IT S IMPORTANCE OVER AVERAGE QUANTUM TIME METHOD D DETERMINING THE VARIABLE QUANTUM TIME (VQT) IN ROUND ROBIN AND IT S IMPORTANCE OVER AVERAGE QUANTUM TIME METHOD Yashasvini Sharma 1 Abstract The process scheduling, is one of the most important tasks

More information

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions 18-447 Lectre 12: Pipelined Implementations: Control Hazards and Resoltions S 09 L12-1 James C. Hoe Dept of ECE, CU arch 2, 2009 Annoncements: Spring break net week!! Project 2 de the week after spring

More information

Combinational vs. Sequential. Summary of Combinational Logic. Combinational device/circuit: any circuit built using the basic gates Expressed as

Combinational vs. Sequential. Summary of Combinational Logic. Combinational device/circuit: any circuit built using the basic gates Expressed as Summary of Combinational Logic : Computer Architecture I Instructor: Prof. Bhagi Narahari Dept. of Computer Science Course URL: www.seas.gwu.edu/~bhagiweb/cs3/ Combinational device/circuit: any circuit

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper

More information

Topics: A multiple cycle implementation. Distributed Notes

Topics: A multiple cycle implementation. Distributed Notes COSC 22: Compter Organization Instrctor: Dr. Amir Asif Department of Compter Science York University Handot # lticycle Implementation of a IPS Processor Topics: A mltiple cycle implementation Distribted

More information

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.

Pipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S. Pipelined Datapath Lectre notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 Reading (2) Pipeline Performance Assme time for stages is v ps for register read or write

More information

Review. Combined Datapath

Review. Combined Datapath Review Topics:. A single cycle implementation 2. State Diagrams. A mltiple cycle implementation COSC 22: Compter Organization Instrctor: Dr. Amir Asif Department of Compter Science York University Handot

More information

CMP 334: Seventh Class

CMP 334: Seventh Class CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative

More information

Combinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #13

Combinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #13 Combinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #13 Ziad Matni Dept. of Computer Science, UCSB Administrative Re: Midterm Exam #2 Graded! 5/22/18

More information

Latches. October 13, 2003 Latches 1

Latches. October 13, 2003 Latches 1 Latches The second part of CS231 focuses on sequential circuits, where we add memory to the hardware that we ve already seen. Our schedule will be very similar to before: We first show how primitive memory

More information

CPS 104 Computer Organization and Programming Lecture 11: Gates, Buses, Latches. Robert Wagner

CPS 104 Computer Organization and Programming Lecture 11: Gates, Buses, Latches. Robert Wagner CPS 4 Computer Organization and Programming Lecture : Gates, Buses, Latches. Robert Wagner CPS4 GBL. RW Fall 2 Overview of Today s Lecture: The MIPS ALU Shifter The Tristate driver Bus Interconnections

More information

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk ECE290 Fall 2012 Lecture 22 Dr. Zbigniew Kalbarczyk Today LC-3 Micro-sequencer (the control store) LC-3 Micro-programmed control memory LC-3 Micro-instruction format LC -3 Micro-sequencer (the circuitry)

More information

2

2 Computer System AA rc hh ii tec ture( 55 ) 2 INTRODUCTION ( d i f f e r e n t r e g i s t e r s, b u s e s, m i c r o o p e r a t i o n s, m a c h i n e i n s t r u c t i o n s, e t c P i p e l i n e E

More information

Figure 4.9 MARIE s Datapath

Figure 4.9 MARIE s Datapath Term Control Word Microoperation Hardwired Control Microprogrammed Control Discussion A set of signals that executes a microoperation. A register transfer or other operation that the CPU can execute in

More information

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary EECS50 - Digital Design Lecture - Shifters & Counters February 24, 2003 John Wawrzynek Spring 2005 EECS50 - Lec-counters Page Register Summary All registers (this semester) based on Flip-flops: q 3 q 2

More information

Computer Architecture. ESE 345 Computer Architecture. Design Process. CA: Design process

Computer Architecture. ESE 345 Computer Architecture. Design Process. CA: Design process Computer Architecture ESE 345 Computer Architecture Design Process 1 The Design Process "To Design Is To Represent" Design activity yields description/representation of an object -- Traditional craftsman

More information

Arithmetic and Logic Unit First Part

Arithmetic and Logic Unit First Part Arithmetic and Logic Unit First Part Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras ALU1-1 Typical

More information

Automata Theory CS S-12 Turing Machine Modifications

Automata Theory CS S-12 Turing Machine Modifications Automata Theory CS411-2015S-12 Turing Machine Modifications David Galles Department of Computer Science University of San Francisco 12-0: Extending Turing Machines When we added a stack to NFA to get a

More information

Table of Content. Chapter 11 Dedicated Microprocessors Page 1 of 25

Table of Content. Chapter 11 Dedicated Microprocessors Page 1 of 25 Chapter 11 Dedicated Microprocessors Page 1 of 25 Table of Content Table of Content... 1 11 Dedicated Microprocessors... 2 11.1 Manual Construction of a Dedicated Microprocessor... 3 11.2 FSM + D Model

More information

ALU A functional unit

ALU A functional unit ALU A functional unit that performs arithmetic operations such as ADD, SUB, MPY logical operations such as AND, OR, XOR, NOT on given data types: 8-,16-,32-, or 64-bit values A n-1 A n-2... A 1 A 0 B n-1

More information

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example

This Unit: Scheduling (Static + Dynamic) CIS 501 Computer Architecture. Readings. Review Example This Unit: Scheduling (Static + Dnamic) CIS 50 Computer Architecture Unit 8: Static and Dnamic Scheduling Application OS Compiler Firmware CPU I/O Memor Digital Circuits Gates & Transistors! Previousl:!

More information

Review: Single-Cycle Processor. Limits on cycle time

Review: Single-Cycle Processor. Limits on cycle time Review: Single-Cycle Processor Jump 3:26 5: MemtoReg Control Unit LUControl 2: Op Funct LUSrc RegDst PCJump PC 4 uction + PCPlus4 25:2 2:6 2:6 5: 5: 2 3 WriteReg 4: Src Src LU Zero LUResult Write + PC

More information

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1

Loop Scheduling and Software Pipelining \course\cpeg421-08s\topic-7.ppt 1 Loop Scheduling and Software Pipelining 2008-04-24 \course\cpeg421-08s\topic-7.ppt 1 Reading List Slides: Topic 7 and 7a Other papers as assigned in class or homework: 2008-04-24 \course\cpeg421-08s\topic-7.ppt

More information

On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work

On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work Lab 5 : Linking Name: Sign the following statement: On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work 1 Objective The main objective of this lab is to experiment

More information

Outcomes. Spiral 1 / Unit 2. Boolean Algebra BOOLEAN ALGEBRA INTRO. Basic Boolean Algebra Logic Functions Decoders Multiplexers

Outcomes. Spiral 1 / Unit 2. Boolean Algebra BOOLEAN ALGEBRA INTRO. Basic Boolean Algebra Logic Functions Decoders Multiplexers -2. -2.2 piral / Unit 2 Basic Boolean Algebra Logic Functions Decoders Multipleers Mark Redekopp Outcomes I know the difference between combinational and sequential logic and can name eamples of each.

More information

Pipeline no Prediction. Branch Delay Slots A. From before branch B. From branch target C. From fall through. Branch Prediction

Pipeline no Prediction. Branch Delay Slots A. From before branch B. From branch target C. From fall through. Branch Prediction Pipeline no Prediction Branching completes in 2 cycles We know the target address after the second stage? PC fetch Instruction Memory Decode Check the condition Calculate the branch target address PC+4

More information

Compiling Techniques

Compiling Techniques Lecture 11: Introduction to 13 November 2015 Table of contents 1 Introduction Overview The Backend The Big Picture 2 Code Shape Overview Introduction Overview The Backend The Big Picture Source code FrontEnd

More information

Introduction The Nature of High-Performance Computation

Introduction The Nature of High-Performance Computation 1 Introduction The Nature of High-Performance Computation The need for speed. Since the beginning of the era of the modern digital computer in the early 1940s, computing power has increased at an exponential

More information

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization CS/COE0447: Computer Organization and Assembly Language Logic Design Review Sangyeun Cho Dept. of Computer Science Logic design? Digital hardware is implemented by way of logic design Digital circuits

More information

Counters. We ll look at different kinds of counters and discuss how to build them

Counters. We ll look at different kinds of counters and discuss how to build them Counters We ll look at different kinds of counters and discuss how to build them These are not only examples of sequential analysis and design, but also real devices used in larger circuits 1 Introducing

More information

Binary addition example worked out

Binary addition example worked out Binary addition example worked out Some terms are given here Exercise: what are these numbers equivalent to in decimal? The initial carry in is implicitly 0 1 1 1 0 (Carries) 1 0 1 1 (Augend) + 1 1 1 0

More information

Ch 7. Finite State Machines. VII - Finite State Machines Contemporary Logic Design 1

Ch 7. Finite State Machines. VII - Finite State Machines Contemporary Logic Design 1 Ch 7. Finite State Machines VII - Finite State Machines Contemporary Logic esign 1 Finite State Machines Sequential circuits primitive sequential elements combinational logic Models for representing sequential

More information

CS/COE1541: Introduction to Computer Architecture. Logic Design Review. Sangyeun Cho. Computer Science Department University of Pittsburgh

CS/COE1541: Introduction to Computer Architecture. Logic Design Review. Sangyeun Cho. Computer Science Department University of Pittsburgh CS/COE54: Introduction to Computer Architecture Logic Design Review Sangyeun Cho Computer Science Department Logic design? Digital hardware is implemented by way of logic design Digital circuits process

More information

Logic Design II (17.342) Spring Lecture Outline

Logic Design II (17.342) Spring Lecture Outline Logic Design II (17.342) Spring 2012 Lecture Outline Class # 10 April 12, 2012 Dohn Bowden 1 Today s Lecture First half of the class Circuits for Arithmetic Operations Chapter 18 Should finish at least

More information