|
|
- Hugo Davis
- 5 years ago
- Views:
Transcription
1 CS 52 Computer rchitecture and Engineering Lecture 4 - Pipelining Krste sanovic Electrical Engineering and Computer Sciences University of California at Berkeley Last =me in Lecture 3 Microcoding became less aeracfve as gap beten RM and ROM speeds reduced Complex instrucfon sets difficult to pipeline, so difficult to increase performance as gate count grew Load- Store RISC ISs designed for efficient pipelined implementafons Very similar to verfcal microcode Inspired by earlier Cray machines (more on these later) Iron Law explains architecture design space Trade instrucfons/program, cycles/instrucfon, and Fme/cycle 2 CS252 S05
2 n Ideal Pipeline stage stage 2 stage 3 stage 4 ll objects go through the same stages No sharing of resources beten any two stages PropagaFon delay through all pipeline stages is equal The scheduling of an object entering the pipeline is not affected by the objects in other stages These condi+ons generally hold for industrial assembly lines, but instruc+ons depend on each other! 3 To pipeline RISC- V: Pipelined RISC- V First build RISC- V without pipelining with CPI= Next, add pipeline registers to reduce cycle Fme while maintaining CPI= 4 CS252 S05 2
3 Lecture 3: Unpipelined path for RISC- V Sel br rind jabs pc+4 RegWriteEn MemWrite WBSel clk clk inst. rs rd Control Br Logic Bcomp? clk OpCode WSel Sel FuncSel Op2Sel 5 Lecture 3: Hardwired Control Table Opcode Sel Op2Sel FuncSel MemWr RFWen WBSel WSel Sel i LW SW BEQ true BEQ false J JL JLR * Reg Func no yes rd pc+4 IType 2 Op no yes rd pc+4 IType 2 + no yes Mem rd pc+4 BsType 2 + yes no * * pc+4 BrType 2 * * no no * * BrType 2 * * no no * * * * * no no * * pc+4 jabs * * * no yes X jabs * * * no yes rd rind br Op2Sel= Reg / WBSel = / Mem / WSel = rd / X Sel = pc+4 / br / rind / jabs 6 CS252 S05 3
4 Pipelined path. rs rd fetch phase decode & Reg- fetch phase Clock period can be reduced by dividing the execufon of an instrucfon into mulfple cycles t C > max {t IM, t RF, t, t DM, t RW } ( = t DM probably) Hover, CPI will increase unless instruc+ons are pipelined execute phase memory phase write - back phase 7 Iron Law of Processor Performance Time = rucfons Cycles Time Program Program * rucfon * Cycle rucfons per program depends on source code, compiler technology, and IS Cycles per instrucfons (CPI) depends on IS and µarchitecture Time per cycle depends upon the µarchitecture and base technology Lecture 2 Lecture 3 Lecture 4 Microarchitecture CPI cycle Fme Microcoded > short Single- cycle unpipelined long Pipelined short 8 CS252 S05 4
5 CPI Examples Microcoded machine 7 cycles 5 cycles 0 cycles 2 3 Time Unpipelined machine Pipelined machine 3 instrucfons, 22 cycles, CPI= instrucfons, 3 cycles, CPI= 3 instrucfons, 3 cycles, CPI= 5- stage pipeline CPI 5!!! 9 Technology ssump=ons small amount of very fast memory (caches) backed up by a large, slor memory Fast (at least for integers) MulFported Register files (slor!) Thus, the following Fming assumpfon is reasonable t IM t RF t t DM t RW 5- stage pipeline will be focus of our detailed design - some commercial designs have over 30 pipeline stages to do an integer add! 0 CS252 S05 5
6 . I- Fetch (IF) 5- Stage Pipelined Execu=on rs rd Decode, Reg. Fetch (ID) Execute (EX) (M) +me t0 t t2 t3 t4 t5 t6 t7.... instrucfon IF ID EX M WB instrucfon2 IF 2 ID 2 EX 2 M 2 WB 2 instrucfon3 IF 3 ID 3 EX 3 M 3 WB 3 instrucfon4 IF 4 ID 4 EX 4 M 4 WB 4 instrucfon5 IF 5 ID 5 EX 5 M 5 WB 5 Write - Back (WB). I- Fetch (IF) Resources 5- Stage Pipelined Execu=on Resource Usage Diagram rs rd ws Decode, Reg. Fetch (ID) +me t0 t t2 t3 t4 t5 t6 t7.... IF I I 2 I 3 I 4 I 5 ID I I 2 I 3 I 4 I 5 EX I I 2 I 3 I 4 I 5 M I I 2 I 3 I 4 I 5 WB I I 2 I 3 I 4 I 5 Execute (EX) (M) Write - Back (WB) 2 CS252 S05 6
7 Pipelined Execu=on: ruc=ons inst rs rd B R MD MD2 Not quite correct! We need an ruc+on Reg () for each stage 3 Pipelined RISC- V path without jumps F D E M W inst rs rd RegWriteEn B MD FuncSel MD2 WSel MemWrite WBSel R Sel Op2Sel Control Points Need to Be Connected 4 CS252 S05 7
8 ruc=ons interact with each other in pipeline n instrucfon in the pipeline may need a resource being used by another instrucfon in the pipeline à structural hazard n instrucfon may depend on something produced by an earlier instrucfon Dependence may be for a data value à data hazard Dependence may be for the next instrucfon s ess à control hazard (branches, excep+ons) 5 Resolving Structural Hazards Structural hazard occurs when two instrucfons need same hardre resource at same Fme Can resolve in hardre by stalling ner instrucfon Fll older instrucfon finished with resource structural hazard can alys be avoided by adding more hardre to design E.g., if two instrucfons both need a port to memory at same Fme, could avoid hazard by adding second port to memory Our 5- stage pipeline has no structural hazards by design Thanks to RISC- V IS, which s designed for pipelining 6 CS252 S05 8
9 Hazards x4 x x inst rs rd B R MD MD2... x x0 + 0 x4 x x is stale. Oops! 7 Resolving Hazards () Strategy : Wait for the result to be available by freezing earlier pipeline stages è interlocks 8 CS252 S05 9
10 Feedback to Resolve Hazards FB FB 2 FB 3 FB 4 stage stage 2 stage 3 stage 4 Later stages provide dependence informafon to earlier stages which can stall (or kill) instruc+ons Controlling a pipeline in this manner works provided the instruction at stage i+ can complete without any interference from instructions in stages to i (otherwise deadlocks may occur) 9 Interlocks to resolve Hazards Stall Condition inst... x x0 + 0 x4 x rs rd B MD MD2 R 20 CS252 S05 0
11 Stalled Stages and Pipeline Bubbles time t0 t t2 t3 t4 t5 t6 t7.... (I ) x (x0) + 0 IF ID EX M WB (I 2 ) x4 (x) + 7 IF 2 ID 2 ID 2 ID 2 ID 2 EX 2 M 2 WB 2 (I 3 ) IF 3 IF 3 IF 3 IF 3 ID 3 EX 3 M 3 WB 3 (I 4 ) stalled stages IF 4 ID 4 EX 4 M 4 WB 4 (I 5 ) IF 5 ID 5 EX 5 M 5 WB 5 Resource Usage time t0 t t2 t3 t4 t5 t6 t7.... IF I I 2 I 3 I 3 I 3 I 3 I 4 I 5 ID I I 2 I 2 I 2 I 2 I 3 I 4 I 5 EX I I 2 I 3 I 4 I 5 M I I 2 I 3 I 4 I 5 WB I I 2 I 3 I 4 I 5 - pipeline 2 Interlock Control Logic stall C stall rs? inst rs rd B MD MD2 Compare the source registers of the instruction in the decode stage with the destination register of the uncommitted instructions. R 22 CS252 S05
12 Interlock Control Logic ignoring jumps & branches stall C stall rs? re re2 C dest C dest C re inst rs rd Should alys stall if an rs field matches some rd? not every instrucfon writes a register not every instrucfon reads a register re B MD MD2 C dest R 23 Source & Des=na=on Registers rd rs func0 opcode rd rs [:0] func3 opcode I/LW/JLR [:7] rs [6:0] func3 opcode SW/Bcond Jump offset[24:0] opcode source(s) des+na+on rd rs func0 rs, rd I rd rs op imm rs rd LW rd M [rs + imm] rs rd SW M [rs + imm] rs, - Bcond rs, rs, - true: + imm false: + 4 J + imm - - JL x, + imm - x JLR rd, rs + imm rs rd 24 CS252 S05 2
13 C dest ws = Case opcode JL X else rd Deriving the Stall Signal = Case opcode, i, LW,JLR (ws 0) JL on... off C re re = Case opcode, i, LW, SW, Bcond, JLR J, JL re2 = Case opcode, SW,Bcond... C stall stall = ((rs D =ws E ). E + (rs D =ws M ). M + (rs D =ws W ). W ). re D + (( D =ws E ). E + ( D =ws M ). M + ( D =ws W ). W ). re2 D on off on off 25 Hazards due to Loads & Stores Stall Condi+on What if x+7 = x3+5? inst... M[x+7] x2 x4 M[x3+5]... rs rd B MD MD2 Is there any possible data hazard in this instruc+on sequence? R 26 CS252 S05 3
14 Load & Store Hazards... M[x+7] x2 x4 M[x3+5]... x+7 = x3+5 data hazard Hover, the hazard is avoided because our memory system completes writes in one cycle! Load/Store hazards are somefmes resolved in the pipeline and somefmes in the memory system itself. More on this later in the course. 27 CS52 dministrivia Quiz on Feb 9 will cover PS, Lab, lectures - 5, and associated readings. 28 CS252 S05 4
15 Resolving Hazards (2) Strategy 2: Route data as soon as possible axer it is calculated to the earlier pipeline stage à bypass 29 Bypassing time t0 t t2 t3 t4 t5 t6 t7.... (I ) x x0 + 0 IF ID EX M WB (I 2 ) x4 x + 7 IF 2 ID 2 ID 2 ID 2 ID 2 EX 2 M 2 WB 2 (I 3 ) IF 3 IF 3 IF 3 IF 3 ID 3 EX 3 M 3 (I 4 ) stalled stages IF 4 ID 4 EX 4 (I 5 ) IF 5 ID 5 Each stall or kill introduces a in the pipeline CPI > new datapath, i.e., a bypass, can get the data from the output of the to its input time t0 t t2 t3 t4 t5 t6 t7.... (I ) x x0 + 0 IF ID EX M WB (I 2 ) x4 x + 7 IF 2 ID 2 EX 2 M 2 WB 2 (I 3 ) IF 3 ID 3 EX 3 M 3 WB 3 (I 4 ) IF 4 ID 4 EX 4 M 4 WB 4 (I 5 ) IF 5 ID 5 EX 5 M 5 WB 5 30 CS252 S05 5
16 stall ing a Bypass x4 x... x... E M W inst D rs rd Src B MD MD2 R... When does this bypass help? (I ) x x0 + 0 x M[x0 + 0] JL 500 (I 2 ) x4 x + 7 yes x4 x + 7 no x4 x + 7 no 3 The Bypass Signal Deriving it from the Stall Signal stall = ( ((rs D =ws E ). E + (rs D =ws M ). M + (rs D =ws W ). W ).re D +(( D =ws E ). E + ( D =ws M ). M + ( D =ws W ). W ).re2 D ) ws = Case opcode JL X else rd Src = (rs D =ws E ). E.re D = Case opcode, i, LW, JLR (ws 0) JL on... off Is this correct? No because only and i instrucfons can benefit from this bypass Split E into two components: - bypass, - stall 32 CS252 S05 6
17 Bypass and Stall Signals Split E into two components: - bypass, - stall - bypass E = Case opcode E, i (ws 0)... off - stall E = Case opcode E LW, JLR (ws 0) JL on... off Src = (rs D =ws E ).- bypass E. re D stall = ((rs D =ws E ).- stall E + (rs D =ws M ). M + (rs D =ws W ). W ). re D +(( D = ws E ). E + ( D = ws M ). M + ( D = ws W ). W ). re2 D 33 Fully Bypassed path stall for JL,... Src E M W inst Is there s+ll a need for the stall signal? D rs rd BSrc B MD MD2 stall = (rs D =ws E ). (opcode E =LW E ).(ws E 0 ).re D + ( D =ws E ). (opcode E =LW E ).(ws E 0 ).re2 D R 34 CS252 S05 7
18 Time Pipeline CPI Examples 2 3 Measure from when first instruc+on finishes to when last instruc+on in sequence finishes. 3 instrucfons finish in 3 cycles CPI = 3/3 = 2 Bubble 3 Bubble 2 Bubble instrucfons finish in 4 cycles CPI = 4/3 =.33 3 instrucfons finish in 5cycles CPI = 5/3 = Resolving Hazards (3) Strategy 3: Speculate on the dependence! Two cases: Guessed correctly è do nothing Guessed incorrectly è kill and restart. We ll later see examples of this approach in more complex processors. 36 CS252 S05 8
19 Specula=on that load value=zero stall for JL,... Src E M W inst D rs rd BSrc Guess_zero 0 B MD MD2 R Guess_zero= (rs D =ws E ). (opcode E =LW E ).(ws E 0 ).re D lso need to add circuitry to remember that this s a guess and flush pipeline if load not zero! Not worth doing in prac+ce why? 37 Control Hazards What do need to calculate next? For Jumps Opcode, and offset For Jump Register Opcode, Register value, and For CondiFonal Branches Opcode, Register (for condifon), and offset For all other instrucfons Opcode and ( and have to know it s not one of above ) 38 CS252 S05 9
20 Calcula=on Bubbles time t0 t t2 t3 t4 t5 t6 t7.... (I ) x x0 + 0 IF ID EX M WB (I 2 ) x3 x2 + 7 IF 2 IF 2 ID 2 EX 2 M 2 WB 2 (I 3 ) IF 3 IF 3 ID 3 EX 3 M 3 WB 3 (I 4 ) IF 4 IF 4 ID 4 EX 4 M 4 WB 4 Resource Usage time t0 t t2 t3 t4 t5 t6 t7.... IF I - I 2 - I 3 - I 4 ID I - I 2 - I 3 - I 4 EX I - I 2 - I 3 - I 4 M I - I 2 - I 3 - I 4 WB I - I 2 - I 3 - I 4 - pipeline 39 Speculate next ess is +4 Src (pc+4 / jabs / rind/ br) stall E M Jump? I 04 inst I 2 I 096 DD I 2 00 J 304 I 3 04 DD I DD kill jump instruction kills (not stalls) the following instruction How? 40 CS252 S05 20
21 Pipelining Jumps Src (pc+4 / jabs / rind/ br) stall To kill a fetched instruction -- Insert a mux before E M Jump? I 2 I inst Src D I 2 ny interaction beten stall and jump? I 096 DD I 2 00 J 304 I 3 04 DD I DD kill Src D = Case opcode D J, JL... IM 4 Jump Pipeline Diagrams time t0 t t2 t3 t4 t5 t6 t7.... (I ) 096: DD IF ID EX M WB (I 2 ) 00: J 304 IF 2 ID 2 EX 2 M 2 WB 2 (I 3 ) 04: DD IF (I 4 ) 304: DD IF 4 ID 4 EX 4 M 4 WB 4 Resource Usage time t0 t t2 t3 t4 t5 t6 t7.... IF I I 2 I 3 I 4 I 5 ID I I 2 - I 4 I 5 EX I I 2 - I 4 I 5 M I I 2 - I 4 I 5 WB I I 2 - I 4 I 5 - pipeline 42 CS252 S05 2
22 Pipelining Condi=onal Branches Src (pc+4 / jabs / rind / br) stall E M BEQ? I Taken? Src D 04 inst I 2 I 096 DD I 2 00 BEQ x,x I 3 04 DD I DD Branch condifon is not known unfl the execute stage what ac+on should be taken in the decode stage? 43 Pipelining Condi=onal Branches Src (pc+4 / jabs / rind / br) stall? E Bcond? M 08 inst Src D I 096 DD I 2 00 BEQ x,x I 3 04 DD I DD I 3 I 2 I Taken? If the branch is taken - kill the two following instrucfons - the instrucfon at the decode stage is not valid stall signal is not valid 44 CS252 S05 22
23 Pipelining Condi=onal Branches Src (pc+4/jabs/rind/br) stall Src E E Bcond? M Jump? I 2 I Taken? 08 inst Src D I : 096 DD I 2: 00 BEQ x,x I 3: 04 DD I 4: 304 DD I 3 If the branch is taken - kill the two following instrucfons - the instrucfon at the decode stage is not valid stall signal is not valid 45 Branch Pipeline Diagrams (resolved in execute stage) time t0 t t2 t3 t4 t5 t6 t7.... (I ) 096: DD IF ID EX M WB (I 2 ) 00: BEQ +200 IF 2 ID 2 EX 2 M 2 WB 2 (I 3 ) 04: DD IF 3 ID (I 4 ) 08: IF (I 5 ) 304: DD IF 5 ID 5 EX 5 M 5 WB 5 Resource Usage time t0 t t2 t3 t4 t5 t6 t7.... IF I I 2 I 3 I 4 I 5 ID I I 2 I 3 - I 5 EX I I I 5 M I I I 5 WB I I I 5 - pipeline 46 CS252 S05 23
24 cknowledgements These slides contain material developed and copyright by: rvind (MIT) Krste sanovic (MIT/UCB) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) David PaEerson (UCB) MIT material derived from course UCB material derived from course CS CS252 S05 24
Computer Architecture ELEC2401 & ELEC3441
Last Time Pipeline Hazard Computer Architecture ELEC2401 & ELEC3441 Lecture 8 Pipelining (3) Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Structural Hazard Hazard Control
More informationSimple Instruction-Pipelining. Pipelined Harvard Datapath
6.823, L8--1 Simple ruction-pipelining Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. I fetch decode & eg-fetch execute memory Clock period
More informationSimple Instruction-Pipelining. Pipelined Harvard Datapath
6.823, L8--1 Simple ruction-pipelining Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. fetch decode & eg-fetch execute
More informationImplementing the Controller. Harvard-Style Datapath for DLX
6.823, L6--1 Implementing the Controller Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L6--2 Harvard-Style Datapath for DLX Src1 ( j / ~j ) Src2 ( R / RInd) RegWrite MemWrite
More informationSimple Instruction-Pipelining (cont.) Pipelining Jumps
6.823, L9--1 Simple ruction-pipelining (cont.) + Interrupts Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Src1 ( j / ~j ) Src2 ( / Ind) Pipelining Jumps
More informationEE 660: Computer Architecture Out-of-Order Processors
EE 660: Computer Architecture Out-of-Order Processors Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David entzlaff Agenda I4 Processors I2O2
More informationComputer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle
Computer Engineering Department CC 311- Computer Architecture Chapter 4 The Processor: Datapath and Control Single Cycle Introduction The 5 classic components of a computer Processor Input Control Memory
More information3. (2) What is the difference between fixed and hybrid instructions?
1. (2 pts) What is a "balanced" pipeline? 2. (2 pts) What are the two main ways to define performance? 3. (2) What is the difference between fixed and hybrid instructions? 4. (2 pts) Clock rates have grown
More informationCPU DESIGN The Single-Cycle Implementation
CSE 202 Computer Organization CPU DESIGN The Single-Cycle Implementation Shakil M. Khan (adapted from Prof. H. Roumani) Dept of CS & Eng, York University Sequential vs. Combinational Circuits Digital circuits
More information4. (3) What do we mean when we say something is an N-operand machine?
1. (2) What are the two main ways to define performance? 2. (2) When dealing with control hazards, a prediction is not enough - what else is necessary in order to eliminate stalls? 3. (3) What is an "unbalanced"
More information1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?
1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 2. (2 )What are the two main ways to define performance? 3. (2 )What
More information[2] Predicting the direction of a branch is not enough. What else is necessary?
[2] What are the two main ways to define performance? [2] Predicting the direction of a branch is not enough. What else is necessary? [2] The power consumed by a chip has increased over time, but the clock
More information[2] Predicting the direction of a branch is not enough. What else is necessary?
[2] When we talk about the number of operands in an instruction (a 1-operand or a 2-operand instruction, for example), what do we mean? [2] What are the two main ways to define performance? [2] Predicting
More informationECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)
ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC
More informationCSCI-564 Advanced Computer Architecture
CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA
More informationCMP N 301 Computer Architecture. Appendix C
CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)
More informationPipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2
Pipelining CS 365 Lecture 12 Prof. Yih Huang CS 365 1 Traditional Execution 1 2 3 4 1 2 3 4 5 1 2 3 add ld beq CS 365 2 1 Pipelined Execution 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
More informationProject Two RISC Processor Implementation ECE 485
Project Two RISC Processor Implementation ECE 485 Chenqi Bao Peter Chinetti November 6, 2013 Instructor: Professor Borkar 1 Statement of Problem This project requires the design and test of a RISC processor
More informationICS 233 Computer Architecture & Assembly Language
ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by
More informationEXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control
The MIPS Pipeline CSCI206 - Computer Organization & Programming Pipeline Datapath and Control zybook: 11.6 Developed and maintained by the Bucknell University Computer Science Department - 2017 Hazard
More informationCPSC 3300 Spring 2017 Exam 2
CPSC 3300 Spring 2017 Exam 2 Name: 1. Matching. Write the correct term from the list into each blank. (2 pts. each) structural hazard EPIC forwarding precise exception hardwired load-use data hazard VLIW
More informationLecture: Pipelining Basics
Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:
More informationProcessor Design & ALU Design
3/8/2 Processor Design A. Sahu CSE, IIT Guwahati Please be updated with http://jatinga.iitg.ernet.in/~asahu/c22/ Outline Components of CPU Register, Multiplexor, Decoder, / Adder, substractor, Varity of
More informationMicroprocessor Power Analysis by Labeled Simulation
Microprocessor Power Analysis by Labeled Simulation Cheng-Ta Hsieh, Kevin Chen and Massoud Pedram University of Southern California Dept. of EE-Systems Los Angeles CA 989 Outline! Introduction! Problem
More informationControl. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...
// CS 2, Fall 2! CS 2, Fall 2! We built the instrument. Now we read music and play it... A simple implementa/on uc/on uct r r 2 Write r Src Src Extend 6 Mem Next: path 7-2 CS 2, Fall 2! signals CS 2, Fall
More informationBuilding a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1
Building a Computer I wonder where this goes? B LU MIPS Kit Quiz # on /3, open book and notes (This is the last lecture covered) Comp 4 Fall 7 /4/7 L6- Building a Computer THIS IS IT! Motivating Force
More informationEC 413 Computer Organization
EC 413 Computer Organization rithmetic Logic Unit (LU) and Register File Prof. Michel. Kinsy Computing: Computer Organization The DN of Modern Computing Computer CPU Memory System LU Register File Disks
More information61C In the News. Processor Design: 5 steps
www.eetimes.com/electronics-news/23235/thailand-floods-take-toll-on--makers The Thai floods have already claimed the lives of hundreds of pele, with tens of thousands more having had to flee their homes
More informationL07-L09 recap: Fundamental lesson(s)!
L7-L9 recap: Fundamental lesson(s)! Over the next 3 lectures (using the IPS ISA as context) I ll explain:! How functions are treated and processed in assembly! How system calls are enabled in assembly!
More informationCSE Computer Architecture I
Execution Sequence Summary CSE 30321 Computer Architecture I Lecture 17 - Multi Cycle Control Michael Niemier Department of Computer Science and Engineering Step name Instruction fetch Instruction decode/register
More informationLecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lectre 9: Control Hazard and Resoltion James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L09 S1, James C. Hoe, CU/ECE/CALC, 2018 Yor goal today Hosekeeping simple control flow
More informationSpiral 1 / Unit 3
-3. Spiral / Unit 3 Minterm and Maxterms Canonical Sums and Products 2- and 3-Variable Boolean Algebra Theorems DeMorgan's Theorem Function Synthesis use Canonical Sums/Products -3.2 Outcomes I know the
More informationReview: Single-Cycle Processor. Limits on cycle time
Review: Single-Cycle Processor Jump 3:26 5: MemtoReg Control Unit LUControl 2: Op Funct LUSrc RegDst PCJump PC 4 uction + PCPlus4 25:2 2:6 2:6 5: 5: 2 3 WriteReg 4: Src Src LU Zero LUResult Write + PC
More informationCMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining
CMU 18-447 Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining Instructor: Prof Onur Mutlu TAs: Rachata Ausavarungnirun, Kevin Chang, Albert Cho, Jeremie
More information/ : Computer Architecture and Design
16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)
More informationCOVER SHEET: Problem#: Points
EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)
More informationA Second Datapath Example YH16
A Second Datapath Example YH16 Lecture 09 Prof. Yih Huang S365 1 A 16-Bit Architecture: YH16 A word is 16 bit wide 32 general purpose registers, 16 bits each Like MIPS, 0 is hardwired zero. 16 bit P 16
More informationENEE350 Lecture Notes-Weeks 14 and 15
Pipelining & Amdahl s Law ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining is a method of processing in which a problem is divided into a number of sub problems and solved and the solu8ons of the sub problems
More informationDesign. Dr. A. Sahu. Indian Institute of Technology Guwahati
CS222: Processor Design: Multi Cycle Design Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati Mid Semester Exam Multi Cycle design Outline Clock periods in single cycle and
More informationIssue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)
Out-of-order Pipeline Buffer of instructions Issue = Select + Wakeup Select N oldest, read instructions N=, xor N=, xor and sub Note: ma have execution resource constraints: i.e., load/store/fp Fetch Decode
More informationComputer Architecture. ESE 345 Computer Architecture. Design Process. CA: Design process
Computer Architecture ESE 345 Computer Architecture Design Process 1 The Design Process "To Design Is To Represent" Design activity yields description/representation of an object -- Traditional craftsman
More informationCHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.
CHPTER 9 2008 Pearson Education, Inc. 9-. log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C 7 Z = F 7 + F 6 + F 5 + F 4 + F 3 + F 2 + F + F 0 N = F 7 9-3.* = S + S = S + S S S S0 C in C 0 dder
More informationComputer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1
Computer Architecture ECE 361 Lecture 5: The Design Process & Design 361 design.1 Quick Review of Last Lecture 361 design.2 MIPS ISA Design Objectives and Implications Support general OS and C- style language
More informationComputer Architecture
Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines
More informationLecture 3, Performance
Lecture 3, Performance Repeating some definitions: CPI Clocks Per Instruction MHz megahertz, millions of cycles per second MIPS Millions of Instructions Per Second = MHz / CPI MOPS Millions of Operations
More informationLecture 3, Performance
Repeating some definitions: Lecture 3, Performance CPI MHz MIPS MOPS Clocks Per Instruction megahertz, millions of cycles per second Millions of Instructions Per Second = MHz / CPI Millions of Operations
More informationFigure 4.9 MARIE s Datapath
Term Control Word Microoperation Hardwired Control Microprogrammed Control Discussion A set of signals that executes a microoperation. A register transfer or other operation that the CPU can execute in
More informationTEST 1 REVIEW. Lectures 1-5
TEST 1 REVIEW Lectures 1-5 REVIEW Test 1 will cover lectures 1-5. There are 10 questions in total with the last being a bonus question. The questions take the form of short answers (where you are expected
More informationUNIVERSITY OF WISCONSIN MADISON
CS/ECE 252: INTRODUCTION TO COMPUTER ENGINEERING UNIVERSITY OF WISCONSIN MADISON Prof. Gurindar Sohi TAs: Minsub Shin, Lisa Ossian, Sujith Surendran Midterm Examination 2 In Class (50 minutes) Friday,
More informationECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk
ECE290 Fall 2012 Lecture 22 Dr. Zbigniew Kalbarczyk Today LC-3 Micro-sequencer (the control store) LC-3 Micro-programmed control memory LC-3 Micro-instruction format LC -3 Micro-sequencer (the circuitry)
More informationDesign of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017
Design of Digital Circuits Lecture 4: Microprogramming Prof. Onur Mutlu ETH Zurich Spring 27 7 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed
More informationFall 2011 Prof. Hyesoon Kim
Fall 2011 Prof. Hyesoon Kim Add: 2 cycles FE_stage add r1, r2, r3 FE L ID L EX L MEM L WB L add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub add add mul sub sub add add mul sub sub add
More informationCPU DESIGN The Single-Cycle Implementation
22 ompter Organization Seqential vs. ombinational ircits Digital circits can be classified into two categories: DESIGN The Single-ycle Implementation. ombinational ircits: m, 2. Seqential ircits: flip-flops,
More informationCS 152 Computer Architecture and Engineering. Lecture 17: Synchronization and Sequential Consistency
CS 152 Computer Architecture and Engineering Lecture 17: Synchronization and Sequential Consistency Dr. George Michelogiannakis EECS, University of California at Berkeley CRD, Lawrence Berkeley National
More informationCOMP303 Computer Architecture Lecture 11. An Overview of Pipelining
COMP303 Compute Achitectue Lectue 11 An Oveview of Pipelining Pipelining Pipelining povides a method fo executing multiple instuctions at the same time. Laundy Example: Ann, Bian, Cathy, Dave each have
More informationECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018
ECE 172 Digital Systems Chapter 12 Instruction Pipelining Herbert G. Mayer, PSU Status 7/20/2018 1 Syllabus l Scheduling on Pipelined Architecture l Idealized Pipeline l Goal of Scheduling l Causes for
More informationLecture 12: Pipelined Implementations: Control Hazards and Resolutions
18-447 Lectre 12: Pipelined Implementations: Control Hazards and Resoltions S 09 L12-1 James C. Hoe Dept of ECE, CU arch 2, 2009 Annoncements: Spring break net week!! Project 2 de the week after spring
More informationCS 152 Computer Architecture and Engineering. Lecture 17: Synchroniza<on and Sequen<al Consistency. Last Time, Lecture 16: GPUs. NOW Handout Page 1
CS 152 Computer Architecture and Engineering Lecture 17: Synchroniza
More informationSISD SIMD. Flynn s Classification 8/8/2016. CS528 Parallel Architecture Classification & Single Core Architecture C P M
8/8/26 S528 arallel Architecture lassification & Single ore Architecture arallel Architecture lassification A Sahu Dept of SE, IIT Guwahati A Sahu Flynn s lassification SISD Architecture ategories M SISD
More informationEE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining
Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining
More informationALU A functional unit
ALU A functional unit that performs arithmetic operations such as ADD, SUB, MPY logical operations such as AND, OR, XOR, NOT on given data types: 8-,16-,32-, or 64-bit values A n-1 A n-2... A 1 A 0 B n-1
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationLecture 13: Sequential Circuits, FSM
Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines 1 Clocks A microprocessor is composed of many different circuits that are operating simultaneously if each
More informationLecture 13: Sequential Circuits, FSM
Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines Reminder: midterm on Tue 2/28 will cover Chapters 1-3, App A, B if you understand all slides, assignments,
More informationUnit 6: Branch Prediction
CIS 501: Computer Architecture Unit 6: Branch Prediction Slides developed by Joe Devie/, Milo Mar4n & Amir Roth at Upenn with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi,
More informationOutcomes. Spiral 1 / Unit 2. Boolean Algebra BOOLEAN ALGEBRA INTRO. Basic Boolean Algebra Logic Functions Decoders Multiplexers
-2. -2.2 piral / Unit 2 Basic Boolean Algebra Logic Functions Decoders Multipleers Mark Redekopp Outcomes I know the difference between combinational and sequential logic and can name eamples of each.
More informationGATE 2014 A Brief Analysis (Based on student test experiences in the stream of CS on 1 st March, Second Session)
GATE 4 A Brief Analysis (Based on student test experiences in the stream of CS on st March, 4 - Second Session) Section wise analysis of the paper Mark Marks Total No of Questions Engineering Mathematics
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits
Digital Integrated Circuits Design Perspective rithmetic Circuits Reference: Digital Integrated Circuits, 2nd edition, Jan M. Rabaey, nantha Chandrakasan and orivoje Nikolic Disclaimer: slides adapted
More informationPerformance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So
Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION
More informationVerilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures
Chapter Additional Design Examples Verilog HDL:Digital Design and Modeling Chapter Additional Design Examples Additional Figures Chapter Additional Design Examples 2 Page 62 a b y y 2 y 3 c d e f Figure
More informationReview. Combined Datapath
Review Topics:. A single cycle implementation 2. State Diagrams. A mltiple cycle implementation COSC 22: Compter Organization Instrctor: Dr. Amir Asif Department of Compter Science York University Handot
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH
More informationState and Finite State Machines
State and Finite State Machines See P&H Appendix C.7. C.8, C.10, C.11 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register
More informationUnit 8: Sequ. ential Circuits
CPSC 121: Models of Computation Unit 8: Sequ ential Circuits Based on slides by Patrice Be lleville and Steve Wolfman Pre-Class Learning Goals By the start of class, you s hould be able to Trace the operation
More informationState & Finite State Machines
State & Finite State Machines Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H Appendix C.7. C.8, C.10, C.11 Big Picture: Building a Processor memory inst register file
More informationCSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT
CSE 560 Practice Problem Set 4 Solution 1. In this question, you will examine several different schemes for branch prediction, using the following code sequence for a simple load store ISA with no branch
More informationWorst-Case Execution Time Analysis. LS 12, TU Dortmund
Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper
More informationProfessor Lee, Yong Surk. References. Topics Microprocessor & microcontroller. High Performance Microprocessor Architecture Overview
This lecture was mae by a generous contribution of C & S Technology corporation. (http://( http://www.cnstec.com) There is no copyright on this lecture. Processor Laboratory homepage, (http://( http://mpu.yonsei.ac.kr)
More informationLogic and Computer Design Fundamentals. Chapter 8 Sequencing and Control
Logic and Computer Design Fundamentals Chapter 8 Sequencing and Control Datapath and Control Datapath - performs data transfer and processing operations Control Unit - Determines enabling and sequencing
More informationLECTURE 28. Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State
Today LECTURE 28 Analyzing digital computation at a very low level! The Latch Pipelined Datapath Control Signals Concept of State Time permitting, RC circuits (where we intentionally put in resistance
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH
More informationTable of Content. Chapter 11 Dedicated Microprocessors Page 1 of 25
Chapter 11 Dedicated Microprocessors Page 1 of 25 Table of Content Table of Content... 1 11 Dedicated Microprocessors... 2 11.1 Manual Construction of a Dedicated Microprocessor... 3 11.2 FSM + D Model
More informationEnrico Nardelli Logic Circuits and Computer Architecture
Enrico Nardelli Logic Circuits and Computer Architecture Appendix B The design of VS0: a very simple CPU Rev. 1.4 (2009-10) by Enrico Nardelli B - 1 Instruction set Just 4 instructions LOAD M - Copy into
More informationDepartment of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.
Last (family) name: Solution First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE/CS 752 Advanced Computer Architecture I Midterm
More informationPipelined Datapath. Reading. Sections Practice Problems: 1, 3, 8, 12 (2) Lecture notes from MKP, H. H. Lee and S.
Pipelined Datapath Lectre notes from KP, H. H. Lee and S. Yalamanchili Sections 4.5 4. Practice Problems:, 3, 8, 2 Reading (2) Pipeline Performance Assme time for stages is v ps for register read or write
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationCS61C : Machine Structures
CS 61C L15 Blocks (1) inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #15: Combinational Logic Blocks Outline CL Blocks Latches & Flip Flops A Closer Look 2005-07-14 Andy Carle CS
More informationProfessor Fearing EECS150/Problem Set Solution Fall 2013 Due at 10 am, Thu. Oct. 3 (homework box under stairs)
Professor Fearing EECS50/Problem Set Solution Fall 203 Due at 0 am, Thu. Oct. 3 (homework box under stairs). (25 pts) List Processor Timing. The list processor as discussed in lecture is described in RT
More informationDepartment of Electrical and Computer Engineering The University of Texas at Austin
Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Fall 2004 Yale Patt, Instructor Aater Suleman, Huzefa Sanjeliwala, Dam Sunwoo, TAs Exam 1, October 6, 2004 Name:
More informationCS61C : Machine Structures
inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #15: Combinational Logic Blocks 2005-07-14 CS 61C L15 Blocks (1) Andy Carle Outline CL Blocks Latches & Flip Flops A Closer Look CS
More informationEECS Components and Design Techniques for Digital Systems. FSMs 9/11/2007
EECS 150 - Components and Design Techniques for Digital Systems FSMs 9/11/2007 Sarah Bird Electrical Engineering and Computer Sciences University of California, Berkeley Slides borrowed from David Culler
More informationCMP 334: Seventh Class
CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative
More informationOutcomes. Spiral 1 / Unit 3. The Problem SYNTHESIZING LOGIC FUNCTIONS
-3. -3.2 Outcomes Spiral / Unit 3 Minterm and Materms Canonical Sums and Products 2 and 3 Variable oolean lgebra Theorems emorgan's Theorem unction Snthesis use Canonical Sums/Products Mark Redekopp I
More informationECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University
ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #4 MIPS Assembly and Simulation
More informationECE 448 Lecture 6. Finite State Machines. State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL Code. George Mason University
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL Code George Mason University Required reading P. Chu, FPGA Prototyping by VHDL Examples
More informationCprE 281: Digital Logic
CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Simple Processor CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev Digital
More informationECE 2300 Digital Logic & Computer Organization
ECE 2300 Digital Logic & Computer Organization pring 201 More inary rithmetic LU 1 nnouncements Lab 4 prelab () due tomorrow Lab 5 to be released tonight 2 Example: Fixed ize 2 C ddition White stone =
More informationCS 700: Quantitative Methods & Experimental Design in Computer Science
CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,
More informationPerformance, Power & Energy
Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?
More informationPortland State University ECE 587/687. Branch Prediction
Portland State University ECE 587/687 Branch Prediction Copyright by Alaa Alameldeen and Haitham Akkary 2015 Branch Penalty Example: Comparing perfect branch prediction to 90%, 95%, 99% prediction accuracy,
More information