Computer Architecture Lecture 8: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/4/2013

Size: px
Start display at page:

Download "Computer Architecture Lecture 8: Pipelining. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/4/2013"

Transcription

1 Computer Architecture Lecture 8: Pipelining Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 2/4/2013

2 Reminder: Homework 2 Homework 2 out Due February 11 (next Monday) LC-3b microcode ISA concepts, ISA vs. microarchitecture, microcoded machines 2

3 Reminder: Lab Assignment 2 Lab Assignment 1.5 Verilog practice Not to be turned in Lab Assignment 2 Due Feb 15 Single-cycle MIPS implementation in Verilog All labs are individual assignments No collaboration; please respect the honor code 3

4 Lookahead: Extra Credit for Lab Assignment 2 Complete your normal (single-cycle) implementation first, and get it checked off in lab. Then, implement the MIPS core using a microcoded approach similar to what we will discuss in class. We are not specifying any particular details of the microcode format or the microarchitecture; you can be creative. For the extra credit, the microcoded implementation should execute the same programs that your ordinary implementation does, and you should demo it by the normal lab deadline. You will get maximum 4% of course grade Document what you have done and demonstrate well 4

5 Readings for Today Pipelining P&H Chapter Optional: Hamacher et al. book, Chapter 6, Pipelining Pipelined LC-3b Microarchitecture a=18447-lc3b-pipelining.pdf 5

6 Today s Agenda Finish microprogrammed microarchitectures Start pipelining 6

7 Review: Last Lecture An exercise in microprogramming 7

8 Review: An Exercise in Microprogramming 8

9 Handouts 7 pages of Microprogrammed LC-3b design als edia=lc3b-figures.pdf 9

10 A Simple LC-3b Control and Datapath 10

11 MAR <! PC PC <! PC , 19 MDR <! M 33 R R IR <! MDR 35 To 8 RTI ADD BEN<! IR[11] & N + IR[10] & Z + IR[9] & P [IR[15:12]] BR To 11 To 10 To 18 DR<! SR1+OP2* set CC DR<! SR1&OP2* set CC 1 5 AND XOR TRAP SHF LEA LDB LDW STW STB JSR JMP 0 [BEN] PC<! PC+LSHF(off9,1) To 18 9 DR<! SR1 XOR OP2* set CC 12 PC<! BaseR To 18 To 18 MAR<! LSHF(ZEXT[IR[7:0]],1) 15 4 [IR[11]] To 18 R MDR<! M[MAR] R7<! PC R PC<! MDR R7<! PC PC<! BaseR 21 R7<! PC To 18 PC<! PC+LSHF(off11,1) To DR<! SHF(SR,A,D,amt4) set CC To 18 To DR<! PC+LSHF(off9, 1) set CC 2 MAR<! B+off6 6 MAR<! B+LSHF(off6,1) 7 MAR<! B+LSHF(off6,1) 3 MAR<! B+off6 To NOTES B+off6 : Base + SEXT[offset6] PC+off9 : PC + SEXT[offset9] *OP2 may be SR2 or SEXT[imm5] ** [15:8] or [7:0] depending on MAR[0] MDR<! M[MAR[15:1] 0] R R 31 DR<! SEXT[BYTE.DATA] set CC MDR<! M[MAR] 27 R DR<! MDR set CC R MDR<! SR 16 M[MAR]<! MDR R R MDR<! SR[7:0] 17 M[MAR]<! MDR** R R To 18 To 18 To 18 To 19

12

13 State Machine for LDW Microsequencer 10APPENDIX C. THE MICROARCHITECTURE OF THE LC-3B, BASIC MACHINE COND1 COND0 BEN R IR[11] Branch Ready Addr. Mode J[5] J[4] J[3] J[2] J[1] J[0] 0,0,IR[15:12] 6 IRD 6 Address of Next State Figure C.5: The microsequencer of the LC-3b base machine State 18 (010010) State 33 (100001) State 35 (100011) State 32 (100000) State 6 (000110) State 25 (011001) State 27 (011011) unused opcodes, the microarchitecture would execute a sequence of microinstructions, starting at state 10 or state 11, dependingon which illegal opcodewas being decoded. In both cases, the sequence of microinstructions would respond to the fact that an instruction with an illegal opcode had been fetched. Several signals necessary to control the data path and the microsequencer are not among those listed in Tables C.1 and C.2. They are DR, SR1, BEN, and R. Figure C.6 shows the additional logic needed to generate DR, SR1, and BEN. The remaining signal, R, is a signal generated by the memory in order to allow the

14 C.4. THE CONTROL STRUCTURE 11 IR[11:9] IR[11:9] DR SR1 111 IR[8:6] DRMUX SR1MUX (a) (b) IR[11:9] N Z P Logic BEN (c) Figure C.6: Additional logic required to provide control signals

15

16 R IR[15:11] BEN Microsequencer 6 Control Store 2 6 x Microinstruction 9 26 (J, COND, IRD)

17 10APPENDIX C. THE MICROARCHITECTURE OF THE LC-3B, BASIC MACHINE COND1 COND0 BEN R IR[11] Branch Ready Addr. Mode J[5] J[4] J[3] J[2] J[1] J[0] 0,0,IR[15:12] 6 IRD 6 Address of Next State Figure C.5: The microsequencer of the LC-3b base machine

18 J IRD Cond LD.MDR LD.IR LD.BEN LD.REG LD.CC LD.MAR GatePC GateMDR GateALU LD.PC GateMARMUX GateSHF PCMUX DRMUX SR1MUX ADDR1MUX ADDR2MUX MARMUX ALUK MIO.EN R.W DATA.SIZE LSHF (State 0) (State 1) (State 2) (State 3) (State 4) (State 5) (State 6) (State 7) (State 8) (State 9) (State 10) (State 11) (State 12) (State 13) (State 14) (State 15) (State 16) (State 17) (State 18) (State 19) (State 20) (State 21) (State 22) (State 23) (State 24) (State 25) (State 26) (State 27) (State 28) (State 29) (State 30) (State 31) (State 32) (State 33) (State 34) (State 35) (State 36) (State 37) (State 38) (State 39) (State 40) (State 41) (State 42) (State 43) (State 44) (State 45) (State 46) (State 47) (State 48) (State 49) (State 50) (State 51) (State 52) (State 53) (State 54) (State 55) (State 56) (State 57) (State 58) (State 59) (State 60) (State 61) (State 62) (State 63)

19 10APPENDIX C. THE MICROARCHITECTURE OF THE LC-3B, BASIC MACHINE COND1 COND0 BEN R IR[11] Branch Ready Addr. Mode J[5] J[4] J[3] J[2] J[1] J[0] 0,0,IR[15:12] 6 IRD 6 Address of Next State Figure C.5: The microsequencer of the LC-3b base machine

20 Review: End of the Exercise in Microprogramming 20

21 The Microsequencer: Some Questions When is the IRD signal asserted? What happens if an illegal instruction is decoded? What are condition (COND) bits for? How is variable latency memory handled? How do you do the state encoding? Minimize number of state variables Start with the 16-way branch Then determine constraint tables and states dependent on COND 21

22 The Control Store: Some Questions What control signals can be stored in the control store? vs. What control signals have to be generated in hardwired logic? i.e., what signal cannot be available without processing in the datapath? 22

23 Variable-Latency Memory The ready signal (R) enables memory read/write to execute correctly Example: transition from state 33 to state 35 is controlled by the R bit asserted by memory when memory data is available Could we have done this in a single-cycle microarchitecture? 23

24 The Microsequencer: Advanced Questions What happens if the machine is interrupted? What if an instruction generates an exception? How can you implement a complex instruction using this control structure? Think REP MOVS 24

25 The Power of Abstraction The concept of a control store of microinstructions enables the hardware designer with a new abstraction: microprogramming The designer can translate any desired operation to a sequence microinstructions All the designer needs to provide is The sequence of microinstructions needed to implement the desired operation The ability for the control logic to correctly sequence through the microinstructions Any additional datapath control signals needed (no need if the operation can be translated into existing control signals) 25

26 Let s Do Some More Microprogramming Implement REP MOVS in the LC-3b microarchitecture What changes, if any, do you make to the state machine? datapath? control store? microsequencer? Show all changes and microinstructions Coming up in Homework 3 26

27 Aside: Alignment Correction in Memory Remember unaligned accesses LC-3b has byte load and byte store instructions that move data not aligned at the word-address boundary Convenience to the programmer/compiler How does the hardware ensure this works correctly? Take a look at state 29 for LDB States 24 and 17 for STB Additional logic to handle unaligned accesses 27

28 Aside: Memory Mapped I/O Address control logic determines whether the specified address of LDx and STx are to memory or I/O devices Correspondingly enables memory or I/O devices and sets up muxes Another instance where the final control signals (e.g., MEM.EN or INMUX/2) cannot be stored in the control store Dependent on address 28

29 Advantages of Microprogrammed Control Allows a very simple datapath to do powerful computation by controlling the datapath (using a sequencer) High-level ISA translated into microcode (sequence of microinstructions) Microcode enables a minimal datapath to emulate an ISA Microinstructions can be thought of a user-invisible ISA Enables easy extensibility of the ISA Can support a new instruction by changing the ucode Can support complex instructions as a sequence of simple microinstructions If I can sequence an arbitrary instruction then I can sequence an arbitrary program as a microprogram sequence will need some new state (e.g. loop counters) in the microcode for sequencing more elaborate programs 29

30 Update of Machine Behavior The ability to update/patch microcode in the field (after a processor is shipped) enables Ability to add new instructions without changing the processor! Ability to fix buggy hardware implementations Examples IBM 370 Model 145: microcode stored in main memory, can be updated after a reboot IBM System z: Similar to 370/145. Heller and Farrell, Millicode in an IBM zseries processor, IBM JR&D, May/Jul B1700 microcode can be updated while the processor is running User-microprogrammable machine! 30

31 An Example Microcoded Multi-Cycle MIPS Design P&H, Appendix D Any ISA can be implemented this way We will not cover this in class However, you can do the extra credit assignment for Lab 2 Partial credit even if your full design does not work Maximum 4% of your grade in the course 31

32 A Microprogrammed MIPS Design: Lab 2 Extra Credit 32

33 Microcoded Multi-Cycle MIPS Design [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 33

34 Control Logic for MIPS FSM [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 34

35 Microprogrammed Control for MIPS FSM [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 35

36 End: A Microprogrammed MIPS Design: Lab 2 Extra Credit 36

37 k-bit control output Horizontal Microcode M ic ro c o d e sto ra g e O u tp u ts n-bit mpc input In p u t ALUSrcA IorD D a ta p a th IRWrite PCWrite PCWriteCond. c o n tro l o u tp u ts 1 A d d e r M ic ro p ro g ra m c o u n te r S e q u e n cin g co n tro l A d d r e s s se le ct lo g ic In p u ts fro m in s tru ctio n re g is te r o p c o d e fie ld [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Control Store: 2 n k bit (not including sequencing) 37

38 Vertical Microcode 1 M ic ro c o d e sto ra g e O u tp u ts n-bit mpc input In p u t PC PC+4 PC ALUOut D PC a ta p a th PC[ 31:28 ],IR[ 25:0 ],2 b00 c o n tro l IR MEM[ PC ] o u tp u ts A RF[ IR[ 25:21 ] ] B RF[ IR[ 20:16 ] ]. 1-bit signal means do this RT (or combination of RTs). m-bit input A d d e r M ic ro p ro g ra m c o u n te r S e q u e n cin g co n tro l ROM A d d r e s s se le ct lo g ic k-bit output [Based on original figure from P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] In p u ts fro m in s tru ctio n re g is te r o p c o d e fie ld ALUSrcA IorD IRWrite PCWrite PCWriteCond. If done right (i.e., m<<n, and m<<k), two ROMs together (2 n m+2 m k bit) should be smaller than horizontal microcode ROM (2 n k bit) 38

39 Nanocode and Millicode Nanocode: a level below traditional mcode mprogrammed control for sub-systems (e.g., a complicated floatingpoint module) that acts as a slave in a mcontrolled datapath Millicode: a level above traditional mcode ISA-level subroutines that can be called by the mcontroller to handle complicated operations and system functions E.g., Heller and Farrell, Millicode in an IBM zseries processor, IBM JR&D, May/Jul In both cases, we avoid complicating the main mcontroller You can think of these as microcode at different levels of abstraction 39

40 Nanocode Concept Illustrated a mcoded processor implementation ROM mpc processor datapath We refer to this as nanocode when a mcoded subsystem is embedded in a mcoded system a mcoded FPU implementation ROM mpc arithmetic datapath 40

41 Multi-Cycle vs. Single-Cycle uarch Advantages Disadvantages You should be very familiar with this right now 41

42 Microprogrammed vs. Hardwired Control Advantages Disadvantages You should be very familiar with this right now 42

43 Can We Do Better? What limitations do you see with the multi-cycle design? Limited concurrency Some hardware resources are idle during different phases of instruction processing cycle Fetch logic is idle when an instruction is being decoded or executed Most of the datapath is idle when a memory access is happening 43

44 Can We Use the Idle Hardware to Improve Concurrency? Goal: Concurrency throughput (more work completed in one cycle) Idea: When an instruction is using some resources in its processing phase, process other instructions on idle resources not needed by that instruction E.g., when an instruction is being decoded, fetch the next instruction E.g., when an instruction is being executed, decode another instruction E.g., when an instruction is accessing data memory (ld/st), execute the next instruction E.g., when an instruction is writing its result into the register file, access data memory for the next instruction 44

45 Pipelining: Basic Idea More systematically: Idea: Pipeline the execution of multiple instructions Analogy: Assembly line processing of instructions Divide the instruction processing cycle into distinct stages of processing Ensure there are enough hardware resources to process one instruction in each stage Process a different instruction in each stage Instructions consecutive in program order are processed in consecutive stages Benefit: Increases instruction processing throughput (1/CPI) Downside: Start thinking about this 45

46 Example: Execution of Four Independent ADDs Multi-cycle: 4 cycles per instruction F D E W F D E W F D E W F D E W Pipelined: 4 cycles per 4 instructions (steady state) Time F D E W F D E W F D E W F D E W Time 46

47 The Laundry Analogy T im e T a sk o rd e r A 6 P M A M B C D place one dirty load of clothes in the washer when the washer is finished, place the wet load in the dryer when the dryer is 7 finished, take out the dry load and fold 6 P M A M T im e when folding is finished, ask your roommate (??) to put the clothes T a sk away o rd e r A B C - steps to do a load are sequentially dependent - no dependence between different loads - different steps do not share resources D Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 47

48 Pipelining Multiple Loads of Laundry 7 T im e T a sk o rd e r T a sk B o rd e r AC 6 P M A M 6 P M A M A T im e DB C D T im e 6 P M A M T a sk o rd e r T im e A T a sk o rd e r B A C B D C 6 P M A M - 4 loads of laundry in parallel - no additional resources - throughput increased by 4 - latency per load is the same D Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 48

49 6 P M A M Pipelining T im e Multiple Loads of Laundry: In Practice T a sk o rd e r A 6 P M A M T im e B T a sk o rd e r C A D B C D T im e 6 P M A M T a sk o rd e r 6 P M A M T im ea T a sk B o rd e r C A D B C D Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] the slowest step decides throughput 49

50 Pipelining 6 PMultiple M 7 8 Loads of 1 1Laundry: In 2 A M Practice T im e T a sk o rd e r A 6 P M A M T im e T a sk B o rd e r C A D B C D T im e 6 P M A M T a sk o rd e r 6 P M A M T im e A T a sk B o rd e r AC DB C D Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] A B A B Throughput restored (2 loads per hour) using 2 dryers 50

51 An Ideal Pipeline Goal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing) Repetition of identical operations The same operation is repeated on a large number of different inputs Repetition of independent operations No dependencies between repeated operations Uniformly partitionable suboperations Processing can be evenly divided into uniform-latency suboperations (that do not share resources) Fitting examples: automobile assembly line, doing laundry What about the instruction processing cycle? 51

52 Ideal Pipelining combinational logic (F,D,E,M,W) T psec BW=~(1/T) T/2 ps (F,D,E) T/2 ps (M,W) BW=~(2/T) T/3 ps (F,D) T/3 ps (E,M) T/3 ps (M,W) BW=~(3/T) 52

53 More Realistic Pipeline: Throughput Nonpipelined version with delay T BW = 1/(T+S) where S = latch delay T ps k-stage pipelined version BW k-stage = 1 / (T/k +S ) BW max = 1 / (1 gate delay + S ) T/k ps T/k ps 53

54 More Realistic Pipeline: Cost Nonpipelined version with combinational cost G Cost = G+L where L = latch cost G gates k-stage pipelined version Cost k-stage = G + Lk G/k G/k 54

55 Pipelining Instruction Processing 55

56 Remember: The Instruction Processing Cycle Fetch 1. Instruction fetch (IF) 2. Instruction Decode decode and register Evaluate operand Address fetch (ID/RF) 3. Execute/Evaluate Fetch Operands memory address (EX/AG) 4. Memory operand fetch (MEM) 5. Store/writeback Execute result (WB) Store Result 56

57 Remember the Single-Cycle Uarch Instruction [25 0 ] S h ift Jum p address [31 0] left PCSrc 1 =Jump 1 PC +4 [31 28] A d d A LU re su lt M u x 1 0 M u x 4 A d d R e g D st Ju m p B ra n ch S h ift le ft 2 Instruction [31 26] C o ntro l MemR ead MemtoR eg A L U O p PCSrc 2 =Br Taken M e m W rite ALU Src RegW rite P C R e a d ad d re ss In s truc tio n mem ory In struc tio n [3 1 0] Instruction [25 21] Instruction [20 16] Instruction [15 11] 0 M u x 1 R ea d register 1 R ea d W rite da ta R e a d da ta 1 register 2 R eg iste rs R e a d W rite da ta 2 re giste r 0 M u x 1 A L U Z e ro A L U re su lt bcond A d d re ss W rite d ata D ata m e m o ry R e a d d ata 1 M u x 0 Instruction [15 0 ] S ig n e xte n d A L U co n tro l In stru ction [5 0 ] ALU operation Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] T BW=~(1/T) 57

58 Dividing Into Stages 200ps IF : In s tru c tio n fe tch ID : In s tru c tio n d e co d e / 0 M u x 1 100ps 200ps 200ps 100ps r e g is te r file re a d E X : E x e cu te / a d d re ss c a lcu la tio n M E M : M e m o ry a cc e ss W B : W rite b a c k ignore for now A d d 4 A dd A dd res ult S hift left 2 P C A dd re ss Ins tru ctio n In str uction m e m o ry R ea d register 1 R ea d register 2 R eg iste rs W rite re gis ter W rite d ata R e ad d ata 1 R e ad d ata 2 0 M u x 1 A LU Ze ro A LU res ult A d dre ss W rite data D a ta m e m o ry R ead da ta 1 M u x 0 RF write 16 S ign e xte nd 32 Is this the correct partitioning? Why not 4 or 6 stages? Why not different boundaries? Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 58

59 Instruction Pipeline Throughput P ro g ra m e x e c u tio n T im e o rd e r (in in str u c tio n s ) lw $ 1, ($ 0 ) In s tru ctio n D a ta R e g A L U R e g fe tch a c c e ss lw $ 2, ($ 0 ) 800ps 8 n s In s tru ctio n R e g fe tch A L U D a ta a c c e ss R e g lw $ 3, ($ 0 ) P ro g ra m e x e c u tio n T im e o rd e r ( in in stru c tio n s) lw $ 1, ($ 0 ) 8 n s In s tru c tio n D a ta R e g A L U R e g fe tc h a cc e s s 800ps In s tru ctio n fe tch ps 8 n s lw $ 2, ($ 0 ) 2 n s 200ps In s tru c tio n fe tc h R e g A L U D a ta a cc e s s R e g lw $ 3, ($ 0 ) 200ps 2 n s In s tru c tio n fe tc h R e g A L U D a ta a cc e s s R e g 2 n s 2 n s 2 n s 2 n s 2 n s 200ps 200ps 5-stage speedup is 4, not 5 as predicated by the ideal model. Why? 200ps 200ps 200ps 59

60 Ins truc tio n PC D +4 PC E +4 Enabling Pipelined Processing: Pipeline Registers IF : In s tru c tio n fe tch ID : In s tru c tio n d e co d e / E X : E x e cu te / M E M : M e m o ry a cc e ss W B : W rite b a c k r e g is te r file re a d a d d ss c a lcu la tio n No resource is used by more than 1 stage! 0 0 M M u u x x 1 1 IF /ID ID /E X E X /M E M M E M /W B 4 4 A da dd d A dd A dd A dd A dd res res ult ult npc M S hift S hift left left 2 2 P CP C PC F Add A dd ress ss In str uct ion Ins tru ctio n m e m o ry Instruction m e m o ry IR D R ea R ea d d register register 1 1 R er ad e ad d ata d ata 1 1 R ea R ea d d register register 2 2 R eg R eg iste iste rs rs R er ad e ad W Write rite d ata d ata 2 2 re re gis gis ter ter W Write rite d ata d ata S ign S ign exte e xte nd nd A E B E Imm E 0 0 M M u u x x 1 1 Ze Ze ro ro A LU A LU A LU A LU res res ult ult Aout M B M R ead R ead A da dre d dre ss ss da da ta ta D ad ata mme me mo ry o ry WWrite rite data data MDR W Aout W 1 1 MM u u x x 0 0 Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] T/k ps T T/k ps 60

61 Ins I Instructio ction n n Pipelined Operation 32 Example S ig n e xte nd S ig n exte nd 32 W rite data W rite d ata 0 lw In stru ctio n fe tc h 0 M u x 1 All instruction classes must follow the same path and timing through the pipeline stages. Any performance impact? lw lw In stru c tio n d e c o d e E xe cu tio n lw lw M e m o ry W rite b a c k I IF/ID D /ME / M ID /E X E X /M E M M E M /W B dd A d d 4 hift S hift ft left 2 dd dd d d A A dd r resu ult ult lt P C Add ress A re s Instruction In struction memo mory m e m o ry y R ead e d d register register 1 R e ea ad data d ead e d ata 1 R d register register 2 R eg egiste rs rs R ead W rite data d ata 2 register W rite da ata d ata 16 ign 32 S ign exte e nd nd 0 M u x 1 Z Ze Ze eer ro ro o A LU LU L U A LU res ult res ult A d ddr dre ess ss D a ta ta D a ata m eem m o ry ry mme em mmory o y W rite rite ddata data R eead ead data d ta ta ta 1 M u u u x 0 lw 0 M u x 0 M u x 1 In stru c tio n d e c o d e Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] lw W rite b a c k 61

62 Instructio uction n dre ata gister 1M u W rite x ddat ataa 1 16 S ign e xte nd Sign eex xte ten ndd Pipelined Operation Example 32 C lo ck 1 C lo ck C lo5 ck 3 W rite data Wrrite ite data ta D a ta mem e mor o ry y 0 M u x 0 slw u b $ 10 1,, 2$ 02 ($, $ 13) su lw b $ $ 110 1,, 2$ 02 ($, 1 $ ) 3 lw $ 1 0, 2 0 ($ 1 ) In stru ctio n fe tc h In s tru ctio n d e co d e E xe c u tio n 0 M u u x 1 su b $ 1 1, $ 2, $ 3 E xe c u tio n su lw b $ $ 10 1,, 2$ 02 (, $ $ 13) M e m o ry su lw b $ $ 110 1,, 2$ 02 ( $, 1 $ ) 3 W rite b a c k IF IF /ID D ID ID D /E /E X E X /M E M M E M /W B Ad dd d 4 S hift h left 2 A dd d d A dd res sult ult P C Add d dre dre ss ss Instructio struction memo m ory ry R ea d register gister 1 R e ead ad d data ta 1 R ea d register gister 2 R Re eg gisters isters R e ead ad W rite e d data ta 2 reg gister r W rite e data a 0 M u u x 1 Ze Ze ero ro A LU A LU L U res sult ult A Addre res res ss s D ata a ta m em e em m ory o y Writer data ta R ead d da ta ta 1 M u u x S ign e ex xte ten nd d C ck lo ck 21 3 Clo lock C lo 56 ck 4 s u b $ 1 1, $ 2, $ 3 lw $ 1 0, 2 0 ($ 1 ) Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] su b $ 1 1, $ 2, $ 3 lw $ 1 0, 2 0 ( $ 1 ) 62 su b $ 1 1, $ 2, $ 3

63 Illustrating Pipeline Operation: Operation View t 0 t 1 t 2 t 3 t 4 t 5 Inst 0 Inst 1 Inst 2 Inst 3 Inst 4 IF ID IF EX ID IF MEM EX ID IF WB MEM EX ID IF WB MEM EX ID IF WB MEM EX ID IF WB MEM EX ID IF 63

64 Illustrating Pipeline Operation: Resource View t 0 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 IF I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 ID I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 EX I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 MEM I 0 I 1 I 2 I 3 I 4 I 5 I 6 I 7 WB I 0 I 1 I 2 I 3 I 4 I 5 I 6 64

65 In s tru c tio n Control Points in a Pipeline P C S rc 0 M u x 1 IF /ID ID /E X E X /M E M M E M /W B A d d 4 R e gw rite S hift le ft 2 A dd A d d res ult B ra n c h P C A d d re s s In s tru c tio n m e m o ry R e a d register 1 R e a d W rite d a ta R e a d d a ta 1 register 2 R e g is te rs R e a d W rite d a ta 2 re g is te r A LU Src 0 M u x 1 A L U Z ez ro e ro A L U re su lt A d d re s s W rite M e m W rite D a ta m e m o ry R e a d d a ta M e m to R e g 1 M u x 0 Instruction [1 5 0 ] 16 S ig n 32 e x te n d 6 A L U c on tro l d a ta M e m R e a d Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] Instruction [ ] Instruction [ ] 0 M u x 1 R e gd s t A LU O p Identical set of control points as the single-cycle datapath!! 65

66 Control Signals in a Pipeline For a given instruction same control signals as single-cycle, but control signals required at different cycles, depending on stage decode once using the same logic as single-cycle and buffer control signals until consumed W B In stru c tio n C o n tro l M W B E X M W B IF /ID ID /E X E X /M E M M E M /W B or carry relevant instruction word/field down the pipeline and decode locally within each stage (still same logic) Which one is better? 66

67 In stru ctio n M e m to R e g M e m W rite R e gw rite Pipelined Control Signals P C S rc 0 M u x 1 C o n trol ID /E X W B M E X /M E M W B M E M /W B IF /ID E X M W B A d d 4 A dd A d d re su lt S h ift le ft 2 ALU Src B ran ch P C A d d re ss In stru c tio n m e m ory R e ad register 1 R e ad W rite d a ta R ea d d a ta 1 register 2 R e g iste rs R ea d W rite d a ta 2 re g is ter 0 M u x 1 A L U Z e ro A LU re su lt A dd re ss W rite d ata D a ta m e m o ry R e a d d ata 1 M u x 0 In stru ction [1 5 0 ] S ig n ex ten d 6 A LU co n trol M e m R e ad In stru ctio n [ ] In stru ctio n [ ] 0 M u x 1 A L U O p RegD st Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 67

68 An Ideal Pipeline Goal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing) Repetition of identical operations The same operation is repeated on a large number of different inputs Repetition of independent operations No dependencies between repeated operations Uniformly partitionable suboperations Processing an be evenly divided into uniform-latency suboperations (that do not share resources) Fitting examples: automobile assembly line, doing laundry What about the instruction processing cycle? 68

69 Instruction Pipeline: Not An Ideal Pipeline Identical operations... NOT! different instructions do not need all stages - Forcing different instructions to go through the same multi-function pipe external fragmentation (some pipe stages idle for some instructions) Uniform suboperations... NOT! difficult to balance the different pipeline stages - Not all pipeline stages do the same amount of work internal fragmentation (some pipe stages are too-fast but take the same clock cycle time) Independent operations... NOT! instructions are not independent of each other - Need to detect and resolve inter-instruction dependencies to ensure the pipeline operates correctly Pipeline is not always moving (it stalls) 69

70 Issues in Pipeline Design Balancing work in pipeline stages How many stages and what is done in each stage Keeping the pipeline correct, moving, and full in the presence of events that disrupt pipeline flow Handling dependences Data Control Handling resource contention Handling long-latency (multi-cycle) operations Handling exceptions, interrupts Advanced: Improving pipeline throughput Minimizing stalls 70

71 Causes of Pipeline Stalls Resource contention Dependences (between instructions) Data Control Long-latency (multi-cycle) operations 71

72 Dependences and Their Types Also called dependency or less desirably hazard Dependencies dictate ordering requirements between instructions Two types Data dependence Control dependence Resource contention is sometimes called resource dependence However, this is not fundamental to (dictated by) program semantics, so we will treat it separately 72

73 Handling Resource Contention Happens when instructions in two pipeline stages need the same resource Solution 1: Eliminate the cause of contention Duplicate the resource or increase its throughput E.g., use separate instruction and data memories (caches) E.g., use multiple ports for memory structures Solution 2: Detect the resource contention and stall one of the contending stages Which stage do you stall? Example: What if you had a single read and write port for the register file? 73

74 Data Dependences Types of data dependences Flow dependence (true data dependence read after write) Output dependence (write after write) Anti dependence (write after read) Which ones cause stalls in a pipelined machine? For all of them, we need to ensure semantics of the program are correct Flow dependences always need to be obeyed because they constitute true dependence on a value Anti and output dependences exist due to limited number of architectural registers They are dependence on a name, not a value We will later see what we can do about them 74

75 Data Dependence Types Flow dependence r 3 r 1 op r 2 Read-after-Write r 5 r 3 op r 4 (RAW) Anti dependence r 3 r 1 op r 2 Write-after-Read r 1 r 4 op r 5 (WAR) Output-dependence r 3 r 1 op r 2 Write-after-Write r 5 r 3 op r 4 (WAW) r 3 r 6 op r 7 75

76 How to Handle Data Dependences Anti and output dependences are easier to handle write to the destination in one stage and in program order Flow dependences are more interesting Five fundamental ways of handling flow dependences 76

Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017

Design of Digital Circuits Lecture 14: Microprogramming. Prof. Onur Mutlu ETH Zurich Spring April 2017 Design of Digital Circuits Lecture 4: Microprogramming Prof. Onur Mutlu ETH Zurich Spring 27 7 April 27 Agenda for Today & Next Few Lectures! Single-cycle Microarchitectures! Multi-cycle and Microprogrammed

More information

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining

CMU Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining CMU 18-447 Introduction to Computer Architecture, Spring 2015 HW 2: ISA Tradeoffs, Microprogramming and Pipelining Instructor: Prof Onur Mutlu TAs: Rachata Ausavarungnirun, Kevin Chang, Albert Cho, Jeremie

More information

Department of Electrical and Computer Engineering The University of Texas at Austin

Department of Electrical and Computer Engineering The University of Texas at Austin Department of Electrical and Computer Engineering The University of Texas at Austin EE 360N, Fall 2004 Yale Patt, Instructor Aater Suleman, Huzefa Sanjeliwala, Dam Sunwoo, TAs Exam 1, October 6, 2004 Name:

More information

CENG3420 Lab 3-1: LC-3b Datapath

CENG3420 Lab 3-1: LC-3b Datapath CENG3420 Lab 3-1: LC-3b Datapath Bei Yu Department of Computer Science and Engineering The Chinese University of Hong Kong byu@cse.cuhk.edu.hk Spring 2018 1 / 22 Overview Introduction Lab3-1 Assignment

More information

CSE Computer Architecture I

CSE Computer Architecture I Execution Sequence Summary CSE 30321 Computer Architecture I Lecture 17 - Multi Cycle Control Michael Niemier Department of Computer Science and Engineering Step name Instruction fetch Instruction decode/register

More information

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk

ECE290 Fall 2012 Lecture 22. Dr. Zbigniew Kalbarczyk ECE290 Fall 2012 Lecture 22 Dr. Zbigniew Kalbarczyk Today LC-3 Micro-sequencer (the control store) LC-3 Micro-programmed control memory LC-3 Micro-instruction format LC -3 Micro-sequencer (the circuitry)

More information

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2 Pipelining CS 365 Lecture 12 Prof. Yih Huang CS 365 1 Traditional Execution 1 2 3 4 1 2 3 4 5 1 2 3 add ld beq CS 365 2 1 Pipelined Execution 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

More information

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture. Appendix C CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)

More information

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference) ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC

More information

CSCI-564 Advanced Computer Architecture

CSCI-564 Advanced Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA

More information

3. (2) What is the difference between fixed and hybrid instructions?

3. (2) What is the difference between fixed and hybrid instructions? 1. (2 pts) What is a "balanced" pipeline? 2. (2 pts) What are the two main ways to define performance? 3. (2) What is the difference between fixed and hybrid instructions? 4. (2 pts) Clock rates have grown

More information

Combinational vs. Sequential. Summary of Combinational Logic. Combinational device/circuit: any circuit built using the basic gates Expressed as

Combinational vs. Sequential. Summary of Combinational Logic. Combinational device/circuit: any circuit built using the basic gates Expressed as Summary of Combinational Logic : Computer Architecture I Instructor: Prof. Bhagi Narahari Dept. of Computer Science Course URL: www.seas.gwu.edu/~bhagiweb/cs3/ Combinational device/circuit: any circuit

More information

CPSC 3300 Spring 2017 Exam 2

CPSC 3300 Spring 2017 Exam 2 CPSC 3300 Spring 2017 Exam 2 Name: 1. Matching. Write the correct term from the list into each blank. (2 pts. each) structural hazard EPIC forwarding precise exception hardwired load-use data hazard VLIW

More information

4. (3) What do we mean when we say something is an N-operand machine?

4. (3) What do we mean when we say something is an N-operand machine? 1. (2) What are the two main ways to define performance? 2. (2) When dealing with control hazards, a prediction is not enough - what else is necessary in order to eliminate stalls? 3. (3) What is an "unbalanced"

More information

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. fetch decode & eg-fetch execute

More information

Simple Instruction-Pipelining. Pipelined Harvard Datapath

Simple Instruction-Pipelining. Pipelined Harvard Datapath 6.823, L8--1 Simple ruction-pipelining Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Pipelined Harvard path 6.823, L8--2. I fetch decode & eg-fetch execute memory Clock period

More information

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?

1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 2. (2 )What are the two main ways to define performance? 3. (2 )What

More information

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary? [2] What are the two main ways to define performance? [2] Predicting the direction of a branch is not enough. What else is necessary? [2] The power consumed by a chip has increased over time, but the clock

More information

Figure 4.9 MARIE s Datapath

Figure 4.9 MARIE s Datapath Term Control Word Microoperation Hardwired Control Microprogrammed Control Discussion A set of signals that executes a microoperation. A register transfer or other operation that the CPU can execute in

More information

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle

Computer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle Computer Engineering Department CC 311- Computer Architecture Chapter 4 The Processor: Datapath and Control Single Cycle Introduction The 5 classic components of a computer Processor Input Control Memory

More information

[2] Predicting the direction of a branch is not enough. What else is necessary?

[2] Predicting the direction of a branch is not enough. What else is necessary? [2] When we talk about the number of operands in an instruction (a 1-operand or a 2-operand instruction, for example), what do we mean? [2] What are the two main ways to define performance? [2] Predicting

More information

Simple Instruction-Pipelining (cont.) Pipelining Jumps

Simple Instruction-Pipelining (cont.) Pipelining Jumps 6.823, L9--1 Simple ruction-pipelining (cont.) + Interrupts Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Src1 ( j / ~j ) Src2 ( / Ind) Pipelining Jumps

More information

Project Two RISC Processor Implementation ECE 485

Project Two RISC Processor Implementation ECE 485 Project Two RISC Processor Implementation ECE 485 Chenqi Bao Peter Chinetti November 6, 2013 Instructor: Professor Borkar 1 Statement of Problem This project requires the design and test of a RISC processor

More information

L07-L09 recap: Fundamental lesson(s)!

L07-L09 recap: Fundamental lesson(s)! L7-L9 recap: Fundamental lesson(s)! Over the next 3 lectures (using the IPS ISA as context) I ll explain:! How functions are treated and processed in assembly! How system calls are enabled in assembly!

More information

A Second Datapath Example YH16

A Second Datapath Example YH16 A Second Datapath Example YH16 Lecture 09 Prof. Yih Huang S365 1 A 16-Bit Architecture: YH16 A word is 16 bit wide 32 general purpose registers, 16 bits each Like MIPS, 0 is hardwired zero. 16 bit P 16

More information

Computer Architecture ELEC2401 & ELEC3441

Computer Architecture ELEC2401 & ELEC3441 Last Time Pipeline Hazard Computer Architecture ELEC2401 & ELEC3441 Lecture 8 Pipelining (3) Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Structural Hazard Hazard Control

More information

CS 52 Computer rchitecture and Engineering Lecture 4 - Pipelining Krste sanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste! http://inst.eecs.berkeley.edu/~cs52!

More information

Processor Design & ALU Design

Processor Design & ALU Design 3/8/2 Processor Design A. Sahu CSE, IIT Guwahati Please be updated with http://jatinga.iitg.ernet.in/~asahu/c22/ Outline Components of CPU Register, Multiplexor, Decoder, / Adder, substractor, Varity of

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

Instruction register. Data. Registers. Register # Memory data register

Instruction register. Data. Registers. Register # Memory data register Where we are headed Single Cycle Problems: what if we had a more complicated instrction like floating point? wastefl of area One Soltion: se a smaller cycle time have different instrctions take different

More information

Implementing the Controller. Harvard-Style Datapath for DLX

Implementing the Controller. Harvard-Style Datapath for DLX 6.823, L6--1 Implementing the Controller Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 6.823, L6--2 Harvard-Style Datapath for DLX Src1 ( j / ~j ) Src2 ( R / RInd) RegWrite MemWrite

More information

Microprocessor Power Analysis by Labeled Simulation

Microprocessor Power Analysis by Labeled Simulation Microprocessor Power Analysis by Labeled Simulation Cheng-Ta Hsieh, Kevin Chen and Massoud Pedram University of Southern California Dept. of EE-Systems Los Angeles CA 989 Outline! Introduction! Problem

More information

UNIVERSITY OF WISCONSIN MADISON

UNIVERSITY OF WISCONSIN MADISON CS/ECE 252: INTRODUCTION TO COMPUTER ENGINEERING UNIVERSITY OF WISCONSIN MADISON Prof. Gurindar Sohi TAs: Minsub Shin, Lisa Ossian, Sujith Surendran Midterm Examination 2 In Class (50 minutes) Friday,

More information

Enrico Nardelli Logic Circuits and Computer Architecture

Enrico Nardelli Logic Circuits and Computer Architecture Enrico Nardelli Logic Circuits and Computer Architecture Appendix B The design of VS0: a very simple CPU Rev. 1.4 (2009-10) by Enrico Nardelli B - 1 Instruction set Just 4 instructions LOAD M - Copy into

More information

2

2 Computer System AA rc hh ii tec ture( 55 ) 2 INTRODUCTION ( d i f f e r e n t r e g i s t e r s, b u s e s, m i c r o o p e r a t i o n s, m a c h i n e i n s t r u c t i o n s, e t c P i p e l i n e E

More information

Computer Architecture Lecture 5: ISA Wrap-Up and Single-Cycle Microarchitectures

Computer Architecture Lecture 5: ISA Wrap-Up and Single-Cycle Microarchitectures 8-447 Compter Architectre Lectre 5: ISA Wrap-Up and Single-Cycle icroarchitectres Prof. Onr tl Carnegie ellon University Spring 22, /25/22 Homework Was de Wednesday! 34 received 2 Reminder: Homeworks for

More information

CPU DESIGN The Single-Cycle Implementation

CPU DESIGN The Single-Cycle Implementation CSE 202 Computer Organization CPU DESIGN The Single-Cycle Implementation Shakil M. Khan (adapted from Prof. H. Roumani) Dept of CS & Eng, York University Sequential vs. Combinational Circuits Digital circuits

More information

ALU A functional unit

ALU A functional unit ALU A functional unit that performs arithmetic operations such as ADD, SUB, MPY logical operations such as AND, OR, XOR, NOT on given data types: 8-,16-,32-, or 64-bit values A n-1 A n-2... A 1 A 0 B n-1

More information

Designing MIPS Processor

Designing MIPS Processor CSE 675.: Introdction to Compter Architectre Designing IPS Processor (lti-cycle) Presentation H Reading Assignment: 5.5,5.6 lti-cycle Design Principles Break p eection of each instrction into steps. The

More information

EC 413 Computer Organization

EC 413 Computer Organization EC 413 Computer Organization rithmetic Logic Unit (LU) and Register File Prof. Michel. Kinsy Computing: Computer Organization The DN of Modern Computing Computer CPU Memory System LU Register File Disks

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

State and Finite State Machines

State and Finite State Machines State and Finite State Machines See P&H Appendix C.7. C.8, C.10, C.11 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register

More information

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5.

CHAPTER log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C * 9-4.* (Errata: Delete 1 after problem number) 9-5. CHPTER 9 2008 Pearson Education, Inc. 9-. log 2 64 = 6 lines/mux or decoder 9-2.* C = C 8 V = C 8 C 7 Z = F 7 + F 6 + F 5 + F 4 + F 3 + F 2 + F + F 0 N = F 7 9-3.* = S + S = S + S S S S0 C in C 0 dder

More information

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1

Building a Computer. Quiz #2 on 10/31, open book and notes. (This is the last lecture covered) I wonder where this goes? L16- Building a Computer 1 Building a Computer I wonder where this goes? B LU MIPS Kit Quiz # on /3, open book and notes (This is the last lecture covered) Comp 4 Fall 7 /4/7 L6- Building a Computer THIS IS IT! Motivating Force

More information

CprE 281: Digital Logic

CprE 281: Digital Logic CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Simple Processor CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev Digital

More information

CMP 338: Third Class

CMP 338: Third Class CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does

More information

Computer Architecture

Computer Architecture Lecture 2: Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture CPU Evolution What is? 2 Outline Measurements and metrics : Performance, Cost, Dependability, Power Guidelines

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines 1 Clocks A microprocessor is composed of many different circuits that are operating simultaneously if each

More information

State & Finite State Machines

State & Finite State Machines State & Finite State Machines Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H Appendix C.7. C.8, C.10, C.11 Big Picture: Building a Processor memory inst register file

More information

EE 660: Computer Architecture Out-of-Order Processors

EE 660: Computer Architecture Out-of-Order Processors EE 660: Computer Architecture Out-of-Order Processors Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David entzlaff Agenda I4 Processors I2O2

More information

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati

Design. Dr. A. Sahu. Indian Institute of Technology Guwahati CS222: Processor Design: Multi Cycle Design Dr. A. Sahu Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati Mid Semester Exam Multi Cycle design Outline Clock periods in single cycle and

More information

COVER SHEET: Problem#: Points

COVER SHEET: Problem#: Points EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)

More information

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1

Computer Architecture. ECE 361 Lecture 5: The Design Process & ALU Design. 361 design.1 Computer Architecture ECE 361 Lecture 5: The Design Process & Design 361 design.1 Quick Review of Last Lecture 361 design.2 MIPS ISA Design Objectives and Implications Support general OS and C- style language

More information

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary EECS50 - Digital Design Lecture - Shifters & Counters February 24, 2003 John Wawrzynek Spring 2005 EECS50 - Lec-counters Page Register Summary All registers (this semester) based on Flip-flops: q 3 q 2

More information

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control The MIPS Pipeline CSCI206 - Computer Organization & Programming Pipeline Datapath and Control zybook: 11.6 Developed and maintained by the Bucknell University Computer Science Department - 2017 Hazard

More information

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #4 MIPS Assembly and Simulation

More information

A L A BA M A L A W R E V IE W

A L A BA M A L A W R E V IE W A L A BA M A L A W R E V IE W Volume 52 Fall 2000 Number 1 B E F O R E D I S A B I L I T Y C I V I L R I G HT S : C I V I L W A R P E N S I O N S A N D TH E P O L I T I C S O F D I S A B I L I T Y I N

More information

A crash course in Digital Logic

A crash course in Digital Logic crash course in Digital Logic Computer rchitecture 1DT016 distance Fall 2017 http://xyx.se/1dt016/index.php Per Foyer Mail: per.foyer@it.uu.se Per.Foyer@it.uu.se 2017 1 We start from here Gates Flip-flops

More information

Topics: A multiple cycle implementation. Distributed Notes

Topics: A multiple cycle implementation. Distributed Notes COSC 22: Compter Organization Instrctor: Dr. Amir Asif Department of Compter Science York University Handot # lticycle Implementation of a IPS Processor Topics: A mltiple cycle implementation Distribted

More information

ENEE350 Lecture Notes-Weeks 14 and 15

ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining & Amdahl s Law ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining is a method of processing in which a problem is divided into a number of sub problems and solved and the solu8ons of the sub problems

More information

Review: Single-Cycle Processor. Limits on cycle time

Review: Single-Cycle Processor. Limits on cycle time Review: Single-Cycle Processor Jump 3:26 5: MemtoReg Control Unit LUControl 2: Op Funct LUSrc RegDst PCJump PC 4 uction + PCPlus4 25:2 2:6 2:6 5: 5: 2 3 WriteReg 4: Src Src LU Zero LUResult Write + PC

More information

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 9: Control Hazard and Resolution. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lectre 9: Control Hazard and Resoltion James C. Hoe Department of ECE Carnegie ellon University 18 447 S18 L09 S1, James C. Hoe, CU/ECE/CALC, 2018 Yor goal today Hosekeeping simple control flow

More information

Fall 2011 Prof. Hyesoon Kim

Fall 2011 Prof. Hyesoon Kim Fall 2011 Prof. Hyesoon Kim Add: 2 cycles FE_stage add r1, r2, r3 FE L ID L EX L MEM L WB L add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub add add mul sub sub add add mul sub sub add

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19

More information

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions

Lecture 12: Pipelined Implementations: Control Hazards and Resolutions 18-447 Lectre 12: Pipelined Implementations: Control Hazards and Resoltions S 09 L12-1 James C. Hoe Dept of ECE, CU arch 2, 2009 Annoncements: Spring break net week!! Project 2 de the week after spring

More information

Spiral 1 / Unit 3

Spiral 1 / Unit 3 -3. Spiral / Unit 3 Minterm and Maxterms Canonical Sums and Products 2- and 3-Variable Boolean Algebra Theorems DeMorgan's Theorem Function Synthesis use Canonical Sums/Products -3.2 Outcomes I know the

More information

Arithmetic and Logic Unit First Part

Arithmetic and Logic Unit First Part Arithmetic and Logic Unit First Part Arquitectura de Computadoras Arturo Díaz D PérezP Centro de Investigación n y de Estudios Avanzados del IPN adiaz@cinvestav.mx Arquitectura de Computadoras ALU1-1 Typical

More information

Lecture 13: Sequential Circuits, FSM

Lecture 13: Sequential Circuits, FSM Lecture 13: Sequential Circuits, FSM Today s topics: Sequential circuits Finite state machines Reminder: midterm on Tue 2/28 will cover Chapters 1-3, App A, B if you understand all slides, assignments,

More information

Numbers and Arithmetic

Numbers and Arithmetic Numbers and Arithmetic See: P&H Chapter 2.4 2.6, 3.2, C.5 C.6 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register file alu

More information

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Performance, Power & Energy ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So Recall: Goal of this class Performance Reconfiguration Power/ Energy H. So, Sp10 Lecture 3 - ELEC8106/6102 2 PERFORMANCE EVALUATION

More information

CPE100: Digital Logic Design I

CPE100: Digital Logic Design I Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu CPE100: Digital Logic Design I Final Review http://www.ee.unlv.edu/~b1morris/cpe100/ 2 Logistics Tuesday Dec 12 th 13:00-15:00 (1-3pm) 2 hour

More information

Designing Sequential Logic Circuits

Designing Sequential Logic Circuits igital Integrated Circuits (83-313) Lecture 5: esigning Sequential Logic Circuits Semester B, 2016-17 Lecturer: r. Adam Teman TAs: Itamar Levi, Robert Giterman 26 April 2017 isclaimer: This course was

More information

ECE 2300 Digital Logic & Computer Organization

ECE 2300 Digital Logic & Computer Organization ECE 2300 Digital Logic & Computer Organization pring 201 More inary rithmetic LU 1 nnouncements Lab 4 prelab () due tomorrow Lab 5 to be released tonight 2 Example: Fixed ize 2 C ddition White stone =

More information

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control Logic and Computer Design Fundamentals Chapter 8 Sequencing and Control Datapath and Control Datapath - performs data transfer and processing operations Control Unit - Determines enabling and sequencing

More information

Chapter 3 Digital Logic Structures

Chapter 3 Digital Logic Structures Chapter 3 Digital Logic Structures Original slides from Gregory Byrd, North Carolina State University Modified by C. Wilcox, M. Strout, Y. Malaiya Colorado State University Computing Layers Problems Algorithms

More information

Lecture 3, Performance

Lecture 3, Performance Lecture 3, Performance Repeating some definitions: CPI Clocks Per Instruction MHz megahertz, millions of cycles per second MIPS Millions of Instructions Per Second = MHz / CPI MOPS Millions of Operations

More information

Unit 6: Branch Prediction

Unit 6: Branch Prediction CIS 501: Computer Architecture Unit 6: Branch Prediction Slides developed by Joe Devie/, Milo Mar4n & Amir Roth at Upenn with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi,

More information

Review. Combined Datapath

Review. Combined Datapath Review Topics:. A single cycle implementation 2. State Diagrams. A mltiple cycle implementation COSC 22: Compter Organization Instrctor: Dr. Amir Asif Department of Compter Science York University Handot

More information

Chapter 7: Digital Components. Oregon State University School of Electrical Engineering and Computer Science. Review basic digital design concepts:

Chapter 7: Digital Components. Oregon State University School of Electrical Engineering and Computer Science. Review basic digital design concepts: hapter 7: igital omponents Prof. en Lee Oregon tate University chool of Electrical Engineering and omputer cience hapter Goals Review basic digital design concepts: esigning basic digital components using

More information

From Sequential Circuits to Real Computers

From Sequential Circuits to Real Computers 1 / 36 From Sequential Circuits to Real Computers Lecturer: Guillaume Beslon Original Author: Lionel Morel Computer Science and Information Technologies - INSA Lyon Fall 2017 2 / 36 Introduction What we

More information

Lecture: Pipelining Basics

Lecture: Pipelining Basics Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4:

More information

Designing Single-Cycle MIPS Processor

Designing Single-Cycle MIPS Processor CSE 32: Introdction to Compter Architectre Designing Single-Cycle IPS Processor Presentation G Stdy:.-. Gojko Babić 2/9/28 Introdction We're now ready to look at an implementation of the system that incldes

More information

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec09 Counters Outline.

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec09 Counters Outline. Review: Designing with FSM EECS 150 - Components and Design Techniques for Digital Systems Lec09 Counters 9-28-04 David Culler Electrical Engineering and Computer Sciences University of California, Berkeley

More information

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018

ECE 172 Digital Systems. Chapter 12 Instruction Pipelining. Herbert G. Mayer, PSU Status 7/20/2018 ECE 172 Digital Systems Chapter 12 Instruction Pipelining Herbert G. Mayer, PSU Status 7/20/2018 1 Syllabus l Scheduling on Pipelined Architecture l Idealized Pipeline l Goal of Scheduling l Causes for

More information

CPS 104 Computer Organization and Programming Lecture 11: Gates, Buses, Latches. Robert Wagner

CPS 104 Computer Organization and Programming Lecture 11: Gates, Buses, Latches. Robert Wagner CPS 4 Computer Organization and Programming Lecture : Gates, Buses, Latches. Robert Wagner CPS4 GBL. RW Fall 2 Overview of Today s Lecture: The MIPS ALU Shifter The Tristate driver Bus Interconnections

More information

TEST 1 REVIEW. Lectures 1-5

TEST 1 REVIEW. Lectures 1-5 TEST 1 REVIEW Lectures 1-5 REVIEW Test 1 will cover lectures 1-5. There are 10 questions in total with the last being a bonus question. The questions take the form of short answers (where you are expected

More information

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures

Verilog HDL:Digital Design and Modeling. Chapter 11. Additional Design Examples. Additional Figures Chapter Additional Design Examples Verilog HDL:Digital Design and Modeling Chapter Additional Design Examples Additional Figures Chapter Additional Design Examples 2 Page 62 a b y y 2 y 3 c d e f Figure

More information

Lecture 3, Performance

Lecture 3, Performance Repeating some definitions: Lecture 3, Performance CPI MHz MIPS MOPS Clocks Per Instruction megahertz, millions of cycles per second Millions of Instructions Per Second = MHz / CPI Millions of Operations

More information

EECS 270 Midterm 2 Exam Answer Key Winter 2017

EECS 270 Midterm 2 Exam Answer Key Winter 2017 EES 270 Midterm 2 Exam nswer Key Winter 2017 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. NOTES: 1. This part of the exam

More information

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide) Out-of-order Pipeline Buffer of instructions Issue = Select + Wakeup Select N oldest, read instructions N=, xor N=, xor and sub Note: ma have execution resource constraints: i.e., load/store/fp Fetch Decode

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

Numbers and Arithmetic

Numbers and Arithmetic Numbers and Arithmetic See: P&H Chapter 2.4 2.6, 3.2, C.5 C.6 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register file alu

More information

/ : Computer Architecture and Design

/ : Computer Architecture and Design 16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)

More information

Spiral 2-1. Datapath Components: Counters Adders Design Example: Crosswalk Controller

Spiral 2-1. Datapath Components: Counters Adders Design Example: Crosswalk Controller 2-. piral 2- Datapath Components: Counters s Design Example: Crosswalk Controller 2-.2 piral Content Mapping piral Theory Combinational Design equential Design ystem Level Design Implementation and Tools

More information

Introduction to CMOS VLSI Design Lecture 1: Introduction

Introduction to CMOS VLSI Design Lecture 1: Introduction Introduction to CMOS VLSI Design Lecture 1: Introduction David Harris, Harvey Mudd College Kartik Mohanram and Steven Levitan University of Pittsburgh Introduction Integrated circuits: many transistors

More information

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory

More information

Chapter 8: Atmel s AVR 8-bit Microcontroller Part 3 Microarchitecture. Oregon State University School of Electrical Engineering and Computer Science

Chapter 8: Atmel s AVR 8-bit Microcontroller Part 3 Microarchitecture. Oregon State University School of Electrical Engineering and Computer Science Chapter : tmel s VR -bit Microcontroller Part 3 Microarchitecture Prof. en Lee Oregon State University School of Electrical Engineering and Computer Science Chapter Goals Understand what basic components

More information

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I. Last (family) name: Solution First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE/CS 752 Advanced Computer Architecture I Midterm

More information

Chapter 7. Sequential Circuits Registers, Counters, RAM

Chapter 7. Sequential Circuits Registers, Counters, RAM Chapter 7. Sequential Circuits Registers, Counters, RAM Register - a group of binary storage elements suitable for holding binary info A group of FFs constitutes a register Commonly used as temporary storage

More information

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it...

Control. Control. the ALU. ALU control signals 11/4/14. Next: control. We built the instrument. Now we read music and play it... // CS 2, Fall 2! CS 2, Fall 2! We built the instrument. Now we read music and play it... A simple implementa/on uc/on uct r r 2 Write r Src Src Extend 6 Mem Next: path 7-2 CS 2, Fall 2! signals CS 2, Fall

More information