Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2

Size: px

Start display at page:

Download "Pipelining. Traditional Execution. CS 365 Lecture 12 Prof. Yih Huang. add ld beq CS CS 365 2"

Phoebe Dorsey
5 years ago
Views:

1 Pipelining CS 365 Lecture 12 Prof. Yih Huang CS Traditional Execution add ld beq CS

2 Pipelined Execution CS Basic Ideas Do not wait for an instruction to complete to start the next. Start the Cycle 0 of the next instruction when the previous one enters Cycle 1. Instruction executions are overlapped. Pipelining increases instruction throughput, as opposed to decreasing the execution time of individual instructions. CS

3 Easier Said Than Done? In every cycle, activities of all five stages take place. Many problems arises with overlapped executions. Structural hazards Control hazards Data hazards CS Pipeline Hazards I Structural Hazards The data path cannot support the combination of instructions that we want to execute in the same cycle Consider what happens when an R-type instruction is followed by a BEQ? add beq RR + RW RR CS

4 Summary of MIPS Lite Instruction Executions Step name Instruction fetch Instruction decode/register fetch Action for R-type instructions Action for memory-reference Action for instructions branches IR = Memory[PC] PC = PC + 4 A = Reg [IR[25-21]] B = Reg [IR[20-16]] ALUOut = PC + (sign-extend (IR[15-0]) << 2) Action for jumps Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2) jump completion Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut] completion ALUOut or Store: Memory [ALUOut] = B Memory read completion Load: Reg[IR[20-16]] = MDR CS Resource Conflicts on the Multicycle Datapath Io rd M e m R e a d M e m W r i te IR W r it e R e g D s t R e g W r ite A L U S rc A P C 0 M u x 1 A d d r e s s M e m o r y M e m D a ta W rite d a t a In s tru c tio n [ ] In s tru c tio n [ ] In s tru c tio n [1 5 0 ] I n s tr u c tio n re g is te r 0 M I n s tr u c tio n u [ ] x 1 R e a d r e g i s te r 1 R e a d R e a d d a t a 1 r e g i s te r 2 R e g is te r s W r ite R e a d r e g i s te r d a ta 2 W r ite d a t a A B 4 0 M u x M u 2 x Z e ro A L U A L U r e s u l t A L U O u t In s tr u c tio n 0 3 [1 5 0 ] M u x M e m o ry 1 d a ta r e g is te r 1 6 S ig n e x t e n d 3 2 S h ift le ft 2 A L U c o n t ro l In s tr u c tio n [5 0 ] M e m to R e g CS A L U S r c B A L U O p 4

5 Pipeline Hazards II Control Hazards: When we decide to branch, other instructions are in the pipeline! beq add sub RR RR + RW RR + RW Target: ld RR + MR RW Which one is the next? CS Pipeline Hazards III Data Hazards: (data dependencies) an instruction depends on the result of a previous instruction still in the pipeline. Writing new value of r1 Add r1, r2, r3 Sub r4, r1, r10 RR + RW RR Reading new value of r1, not available yet CS

6 Lessons To achieve pipelining and avoid hazards, we need to redesign the datapath and instruction execution steps, CS Pipeline Stages R-Type LD ST BEQ J Stage 1: IR Mem[PC] PC PC + 4 Stage 2: RR Read Reg[rs] and Reg[rt] Stage 3: EX (use ALU) RS op RT Calculate RS+Immd Calculate RS+Immd Calculate PC + Immd Compare RS and RT Set PC to Immd Stage 4: DM Read memory Write memory Set PC accordingly Stage 5: RW/WB Write to Rd Write to Rt CS

7 Graphically Representing Pipelines T i m e ( i n c l o c k c y c l e s ) P r o g r a m e x e c u t io n C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 o r d e r ( i n i n s t r u c t i o n s ) l w $ 1 0, 2 0 ( $ 1 ) I M R e g A L U D M R e g s u b $ 1 1, $ 2, $ 3 I M R e g A L U D M R e g CS M u x Solving Structural Hazards: Pipelined Datapath 1 I F /I D I D /E X E X / M E M M E M / W B A d d 4 A d d A d d r e s u lt S h if t l e ft 2 R e a d P C A d d r e s s I n s tr u c t io n m e m o r y I n s t r u c t i o n r e g is t e r 1 R e a d d a t a 1 R e a d r e g is t e r 2 R e g i s te r s R e a d W r it e d a t a 2 r e g is t e r W r it e d a t a 0 M u x 1 Z e r o A L U A L U r e s u lt A d d re s s D a t a m e m o r y W r i te R e a d d a t a 1 M u x 0 d a t a 1 6 S ig n e x t e n d 3 2 CS

8 Discussions Make sure you understand why two extra adders are added. Is the assumption of using separate instruction and data memory reasonable? CS Consider Control Hazard beq Target: When is the decision made? CS

9 Solution 1: Stalling The Pipeline OR Decision is in CS Solution 2: Branch Prediction Make a guess about the branch decision and start execute the guessed path before the decision is in (aka speculative execution). If guess was wrong, abandon those in the pipeline and jump to the right target. Branch Prediction Strategies Predict branch fails Predict branch succeeds Look into history CS

10 Predict Branch Fails We guess the branch will fail. That is, the next will fetch the next sequential execution. If guess is right, just proceed. If guess is wrong, abandon sequential instructions and fetch the instruction from the target address. CS beq Predict branch fails: Guess Is Right Following instructions Decision is in; Do not branch CS

11 beq Following instructions RR EX Target instructions RR Predict branch fails: Guess Is Wrong CS Decision is in: Do branch Discussion Notice that programmers are not aware of branch predictions; right or wrong guesses affect only performance. Delayed Branches: Make it official that CPU always executes the instr following a branch. The branch determines the next next instruction. Notice the programmer awareness CS

12 Example BEQ r1, r2, target add r10, r11, r12 add r20, r21, r22 sub r30, r10, r20 Delay slot, executed regardless of the branch decision Target: sub r30, r10, r20 CS Delayed Branch with Wrong Guess beq Following instructions RR EX Target instructions RR DM RW CS Decision is in: Do branch Delay slot is always finished 12

13 Discussions Compilers/programmers must be smart enough to make good use of the delay slots. The problem is not entirely solved: We still need to stall before the decision comes in. The situations are exacerbated by deeper pipelining. Branch predictions are still important. CS Smart Branch Predictions Observations Branches of if-else statements are hard to predict. Branches of loops typically repeat previous decisions. For performance, loops are more important than other control structures. Many modern processors use specialpurpose hardware to remember the targets of recent branch instructions. CS

14 Recall Data Hazards Sub $2, $1, $3 And $12, $2, $5 Or $13, $6, $2 Add $14, $2, $2 Sw $15, 100($2) $2 available CS Solution 1: Stalling Postpone subsequent instructions until data is available Simple but inefficient Sub RR EX RW And RR EX RW Or RR EX RW Add RR EX Sw RR EX RW RW CS

15 Solution 2: Internal Forwarding New value of $2 is available after the EX of Sub, but not in $2 yet Use special circuits to forward the new value to subsequent instructions Sub r2, r1, r3 Sub RR EX RW And r12, r2, r5 And RR EX RW Or r13, r6, r2 Or RR EX RW Add r14, r2, r2 Add RR EX RW Sw r15, 100(r2) Sw RR EX DM CS Types of Forwarding ALU forwarding: forward an ALU output to subsequent instructions This is the case we have seen Memory Forwarding: forward a memory output (ld result) to subsequent instructions. CS

16 Memory Forwarding Ld $1, 4($2) ld Or $10, $1, $3 Or RR EX RW Sub $20, $1, $10 Sub RR EX RW Notice that we still have to stall for one cycle CS Delayed Loads Make it official in the ISA that the result of a load will not be available to the next instruction. Have the compiler find something useful to do in load slots. (by reordering) CS

17 MIPS Solution Delayed result with internal forwarding. Ld Or RR EX RW Sub RR EX RW Nop or something useful CS Summary Three types of pipeline hazards Structural hazards Control hazards Data hazards Compilers/programmers could reorder the code to avoid hazards and eliminate bubbles. Ideal cases: no bubbles; one instr per cycle. Stalling (bubbles) is the last resort but must be supported by hardware. CS

18 Registers Exercise: Code Reordering for (i=0; i<n; i++) z[i] = x[i] + y[i]; r1 points to x[i] r2 points to y[i] r3 points to z[i] r4 holds x[i] r5 holds y[i] r6 holds z[i] r7 holds i r8 holds N CS loop: lw lw r4, 0(r1) r5, 0(r2) add r6, r4, r5 sw r6, 0(r3) addi r1,r1,4 addi r2,r2,4 addi r3,r3,4 addi r7,r7,1 bne r7,r8, loop CS

19 CS loop: lw r4, 0(r1) lw r5, 0(r2) exit: add r6, r4, r5 sw r6, 0(r3) addi r7,r7,1 beq r7,r8, exit lw r4, 4(r1) lw r5, 4(r2) add r6, r4, r5 sw r6, 4(r3) addi r1,r1,8 addi r2,r2,8 addi r3,r3,8 addi r7,r7,1 bne r7,r8, loop Loop Unrolling 19

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control

EXAMPLES 4/12/2018. The MIPS Pipeline. Hazard Summary. Show the pipeline diagram. Show the pipeline diagram. Pipeline Datapath and Control The MIPS Pipeline CSCI206 - Computer Organization & Programming Pipeline Datapath and Control zybook: 11.6 Developed and maintained by the Bucknell University Computer Science Department - 2017 Hazard