ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)

Size: px

Start display at page:

Download "ECE 3401 Lecture 23. Pipeline Design. State Table for 2-Cycle Instructions. Control Unit. ISA: Instruction Specifications (for reference)"

Kathleen Fields
6 years ago
Views:

ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0

1 ECE 3401 Lecture 23 Pipeline Design Control State Register Combinational Control Logic New/ Modified Control Word ISA: Instruction Specifications (for reference) P C P C + 1 I N F I R M [ P C ] E X 0 PC<-PC+1 I n st ruction Speci fications for the Simple Comput er - Part 1 Instr u ctio n O pc ode Mnem on ic Form a t D escrip tion St a t u s Bits Move A MO V A RD,RA R [DR] R[SA ] N, Z Increment INC R D, RA R[DR] R [ SA] + 1 N, Z Add ADD R D, RA,RB R [DR] R[SA ] + R[ SB] N, Z Subtr a ct SUB R D, RA,RB R [DR] R[SA ] - [ SB] N, Z D e crement DEC R D, RA R[DR] R[SA ] - R 1 N, Z AND AND R D, RA,RB R [DR] R[SA ] R[SB ] N, Z O R OR RD,RA,RB R[DR] R[SA ] R[SB ] N, Z Exclusive OR XOR R D, RA,RB R [DR] R[SA ] R[SB] N, Z NO T NO T R D, RA R[DR] R[SA ] N, Z I n st ruction Speci fications for the Simple Comput er - Part 2 Instr u ctio n O pc ode Mnem on ic Form a t D escrip tion Move B MO VB RD,RB R [DR] R[SB] Shift Right SHR R D, RB R[DR] sr R[SB] Shift Left SHL R D, RB R[DR] sl R[SB] Load Imm e diate LDI R D, O P R[DR] zf OP Add Immediate ADI R D, RA,OP R [DR] R[SA] + zf OP Load LD RD,RA R [DR] M[ SA ] Store ST RA,RB M [SA] R[SB] Branch on Zero BRZ R A,AD if (R[ S A] = 0) PC PC + s e A D Branch on Negative BRN R A,AD if (R[ S A] < 0) PC PC + s e A D J u mp JMP R A P C R[SA ] St a t u s Bits + R [ S B ] 1 v R [ S B ] R [ S B ] + 1 R [ S B ] + R [ S B ] O p c o d e R [ D R ] [ R [ S A ] ] M R [ D R ] Z N zf OP 1 1 M PC R [ D R ] R [ S B ] [ R [ S A ] ] PC + se AD PC R [ S A ] R [ S B ] R [ D R ] R [ S A ] + zf OP To INF State Table for 2-Cycle Instructions Control Unit I n p u t s O u t p u t s N e x t S t a t e s t a t e I P M M R M M O p c o d e V C N Z L S D X A X B X B F S D W M W C o m m e n t s I N F X X X X X X EX X X X X X X X X X X I R M [ PC ] E X X X INF X X 0 X X X X X X 0 M O V A R [DR] R [SA]* E X X X INF X X 0 X X X X X X 0 I N C R [DR] R [S A ] + 1* E X X X INF X X 0 X X 0 X X X 0 A D D R [DR] R [S A ] + R [S B ]* E X X X INF X X 0 X X 0 X X X 0 S U B R [DR} R [S A ] + R [ S B ] + 1* E X X X INF X X 0 X X X X X X 0 D E C R [DR] R [S A ] + (- 1) * E X X X INF X X 0 X X 0 X X X 0 A N D R [DR] R [SA] ^ R [S B ]* E X X X INF X X 0 X X 0 X X X 0 O R R [DR] R [SA] v R [S B ]* E X X X INF X X 0 X X 0 X X X 0 X O R R [DR] R [SA] + R [S B ]* E X X X INF X X 0 X X X X X X 0 N O T R [DR] R [ S A ] * E X X X INF X X X X 0 X X X 0 M O V B R [DR] R [S B ]* E X X X INF X X 0 X X X X X X X L D R [DR] M [ R [SA]]* E X X X INF 0 01 X X 0 X X 0 X X 0 X X X S T M [ R [SA]] R [S B ]* E X X X INF X X X X X X LDI R [DR] z f OP * E X X X INF X X 0 X X X X ADI R [DR] R [S A ] + z f OP * E X X X 1 I N F 0 10 X X 0 X X X X X 0000 X B R Z PC PC + s e A D E X X X 0 I N F 0 01 X X 0 X X X X X 0000 X B R Z PC PC + 1 E X X 1 X INF 0 10 X X 0 X X X X X 0000 X B R N PC PC + s e A D E X X 0 X INF 0 01 X X 0 X X X X X 0000 X B R N PC PC + 1 E X X X INF 0 11 X X 0 X X X X X 0000 X J M P PC R [S A ] * For this state and input combinations, PC PC+1 also occurs controller library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; -- Uncomment the following lines to use the declarations that are -- provided for instantiating Xilinx primitive components. --library UNISIM; --use UNISIM.VComponents.all; entity controller is Port (clk : in std_logic; opcode : in std_logic_vector(6 downto 0); reset : in std_logic; carry : in std_logic; neg : in std_logic; zero : in std_logic; overflw : in std_logic; IL : out std_logic; PS : out std_logic_vector(1 downto 0); DX : out std_logic_vector(3 downto 0); AX : out std_logic_vector(3 downto 0); BX : out std_logic_vector(3 downto 0); FS : out std_logic_vector(3 downto 0); MB : out std_logic; MD : out std_logic; RW : out std_logic; MM: out std_logic; MW: out std_logic) end controller; architecture Behavioral of controller is type state_type is (RES, INF, EX0); attribute enum_encoding : string; attribute enum_encoding of state_type : type is "; signal cur_state, next_state : state_type; begin state_register: process(clk, reset) begin if(reset='1') then cur_state<=res; elsif (clk'event and clk='1') then cur_state<=next_state; end if; end process; out_func: process (cur_state, opcode, carry, zero, neg, overflw ) begin (IL, PS, MB, MD, RW, MM, MW) <= std_logic_vector'( "); (DX, AX, BX)<=std_logic_vector (X 000 ); FS<= 0000"; case cur_state is when RES => next_state <= INF; when INF => next_state<=ex0; MM <= 1 ; IL <= 1 ; when EX0 => next_state<= INF; case opcode is when " => PS <= 01 ; RW <= 1 ; when => PS <= 01 ; RW <= 1 ; FS <= 0001 ;.. when => if (zero = 1 ) then PS <= 10 ; else PS <= 01 ; end if;.. when others=> report "Unrecognizable state" severity error; end case; end case; end process; end Behavioral; 1

2 Outline for Pipelined Design Abstract View of Critical Path Pipelined Design Basic 5-stage pipe Speedup of pipelined vs non-pipelined implementations Pipeline hazards Structural, data, control Parallel digital systems 7 8 Pipelined critical path Steps in Instruction Processing Critical path is longest path between stage registers 9 10 Un-pipelined (Non-overlapped) Implementation Pipelined Implementation Consider loads with DF stage

5-stage Pipeline Pipelining Lessons CPU stages IF: Instruction fetch DR: Instruction decode & Register read E: Execute DF: Data fetch( Memory load/store) W: Write Back Regs Another set of mnemonic

3 5-stage Pipeline Pipelining Lessons CPU stages IF: Instruction fetch DR: Instruction decode & Register read E: Execute DF: Data fetch( Memory load/store) W: Write Back Regs Another set of mnemonic names IF, ID, E, MEM, WB 13 Pipelining doesn t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stages Potential speedup = number of pipe stages Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup 14 Computer Pipelines Execute billions of instruction, so throughput is what matters Throughput versus latency + Throughput increases - :Latency for a single instruction increases May have to wait longer for single instruction to complete Allows much faster clock cycle RISC pipeline architecture features: All instructions same length Registers located in same place in instruction format Memory operands only in loads and stores Pipelining Every clock cycle requires New instruction fetch New ALU operation New data word to/from memory Memory Requirements Faster memory (5x faster) Separate instruction and data paths Better caches Unpipelined Datapath Pipelined Datapath MAR MDR

4 Outline Pipelined Design Basic 5-stage pipe Speedup of pipelined vs. non-pipelined implementations Pipeline hazards Structural, data, control Parallel digital systems Pipelining Hazards Hazards cause the pipe to stall because of some conflict in the pipe (prevents the next instruction in pipe from executing in its turn) Types of hazards Structural: contention for same hardware resource Data: dependency on earlier instruction for the correct sequencing of register reads and writes Control: branch/jump instructions stall the pipe until get correct target address into PC Structural Hazards Structural Hazards Resource conflicts in the pipeline Examples Single memory port shared for instruction and data access Register file without a separate write port STALL load sub and or Structural Hazards IF and DF compete for single memory port Ideal Machine No stalls, 1 cycle per instruction Assume 30% of instructions access data With structural hazard, 1.3 cycles per instruction Performance has gone down by 30% Solutions: Pipeline stall (insert bubble) Have 2 memory ports for shared instruction-data cache-memory (expensive) Have separate instruction cache-memory and data cache-memory Three Generic Data Hazards (I) - RAW Instr 1 followed by Instr 2 add r1, r3, r2 add r4, r5, r1 Instr 2 tries to read operand before Instr 1 writes it Can be due to true data dependency (data must be produced before it can be consumed) Or can be due to pipeline staging (data already produced, but not yet written to general register file

5 Data Hazards (II) - WAR Data Hazards (III) - WAW Instr 1 followed by Instr 2 ld r1, (r3)+ add r3, r4, r1 Instr 2 tries to write operand before Instr 1 reads it Instr 1 gets wrong operand Can t happen in the 5-stage RISC pipeline we just covered All instruction take 5 stages Reads are always in stage 2 Writes are always in stage 5 Instr 1 followed by Instr 2 mul r1, r0, r2 add r1, r5, r6 Instr 2 tries to write operand before Instr 1 writes it Leaves wrong result (Instr 1, not Instr 2 ) Can t happen in our 5-stage pipeline because All instructions take 5 stages Writes are always in stage Data Hazards Overlapping instructions cause dependencies on data (RAW) e.g., R 1 R 5 MOVA R1, R5 ADD R3, R1, R2 R 2 R 1 + R 6 R 3 R 1 + R Write R 1 Write R 2 Data Hazards Remedy - SW Software delay (compiler or machine code programming to insert s) MOVA R1, R5 ADD R3, R1, R Data Hazards Remedy - HW Data Forwarding (Reg. Bypassing) Hardware stalls Hazard detection MOVA R 1, R 5 IF DR IF Hardware Data Forwarding Add an extra path connecting ALU outputs to ALU inputs on the next clock

6 Pipelined Datapath with data forwarding 31 6

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)