Topics: A multiple cycle implementation. Distributed Notes

Size: px

Start display at page:

Download "Topics: A multiple cycle implementation. Distributed Notes"

Angelica Baldwin
6 years ago
Views:

1 COSC 22: Compter Organization Instrctor: Dr. Amir Asif Department of Compter Science York University Handot # lticycle Implementation of a IPS Processor Topics: A mltiple cycle implementation Distribted Notes

2 Why lticycle? Eample: Assme that the operation times for major fnctional nit in a microprocessor are: emory nit ~ 2ns, ALU and adders ~ 2ns, Register file ~ ns Compare the performance of the following instrction mi Loads: 24%; Stores: 2%; ALU instrctions: 44%; Branches: 8%; Jmps: 2% on the two implementations Implementation I: Each instrction operates in clock cycle Implementation II: Each instrction is as long as it needs to be. Instrction Class Fnctional nits sed (Steps involved) ALU type Instrction fetch Register Access ALU Register Access 6ns Load word Instrction fetch Register Access ALU emory Access Register Access 8ns Store word Instrction fetch Register Access ALU emory Access 7ns Branch Instrction fetch Register Access ALU 5ns Branch Instrction fetch 2ns Average time per instrction: Implementation : ~ 8ns Implementation 2: ~.24(8)+.2(7)+.44(6)+.8(5)+.2(2) = 6.34ns 2

3 lticycle Implementation Instrction: Eection of each instrction is broken into different steps Each step reqires clock cycle Each instrction takes mltiple clock cycles Fnctional Unit: Can be sed more than once in an instrction (bt still only once in a clock cycle) Advantages: Fnctional nits can be shared ALU and adder is combined Single memory is sed for instrctions and data 3

4 lticycle Implementation: Abstract Diagram P C A d d r e s s e m o r y D a t a o r e m o r y D a t a R e g i s t e r # R e g i s t e r s R e g i s t e r # R e g i s t e r # A B A L U A L U O t One ALU is sed for incrementing PC and for arithmetic operations Data memory and Instrction memory are combined 5 additional registers are added. An instrction register (IR) to hold instrctions before distribting data to register file or ALU 2. A memory data register (DR) to hold data before distribting to register file or ALU 3. Regsiters A and B that hold data before the ALU 4. Register ALUot that hold data compted by ALU 4

5 lticycle Implementation: ltipleers added P C A d d r e s s e m o r y e m D a t a [ ] [ 2 6 ] [ 5 ] [ 5 ] e m o r y [ 5 ] 6 2 R e g i s t e r s S i g n e t e n d S h i f t l e f t 2 A B Z e r o A L U A L U r e s l t A L U O t Becase fnctional nits are shared, mltipleers are added to select data between different devices. UX before memory selects either the PC otpt (fetch instrction) or ALU otpt (storing data) 2. UX before write register selects write-register nmber (instrction [5-] or instrction[2-6]) 3. UX before write data selects data from ALUOt (R-type instrction) or emdata (lw instrction) 4. Upper UX before ALU selects PC otpt (increment PC) or Read data (R-type instrction) 5. Lower UX before ALU selects Read data 2, or sign etended instrction[5-] or shift left sign etended instrction[5-], or 4 5

6 lticycle Implementation: Controls added I o r D e m e m I R R e g D s t R e g A L U S r c A P C A d d r e s s e m o r y e m D a t a [ ] [ 2 6 ] [ 5 ] [ 5 ] e m o r y [ 5 ] 6 2 R e g i s t e r s S i g n e t e n d S h i f t l e f t 2 A B A L U c o n t r o l Z e r o A L U A L U r e s l t A L U O t [ 5 ] e m t o R e g A L U S r c B A L U O p Becase fnctional nits are shared, mltipleers are added to select data between different devices. UX before memory selects either the PC otpt (fetch instrction) or ALU otpt (storing data) 2. UX before write register selects write-register nmber (instrction [5-] or instrction[2-6]) 6

7 lticycle Implementation: Control Units added P C A d d r e s s e m o r y e m D a t a [ ] [ 2 6 ] [ 5 ] [ 5 ] e m o r y P C C o n d P C I o r D e m e m e m t o R e g I R O t p t s C o n t r o l O p [ 5 ] [ 5 ] P C S o r c e A L U O p A L U S r c B 6 A L U S r c A R e g R e g D s t [ 25 ] S h i f t l e f t 2 [ 3-26 ] P C [ 3-28 ] 2 R e g i s t e r s 2 S i g n 32 e t e n d S h i f t l e f t 2 A B A L U c o n t r o l Z e r o A L U A L U r e s l t J m p a d d r e s s [ 3 - ] A L U O t 2 [ 5 ] 7

8 Action of -bit Control Signals Control Inpt Effect when Deasserted () Effect when asserted () IorD PC spplies address to memory (instrction fetch) ALUot spplies address to memory (lw/sw) emread None emory content specified by address is placed on emdata o/p (lw/any instrction) emwrite None I/p Write data is stored at specified address (sw) IRWrite None emdata o/p is written on IR (instrction fetch) RegDst Write Register specified by Instrction[2-6] (lw) WriteRegister specified by Instrction[5-] (R-type) RegWrite None Data from WriteData i/p is written on the register specified by WriteRegister nmber ALUSrcA PC is the first operand in ALU (increment PC) Register A is the first operand in ALU emtoreg WriteData of the register file comes from ALUOt WriteData of the register file comes from DR PCWrite PCWriteCond Operation at PC depends on PCWriteCond and zero otpt of ALU Operation at PC depends on PCWrite PC is written; Sorce is determined by PCSorce PC is written if zero o/p of ALU = ; Sorce is determined by PCSorce 8

9 Action of 2-bit Control Signals Control Inpt Vale Effect ALU performs an add operation ALUOp ALUSrcB PCSorce ALU performs a sbtract operation The fnction field of Instrction defines the operation of ALU The second operand of ALU comes from Register B The second operand of ALU = 4 The second operand of ALU is sign etended Instrction[5-] The second operand of ALU is sign etended, 2-bit left shifted Instrction[5-] Otpt of ALU (PC + 4) is sent to PC Contents of ALUOt (branch target address = PC offset) is sent to PC Contents of Instrction[25-], shift left by 2, and concatenated with the SB 4-bits of PC is sent to PC (jmp instrction) 9

10 Breaking the Instrction Eection into Clock Cycles Eection of each instrction is broken into a series of steps Each step is balanced to do almost eqal amont of work Each step takes one clock cycle Each step contains at the most ALU operation, or register file access, or memory access Operations listed in step occrs in parallel in clock cycle Different steps occr in different clock cycles Different steps are:. Instrction fetch step 2. Instrction decode and register fetch step 3. Eection, memory address comptation, or branch completion step 4. emory access of R-type instrction completion step 5. emory read completion step

11 lticycle Implementation: Control Units added P C A d d r e s s e m o r y e m D a t a [ ] [ 2 6 ] [ 5 ] [ 5 ] e m o r y P C C o n d P C I o r D e m e m e m t o R e g I R O t p t s C o n t r o l O p [ 5 ] [ 5 ] P C S o r c e A L U O p A L U S r c B 6 A L U S r c A R e g R e g D s t [ 25 ] S h i f t l e f t 2 [ 3-26 ] P C [ 3-28 ] 2 R e g i s t e r s 2 S i g n 32 e t e n d S h i f t l e f t 2 A B A L U c o n t r o l Z e r o A L U A L U r e s l t J m p a d d r e s s [ 3 - ] A L U O t 2 [ 5 ]

12 Step : Instrction Fetch Fetch instrction from memory and compte the address of net seqential instrction IR = emory[pc]; PC = PC + 4; Operation:. Send PC to the memory as address (IorD = ) 2. Read memory cell defined by PC (emread = ) 3. Copy otpt of memory (edata) into IR (IRwrite = ) 4. Increment PC by 4 (ALUSrcA =, ALUSrcB =, PCSrc = ) 5. Store (PC + 4) into PC (PCWrite = ) 2

13 Step 2: Instrction Decode and Register Fetch Read register rs in register file and store content of rs in register A Read rt in register file and store content of rt from register file Compte branch target address A = Reg[IR[25-2]]; B = Reg[IR[2-6]]; ALUOt = PC + (sign-etend(ir[5-]) << 2); Operation:. Access register file to write rs in A. 2. Access register file to write rt in B. 3. Compte branch target address and store in ALUOt (ALUSrcA = ; ALUSrcB = ) Remember that ALU mst add (ALUOp = ) After this step, one of the for actions are possible: emory reference (lw/sw), R-type, Branch, or Jmp 3

14 Step 3: Eection, emory address Comptation, or Branch Completion emory Reference (sw/lw): ALUOt = A + sign-etend(ir[5-]) ALU adds content of A and sign-etend(ir[5-]) (ALUSrcA =, ALUSrcB = ), (ALUOp = ) R-type (add/sb/or/and): ALUOt = A op B ALU performs specified operation on A and B (ALUSrcA =, ALUSrcB = ), Operation of ALU is determined by the fnction field code (ALUOp = ) Branch (beq): if (A == B) PC = ALUOt; ALU does the eqal comparison operation on A and B (ALUSrcA =, ALUSrcB = ), ALU mst sbtract (ALUOp = ) Update PC with ALUOt if A == B (PCWriteCond =, PCSorce = ). Complete. Jmp (j): PC = PC[3-28] (IR[25-) << 2); PC gets overwritten by otpt of jmp address UX (PCSorce =, PCWrite = ). Complete. 4

15 Step 4: emory Access or R-type Instrction Completion emory Reference (sw/lw): DR = emory[aluot]; (for lw) or emory[aluot] = B; (for sw). Address from ALUOt is applied at address i/p of memory (IorD = ) 2. For sw, emwrite =. For lw, emread =. sw is complete. R-type Instrction (add/sb/or/and): Reg[IR[5-]] = ALUOt; ALUOt is stored into the register specified by IR[5-] (emtoreg =, RegWrite = ). Complete. 5

16 Step 5: emory Read Completion load (lw): Reg[IR[2-6]] = DR; DR is stored into the register specified by IR[2-6] (emtoreg =, RegWrite =, RegDst = ) 6

17 Smmary of Steps sed in different Instrctions Step Name Instrction fetch Instrction decode / Register fetch Action for R-type Instrction emory Reference Instrction Branch Jmp IR = emory[pc]; PC = PC + 4; A = Reg[IR[25-2]]; B = Reg[IR[2-6]]; ALUOt = PC + (sign-etend(ir[5-])<<2); R-type Eection / address comptation / Branch / Jmp ALUOt = A op B ALUOt = A + signetend(ir[5-]) if(a == B) then PC = ALUOt; PC = PC[3-28] (IR[25-)<<2); emory Access / R-type Completion Reg[IR[5-]] = ALUOt; lw: DR = emory[aluot] or sw: emory[aluot] = B emory Read Completion lw: Reg[IR[2-6]]=DR; 7

18 ltipath Datapath Implementation: Control Recall that design of single cycle datapath was based on a combinational circit Design of mlticycle datapath is more complicated. Instrctions are eected in a series of steps 2. Each step mst occr in a seqence 3. Control of mlticycle mst specify both the control signals and the net step The control of a mlticycle datapath is based on a seqential circit referred to as a finite state machine A finite state diagram for a 2-bit conter Each state specifies a set of otpt By defalt, nspecified otpts are assmed disabled The nmber of the arrows identify inpts State State 3 State State 2 8

19 Finite State achine Control of lticycle Datapath () S t a r t f e t c h / d e c o d e a n d f e t c h ( F i g r e ) e m o r y a c c e s s i n s t r c t i o n s ( F i g r e ) R - t y p e i n s t r c t i o n s ( F i g r e ) B r a n c h i n s t r c t i o n ( F i g r e 5 4 ) J m p i n s t r c t i o n ( F i g r e 5. 4 ) High-Level View 9

20 Finite State achine Control of lticycle Datapath (2) S t a r t f e t c h e m A L U S r c A = I o r D = I R A L U S r c B = A L U O p = P C P C S o r c e = d e c o d e / R e g i s t e r f e t c h A L U S r c A = A L U S r c B = A L U O p = ( O p = ' J P ' ) e m o r y r e f e r e n c e F S ( F i g r e ) R - t y p e F S ( F i g r e ) B r a n c h F S ( F i g r e 5. 4 ) J m p F S ( F i g r e 5. 4 ) Fig. 5.37: Steps and 2: Instrction Fetch and Decode Instrctions 2

21 Finite State achine Control of lticycle Datapath (3) 2 F r o m s t a t e A L U S r c A = A L U S r c B = A L U O p = ( O p = ' L W ' ) o r ( O p = ' S W ' ) e m o r y a d d r e s s c o m p t a t i o n 3 ( O p = ' L W ' ) e m o r y a c c e s s 5 e m o r y a c c e s s Fig. 5.38: Finite State achine for emory Reference Instrctions e m I o r D = e m I o r D = 4 - b a c k s t e p R e g e m t o R e g = R e g D s t = T o s t a t e ( F i g r e ) 2

22 Finite State achine Control of lticycle Datapath (4) F r o m s t a t e ( O p = R - t y p e ) 6 E e c t i o n A L U S r c A = A L U S r c B = A L U O p = 7 R - t y p e c o m p l e t i o n R e g D s t = R e g e m t o R e g = T o s t a t e ( F i g r e ) Fig. 5.39: Finite State achines for R-type Instrctions 22

23 Finite State achine Control of lticycle Datapath (5) 8 F r o m s t a t e ( O p = ' B E Q ' ) A L U S r c A = A L U S r c B = A L U O p = P C C o n d P C S o r c e = B r a n c h c o m p l e t i o n 9 F r o m s t a t e ( O p = ' J ' ) P C P C S o r c e = J m p c o m p l e t i o n T o s t a t e ( F i g r e ) Fig. 5.4: Finite State achine for Branch Instrction T o s t a t e ( F i g r e ) Fig. 5.4: Finite State achine for Jmp Instrction 23

24 2 e m o r y a d d r e s s c o m p t a t i o n A L U S r c A = A L U S r c B = A L U O p = S t a r t 6 e m A L U S r c A = I o r D = I R A L U S r c B = A L U O p = P C P C S o r c e = E e c t i o n A L U S r c A = A L U S r c B = A L U O p = f e t c h 8 B r a n c h c o m p l e t i o n A L U S r c A = A L U S r c B = A L U O p = P C C o n d P C S o r c e = d e c o d e / f e t c h 9 A L U S r c A = A L U S r c B = A L U O p = ( O p = ' J ') J m p c o m p l e t i o n P C P C S o r c e = 3 ( O p = ' L W ') e m o r y a c c e s s 5 e m o r y a c c e s s 7 R - t y p e c o m p l e t i o n e m I o r D = e m I o r D = R e g D s t = R e g e m t o R e g = 4 - b a c k s t e p R e g D s t = R e g e m t o R e g = 24

25 lticycle Implementation: Control Units added P C A d d r e s s e m o r y e m D a t a [ ] [ 2 6 ] [ 5 ] [ 5 ] e m o r y P C C o n d P C I o r D e m e m e m t o R e g I R O t p t s C o n t r o l O p [ 5 ] [ 5 ] P C S o r c e A L U O p A L U S r c B 6 A L U S r c A R e g R e g D s t [ 25 ] S h i f t l e f t 2 [ 3-26 ] P C [ 3-28 ] 2 R e g i s t e r s 2 S i g n 32 e t e n d S h i f t l e f t 2 A B A L U c o n t r o l Z e r o A L U A L U r e s l t J m p a d d r e s s [ 3 - ] A L U O t 2 [ 5 ] 25

26 Finite State achine Control of lticycle Datapath (5) P C P C C o n d I o r D e m e m C o n t r o l l o g i c I R e m t o R e g P C S o r c e O t p t s A L U O p A L U S r c B A L U S r c A R e g R e g D s t I n p t s N S 3 N S 2 N S N S O p 5 O p 4 O p 3 O p 2 O p O p S 3 S 2 S S o p c o d e f i e l d S t a t e 26

Designing MIPS Processor

Designing MIPS Processor CSE 675.: Introdction to Compter Architectre Designing IPS Processor (lti-cycle) Presentation H Reading Assignment: 5.5,5.6 lti-cycle Design Principles Break p eection of each instrction into steps. The