VLSI Design Chapter 7 VLSI System Components Jin-Fu Li
Chapter 7 VLSI System Components Introduction Datapath Operators Memory Elements Control Structures 2
System-Level Hierarchy System (Top) Complex units (cores) Simple Components Logic Circuits Silicon 3
Categories of Components Types of digital component Datapath operators Memory elements Control structures I/O cells Tradeoff of selection Speed Density Programmability Easy of design etc 4
Datapath Adder Adder Truth Table C A B A.B(G) A+B(P) A B SUM CARRY A B Generate Signal G(A.B): occurs when a carry output (CARRY) CARRY is internally generated within the adder. G P C Propagate Signal P(A+B): when it is true, the carry in signal C is passed to the carry output (CARRY) when C is true SUM 5
Datapath Adder SUM=A B C CARRY=AB+AC+BC Single-bit schematic of SUM A B C -C A C -A -A -B B A A B -B SUM -A -A SUM C A C 6
Datapath Adder Single-bit schematic of CARRY C A B A B CARRY -C -A -B CARRY 7
Datapath Adder Optimized combinational adder schematic C i+ =A i B i +A i C i +B i C i S i =(A i +B i +C i ).C i+ +A i B i C i A i Vdd B i C i C i+ C i+ S i Si C i B i A i Vss 8
Datapath Adder Symmetrical optimized combinational adder schematic A A B A A B C i B C i B B C out C out C i S A B A A B C i B A 9
Datapath Bit-Parallel Adder Parallel adder implementations C<n+> C<n+> B<n> A<n> S<n> B<n> A<n> S<n> C<n> C<n> B<3> A<3> C<3> S<3> B<3> A<3> C<3> S<3> B<2> A<2> S<2> B<2> A<2> S<2> B<> A<> S<> B<> A<> S<> B<> A<> S<> B<> A<> S<> Cin Cin
Datapath Bit-Parallel Adder B<3> A<3> B<2> A<2> B<> A<> B<> A<> C<3> Vdd S<3> S<2> S<> S<> B<3> B<2> B<> B<> Substract A<3> A<2> A<> A<> C<3> S<3> S<2> S<> S<> A-B If(Substract==) {S=A+B;} else {S=A-B;}
Datapath Bit-Serial Adder Cout A addand B augand Cin Result 2
Datapath Carry-Save Adder (CSA) nc COUT SIN<3> A<3> CIN<2> B<3> S<3> SIN<2> A<2> CIN<> B<2> S<2> SIN<> A<> CIN<> B<> S<> SIN<> CPA Adder A<> clk B<> clk S<> 3
Datapath Carry Look-Ahead Adder (CLA) Objective To avoid the linear growth of the carry delay, we use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel Feature The Carry of each bit is generated from the propagate and the generate signals as well as the input carry The propagate and the generate signals are derived from the operand A i and B i by G i =A i.b i P i =A i +B i 4
Datapath Carry Look-Ahead Adder C i+ =A i B i +(A i +B i )C i =G i +P i C i C =G +P C C 2 =G +P G +P P C C 3 =G 2 +P 2 G +P 2 P G +P 2 P P C C 4 =G 3 +P 3 G 2 +P 3 P 2 G +P 3 P 2 P G +P 3 P 2 P P C C P G P -P G -G P -P 2 G -G 2 P -P 3 G -G 3 4-bit CLA CLG CLG2 CLG3 CLG4 C C 2 C 3 C 4 SG SG2 SG3 SG4 S S S 2 S 3 5
Datapath Carry Look-Ahead Adder CLG G P C C G P C C G P C P G 6
Datapath Carry Look-Ahead Adder CLG4 G 3 G 2 G G C C 4 P P P 2 P 3 C 4 =G 3 +P 3 G 2 +P 3 P 2 G +P 3 P 2 P G +P 3 P 2 P P C 7
Datapath Carry Look-Ahead Adder Manchester Carry Chain C i+ =G i +P i C i G i =A i.b i P i =A i +B i Introduce the carry-kill bit K i, this term gets its name from the fact that if K i =, then P i = and G i =, so that C i+ =; K i = thus kills the carry-out bit. K i =A i.b i A i B i P i G i K i C i+ G i K i P i C i 8
Datapath Carry Look-Ahead Adder Manchester circuit styles G i P i P i C i+ C i C i+ C i G i P i G i Static circuit Dynamic circuit Clk Clk C 4 P 3 P 2 P P C G 3 G 2 G G Clk C 4 C 3 C 2 C Dynamic Manchester chain 9
Datapath Carry Look-Ahead Adder Extension to wide adders If we use a brute-force approach for an 8-bit design, then the carry-out bit C 8 would have a term of the form P 7 P6 P5 P4 P3 P2 P P C Multilevel CLA networks can improve this problem bit[n-] bit[] n-bit adder [i+3] [i] 4-bit CLG 2
Datapath Carry Look-Ahead Adder P i+4 G i+4 P i+3 G i+3 P i+2 G i+2 P i+ G i+ block propagate P [i, i+3] G [i, i+3] 4-bit Carry Lookahead Generator block generate C i+3 C i+2 C i+ G [i, i+3] =G i+3 +P i+3 G i+2 +P i+3 P i+2 G i+ +P i+3 P i+2 P i+ G i P [i, i+3] =P i+3 P i+2 P i+ P i 2
Datapath Carry-Skip Adder A carry-skip adder is designed to speed up a wide adder by aiding the propagation of a carry bit around a portion of the entire adder. [i+4] [i] c i+4 4-bit adder c i c i+k k-bit adder c i P [i,i+3] c i+4 +c i.p [i,i+3] Carry-skip Carry-skip logic Generalization 22
Datapath Carry-Select Adder b 7 a 7 b 6 a 6 b 5 a 5 b 4 a 4 b 7 a 7 b 6 a 6 b 5 a 5 b 4 a 4 c 8 4-bit adder U c= c 8 4-bit adder U c= s 7 s 6 s 5 s 4 s 7 s 6 s 5 s 4 MUX MUX MUX MUX MUX c 4 b 3 a 3 b 2 a 2 b a b a 4-bit adder L c c 8 s 7 s 6 s 5 s 4 23
Datapath Conditional-Sum Adder A B A B A 2 B 2 A 3 B 3 Conditional cell Conditional cell Conditional cell Conditional cell S S C C S S C C S S C C S S C C C =C in S S S 2 S 3 C 4 24
Datapath Multipliers Bit-level multiplier a b axb a b axb Multiplication of two 4-bit words a 3 a 2 a a b 3 b 2 b b a 3 b a 3 b 2 a 3 b a 2 b a b a 2 b a b a b a 2 b 2 a b 2 a b 2 a b 3 a b a 3 b 3 a 2 b 3 a b 3 p 7 p 6 p 5 p 4 p 3 p 2 p p 25
Datapath Multipliers The product axb is given by the 8-bit result p=p 7 p 6 p 5 p 4 p 3 p 2 p p The ith product term p i can be expressed as pi = a jbk + ci i= j+ k Alternate view of multiplication process a 3 a 2 a a b 3 b 2 b b (a 3 a 2 a (a 3 a 2 a a ) xb (a 3 a 2 a a ) xb 2 (a 3 a 2 a a ) xb 3 a ) xb (axb )2 (axb )2 (axb 2 )2 2 (axb 3 )2 3 p 7 p 6 p 5 p 4 p 3 p 2 p p 26
Datapath Multipliers Using a product register for multiplication 7 6 5 4 3 2 Product register (axb )2 (axb )2 (axb 2 )2 2 (axb 3 )2 3 27
Datapath Multipliers Shift-right multiplication sequence add (axb ) shift right a 3 b a 2 b a b a b a 3 b a 2 b a b a b add (axb ) shift right add (axb 2 ) shift right c x a 3 b a 2 b a b a b a 3 b a 2 b a b a b c x a 3 b a 2 b a b a b a 3 b a 2 b a b a b a 3 b a 2 b a b a b a 3 b a 2 b a b a b a 3 b 2 a 2 b 2 a b 2 a b 2 c y a3 b a 2 b a b a b p i i+ = ( pi + a2 bi ) 2 c y a 3 b a 2 b a b a b a 3 b 2 a 2 b 2 a b 2 a b 2 add (axb 3 ) shift right p 7 a 3 b 3 a 3 b a 2 b a b a b a 3 b a 2 b a b a b a 3 b 2 a 2 b 2 a b 2 a b 2 a 2 b 3 a b 3 a b 3 28
Datapath Register-Based Multiplier Product register (2n) clk shr Multiplicand n Multiplier n n MUX n n-bit adder n 29
3 Datapath Datapath Array Multipliers Array Multipliers = = 2 n i i X i X = = 2 n j j Y j Y = = = = 2 2 n i n j j j i X i Y Y X P Consider two unsigned binary integers X and Y + = = + = = = 2 ) 2 ( n n k k k n j j i j i n i P Y X
Datapath Array Multipliers Y X 3 X 2 X X Y P Y 2 P Y 3 P 2 P 3 P 7 P 6 P 5 P 4 3
Datapath Array Multipliers X 3 Y X 2 Y X Y X Y X 3 Y X 2 Y X Y X Y X 3 Y 2 X 2 Y 2 X Y 2 X Y 2 X 3 Y 3 X 2 Y 3 X Y 3 X Y 3 P 7 P 6 P 5 P 4 P 3 P 2 P P 32
Datapath Booth Multiplier Booth s algorithm takes advantages of the fact that an adder-substractor is nearly as fast and small as a simple adder Consider the two s complement representation of the multiplier y n n n 2 y = 2 yn + 2 yn + 2 yn +L The representation can be rewritten as n n n 2 y = 2 ( yn yn) + 2 ( yn 2 yn ) + 2 ( yn 3 yn 2) +L Extract the first two terms n n y = 2 ( yn yn) + 2 ( yn 2 yn ) The right-hand term can be used to add x to partial product The left-hand term add 2x 2 33
Datapath Booth Multiplier Actions during Booth multiplication y i y y i i 2 Operation Add Add x Add 2x Add x Sub 2x Sub x Sub x Add For example, x= (25 ), y= (-8 ). y y y - =, so P =P -2x.= 2. Y 3 y 2 y =, so P 2 =P +.4= 3. Y 5 y 4 y 3 =, so P 3 =P 2 -x.6= 34
Datapath Booth Multiplier Structure of a Booth multiplier P j+2 left shift 2 P j+ Adder/substractor code Mux sel y i+4 y i+3 y i+2 x 2x Stage j+ P j+ left shift 2 Adder/substractor y i+2 code y i+ P j Mux sel y i x 2x Stage j 35
Datapath Serial Multiplication Serial multiplier reset serial register X Y. Require MN clock cycles to produce a product for an N-bit multiplier and a M-bit multiplicand 36
Datapath Serial Multiplication Serial/parallel multiplier Y Y Y 2 Y 3 X D D D S S S 2 D D D. Require M+N clock cycles to produce a product for an N-bit multiplier and a M-bit multiplicand 2. The critical path consists of the adders 37
Memory Elements Memory Architecture Memory elements may be divided into the following categories Random access memory Serial access memory Content addressable memory Memory architecture row decoder row decoder row decoder row decoder 2 m+k bits 2 n-k words n-bit address k column decoder m-bit data I/Os column mux, sense amp, write buffers 38
Memory Elements RAM Generic RAM circuit Bit line conditioning Clocks RAM Cell n-:k k-: Sense Amp, Column Mux, Write Buffers Write Clocks Address write data read data 39
Memory Elements RAM Cells 6-T SRAM cell word line 4-T SRAM cell bit - bit word line bit - bit 4
Memory Elements RAM Cells 4-T dynamic RAM (DRAM) cell word line 3-T DRAM cell bit - bit Read Write Write data Read data 4
Memory Elements RAM Cells -T DRAM cell word line word line Vdd or Vdd/2 bit bit Layout of -T DRAM (right) Vdd word line bit 42
Memory Elements DRAM Retention Time Write and hold operations in a DRAM cell WL= WL= Input Vdd + - on C s + - V s off C s + - V s Write Operation Hold V Q s = max V max = C s = V ( V DD DD V V tn tn ) 43
Memory Elements DRAM Retention Time Charge leakage in a DRAM Cell WL= V max V s (t) off I L C s + - V s (t) V t h Minimum logic voltage t I I I t L L L h dq s = ( ) dt dv s = C s ( ) dt V s C s ( ) t C s = t ( ) V I L s 44
Memory Elements DRAM Refresh Operation As an example, if IL=nA, Cs=5fF, and the difference of Vs is V, the hold time is 5 5 t h = =.5µs 9 Memory units must be able to hold data so long as the power is applied. To overcome the charge leakage problem, DRAM arrays employ a refresh operation where the data is periodically read from every cell, amplified, and rewritten. The refresh cycle must be performed on every cell in the array with a minimum refresh frequency of about f refresh 2t h 45
Memory Elements DRAM Read Operation WL= on I L + C bit + C s - V s V f - V bit V f Q Q V s s f = = = C C s s ( C V V s s f C + + C s C bit bit V ) V s f This shows that V f <V s for a store logic. In practice, V f is usually reduced to a few tenths of a volt, so that the design of the sense amplifier becomes a critical factor 46
Memory Elements RAM Read Operation Vdd precharge precharge precharge bit, -bit word line word data -bit bit data 47
Memory Elements RAM Read Operation Vdd-Vtn precharge precharge Vdd precharge bit, -bit word line word data -bit bit data 48
Memory Elements RAM Read Operation bit, -bit word data -bit bit load V 2 sense + sense - pass sense common V pulldown 49
Memory Elements Differential Amplifier Vdd Mp Mp2 I d se f I d I d I d2 d Mn Mn2 I SS Mn I d2 I d I SS I d2 d d I SS =I d +I d2 5
Memory Elements Write Operation N 5 N 6 word write data write N 3 N 4 word -bit bit bit, -bit write N N 2 write data cell, -cell 5
Memory Elements Write Operation P bit -bit bit 5V -cell cell V N bit -write write N D P D write-data 52
Memory Elements Row Decoder word<3> word<> word<2> word<> word<> word<2> word<> word<3> a<> a<> a<> a<> 53
Memory Elements Row Decoder word a a a a a2 a3 Complementary AND gate Pseudo-nMOS gate 54
Memory Elements Row Decoder Symbolic layout of row decoder Vss Vdd output Vss 55
Memory Elements Row Decoder Symbolic layout of row decoder Vdd output a3 -a3 a2 -a2 a -a a Vss 56
Memory Elements Row Decoder Predecode circuit word<7> word<6> word<5> word<4> word<3> word<2> word<> word<> a2 a a 57
Memory Elements Row Decoder Actual implementation a a4 a3 a2 a word -a clk Pseudo-nMOS example a word a a2 en 58
Memory Elements Column Decoder bit<7> bit<6> bit<5> bit<4> bit<3> bit<2> bit<> bit<> -bit<7> -bit<6> -bit<5> -bit<4> -bit<3> -bit<2> -bit<> -bit<> selected-data to sense amps and write ckts -selected-data a -a a -a a2 -a2 59
Memory Elements Multiported RAM write read read -rbit -rbit -rwr_data rwr_data rbit rbit write read read -rbit -rwr_data rwr_data rbit 6
Memory Elements Expandable Reg. File Cell wr-b addr<3:> rd-a addr<3:> rd-b addr<3:> write-data read-data read-data write-data write-enable(row) read read Write-enable (column) 6
Memory Elements FIFO Two port RAM Write-Data Write-Address Write-Clock Read-Data Read-Address Read-Clock Full Empty FIFO read address control design WP Read rst clk incrementer 62
Memory Elements FIFO FIFO address control design - decrementer Read rst clk incrementer Read Write Empty Full 63
Memory Elements LIFO LIFO (Stack) Require: Single port RAM One address counter Empty/Full detector Address Algorithm: Write: write current address Address=Address+ Read: Address=Address- read current address Empty: Address= Full : Address=FFF 64
SIPO cell design Memory Elements SIPO Read Parallel-data Sh-In Sh-Out Clk -Clk SIPO Read Sh-In Clk 65
Memory Elements Tapped Delay Line delay<5> delay<4> delay<3> delay<2> delay<> delay<> din dout Clk 32-stage SR 6-stage SR 8-stage SR 4-stage SR 2-stage SR -stage SR 66
Memory Elements ROM 4x4 NOR-type ROM Vdd WL WL GND WL2 WL3 GND BL BL BL2 BL3 67
Memory Elements ROM 4x4 NAND-type ROM Vdd WL BL BL BL2 BL3 WL WL2 WL3 68
Memory Elements CAM CAM architecture Data CAM Memory Array N M-bit words Match Cache architecture Data In CAM CAM Memory Array N M-bit words Match Match Match 2 Match 3 Match 4 Match 5 RAM Word Word Word 2 Word 3 Word 4 Word 5 Data I/Os 69
Memory Elements CAM CAM cell -bit bit WL -d d M2 M Match M3 7
Memory Elements CAM CAM circuit Match Data In Read Data (Test) Normal RAM Read/Write Circuitry Hit Match Match Match2 Match3 precharge 7
Control FSM Moore input output clk Mealy input output clk 72
Control FSM FSM design procedure Draw the state-transition diagram Check the state diagram Write state equations (Write HDL) An example of state-transition diagram R -A IDLE A IDLE: (S,S)=() WAIT: (S,S)=() EXIT: (S,S)=() EXIT -A WAIT A: car-in C: change-ok R: rst A C -C 73
Control FSM Check the state-transition diagram Ensure all states are represented, including the IDLE state Check that the OR of all transitions leaving a state is TRUE. This is a simple method of determining that there is a way out of a state once entered. Verify that the pairwise XOR of all exit transitions is TRUE. This ensures that there are not conflicting conditions that would lead to more than one exittransition becoming active at any time. Insert loops into any state if it is not guaranteed to otherwise change on each cycle. Formal FSM verification method Perform conformance checking 74
Control Verilog Coding Style for FSMs module toll_booth(clk,rst,car_in,change_ok,green); input clk,rst,car_in,change_ok; output green; reg[:] state_reg, next_state; parameter IDLE = 2 b; parameter WAIT = 2 b; parameter EXIT = 2 b; always @(posedge clk or negedge rst) begin If (rst== b) state_reg<=idle; else state_reg<=next_state; end always @(state_reg or car_in or change_ok) begin case(state_reg): IDLE: if (car_in== ) begin next_state=wait; green= b; end else begin next_state=idel; end WAIT: if (change== b) begin next_state=exit; green= b; end else begin next_state=wait; green= b; end EXIT: if (car_in== ) begin next_state=exit; green= b; end else begin next_state=idel; green= b; end default: begin next_state=idle; green= b; end endcase end endmodule 75
Control PLA Structure of a PLA Minterms AND array OR array a b c d f f f 2 f 3 A PLA represents an expression of sum-of-product (SOP) f = m ( a, b, c, d ) i i i f = a b c d + a b c d + a b c d 76
Control PLA Fuse-programmable PLA Fuse a b c d f f f 2 f 3 77
Control PLA Logic gate diagram of a PLA a b c d f f f 2 f 3 78
Control PLA Pseudo-nMOS PLA In Out Out 79
Control FSM Implementation with PLA PLA for toll-booth example C A R S S clk 8
Control FSM Implementation with ROM ROM for toll-booth example ROM address Data rst Jump-address next-address condition-code-select condition-code-polarity condition-code-enable car_in change_ok Symbolic Microcode for the Tollbooth example ADDRESS LABEL INSTRUCTION JUMP-ADDRESS OUTPUT 2 3 4 5 IDLE: WAIT: EXIT: nop!car_in change_ok nop!car_in car_in jum[p IDLE jump EXIT jump WAIT jump IDLE jump EXIT green green 8