Digital Integrated Circuits Lecture 5: Logical Effort Chih-Wei Liu VLSI Signal Processing LAB National Chiao Tung University cwliu@twins.ee.nctu.edu.tw DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 1
Outline RC Delay Estimation Logical Effort Delay in a Logic Gate Multistage Logic Networks Choosing the Best Number of Stages Example Summary DIC-Lec5 cwliu@twins.ee.nctu.edu.tw
RC Delay Model Use equivalent circuits for MOS transistors Ideal switch + capacitance and ON resistance Unit nmos has resistance R, capacitance C Unit pmos has resistance R (mobility 較小 ), capacitance C Capacitance proportional to width Resistance inversely proportional to width DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 3
RC Values Capacitance C = C g = C s = C d = ff/μm of gate width Values similar across many processes Resistance R 6 KΩ*μm in 0.6um process Improves with shorter channel lengths Unit transistors May refer to minimum contacted device (4/ λ) 假設 nmos/pmos 寄生電容與閘極電容相同 Or maybe 1 μm wide device Doesn t matter as long as you are consistent DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 4
Inverter Delay Estimate Estimate the delay of a fanout-of-1 inverter by using the basic RC delay model of nmos and pmos RC model A 1 Y 1 s kc g kc R/k kc d DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 5
Inverter Delay Estimate Estimate the delay of a fanout-of-1 inverter R C A 1 Y 1 R C C Y C C 假設 A 電位提昇 C DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 6
Inverter Delay Estimate Estimate the delay of a fanout-of-1 inverter R C A 1 Y 1 R C C Y C C R C C C C A=1 C output DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 7
Inverter Delay Estimate Estimate the delay of a fanout-of-1 inverter R C A 1 Y 1 R C C Y C C R C C C C C t pd = 6RC 若沒有寄生電容的 ideal Inverter 則為 t pd = 3RC DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 8
Example: 3-input NAND Sketch a 3-input NAND with transistor widths chosen to achieve effective rise and fall resistances equal to a unit inverter (R). DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 9
Example: 3-input NAND Sketch a 3-input NAND with transistor widths chosen to achieve effective rise and fall resistances equal to a unit inverter (R). Step 1 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 10
3-input NAND Caps We still assume C=C g =C diff Annotate the 3-input NAND gate with gate and diffusion capacitance. 3 3 3 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 11
3-input NAND Caps We assume C=C g =C diff Annotate the 3-input NAND gate with gate and diffusion capacitance. C C C C C C C C C Step 3C 3C 3C 3 3 3 3C 3C 3C 3C DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 1
3-input NAND Caps Annotate the 3-input NAND gate with gate and diffusion capacitance. Final result 5C 5C 5C 3 3 3 9C 3C 3C DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 13
Elmore Delay Model ON transistors look like resistors Pullup or pulldown network modeled as RC ladder Elmore delay of RC ladder t R C pd i to source i nodes i ( )... (... ) = RC + R + R C + + R + R + + R C 1 1 1 1 R 1 R R 3 R N C 1 C C 3 C N N N RC ladder DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 14
Example: -input NAND Estimate rising and falling propagation delays of a - input NAND driving h identical gates. A B x Y h copies -input NAND DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 15
Example: -input NAND Estimate rising and falling propagation delays of a - input NAND driving h identical gates. A B x 6C C Y 4hC h copies DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 16
Example: -input NAND Estimate worst-case rising and falling propagation delays of a -input NAND driving h identical gates. A B x 6C C Y 4hC h copies Worst case rising: 只有一個 pmos 開啟 R Y ( ) (6+4h)C t pdr = 6+ 4h RC DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 17
Example: -input NAND Estimate worst-case rising and falling propagation delays of a - input NAND driving h identical gates. A B x 6C C Y 4hC h copies R/ x R/ C Y (6+4h)C Worst case falling:a 已經為 1 ;B 的電位開始上升 t pdf R R = ( )C + ( = (7 + 4h) RC + R )[(6 + 4h) C] DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 18
Delay Components Delay has two parts Parasitic delay 6 or 7 RC Independent of load Effort delay 4h RC Proportional to load capacitance DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 19
Contamination Delay Best-case (contamination) delay can be substantially less than worst-case propagation delay. Ex: If both inputs fall simultaneously A B x 6C C Y 4hC R R Y (6+4h)C t cdr R = ( )[(6 + 4h) C] = (3 + h) RC DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 0
Contamination Delay Best-case (contamination) falling delay can be substantially less than worst-case propagation delay. Ex: B 已經為 1 ;A 的電位開始上升 A B x 6C C Y 4hC t cdf = = R R ( + )[(6 + (6 + 4h) RC 4h) C] DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 1
Delay Components Again, Delay has two parts Parasitic delay 3 or 6 RC Independent of load Effort delay 4h or h RC Proportional to load capacitance DIC-Lec5 cwliu@twins.ee.nctu.edu.tw
Diffusion Capacitance We assumed contacted diffusion on every s / d. Good layout minimizes diffusion area Ex: NAND3 layout shares one diffusion contact Reduces output capacitance by C Merged un-contacted diffusion might help too 3 9C 5C 3 3C 5C 3 3C 5C Recall 3-input NAND Shared Contacted Diffusion Merged Uncontacted Diffusion C C Gate capacitance: 由電晶體的寬度決定 Diffusion capacitance: 與佈局有關 Isolated Contacted Diffusion 3 3 7C 3C 3C 3C 3C 3 3C DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 3
Layout Comparison Which layout is better? V DD A B V DD A B Y Y GND GND DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 4
Introduction Chip designers face a bewildering array of choices What is the best circuit topology for a function? How many stages of logic give least delay? How wide should the transistors be???? Logical effort is a method to make these decisions Uses a simple model of delay Allows back-of-the-envelope calculations Helps make rapid comparisons between alternatives Emphasizes remarkable symmetries DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 5
Example Ben Bitdiddle is the memory designer for the Motoroil 68W86, an embedded automotive processor. Help Ben design the decoder for a register file. Decoder specifications: A[3:0] A[3:0] 3 bits 16 word register file Each word is 3 bits wide Each bit presents load of 3 unit-sized transistors 4:16 Decoder 16 Register File 16 words True and complementary address inputs A[3:0] Each input may drive 10 unit-sized transistors Ben needs to decide: How many stages to use? How large should each gate be? How fast can decoder operate? DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 6
Delay in a Logic Gate Express delays in process-independent unit d = d abs τ = 3RC 沒有寄生電容的理想反向器 1 ps in 180 nm process τ Normalized delay representation 40 ps in 0.6 μm process DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 7
Delay in a Logic Gate Express delays in process-independent unit d = d abs τ Delay has two components d = f + p 無關於製程型態的方式來表示 Delay, 可使電路能依照佈局來做 Delay 的比較, 而不必考慮不同製程對電路速度的影響 Linear delay model DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 8
Delay in a Logic Gate Express delays in process-independent unit Delay has two components Effort delay: f = gh (a.k.a. stage effort) d = d d abs τ = f + p Again has two components DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 9
Delay in a Logic Gate Express delays in process-independent unit Delay has two components Effort delay f = gh (a.k.a. stage effort) Again has two components g: logical effort d = d abs τ d = f + p Measures relative ability of gate to deliver current g 1 for inverter Gate Complexity, g i.e. 會耗費較多的時間去驅動相同 fanout 數的電晶體 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 30
Delay in a Logic Gate Express delays in process-independent unit Delay has two components Effort delay f = gh (a.k.a. stage effort) Again has two components h: electrical effort = C out / C in d = d abs τ d = f + p Ratio of output to input capacitance Sometimes called fanout Fan-out, h DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 31
Delay in a Logic Gate Express delays in process-independent unit Delay has two components Parasitic delay p d = d d abs τ = f + p Represents delay of gate driving no load Set by internal parasitic capacitance DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 3
Delay in NAND Effort delay: 4hC ( 與外部電容有關 ) Delay in -input NAND 6C d = f + p 4hC = + 3C 4 = ( ) h + = gh + 3 p A B Normalized delay representation How to calculate the logical effort g? x C 註 :Ideal Inverter ( 無寄生電容 Single Fanout) τ = 3RC;g = 1 Y 4hC DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 33
Delay Plots d = f + p = gh + p Normalized Delay: d 6 5 4 3 -input NAND g = p = d = Inverter g = p = d = 1 0 0 1 3 4 5 Electrical Effort: h = C out / C in DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 34
Delay Plots d = f + p = gh + p What about NOR? Normalized Delay: d 6 5 4 3 -input NAND Inverter g = 4/3 p = d = (4/3)h + g = 1 p = 1 d = h + 1 Effort Delay: f 1 0 Parasitic Delay: p 0 1 3 4 5 Electrical Effort: h = C out / C in DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 35
Computing Logical Effort DEF: Logical effort is the ratio of the input capacitance of a gate to the input capacitance of an inverter delivering the same output current. Measure from delay vs. fanout plots Or estimate by counting transistor widths A 1 Y A B Y A B 4 4 1 1 Y C in = 3 g = 3/3 C in = 4 g = 4/3 C in = 5 g = 5/3 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 36
Catalog of Gates Logical effort of common gates Gate type Number of inputs 1 3 4 n Inverter 1 NAND 4/3 5/3 6/3 (n+)/3 NOR 5/3 7/3 9/3 (n+1)/3 Tristate / mux XOR, XNOR 4, 4 6, 1, 6 8, 16, 16, 8 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 37
Recall: 4:1 Multiplexer 4:1 mux chooses one of 4 inputs using two selects S1S0 S1S0 S1S0 S1S0 D0 D1 Logical Effort =; 與輸入的數目無關 D D3 大型多工器的速度與小型多工器的速度相當? DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 38
Computing Parasitic Delay DEF: Represents delay of gate driving no load Set by internal parasitic capacitance Measure from delay vs. fanout plots Or estimate by counting total diffusion capacitances at output A A Y 1 B Y A 4 B 4 Y 1 1 C= 3 p= 3/3= 1 C = 6 p= 6/3= C= 6 p= 6/3= 增加電晶體大小會降低電阻, 但卻會增加寄生電容, 但較寬的電晶體可以摺疊, 使內部的寄生電容可能變小 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 39
Catalog of Gates Parasitic delay of common gates In multiples of p inv ( 1) Gate type Number of inputs 1 3 4 n Inverter 1 NAND 3 4 n NOR 3 4 n Tristate / mux 4 6 8 n XOR, XNOR 4 6 8 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 40
Example: FO4 Inverter Estimate the delay of a fanout-of-4 (FO4) inverter d Logical Effort: g = Electrical Effort: h = Parasitic Delay: p = Stage Delay: d = DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 41
Example1: FO4 Inverter Estimate the delay of a fanout-of-4 (FO4) inverter d Logical Effort: g = 1 Electrical Effort: h = 4 Parasitic Delay: p = 1 Stage Delay: d = gh + p = 5 經驗法則 The FO4 delay is about 00 ps in 0.6 μm process 60 ps in a 180 nm process f/3 ns in an f μm process DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 4
Example: Ring Oscillator Estimate the frequency of an N-stage ring oscillator Odd-number stages Logical Effort: g = 1 Electrical Effort: h = 1 31 stage ring oscillator in 0.6 μm process has Parasitic Delay: p = 1 frequency of ~ 00 MHz Stage Delay: d = Frequency: f osc = 1/(*N*d) = 1/4N 如果 g 1, 那如何計算多級邏輯網路的延遲呢? DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 43
Multistage Logic Networks Logical effort generalizes to multistage networks Path Logical Effort Path Electrical Effort Path Effort G = g i C H = C out-path in-path F = f = gh 與電晶體大小無關 與電晶體大小有關 i i i 10 g 1 = 1 h 1 = x/10 x g = 5/3 h = y/x y g 3 = 4/3 h 3 = z/y z g 4 = 1 h 4 = 0/z 0 F=GH=(5/3)(4/3)(x/10)(y/x)(z/y)(0/z)=G(0/10) DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 44
Multistage Logic Networks Logical effort generalizes to multistage networks Path Logical Effort Path Electrical Effort Path Effort G = g i C H = C out-path in-path F = fi = gh i i Can we always write F = GH? DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 45
Branch Paths No! Consider paths that branch: G = H = 5 15 90 GH = h 1 = 15 90 h = F = GH? DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 46
Paths that Branch No! Consider paths that branch: G = 1 H = 90 / 5 = 18 5 15 90 GH = 18 h 1 = (15 +15) / 5 = 6 15 90 h = 90 / 15 = 6 F = g 1 g h 1 h = 36 = GH DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 47
Branching Effort Branching effort Accounts for branching between stages in path Now we compute the path effort b = C F = GBH on path B= b i C + C on path off path Path branching effort Note: h i = BH DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 48
Multistage Delays Path Effort Delay D F = f i Path Parasitic Delay Path Delay P= p i = i = F + D d D P Recall that F = GBH 各級 path effort 的乘積為 F;path delay 則為各級 path effort delay 的總和 最小值發生在各級的 path effort delay 相同, 亦即 f i = F 1/N DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 49
Designing Fast Circuits Delay is smallest when each stage bears same effort Thus minimum delay of N stage path is This is a key result of logical effort = i = F + D d D P fˆ = gh = F i i 1 N D= NF + P 1 N Find fastest possible delay Doesn t require calculating gate sizes 這種方式優於模擬 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 50
Choose Gate Sizes to Meet the Least Delay Design How wide should the gates be for least delay? fˆ = gh= g C = in i C C out in gc i fˆ out i Working backward, apply capacitance transformation to find input capacitance of each gate given load it drives. Check work by verifying input cap spec is met. DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 51
Example: 3-stage path Select gate sizes x and y for least delay from A to B x x y 45 A 8 x y B 45 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 5
Example: 3-stage path x A 8 x x y y 45 B 45 Logical Effort G = Electrical Effort H = Branching Effort B = Path Effort F = Best Stage Effort ˆf = Parasitic Delay P = Delay D = DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 53
Example: 3-stage path x A 8 x x y y 45 B 45 Logical Effort G = (4/3)*(5/3)*(5/3) = 100/7 Electrical Effort H = 45/8 Branching Effort B = 3 * = 6 Path Effort F = GBH = 15 Best Stage Effort ˆ 3 F 5 Parasitic Delay P = + 3 + = 7 f = = Delay D = 3*5 + 7 = = 4.4 FO4 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 54
Example: 3-stage path Work backward for sizes y = x = x x y 45 A 8 x y B 45 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 55
Example: 3-stage path Work backward for sizes y = 45 * (5/3) / 5 = 15 x = (15+15) * (5/3) / 5 = 10 fˆ = gh= g C = in i C C out in gic fˆ out i 45 A P: 4 N: 4 P: 4 N: 6 P: 1 N: 3 B 45 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 56
Example: 3-stage path Check the result: 1. the size of the NAND gate: (10+10+10)*(4/3) / 5 = 8. NAND gate delay: d 1 =g 1 h 1 +p 1 = (4/3)*(10+10+10) / 8 + = 7 3. NAND3 gate delay: d =g h +p = (5/3)*(15+15) / 10 + 3 = 8 4. NOR gate delay: d 3 =g 3 h 3 +p 3 = (5/3)*(45/15) + = 7 7+8+7 = 45 A P: 4 N: 4 P: 4 N: 6 P: 1 N: 3 B 45 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 57
Best Number of Stages How many stages should a path use? Minimizing number of stages is not always fastest Example: drive 64-bit datapath with unit inverter Initial Driver 1 1 1 1 D = Datapath Load 64 64 64 64 N: f: D: 1 3 4 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 58
Best Number of Stages Example: drive 64-bit datapath with unit inverter Initial Driver 1 1 1 1 D = NF 1/N + P = N(64) 1/N + N 8 4.8 16 8 3 Datapath Load 64 64 64 64 N: f: D: 1 64 65 8 18 3 4 15 Fastest 4.8 15.3 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 59
Derivation for Best Number of Stages Consider adding inverters to end of path How many give least delay? = n 1 1 N + i + i= 1 ( ) D NF p N n p D N 1 1 1 N N N 1 inv = F ln F + F + p = 0 inv Logic Block: n 1 Stages Path Effort F N - n 1 Extra Inverters Define best stage effort 1 ρ = F N p + ρ 1 lnρ = 0 inv ( ) DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 60
Best Number of Stages p + ρ 1 lnρ = 0 ( ) has no closed-form solution inv Neglecting parasitics (p inv = 0), we find ρ =.718 (e) For p inv = 1, solve numerically for ρ = 3.59 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 61
Sensitivity Analysis How sensitive is delay to using exactly the best number 1.6 of stages? 1.51 Nˆ = log ρ F D(N) /D(N) 1.4 1.6 1. 1.15 1.0 (ρ=6) (ρ =.4) If p inv = 1 0.0 0.5 0.7 1.0 1.4.0.4 < ρ < 6 gives delay within 15% of optimal We can be sloppy! I like ρ = 4 N / N DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 6
Example, Revisited Ben Bitdiddle is the memory designer for the Motoroil 68W86, an embedded automotive processor. Help Ben design the decoder for a register file. Decoder specifications: A[3:0] A[3:0] 3 bits 16 word register file Each word is 3 bits wide Each bit presents load of 3 unit-sized transistors 4:16 Decoder 16 Register File 16 words True and complementary address inputs A[3:0] Each input may drive 10 unit-sized transistors Ben needs to decide: How many stages to use? How large should each gate be? How fast can decoder operate? DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 63
Possible 4:16 Decoder Each word is 3 bits wide Each bit presents load of 3 unit-sized transistors Each input may drive 10 unit-sized transistors A[3] A[3] A[] A[] A[1] A[1] A[0] A[0] 10 10 10 10 10 10 10 10 y z word[0] 96 units of wordline capacitance y z word[15] DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 64
Number of Stages Decoder effort is mainly electrical and branching Electrical Effort: H = (3*3) / 10 = 9.6 Branching Effort: B = 8 If we neglect logical effort (assume G = 1) Path Effort: F = GBH = 76.8 Number of Stages: N = log 4 F = 3.1 Try a 3-stage design DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 65
Gate Sizes & Delay Logical Effort: G = 1 * 6/3 * 1 = Path Effort: F = GBH = 154 Stage Effort: Path Delay: Gate sizes: z = 96*1/5.36 = 18 y = 18*/5.36 = 6.7 A[3] A[3] A[] A[] A[1] A[1] A[0] A[0] 10 10 10 10 10 10 10 10 ˆ 1/3 f = F = 5.36 D= 3 fˆ + 1+ 4 + 1 =.1 落在.4 與 6 之間 y z word[0] 96 units of wordline capacitance y z word[15] DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 66
Comparison Compare many alternatives with a spreadsheet Design N G P D NAND4-INV 5 9.8 NAND-NOR 0/9 4 30.1 INV-NAND4-INV 3 6.1 NAND4-INV-INV-INV 4 7 1.1 NAND-NOR-INV-INV 4 0/9 6 0.5 NAND-INV-NAND-INV 4 16/9 6 19.7 INV-NAND-INV-NAND-INV 5 16/9 7 0.4 NAND-INV-NAND-INV-INV-INV 6 16/9 8 1.6 DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 67
Review of Definitions Term number of stages logical effort electrical effort branching effort effort effort delay parasitic delay delay Stage 1 g h = b = f f p = C C C out in on-path gh + C C on-path d = f + p off-path Path N G = g i H = C C out-path in-path B = b i F = GBH D F = f i P = p i D = d = D + P i F DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 68
Method of Logical Effort 1) Compute path effort ) Estimate best number of stages 3) Sketch path with N stages 4) Estimate least delay 5) Determine best stage effort F = GBH N = log 4 F 1 N D= NF + P fˆ = F 1 N 6) Find gate sizes C in i = gic fˆ out i DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 69
Limits of Logical Effort Chicken and egg problem Need path to compute G But don t know number of stages without G Simplistic delay model Neglects input rise time effects Interconnect Iteration required in designs with wire Maximum speed only Not minimum area/power for constrained delay DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 70
Summary Logical effort is useful for thinking of delay in circuits Numeric logical effort characterizes gates NANDs are faster than NORs in CMOS Paths are fastest when effort delays are ~4 Path delay is weakly sensitive to stages, sizes But using fewer stages doesn t mean faster paths Delay of path is about log 4 F FO4 inverter delays Inverters and NAND best for driving large caps Provides language for discussing fast circuits But requires practice to master DIC-Lec5 cwliu@twins.ee.nctu.edu.tw 71