EE M216A.:. Fall Lecture 5. Logical Effort. Prof. Dejan Marković

EE M26A.:. Fall 200 Lecture 5 Logical Effort Prof. Dejan Marković ee26a@gmail.com Logical Effort Recap Normalized delay d = g h + p g is the logical effort of the gate g = C IN /C INV Inverter is sized such that R INV equals the gate s drive strength h is the electrical effort h = C OUT /C IN The combination of g h is essentially our fan out definition p is the parasitic effort p = C SELF/ /C INV Typically ~ for an inverter May have different g, p for each type of gate structure The g, p may differ per input, and for pull up/down EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 2 2

A Side Note: Total Gate Effort (g TOT ) One way of evaluating a gate is to look at the total logical effort of the gate g TOT = C IN_TOT /C INV A sum of the total input capacitance Example: 3 input NAND Equivalent inverter is 4:2 g INPUT = 0/6=5/3 g TOT = 30/6 = 5 (or 3 g INPUT ) W P :W N = 4:6 The total gate effort is not very useful to calculate delay But it is an indication of the cost of the gate Can be very useful in gate mapping of logic synthesis To find which gate is best to use to map a given Boolean expression EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 3 3 A Note on Asymmetry The gates used in examples so far have been symmetric All inputs are essentially identical Some are bundled P:N ratio is approximately equal to the mobility ratio Same as the reference inverter The truth is that inputs of most gates are not symmetric Different inputs may see different capacitances Even series stacked gates may not be the same size Pull up and pull down is rarely equal resistance Call this skewed gates P:N sizing for optimal delay is β = root(µ) Inverter in a domino gate often favors the PMOS to improve speed EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 4 4

Asymmetric Examples PMOS NOR: different sizing Equivalent inv is still 6:2 C ina = 0, C inb = 26 g A (0/8=5/4) not equal to g B (26/8=3/4) B input is much worse (for a reason) For more complex gates: Equivalent inv is now 9:3 g EDC =2/2=7/4, g A B =30/2=5/2 e 9 d 9 c 9 a b 8 8 B A 24 8 2 2 NOR Output Same β = µ = 3 e d c a 2 2 2 2 2 b The larger g is BAD! (More effort to do logic) EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 5 5 What is the Reference Inverter? The assumption of logical effort is that the reference inverter has equal rise and fall delays β =µ The implication for different pull up and pull down resistance is that the rising and falling delays are not equal Similar to the parasitic delay calculation earlier d UP = g UP h + p UP d DN = g DN h + p DN Since static CMOS gates are inverting, the transitions through subsequent gates must be alternating Use an average logical effort (and parasitic effort) a good idea g AVG = (g UP + g DN )/2 EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 6 6

Skewed P:N Ratio Gates Assume: β =.7, µ = 3 Example: Inverter Pull up: reference inverter is sized P:N of W P :W P /µ g UP = W P (+/.7)/W/ P (+/µ) / = 9.9 Pull down: reference inverter is sized P:N of µw N :W N g DN = W N (+.7)/W N (+ µ) = 0.675 g AVG = 0.93 (instead of ) Example: 2 input NAND Gate g UP = W P (+2/.7)/W P (+/µ) =.63 g DN = W N (2+.7)/W N (+ µ) ) = 0.925 Average =.28 (instead of 5/4) Skewing rise/fall transition in gates may improve delay (because reference gate wasn t optimized for min t p ) EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 7 7 Velocity Saturation Velocity saturation complicates a little Need to account for the change in resistance Assume reference inverter is W P =2W N Example: a 2 input NAND gate with sizing W P =2, W N =2 Assuming R N_nostack =(4/3)R N_stack, R P_nostack =(6/5)R P_stack The equivalent inverter with same drive resistance Pull up: equiv inv = 2:, g UP =4/3 (same as before) Pull down: equiv inv = 8/3:4/3, g DN =4/4= This makes sense because velocity saturation allows the transition through the series stacking be faster (more current) EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 8 8

Choosing a Reference Inverter The reference inverter can actually be any β Logical effort can be used for optimally sizing a logic chain Part of optimal sizing is to add inverters to increase N (the number of logic stages) It is inconvenient to add inverters that do not have a LE of A common practice is to choose the reference inverter β to be the one that is used. Use a normalization to adjust the g. Calculate g INV(β =µ) assuming the reference inverter g AVG = Example: g REF =withβ β =7(µ.7 =3) g AVG_INV(β=µ) =.6 for β = 3 (µ = 3) A 2 input NAND β =.7 has g =.37 A 2 input NAND β = 3 has g =.45 =.25.6 g of 2 input NAND with reference inverter of β = µ is.25 EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 9 9 Generalization for Multistage Networks Same concept applies at the path level Stage effort: f i = g i h i Path electrical effort: H path = C out /C in Path logical effort: G path = g g 2 g N Branching effort: B path = b b 2 b N Ignore this Path effort = G path H path B path for now Path delay D = Σd i = Σp i + Σg i h i EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 0 0

Total Effort A path can have a total effort G = Path Logical Effort G PATH = g i H = Path Electrical Effort F = Path Effort Consider path effort as fanout. Total fanout should be a product of all fanouts. Treat the path as a single gate H F PATH PATH = N i = C = C N i= OUT IN f i EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide Example: Total Effort Calculations G = (4/3) 2 (5/3) = 3 H = 60/4 = 5 F = 45; F?= GH (yes for this case) G H F PATH PATH PATH = N i= C = C = N i= g i OUT IN f i W P :W N =2:2 g=4/3 p =2 g=4/3 p=2 2 W P :W N =6:6 W P :W N =24:6 3 C out =60 g=5/3 p =2 f =4 f 2 = 3.33 f 3 = 3.33 EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 2 2

Example: Random Multi stage Networks Delay of a multi stage network = sum of stage delays Path Effort Delay D F = f i Path Parasitic Delay P i = i p i Total Path Delay W P :W N =2:2 g = 4/3 p = 2 g = 4/3 p = 2 2 W P :W N = 6:6 D = di = DF + P i W P :W N = 24:6 3 g = 5/3 p = 2 C out = 60 f = 4 f 2 = 333 3.33 f 3 = 3.33 D F = 0.66, P = 6 D = 6.66 (τ delays) EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 3 3 Add Branching Effort Ratio of total to on path capacitance How much more current is needed to supply on path (given that some current flows off path) b = C on-path + C off-path C on-path c load EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 4 4

Branching Example # 5 5 5 90 90 g = h = F = f = f 2 = F = 90/5 = 8 8 (wrong!) (5+5)/5 = 6 90/5 = 6 36, not 8! Introduce new kind of effort to account for branching: Branching Effort: Path Branching Effort: b = B = C on path + C off path Π b i C on path Now we can compute path effort: H = g h b EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 5 5 Multistage Networks with Branching General logical effort formulation Stage effort: f i = g i h i Path electrical effort: H path = C out /C in Path logical effort: G path = g g 2 g N Branching effort: B path = b b 2 b N Path effort = G path H path B path Branching Path delay D = Σd i = Σp i + Σg i b i h i EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 6 6

Branching Example #2 C in = 4 W P :W N = 6:6 W P :W N = 24:6 W P :W N = 2:2 g = 4/3 W P :W N = 24:6 p = 2 g = 4/3 p = 2 2 W P :W N = 6:6 3 g = 5/3 p = 2 C out = 60 branching w/o bran. f = 8 f 2 = 6.66 f = 4 f 2 = 3.33 f 3 = 3.33 f 3 = 3.33 For circuits with branching: G PATH is the same = 3 H PATH is the same = 5 F PATH differs h = 24/4= 6, h 2 = 60/2=5, h 3 =2 F PATH = 4/3 6 4/3 5 5/3 2 = 77.8 F is no longer GH New F PATH with branching: F PATH = G B H EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 7 7 Interconnects Capacitance from wires are the most difficult to deal with Fixed load so intuitively, we would increase fanout Short wires small parasitic capacitance Treat them as increasing p for each gate Not exact but accounts for the effect Long wires large capacitance (dominates gate loading) Size of driving gate is as if driving a large C (inverter chain) Size of receiving gate does not impact branching It is typically big and limited by area, power, and function Medium wires most difficult. Cap similar to gate loading Brute force is to write delay as function of gate and wire capacitances N th order polynomial and differentiate More realistic method is to iterate EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 8 8

Handling Wires & Fixed Loads i C L C w EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 9 9 Summary Delay and/or power of a logic network depend significantly on the relative sizes of logic gates (not transistors within a gate) Inverter buffering is a simple example of the analysis The analysis leads to ~FO4 as being optimal fanout for driving larger capacitive loads To generalize analysis of delay, we introduce logical effort Delay normalized by inverter delay, d = g h + p g and p are characteristics of a logic gate that depends on its structure and does not depend on gate size. May have different g s and p s for different inputs and PU / PD Simplify by using g AVG and ignoring C s of intermediate nodes Once a table of g s and p s are created for the catalog of gates, delay can be calculated quickly and easily Next we will look at how to size a network (instead of just analyzing it) EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 20 20

Optimum Effort per Stage When each stage bears the same effort: Fanout of each stage: Complex gates should drive smaller load!!! Minimum path delay EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 2 2 Gate Sizing Example (/2) From: David Harris First, compute path effort: H path The optimal stage effort is: EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 22 22

Gate Sizing Example (2/2) We can now size the gates: 20 5 y z = = 3.8 x = = 4. 5.45 3.45 4 z x y = = 2.7 C in = = 0 3.45.45 The total normalized delay is (assuming p inv = ): EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 23 23 Sizing Example with Branching Size the gates for optimum delay Once the path effort is determined, it is quite easy to determine the appropriate gate sizes Start from the output of a path Work backwards to the input C out _ ig C in _ i = f Check your work if the input is the same as the specification Assuming each unit W has capacitance of unit C opt i B = b b 2 =4 C in = 4 g = 4/3 p = 2 Gate 2 (dup) Gate 3 (dup) g = 4/3 p = 2 2 3 g = 5/3 p = 2 C out = 60 F = 77.8, f opt = 5.62 C in3 = 60*5/3*/5.6 = 7.9 C in2 = b 2 *7.9*4/3*/5.6 = 8.5 C in = b *8.5*4/3*/5.6 = 4 W P3 = 4/5*7.9 = 4.3 W P2 = ½*8.5 = 4.25 EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 24 24

Branches: Same # Stages, Different Loads Example: 2 paths with the same # of stages but different loads Optimal system has all paths with equal delay Branching for path, b = + α A ti i th t ~ Assumption is that p ~ p 2 C in Path C in2 C out C out2 D = D 2 N N F + p = N N F BGC out F = = F2 Cin Cin 2 B2G2C = α = C B G C in 2 out + p 2 B2G2C = C out 2 in 2 out 2 EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 25 25 Path Effort Estimate With equal stage branches, the Path Effort can be estimated without knowing each stage s, b H tot = (C out +C out2 )/C in (with B = ) Simple example for Path : G = g NAND H = C out /C in B = (+C out2 /C out ) C in Since F = F 2 and G = G 2, B = (+C out2 /C out ) C in C in2 Path C out C out2 F = G H B = g NAND (C out +C out2 )/C in Same as F = G H tot The errors occur with different g and p for the gates in the two different paths EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 26 26

Unequal Length Branches (Not Straightforward) If both branches are long, then N =N 2 is an o.k. assumption Otherwise, f opt differs between two paths Solving precisely, example: C out = H CC in 2d inv p = C out /C C (d inv p) 2 in = C C out2 = C out2 /C 2 2 d nand 2 d d nand = g nand (C + C 2 )/C inv d inv in D = d nand + 2d inv = g nand (C out /(2d inv p) + C out2 /(d inv p) 2 ) + 2d inv Take partial derivative of D/ (d inv ) Not easy so ignore p to simplify If g NAND = 4/3, H = H 2 = 3: d nand = 2.35, d inv =.80 The per stage f = g h (no p) is different per stage Most branches are relatively long or not critical path C 2d inv = H 2 C in EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 27 27 Re convergent Paths Very similar to unequal branching, another logic structure that adds complexity to Logical Effort is recombined branches. Typically constrains the sizing problem Example: ignoring parasitics. Let x = C in = f a = (y+z)/x f b = (3z/y) f c = g NAND C out /z 2 variables C in x y f a f b f c 2x NANDz z C out Constrain with f a = f b = f c, 2 vars equations (directly solve) Any constraints are possible Increase variables by introducing more buffering (parallel to y) EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 28 28

Logic Opt. Example: 2 Stage; 8 Input AND Assume symmetric 8 input AND (β = µ =2, 3W o is unit C) g NAND =0/3, p NAND =8 for an 8 input NAND g INV =, p INV = C out =00, C in = Logical effort G = 0/3, B =, H = 00 F = 333.3 For 2 stages, f opt =8.3 h 2 = 8.3, C in2 = 5.5, W P = W o h = 5.5, C in =, W P = 0.6W o Delay = 36.66 + 9 = 45.6 x8 G G 2 Buffering For 3 stages, f opt = 7 ( extra inverter), Delay = 3 For 4 stages, f opt = 4.3 (2 extra inverters), Delay = 28.2 For 5 stages, f opt = 3.9 (closer to optimal of 3.6), Delay=28 For 6 stages, f opt = 2.63 (below optimal of ~3.6), Delay=28.8 EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 29 29 C out Logic Opt. Example: Multi 2 Input Gate 8 AND Many ways to implement this same function Use a tree of fewer input AND gates (A 0 A )(A 2 A 3 ) If multiple ANDs (as in a memory decoder), then partial results can be shared EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 30 30

2 input Implementation C in = G 2 G 3 C out = 00 Assuming that 3W o has capacitance of unit C F = B G H = 4/3 3 00 = 235 f opt = 2.48 (too small) D = 6 2.48 + 3 2 + 3 = 4.88 + 9 = 23.88 Still better than 8 input NAND Optimal sizing G 7 : C in7 = 00g inv /f = 40, W P =80W 0 G 6 : C in6 = C in7 g nand /f = 2.5, W P =32W 0 G 5 : C in5 = C in6 g inv /f = 8.7, W P = 7.4W 0 G 4 : C in4 = C in5 g nand /f = 4.7, W P = 7W 0 G 3 : C in3 = C in4 g inv /f =.88, W P = 3.8W 0 Double check G 2 : C in = C in3 g nand /f =, W P =.5W 0 G 4 G 5 G 6 G 7 out EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 3 3 4 input Implementation out C in = G 2 G 3 G 4 G 5 C out = 00 Assuming that unit for C = 3W o F = B G H = 4/3 6/3 00 = 266 f opt = 4.04 (a tad high) D = 4 4.04 + 4 + 2 + 2 = 6.:. 6 + 8 = 24.6 Slightly worse than 2 input! Due to self loading Optimal sizing G 5 : C in5 = 00g inv /f = 24.75, W P =50W 0 G 4 : C in4 = C in5 g nand /f = 8.6, W P =2.5W 0 G 3 : C in3 = C in4 g inv /f = 2.02, W P =4W 0 Double check G 2 : C in =C in3 g nand4 /f =, W P =.0W 0 EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 32 32

Example: 2 NOR Implementation C in = G 2 C out = 00 Assuming that unit for C = 3W o F= BGH = 4/3 2 *5/3*00 = 296 f opt = 4.4 (a tad high) D = 4 4.4 + 3 2 + = 6.56 + 7 = 23.6 (Best one!) Optimal sizing G 5: C in5 = 00g inv /f = 24., W P =48W 0 G 4 : C in4 = C in5 g nand /f = 7.76, W P =.6W 0 G 3 : C in3 = C in4 g nor /f = 3.25, W P =7.5W 0 Double check G 2 : C in = C in3 g nand /f =, W P =.5W 0 Lesson: use close to optimal f, use gates with small p G 3 G 4 G 5 out EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 33 33 Logical Effort Design Flow Compute the path effort: Path Effort = F = g h b Find the best number of stages: N* ~ log 4 (PathEffort) Compute the stage effort: se* = (PathEffort) /N Working from either end, determine gate sizes: Reference: Sutherland, Sproull, Harris, Logical Effort, (Morgan-Kaufmann 999) EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 34 34

Summary of LE Design Methodology. Draw network 2. Buffer non critical paths with minimum sized gates Minimize loading on critical path Simplifies sizing of non critical path 3. Estimate total effort along each path (without branching) 4. Verify that the number of stages is appropriate Add inverters if f opt > 5 5. Assign branch ratio of each branch Estimate based on the ratio of the Effort of the paths Ignore paths that have little effect (i.e. min sized) Include wire capacitances 6. Compute delays for the design (include parasitic delay) Adjust branching ratios (especially with wire capacitance) Repeat if necessary until delay meets specification 7. Re optimize logic network if f opt is small (Return to step 3) EEM26A.:. Fall 200 Lecture 5: D. Logical Markovic Effort / Slide 35 35