Variation-Resistant Dynamic Power Optimization for VLSI Circuits

Process-Variation Variation-Resistant Dynamic Power Optimization for VLSI Circuits Fei Hu Department of ECE Auburn University, AL 36849 Ph.D. Dissertation Committee: Dr. Vishwani D. Agrawal Dr. Foster Dai Dr. Darrel Hankerson November 16, 2005

Outline Introduction Background Dynamic power dissipation Glitch reduction Previous LP model Process-variation variation-resistant resistant LP model Process variation Delay model LP model based on worst-case timing LP model based on statistical timing Input-specific optimization Without process-variation With process-variation Experimental results Conclusion Fei Hu, PhD Dissertation 2

Introduction Power component for CMOS circuits P avg = P static + P dynamic P dynamic 1/2 kc L V dd dd2 f clk Power dissipation problem For constant die size, total capacitance increases by 40% when transistor size is reduced by 70% Clock frequency is scaled up faster than the minimum feature size (MFS) Leakage power increases dramatically as MFS reduces into submicron region Architecture trend is towards programmability and reusability leads to more hunger for power Fei Hu, PhD Dissertation 3

VLSI Chip Power Density Power Density (W/cm 2 ) Source: Intel 10000 1000 100 10 1 4004 8008 8080 8086 Nuclear Reactor Hot Plate 8085 286 386 486 Rocket Nozzle P6 Pentium Sun s Surface 1970 1980 1990 2000 2010 Year Fei Hu, PhD Dissertation 4

Background Dynamic power dissipation P dyn = P switching + P short-circuit Switching power dissipation P switching = 1/2 kc L V dd dd2 f clk V dd 1 0 off 1 0 on i c C L Gnd Fei Hu, PhD Dissertation 6

Background Short-circuit power dissipation Short-circuit current when both PMOS and NMOS are on Very much affected by the rising and falling times of input signals significant when input rise/fall time much longer than the output rise/fall time Can be kept to a insignificant portion of P dyn Fei Hu, PhD Dissertation 7

Background Glitch reduction A important dynamic power reduction technique Static glitch Glitch power consumes 30~70% P dyn for typical circuits Related techniques Balanced delay Hazard filtering Transistor/Gate sizing Linear Programming approach Fei Hu, PhD Dissertation 8

Glitch reduction Original circuit Balanced path/ path balancing Equalize delays of all path incident on a gate Balancing requires insertion of delay buffers..5.5 1.5 1 1 Hazard/glitch filtering Utilize glitch filtering effect of gate Not necessary to insert buffer Fei Hu, PhD Dissertation 9

Glitch reduction Transistor/gate sizing Find transistor sizes in the circuit to realize the delay No need to insert buffer Suffers from nonlinearity of delay model large solution space, numeric convergence and global optimization not guaranteed Linear programming approach Adopt both path balancing and hazard filtering Find the optimal delay assignments of gates Use technology mappings to map the gate delay assignments to transistor/gate dimensions. Guaranteed optimal solution, a convenient way to solve a large scale optimization problem Fei Hu, PhD Dissertation 10

Previous LP approach 1 2 15 18 19 20 16 21 4 5 22 6 7 23 24 25 8 27 28 10 29 11 12 13 14 3 17 26 9 Timing window (t, T) t 6 t 5 T 6 T 5 t 7 T 7 d 7 Gate constraints: T 7 T 5 + d 7 T 7 T 6 + d 7 t 7 t 5 + d 7 t 7 t 6 + d 7 d 7 > T 7 t 7 Circuit delay constraints: T 11 maxdelay T 12 maxdelay Objective: Minimize sum of buffer delays Fei Hu, PhD Dissertation 11

Process-variation variation-resistant resistant optimization Motivation Gate delay assumed fixed in previous models Variation of gate delay in real circuits Environmental factors: temperature, V dd Physical factors: process variations Effect of delay variation Glitch filtering conditions corrupted Power dissipation increases from the optimized value Leakage variation possible, requires separate investigation Our proposal Consider delay variations in dynamic power optimization Only consider process variations (major source of delay variation) Fei Hu, PhD Dissertation 13

Process and delay variations Process variations Variations Variations due to semiconductor process V T, t ox, L eff, W wire, TH wire, etc. Inter-die variation Constant within a die, vary from one die to another die of a wafer or wafer lot Intra-die variation Variation within a die Due to equipment limitations or statistical effects in the fabrication process, e.g., variation in doping concentration Spatial correlations and deterministic variation due to CMP and optical proximity effect Fei Hu, PhD Dissertation 14

Process and delay variations Delay variation First order gate delay model CL Vdd CL Vdd Td = = I μc ( W ox ) L ( V dd V t ) 2 Gate delay sensitive to process-variations Related previous work Static timing analysis Worst case timing analysis Statistical timing analysis Power optimization under process-variations Voltage scaling, multi-v dd /V th considering critical delay variations Gate sizing using statistical delay model No work on glitch power optimization 2 Fei Hu, PhD Dissertation 15

Delay model and implications Random gate delay model D = D +Δ D +ΔD total, i nom, i inter,i intra,i Truncated normal distribution Assume independence Variation in terms of σ/d nom,i ratio Effect of inter-die variations Depends on its effect to switching activities Definition of glitch-filtering probability P glt = P {t 2 -t 1 < d} Signal arrival time t 1, t 2 Gate inertial delay d Theorem 1 states the change of P glt due to inter-die variation 1 k k Δ Pglt = erf( ) erf( ) 2 2 2 2+ 2( r k) erf(), the error function k, a path and gate dependent constant r, σ/d nom,i ratio for inter-die variations Fei Hu, PhD Dissertation 16

Delay model and implications Effect of inter-die variations For a large inter-die variation,, r = 0.15, ΔP glt < 5.3 10 10-3 Negligible effect on switching activity Fei Hu, PhD Dissertation 17

Delay model and implications Process-variation variation-resistant resistant design Can be achieved by path balancing and glitch filtering Critical delay may increase Theorem 2 states that a solution is guaranteed only if circuit delay d is allowed to increase Proved by example, assuming 10% variation 2.1 3.9 Fei Hu, PhD Dissertation 18

LP model based on worst-case timing Timing model...... Fei Hu, PhD Dissertation 19

LP model based on worst-case timing Constraints Gate constraints Tb Tb Tb Glitch filtering constraints Tb tb < d (1 3 r) α where r < 0.33 (33%) Delay constraints for POs Ta D Parameter i i i Ta ; 1 Ta j; Ta ; k tb tb tb i i i i i i i max ta ; 1 ta j; ta ; k Tai = Tbi + di (1 + 3 r); ta = tb + d (1 3 r); i i i r, σ/d nom,i ratio D max, circuit delay parameter α,, optimism factor [1, ]; 1 all glitches filtered, no glitch filtered Objective Minimize #buffer inserted sum of buffer delays Fei Hu, PhD Dissertation 20

LP model based on statistical timing Worst-case timing tends to be too pessimistic Statistical timing model with random variables Gate 1 ta 1 Ta 1...... Gate j ta j Ta j Gate i ta i Ta i ta k Ta k d i Gate k tb i Tb i Fei Hu, PhD Dissertation 21

LP model based on statistical timing Minimum-maximum maximum statistics needed for tb i, Tb i Previous works tb Tb i 1 j k Min, Max for two normal random variable not necessarily distributed ted as normal Can be approximated with a normal distribution Requiring complex operations, e.g., integration, exponentiation, etc. Challenges for LP approach = Min( ta, ta, ta ); = Max( Ta, Ta, Ta ); i 1 j k Require simple approximation w/o nonlinear operations Our approximation for C=Max( Max(A,B), A, B, and C are Gaussian RVs μc = Max( μa, μb) μ + 3σ = Max( μ + 3 σ, μ + 3 σ ) C C A A B B Fei Hu, PhD Dissertation 22

LP model based on statistical timing Min-Max Max statistics approximation error Negligible when μ A -μ B > 3(σ A + σ B ) Largest when μ A =μ B P CDF A CDF B Actual CDF for Max(A,B) Approximated CDF for Max(A,B) μ = Max( μ, μ ) C A B 1 σ C = Max( μa + 3 σ A, μb + 3 σ B ) μc 3 ( ) A B x Fei Hu, PhD Dissertation 23

LP model based on statistical timing Variables Timing, delay variables with mean μ and std dev σ Auxiliary variables, T, t, W = Tb tb, μ, σ Constraints Gate constraints Tb tb i i i W W i i i i Timing window at the inputs for a two-input gate i μ μ ; T μ + 3 σ ; μ μ ; t μ 3 σ ; Tb Ta Tb Ta Ta i 1 i 1 1 μ μ ; T μ + 3 σ ; σ Tb Ta Tb Ta Ta i 2 i 2 2 = ( T μ )/3; Tb Tb Tb i i i tb ta tb ta Ta i 1 i 1 1 μ μ ; t μ 3 σ ; σ tb ta tb ta Ta i 2 i 2 2 = ( μ t )/3; tb tb tb i i i Timing window at outputs μ = μ + μ ; σ = k( σ + r μ ); Ta Tb d Ta Tb d i i i i i i μ = μ + μ ; σ = k( σ + r μ ); ta tb d ta tb d i i i i i i Fei Hu, PhD Dissertation 24

LP model based on statistical Constraints Gate constraint Linear approximation timing σ = σ + ( r μ ) σ = k( σ + r μ ) 2 2 Ta Tb d Ta Tb d i i i i i i k [0.707, 1]; choose k=0.85, since Glitch filtering constraints μ = μ μ W Tb tb i i i σ = k( σ + σ ); W Tb tb i i i μ μ > 3 k( σ + r μ ); d W W d i i i i ; A+ B + + 2 3σ P 2 2 A B A B ; Circuit delay constraint μ (1 + 3 r) Ta i D max d i -W i Fei Hu, PhD Dissertation 25

LP model based on statistical timing Parameter ratio max, circuit delay parameter α,, optimism factor r, σ/d nom,i D max Objective μ μ > 3 k( σ + r μ ) α; d W W d i i i i α=1, no relaxation α<1, optimistic about the actual glitch width α=0, reduce to previous model Minimize #buffer inserted sum of buffer delays Fei Hu, PhD Dissertation 26

Input-specific optimization Motivation Previous LP models guarantees glitch filtering for any input vector sequence T i - t i < d i for all gates Redundancy in optimization Insertion of more buffers Increased the overhead in power/area In reality, circuit under embedded environments Optimization for input vector sequence that is possible to the circuit, e.g., functional vectors Same reduction in power dissipation w/ less trade-offs in overheads Fei Hu, PhD Dissertation 28

Input-specific optimization Glitch generation pattern Input vector pair that can potentially generate a glitch AND gate example: 1 1 1 0 1 0 0 1 0 1 0 0 1 0 Glitch generation probability P g [i] Probability glitch-generation generation pattern occurs at input of gate i Steady state signal values match the pattern Fei Hu, PhD Dissertation 29

Input-specific optimization Application to Previous model w/o process-variation Static optimization Only static glitches/hazards considered Relaxation of constraints Relax glitch filtering constraints where glitches unlikely happen T i - t i < d i => (T i t i )*β i < d i Selective relaxation 0 if Pg [ i] = 0 βi = 1 if Pg [ i] > 0 Generalized relaxation β = i P [] i g 1 e τ Fei Hu, PhD Dissertation 30

Input-specific optimization Application to process-variation variation-resistant resistant LP model based on statistical timing Static optimization Relaxation of constraints μ > [ μ + 3 k( σ + r μ ) α] β ; Selective relaxation Generalized relaxation Tuning factor Original objective Current objective d W W d i i i i i Minimize d j; ( j buffers) j 1 Minimize d j + TF ( di); ( j buffers, i other gates) N j i Fei Hu, PhD Dissertation 31

Input-specific optimization Why need a tuning factor Dominating path affected critical delay distribution Dominating path 41 Can be [1,41] 1 0 1 0 1 1 Fei Hu, PhD Dissertation 32

Experimental results Experimental procedure Flow chart Power estimation Event driven logic simulation Fanout weighted sum of switching activities Variations of C L and V dd ignored Monte-Carlo simulation with 1,000 samples of delays under process-variation Results analysis Un-Opt., unit-delay circuit Opt, previous optimization Opt1, Proc-var var-rstrst optimization worst-case timing Opt2, Proc-var var-rstrst optimization statistical timing D max r, Circuit Data extraction AMPL Circuit generation Logic simulations Results Constraint set data Gate delays Optimized circuit LP models Fei Hu, PhD Dissertation 34

Experimental results small variation Power dissipation under no process variation UnOpt Opt (w/o proc var.) Opt1 (worst case proc) Opt2 (statistical proc) c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 Pwr. Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.53 0.55 0.74 0.74 0.60 0.59 0.56 0.56 0.13 0.13 0.52 0.52 Buf. 95 66 80 48 63 29 224 160 84 54 157 26 219 103 281 113 881 864 369 62 maxdelay 17 34 11 22 24 72 24 72 40 120 32 96 47 141 49 147 124 372 43 129 Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.53 0.53 0.79 0.75 0.59 0.61 0.62 0.58 0.15 0.14 0.64 0.56 Buf. 96 91 88 88 45 37 296 296 68 92 244 80 228 152 228 130 801 922 180 162 Dmax 20 40 13 26 28 83 28 83 46 138 37 111 55 163 57 170 143 428 50 149 Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.52 0.52 0.73 0.73 0.59 0.59 0.55 0.55 0.14 0.13 0.52 0.52 Buf. 99 91 97 129 76 37 305 273 136 198 313 168 306 303 401 460 1685 1213 464 879 Dmax 20 40 13 26 28 83 28 83 46 138 37 111 55 163 57 170 143 428 50 149 Fei Hu, PhD Dissertation 35

Experimental results small variation Power distribution under 5% inter-die, 5% intra-die variation Circuit c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 Maxdelay 17 34 11 22 24 72 24 72 40 120 32 96 47 141 49 147 124 372 43 129 Un-Opt Mean Max. Dev. Pwr. (%) 8 17.5 8 17.5 6 12.9 6 12.9 3 7.1 3 7.1 1.10 18.1 1.10 18.1 1.15 2 1.15 2 1.17 21.8 1.17 21.8 1.15 18.9 1.15 18.9 1.12 14.9 1.12 14.9 1.46 49.9 1.46 49.9 1.17 19.6 1.17 19.6 Opt (w/o proc var.) Mean Max. Dev. Pwr. (%) 0.78 12.8 0.76 8.2 0 12.6 0.99 12.6 0.62 23.1 0.57 12.8 0.99 10.6 0.98 8.8 0.64 28.6 0.64 21.5 0.80 11.6 0.77 6.1 0.66 15.2 0.62 7.2 0.62 13.8 0.60 10.3 0.27 131.6 0.26 128.3 0.57 12.4 0.56 9.3 Opt1 (worst case proc) Mean Max. Dev. Pwr. (%) 0.75 7.0 0.74 0.1 0.95 0.7 0.94 0.0 0.58 13.9 0.55 1.1 0.96 5.5 0.93 0.3 0.62 22.8 0.54 5.9 0.81 5.5 0.78 5.2 0.65 12.9 0.63 5.1 0.67 9.9 0.61 6.8 0.28 105.9 0.23 76.8 0.72 13.3 0.58 5.1 Opt2 (statistical proc) Mean Max. Dev. Pwr. (%) 0.75 4.5 0.74 0.1 0.95 0.7 0.94 0.1 0.55 7.5 0.54 0.95 4.2 0.93 0.1 0.58 21.6 0.54 6.5 0.75 4.8 0.74 1.8 0.63 9.7 0.59 1.3 0.59 9.1 0.56 3.7 0.24 93.6 0.18 56.0 0.57 11.8 0.53 3.5 Fei Hu, PhD Dissertation 36

Experimental results small variation Power timing analysis Example c432 maxdelay=17 maxdelay=26 Complete suppression of power variation Fei Hu, PhD Dissertation 37

Experimental results small variation Critical delay distribution Nominal delay Max. Deviation Similar nominal delay Reduced variation by Opt2 for c880, c2670, c5315, c7552 Fei Hu, PhD Dissertation 38

Experimental results large variation Power dissipation under no process-variation c432 c499 c880 c1355 c1908 c2670 c3540 c5313 c6288 c7552 Un-opt. Pwr. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Opt (w/o proc var.) Pwr. Buf. maxdelay 0.74 66 34 0.74 58 68 0.94 48 22 0.94 0 33 0.54 35 48 0.54 30 120 0.93 192 48 0.93 128 120 0.53 62 80 0.54 34 200 0.74 34 64 0.74 9 160 0.59 139 94 0.59 78 235 0.56 167 98 0.56 53 245 0.13 870 228 0.13 857 620 0.52 91 86 0.52 44 215 Opt1 (worst case proc) Pwr. Buf. Dmax 0.75 87 50 0.74 81 99 0.97 88 32 0.97 0 48 0.58 36 70 0.59 29 174 0.95 264 70 0.96 264 174 0.55 41 116 0.56 12 290 0.80 39 93 0.78 95 232 0.62 149 137 0.65 52 341 0.66 93 143 0.60 144 356 0.14 1303 331 0.13 939 899 0.69 64 125 0.60 622 312 Opt2 (statistical proc) Pwr. Buf. Dmax 0.74 88 50 0.74 106 99 0.94 88 32 0.94 129 48 0.54 57 70 0.54 62 174 0.93 305 70 0.93 305 174 0.52 135 116 0.52 190 290 0.74 249 93 0.73 211 232 0.59 281 137 0.59 311 341 0.55 399 143 0.55 418 356 0.13 1121 331 0.13 1473 899 0.52 481 125 0.52 645 312 Fei Hu, PhD Dissertation 39

Experimental results large variation Power distribution under 15% intra-die and 5% inter-die variation Circuit c432 c499 c880 c1355 c1908 c2670 c3540 c5313 c6288 c7552 Maxdelay 34 68 22 33 48 120 48 120 80 200 64 160 94 235 98 245 228 620 86 215 Un-opt Mean Max. Dev. Pwr. (%) 9 19.8 9 19.8 7 14.0 7 14.0 4 8.0 4 8.0 1.13 21.8 1.13 21.8 1.16 23.1 1.16 23.1 1.19 25.4 1.19 25.4 1.16 20.7 1.16 20.7 1.13 16.5 1.13 16.5 1.45 52.2 1.45 52.2 1.17 21.9 1.17 21.9 Opt (w/o proc var.) Mean Max. Dev. Pwr. (%) 0.78 12.6 0.77 10.3 2 15.3 0.99 10.2 0.62 26.5 0.60 22.7 6 19.7 5 18.8 0.72 49.6 0.66 32.3 0.81 13.6 0.80 11.2 0.67 19.5 0.66 16.1 0.67 24.6 0.64 19.0 0.43 274.3 0.41 264.0 0.64 25.8 0.60 20.2 Opt1 (worst case proc) Mean Max. Dev. Pwr. (%) 0.78 12.1 0.75 6.1 0.98 1.7 0.97 1.4 0.63 15.7 0.60 5.6 0.98 7.3 0.97 1.7 0.66 30.1 0.62 18.8 0.90 16.0 0.82 8.6 0.69 16.9 0.71 11.7 0.74 16.3 0.66 13.9 0.36 193.4 0.31 161.5 0.78 16.0 0.65 11.2 Opt2 (statistical proc) Mean Max. Dev. Pwr. (%) 0.76 11.1 0.74 3.7 0.95 2.0 0.95 0.59 18.2 0.55 8.6 0.98 10.2 0.94 3.0 0.64 35.8 0.58 21.4 0.80 13.6 0.76 6.2 0.66 17.8 0.62 10.1 0.63 20.8 0.60 13.4 0.38 223.8 0.26 125.3 0.59 18.7 0.56 11.8 Fei Hu, PhD Dissertation 40

Experimental results large variation Critical delay distribution Nominal delay Max. Deviation (%) Similar nominal delay Reduced delay variation by Opt2 Fei Hu, PhD Dissertation 41

Experimental results input-specific optimization Application to Opt under no process-variation, IS-Opt c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 maxdelay 34 68 22 33 48 120 48 120 80 200 64 160 94 235 98 245 228 620 86 215 Un-Opt Pwr. Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.53 0.54 0.74 0.74 0.59 0.59 0.56 0.56 0.13 0.13 0.52 0.52 Opt (w/o proc var.) Delay 34 68 22 33 51 121 48 121 82 203 65 163 95 239 100 249 226 620 89 220 Buffers 66 58 48 0 35 30 192 128 62 34 34 9 139 78 167 53 870 857 91 44 IS-Opt (input-specific w/o proc) Pwr. 0.74 0.74 0.94 0.95 0.54 0.54 0.93 0.93 0.54 0.53 0.74 0.74 0.59 0.59 0.56 0.56 0.13 0.13 0.52 0.52 Delay 35 69 22 33 49 122 48 120 86 204 66 162 101 239 104 250 228 620 88 221 Buffers 66 41 33 0 32 24 113 25 52 3 30 1 122 73 170 52 870 853 84 38 Fei Hu, PhD Dissertation 42

Experimental results input-specific optimization Application to Opt2 under process-variation, IS-Opt2 under 15% intra-die and 5% inter-die variation Cir. D Max c432 50 99 c499 32 48 c880 70 174 c1355 70 174 c1908 116 290 c2670 93 232 c3540 137 341 c5315 143 356 c6288 331 899 c7552 125 312 Un-opt. Nom. Pwr. Nom. Pwr. 0.74 0.74 0.94 0.94 0.54 0.54 0.93 0.93 0.52 0.52 0.74 0.73 0.59 0.59 0.55 0.55 0.13 0.13 0.52 0.52 Opt2 (statistical proc) IS-Opt2 (input-specific statistical proc) Mean Max Dev. No. Nom. Mean Max Dev. No. Pwr. (%) Buf. Pwr. Pwr. (%) Buf. 0.76 11.1 88 0.74 0.76 9.3 81 0.74 3.7 106 0.74 0.74 3.3 76 0.95 2.0 88 0.94 0.95 1.9 88 0.95 129 0.94 0.95 1.8 58 0.59 18.2 57 0.54 0.59 20.4 38 0.55 8.6 62 0.54 0.56 9.0 38 0.98 10.2 305 0.93 1 13.1 253 0.94 3.0 305 0.93 0.95 4.7 160 0.64 35.8 135 0.52 0.64 34.7 107 0.58 21.4 190 0.52 0.57 18.4 104 0.80 13.6 249 0.73 0.79 11.3 186 0.76 6.2 211 0.73 0.75 4.3 79 0.66 17.8 281 0.59 0.65 15.6 247 0.62 10.1 311 0.59 0.61 7.4 188 0.63 20.8 399 0.55 0.63 2 389 0.60 13.4 418 0.55 0.60 13.2 413 0.38 223.8 1121 0.13 0.38 225.2 1115 0.26 125.3 1473 0.13 0.26 125.5 1243 0.59 18.7 481 0.52 0.58 18.1 389 0.56 11.8 645 0.52 0.55 10.9 520 Fei Hu, PhD Dissertation 43

Experimental results input-specific optimization Trade-off by generalized relaxation c432 circuit with varying τ value Reduction of #buffers with degradation of power distribution Fei Hu, PhD Dissertation 44

Experimental results input-specific optimization Critical delay Nominal delay Max. deviation Similar performance for Opt2 and IS-Opt2 Fei Hu, PhD Dissertation 45

Conclusions Proposed a dynamic power optimization technique that is resistant t to the process variation Consider process-variation in terms of the delay variations inter-die and intra-die variations Prove inter-die variation has negligible effect on switching activity and power Construct two new LP models Worst case timing analysis Statistical timing analysis Input-specific optimization to reduce number of buffers Circuit optimized for certain input vector sequence Experimental results Complete suppression of power variation for small circuit and variations Significant reduction of power and delay variations for larger circuit c and variations 53% reduction in power deviation, 40% reduction in delay deviation under 15% intra-die and 5% inter-die variation Input-specific optimization reduces trade-off (buffers) significantly w/ equivalent power and delay performance IS-Opt2 vs. Opt2, Up to 63% reduction of buffer Fei Hu, PhD Dissertation 47

Questions For more questions, contact me at hufei01@auburn.edu Fei Hu, PhD Dissertation 48