CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 14: Designing for Low Power [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp12 CMPEN 411 L14 S.1
Reminders Next lecture Dynamic logic - Reading assignment Rabaey, et al, 6.3 Sp12 CMPEN 411 L14 S.2
Review: CMOS Power Equations P = C L V DD2 f + t sc V DD I peak f + V DD I leak Dynamic power Short-circuit power Leakage power Sp12 CMPEN 411 L14 S.3
Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active (Dynamic) Leakage (Standby) Logic design Reduced V dd TSizing Multi-V dd Multi-V T Stack effect Pin ordering Clock Gating Sleep Transistors Multi-V dd Variable V T Input control DFS, DVS (Dynamic Freq, Voltage Scaling) Variable V T Sp12 CMPEN 411 L14 S.4
Transistor Sizing for Minimum Energy Device sizing COMBINED with supply voltage reduction is a very effective way to reduce the energy consumption of a logic network Device sizing affects dynamic energy consumption gain is largest for networks with large overall effective fan-outs (F = C L /C g,1 ) Sp12 CMPEN 411 L14 S.5
Dynamic Power Consumption is Data Dependent Switching activity, P 0 1, has two components A static component function of the logic topology A dynamic component function of the timing behavior (glitching) 2-input NOR Gate A B Out 0 0 1 0 1 0 1 0 0 1 1 0 Static transition probability P 0 1 = P out=0 x P out=1 = P 0 x (1-P 0 ) With input signal probabilities P A=1 = 1/2 P B=1 = 1/2 NOR static transition probability = 3/4 x 1/4 = 3/16 Sp12 CMPEN 411 L14 S.7
NOR Gate Transition Probabilities Switching activity is a strong function of the input signal statistics P A and P B are the probabilities that inputs A and B are one A B A B C L 0 P A 1 0 1 P B P 0 1 = P 0 x P 1 = (1-(1-P A )(1-P B )) (1-P A )(1-P B ) Sp12 CMPEN 411 L14 S.8
Transition Probabilities for Some Basic Gates P 0 1 = P out=0 x P out=1 NOR (1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - P B ) OR (1 - P A )(1 - P B ) x (1 - (1 - P A )(1 - P B )) NAND P A P B x (1 - P A P B ) AND (1 - P A P B ) x P A P B XOR (1 - (P A + P B - 2P A P B )) x (P A + P B - 2P A P B ) 0.5 0.5 A B X Z For X: P 0 1 = For Z: P 0 1 = Sp12 CMPEN 411 L14 S.9
Transition Probabilities for Some Basic Gates P 0 1 = P out=0 x P out=1 NOR (1 - (1 - P A )(1 - P B )) x (1 - P A )(1 - P B ) OR (1 - P A )(1 - P B ) x (1 - (1 - P A )(1 - P B )) NAND P A P B x (1 - P A P B ) AND (1 - P A P B ) x P A P B XOR (1 - (P A + P B - 2P A P B )) x (P A + P B - 2P A P B ) 0.5 0.5 A B X Z Sp12 CMPEN 411 L14 S.10 For X: P 0 1 = P 0 x P 1 = (1-P A ) P A = 0.5 x 0.5 = 0.25 For Z: P 0 1 = P 0 x P 1 = (1-P X P B ) P X P B = (1 (0.5 x 0.5)) x (0.5 x 0.5) = 3/16
Another Example 0.5 0.5 A B (1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16 X Z (1-3/16 x 0.5) x (3/16 x 0.5) = 0.085 Sp12 CMPEN 411 L14 S.11
Inter-signal Correlations Determining switching activity is complicated by the fact that signals exhibit correlation in space and time reconvergent fan-out 0.5 0.5 A B (1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16 X Z Reconvergent (1-3/16 x 0.5) x (3/16 x 0.5) = 0.085 Have to use conditional probabilities P(Z=1) = P(B=1) & P(A=1 B=1) Sp12 CMPEN 411 L14 S.12 notice that Z = (A or B) and B = AB or B = B, so 0 -> 1 should be (and is) 1/2 x 1/2 = 1/4!!!
Logic Restructuring Logic restructuring: changing the topology of a logic network to reduce transitions AND: P 0 1 = P 0 x P 1 = (1 - P A P B ) x P A P B 0.5 0.5 (1-0.25)*0.25 = 3/16 A A W 7/64 0.5B B X 15/256 0.5 0.5 C C 0.5 D F D 0.5 0.5 3/16 Y Z 3/16 15/256 F Chain implementation has a lower overall switching activity than the tree implementation for random inputs Sp12 CMPEN 411 L14 S.13
Input Ordering 0.5 A B 0.2 C 0.1 X F 0.2 B C 0.1 A 0.5 X F Which is better wrt transition probabilities? Sp12 CMPEN 411 L14 S.14
Input Ordering 0.5 A B 0.2 (1-0.5x0.2)x(0.5x0.2)=0.09 X C F 0.1 0.2 B C 0.1 (1-0.2x0.1)x(0.2x0.1)=0.0196 X A F 0.5 Which is better wrt transition probabilities? Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5) Sp12 CMPEN 411 L14 S.15
Glitching in Static CMOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value A B C X Z ABC 101 000 X Z Sp12 CMPEN 411 L14 S.16 Unit Delay
Glitching in Static CMOS Networks Gates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards) glitch: node exhibits multiple transitions in a single cycle before settling to the correct logic value A B C X Z ABC 101 000 X Z Sp12 CMPEN 411 L14 S.17 Unit Delay
S Output Voltage (V) Glitching in an RCA Cin S15 3 S14 S2 S1 S0 2 1 0 Sp12 CMPEN 411 L14 S.18 S3 S4 S15 Cin S2 S5 S10 S1 S0 0 2 4 6 8 10 12 Time (ps)
Balanced Delay Paths to Reduce Glitching Glitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs 0 0 0 F 1 1 F2 2 0 0 F 1 1 0 F 3 0 0 F 2 1 F 3 So equalize the lengths of timing paths through logic Sp12 CMPEN 411 L14 S.19
Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active (Dynamic) Leakage (Standby) Logic design Reduced V dd TSizing Multi-V dd Multi-V T Stack effect Pin ordering Clock Gating Sleep Transistors Multi-V dd Variable V T Input control DFS, DVS (Dynamic Freq, Voltage Scaling) Variable V T Sp12 CMPEN 411 L14 S.20
Dynamic Power as a Function of V DD Decreasing the V DD decreases dynamic energy consumption (quadratically) But, increases gate delay (decreases performance) Sp12 CMPEN 411 L14 S.21 5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 V DD (V) Determine the critical path(s) at design time and use high V DD for the transistors on those paths for speed. Use a lower V DD on the other gates, especially those that drive large capacitances (as this yields the largest energy benefits).
Multiple V DD Considerations How many V DD? Two is becoming common Many chips already have two supplies (one for core and one for I/O) When combining multiple supplies, level converters are required whenever a module at the lower supply drives a gate at the higher supply (step-up) If a gate supplied with V DDL drives a gate at V DDH, the PMOS never turns off - The cross-coupled PMOS transistors do the level conversion - The NMOS transistor operate on a reduced supply Level converters are not needed for a step-down change in voltage Overhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside the flipflop (see Figure 11.47) V in V DDH VDDL V out Sp12 CMPEN 411 L14 S.22
Dual-Supply Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Clustered voltage-scaling Each path starts with V DDH and switches to V DDL (gray logic gates) when delay slack is available Level conversion is done in the flipflops at the end of the paths Sp12 CMPEN 411 L14 S.23
Dual-Supply Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Clustered voltage-scaling Each path starts with V DDH and switches to V DDL (gray logic gates) when delay slack is available Level conversion is done in the flipflops at the end of the paths Sp12 CMPEN 411 L14 S.24
Power and Energy Design Space Constant Throughput/Latency Variable Throughput/Latency Energy Design Time Non-active Modules Run Time Active (Dynamic) Leakage (Standby) Logic design Reduced V dd TSizing Multi-V dd Multi-V T Stack effect Pin ordering Clock Gating Sleep Transistors Multi-V dd Variable V T Input control DFS, DVS (Dynamic Freq, Voltage Scaling) Variable V T Sp12 CMPEN 411 L14 S.25
Stack Effect Subthreshold leakage is a function of the circuit topology and the value of the inputs V T = V T0 + ( -2 F + V SB - -2 F ) where V T0 is the threshold voltage at V SB = 0; V SB is the sourcebulk (substrate) voltage; is the body-effect coefficient A A B Out Leakage is least when A = B = 0 Leakage reduction due to stacked transistors is called the stack effect B V X Sp12 CMPEN 411 L14 S.26
ID (A) Leakage as a Function of Design Time V T Reducing the V T increases the subthreshold leakage current (exponentially) 90mV reduction in V T increases leakage by an order of magnitude But, reducing V T decreases gate delay (increases performance) 0 0.2 0.4 0.6 0.8 1 Sp12 CMPEN 411 L14 S.28 VGS (V) VT=0.4V VT=0.1V Determine the critical path(s) at design time and use low V T devices on the transistors on those paths for speed. Use a high V T on the other logic for leakage control. A careful assignment of V T s can reduce the leakage by as much as 80%
Dual-Thresholds Inside a Logic Block Minimum energy consumption is achieved if all logic paths are critical (have the same delay) Use lower threshold on timing-critical paths Assignment can be done on a per gate or transistor basis; no clustering of the logic is needed No level converters are needed Sp12 CMPEN 411 L14 S.29
IBM Cu11/Cu08 Blue Logic Library ASIC Cu11 (130nm) Library : Dual-vt library 2690 total cells in standard cell library Nominal Vt level (~300mv) Low Vt level (~210mv) Low-vt version has same physical footprint ~15% improvement in gate delay ~10x increase in leakage power ASIC Cu08 (90nm) Library : Multi-vt library 2118 total cells in standard cell library Intermediate-vt (AVT) and Low-vt (LVT) version of each cell Two more vt levels being planned (very lowvt and high vt) Sp12 CMPEN 411 L14 S.30
An example to summarize all design-time techniques Critical path Sp12 CMPEN 411 L14 S.31
Design Time Low Power Techniques Lower Vdd Higher Vdd Sp12 CMPEN 411 L14 S.32 Level Converter
Design Time Low Power Techniques Higher Vth Lower Vth Sp12 CMPEN 411 L14 S.33
Design Time Low Power Techniques Stack Forcing W In 1/2 W 1/2 W Out W 1/2 W 1/2 W Sp12 CMPEN 411 L14 S.34
Low Power Techniques Interaction w/ each other Higher Vth Lower Vth Apply high Vth and size-up to recover speed Sp12 CMPEN 411 L14 S.35
Next Lecture and Reminders Next lecture Dynamic logic - Reading assignment Rabaey, et al, 6.3 Sp12 CMPEN 411 L14 S.36