Power in Digital CMOS Circuits Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2004 by Mark Horowitz MAH 1 Fruits of Scaling SpecInt 2000 1000.00 100.00 10.00 intel 386 intel 486 i ntel penti um i ntel penti um 2 i ntel penti um 3 i ntel penti um 4 intel itanium Al pha 21 064 Al pha 21 1 64 Al pha 21 264 Spar c Super Spar c Spar c64 Mips HP PA Power PC AMD K6 AMD K7 1.00 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 MAH 2
The Darker Side of Scaling - Power 100 10 1 intel 386 intel 486 intel pentium intel pentium 2 intel pentium 3 intel pentium 4 intel itanium Al pha 21 064 Al pha 21 1 64 Al pha 21 264 Spar c Super Spar c Spar c64 Mips HP PA Power PC AMD K6 AMD K7 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 At least it is scaling slower than performance MAH 3 Power is Important Three reasons we care about power: 1. Need to get the power into the chip 60W@ 1.2V is many Amps 2. Need to get the power out of the chip How low thermal resistance is possible? Plastic packages w/o forced air High thermal resistances 3. Energy is heavy Need to carry the energy 20Wh/lb MAH 4
Important Questions How did we end up in this situation? Power was not an issue in the mid 80s Scaling theory said power would be constant Energy efficiency would improve Is there any hope for the future? More issues that need to be addressed? Silver bullets to solve the power problem? Techniques that will help MAH 5 Power in CMOS Circuits Dynamic power Proportional to C Vswing Vdd F Dominates most circuits Static power Idc (usually leakage current now) *Vdd Has been very small, and is still small Issue when circuit is idle and dynamic power is zero MAH 6
Historical Power Scaling In current technology shrinks, X,Y, V all scale Implies that C also scales If scale a chip to a new technology, operate at F C, and V both scale, so power DECREASES by α 3 Power decreases by 3x for each technology generation If scale a chip to a new technology, operate at F/α C, and V both scale down, F scales up So power DECREASES by α 2 1.4 times faster chip, for ½ the power Every time you move to a new technology MAH 7 Power Scaling Static Power X,Y, V all scale. Gate oxide is scaling faster than α Now have leakage current through the gate GIDL is another issue Scaling Vth Performance depends on Vdd/Vth ratio Transistor leakage tied to Vth Leakage current increases exponentially If Vth does not scale Leakage power scales as α 2 I leak I. s e V th αv T MAH 8
New Problem Leakage Scaling Vth From.5µ generation, Vdd = 10V/µ *L Seems like this scaling is still on track Vth was Vth = 1.4V/µ *L But that would mean that 0.18µ has 250mV Vth Leakage issues Vdd/Vth of gates are starting to fall Performance of gates will drop Haven t seen this yet, since technologies are pushed Variability of delay will increase Subtracting two large numbers to get overdrive Additional leakage paths gate tunnel current MAH 9 Dilemma What Vdd and Vth Lower Vth Need less Vdd Less dynamic power More leakage current Correct choice depends on operation condition Vth.4.2 Gate Delay Also at low Vdd to Vth ratios the variation in delay will be larger 0 t 1 2 Vdd MAH 10
Energy-Delay Trade-offs Placed lines of constant delay on top of a contour plot of energy Much lower energy/op if operate at low Vdd, Vth If, of course, the circuit is active Et, MAH 11 Do We Have a Power Problem? That depends on your point of view: Cost per operation is MUCH cheaper at same F Look at what we do in cell phones / laptop computers This cost will continue to fall Cost per operation is cheaper even at higher F Get this reduced cost even when you run part faster But we are greedy Want machine to run faster than technology scaling Build the most complex machines possible This combination means power has been growing. Power and performance are related Faster often means much more power MAH 12
Future Scaling Will Be Much Worse If Vdd does not scale Energy/gate scales only as α This is because C scales This means that: Power of gate will be constant if F increases MAH 13 Architecture Convert transistors to performance Use transistors to Exploit parallelism Or create it (speculate) Processor generations Simple machine Pipelined Super-scalar Out-of-order Speculation Each design has more logic to accomplish same task MAH 14
Architecture Scaling Plot of IPC Real IPC Compiler 0.05 0.04 Hardware Grows rapidly More FU Wires don t shrink 0.03 0.02 0.01 0.00 80386 80486 Pentium Pentium II PentiumIII Pentium4 Jan-85 Jan-88 Jan-91 Jan-94 Jan-97 Jan-00 MAH 15 SpecInt/MHz 1.00 0.10 0.01 85 87 89 91 93 95 97 99 01 MAH 16
Clock Frequency Scaling 10000 1000 100 10 85 87 89 91 93 95 97 99 01 MAH 17 100 Clock Cycle in FO4 Alpha 10 85 87 89 91 93 95 97 99 01 03 05 MAH 18
Power Scaling Complexity of chip is scaling as α 2 Freq is scaling as 1/α 2 Means we have more flops/gate Higher average power per gate Power should scale as faster than 1/α For the biggest / fastest chips This scaling has changed the rules of design It is not sustainable Processors have a limited power budget Fastest processor for a given power budget MAH 19 Now What? The power issue is more complex Power efficiency of digital hardware is improving Lots of hardware now can run on small batteries But demands are growing faster Power on top-end processors is growing Technology scaling is becoming more interesting Scaling now has its own set of trade-offs Interesting techniques to deal with these issues MAH 20
Evaluating Power Efficiency Want to select the most energy efficient solution Can t use power, P=CV 2 F Lowering the operating frequency lowers the power Many people use Mips/W This is really an energy metric Mips/W = Reciprocal of Joules/million instructions Energy metrics are kcv 2 Lower Voltage will increase this metric Look at the Xscale processor Highest Mips/Watt for lowest supply voltage Need to plot performance and power MAH 21 Energy Performance Graphs Use two axis Power or Energy/Op Performance Normalize for technology Watt/Spec 1.60 1.40 1.20 1.00 0.80 0.60 Energy scales as α 3 Makes tradeoff clearer At performance level Choose lowest energy At power level Choose highest performance 0.40 0.20 0.00 0.00 5.00 10.00 15.00 20.00 25.00 Spec95 Most processors are roughly on a line of 25 Spec 2 /W in 98 MAH 22
Energy/Instruction vs. Performance Watts/Spec 0.4 intel 386 0.35 intel 486 0.3 0.25 0.2 0.15 Sparc 0.1 Sparc64 Mips HP PA 0.05 AMD K6 0 AMD K7 0 200 400 600 800 Spec2000 intel pentium intel pentium 2 intel pentium 3 intel pentium 4 intel itanium Alpha 21064 Alpha 21164 Alpha 21264 SuperSparc Power PC MAH 23 1 Energy/Instruction Performance Watts/Spec 0.1 0.01 1 10 100 1000 10000 Spec2000 MAH 24
Truths About Power The power is super linear on required performance Lower performance is lower Energy/Op Means you can trade excess performance for power Many low power solutions are really highperformance algorithms run slowly Key enabling of low power signal processing Create parallel solution, and then voltage scale Create new algorithms that use less computing Best way to save power is to do less ops System level power management MAH 25 Low Power Design Define your problem at the correct level Architectural changes make the most difference Turning the RF section off most of the time is easier than building a low power RF design Doing less OPs is easier than building low power OPs Doing OPs in parallel is lower power than sequentially Technology also makes a huge difference Power scales by about 3 to 4x each generation Mobile parts almost always use the best process Circuits is mostly about not wasting power Use the right supply, Vth, etc for the job MAH 26
Solution - Multiple Flavors of Transistors New technologies have many different transistors Vth, Gate Oxide Get to select which technology to use Highest performance, low leakage current, etc Often have a couple of transistors types Dual gate oxides, multiple Vths for the transistors Designer Needs to choose the transistor type Can choose the supply voltage too MAH 27 Solution Multiple Levels for Vdd Basic Concept Gates off the critical path run at VDDL (reduced voltage) Gates on the critical path run at VDDH (higher voltage) Minimize # of level-converters V DDH only level converter V DDH V DDL cluster critical path critical path CVS structure [Usami98a] MAH 28
Problem With Static Approach Two issues with setting Vdd and Vth at design time Optimal point changes with operating task Variability of devices changes what you create Might want to have periods of high-performance Adaptive supplies Might have periods of low activity Adaptive threshold But how do you do this when you have variability? This is still an open question MAH 29 Adaptive Power-Supply Regulator fref Digital System Reference Circuit V f - + Controller Vdd V L u C Buck Converter MAH 30
Adaptive Threshold Control People have used substrate voltage Both forward and reverse bias Impedance control is the big issue Have strap transistors to tie to Vdd, Gnd Only drive the lines when in standby Other groups have tied to control Vth more directly Adjust Ion to Ioff ratios to optimize performance Problem is how to measure leakage If devices don t match, your control device might not be correct, and the optimal point depends on activity anyhow. MAH 31 Crazy Idea Measure Power Directly Assume you change change Vdd and Vth Why not use simple adaptive algorithm Build a system that has a tracking Vdd control loop Vdd adapts to the value needed to run at F Add a small modulation to Vthn Vthp Measure power of the system as it is running Measure the change in power w.r.t. Vthn, Vthp using a synchronous detector. This will tell whether lowering Vthn will increase power or decrease power for this chip. Adapt Vthn Vthp in a slow loop Naturally will track activity ratio, temp, etc. MAH 32
Summary Power efficiency of CMOS logic scales Power per function is scaling about 3-4x per gen This scaling might slow down in a couple gen Trouble with scaling Vdd if Vth does not scale Use adaptation to become more efficient with voltages Power is a large problem for current IC designers We now have the capability to build very hot chips Since we can build very complex circuits in small spaces Need to balance performance and power Best way to lower power is to find solution that needs less stuff done, or can use parallelism MAH 33