Runtime Mechanisms for Leakage Current Reduction in CMOS VLSI Circuits Afshin Abdollahi University of Southern California Farzan Fallah Fuitsu Laboratories of America Massoud Pedram University of Southern California
Outline Introduction Leakage Reduction Techniques Input Vector Control Adding Control Points Modifying Gates Results
Introduction Decrease in Transistor Size Increase in Density Decrease in Breakdown Voltage Increase in Operation Frequency Increase in Dynamic Power Low Power Design Maintaining Performance Low Supply Voltage Low Threshold Voltage High Leakage Current I sub = K 1 e qv kt DS e q( V GS VT +ηv nkt DS )
Power Supply Gating Leakage Reduction Techniques Low Threshold High Threshold +Hugereductioninleakage Virtual V dd SLEEP -ModificationinCMOS technology process P Gate1 Gate2 Gate3 - Reduced voltage swing IN Virtual Ground N OUT Virtual Ground SLEEP - Reduced performance - Reduced DC noise margin -Less effective as technology scale down SLEEP
Leakage Reduction Techniques Dual and Variable Threshold Voltages Dual Threshold CMOS + High-Threshold devices on non-critical paths + Low-Threshold device on critical paths - Requires technology process modification Variable Threshold CMOS (VTCMOS) + Dynamically change the substrate voltage to control the leakage and speed + Substrate voltage higher than V dd (for P transistors) + Substrate voltage lower than ground (for N transistors) - Requires triple-well technology and additional power supply - Performance penalty (delay of retrieving the substrate voltage) - Less effective with the technology scaling down
Input Vector Control Input Dependence of the Leakage Current X 0 X 1 Simulation environment : Technology: 0.18 micron Supply Voltage = 1.5V Threshold Voltage = 0.2V X 0 X 1 X 0 X 1 00 01 10 11 Leakage 23.60 na 47.15 na 51.42 na 82.94 na
Input Vector Control Minimum Leakage Vector Identification Primary Inputs Internal Signals Original Circuit Primary Outputs Leakage Computing Circuit Circuit Leakage Leakage Level Search for the minimum leakage level for which the above Boolean network is satisfiable. < 1
Boolean Satisfiability Formulation Input Vector Control a 0 b 0 s ) s0 0 = a0 b0 a 0 b 0 s 0 clause 0 0 0 cl 1 =a 0 +b 0 +s 0 0 1 1 cl 2 =a 0 +b 0 +s 0 a 0 b 0 s 0 a + 0 b 0 s 0 a + 0 + b0 s 0 1 0 1 cl 3 =a 0 +b 0 +s 0 l = AND ( cl1, cl 2, cl3, cl 4 ) 1 1 0 cl 4 =a 0 +b 0 +s 0 ( s 0 = a0 b0 ) ( l = true )
Input Vector Control Leakage Computing X 0 0.18 micron X 00 Leakage 23.60 na L 00 X 1 V DD = 1.5V 01 51.42 na L 01 X i0 X i1 V T = 0.2V 10 11 47.15 na 82.94 na L 10 L 11 -Quantize the leakage values to k level: 0, 1,, k-1 D = X X D 01 = X 1X 0 D 10 = X 1 X 0 D 11 = X 1X 0 00 1 0 Leakage ( X D L ) = D00 L 00 + D01L 01 + D10 L10 + 11 11
Decreasing the Number of Additions Input Vector Control Contribution of all NAND gates in the total leakage = L NAND = n = 1 Leakage NAND ( X ) L NAND = one bit m bits n = 1 ( D n n n 00 L00 ) + ( D01L01 ) + ( D10 L!0 ) + ( D11L11 ) = 1 = 1 = 1 (n-1)m single-bit additions L NAND log n bits = ( n = 1 D m bits n n n 00 ) L00 + ( D00 ) L00 + ( D00 ) L00 + ( D11) = 1 = 1 = 1 L 11 (n -1+ m log n) single-bit additions
Leakage Computing Circuit Input Vector Control X 10 X 11 1 D 00 n D 00 Number of L 00 appearances in the total leakage 2-to-4 decoder L 00 L NAND (00 ) 1 D 11 L NAND X n 0 X n1 2-to-4 decoder n D 11 L NAND (11) Leakage Level L 11 L OR Total_Leakage <
Input Vector Control Linear Search Algorithm for Minimum Leakage Start C=TrivialUpperBound MLV = {} C=C-1 Generate Boolean clauses corresponding total_leakage < C If total_leakage < C 1 then total_leakage < C Solve the resulting satisfiability problem Stop MLV = satisfying vector Yes Satisfiable? No Minimum Leakage = C+1 Min-Leakage Vector = MLV
Input Vector Control Comparing Maximum and Minimum Leakage Values Number of Instances 12 10 8 6 4 2 0 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Max/Min Leakage Ratio
Applying Minimum Leakage Vector Input Vector Control Primary Inputs Min-Leakage Vector Min-Leakage Input = 0 0 1 sleep Circuit Min-Leakage Input = 1 input sleep input input sleep input sleep input sleep input 0 input 1 0 0 input 1 1
Input Vector Control Leakage Power Reduction by Applying Min-Leakage Vector Number of Instances 14 12 10 8 6 4 2 0 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% Percentage of Energy Saving Comparing minimum leakage with average leakage in the standby mode
Controllability of Internal Gates Adding Control Points Min-leakage input 1 0 sleep
Adding Control Points sleep sleep X=0 X=1 Y=0 X=1 Y=1 X Y Parameters X=0: No multiplexing X = 1 : Multiplex with 0 Leakage(MUX) Y=0 Leakage(MUX) Y=1 00 01 10 11 L MUX 0 1 optimum value Y = 0 : Optimum value = 0 Y = 1 : Optimum value = 1 XY Leakage level L total L MUX k ' L total <
Adding Control Points Power Reduction by Adding Control Points Number of Instances 16 14 12 10 8 6 4 2 0 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% Percentage of Energy Saving
Adding Control Points Standby Time Constraint for Effectiveness of the Method Number of Instances 14 12 10 8 6 4 2 0 50 100 150 200 250 300 350 400 450 500 Minimum Cycles in Idle Mode
Modifying Gates A ) B ) C ) in P N out in P N out sleep in P N out sleep in g out in g sleep out in sleep g out X=0 X=1 Y=0 X=1 Y=1
Modifying Gates Delay Calculation Leakage A Leakage B Leakage C 00 01 10 11 Leakage in 1 in m arrival_time(in 1 ) delay(in 1,out) out XY MAX arrival_time(out) arrival_time(in m ) Delay A Delay B Delay C 00 01 10 11 Delay delay(in m,out) arrival_time(po 1 ) XY arrival_time(po n ) MAX circuit_delay
Equivalent Boolean Network Modifying Gates Delay Constraint XY variables Primary Inputs Leakage Level Original Circuit Delay Computing Circuit Internal Signals Leakage Computing Circuit > <
Modifying Gates Energy Saving for different speed degradations Number of Instances 40 35 30 25 20 15 10 5 0 0% speed degradation 5% speed degradation 10% speed degradation 15% speed degradation 10% 20% 30% 40% 50% 60% 70% Percentage of Energy Saving
Modifying Gates Runtime of the Algorithm Number of Instances 25 20 15 10 5 0 Algorithm runtime for 32 quantized levels. Algorithm runtime for 64 quantized levels. 1 2 3 4 5 6 7 8 9 10 Runtime (x1000 seconds)