VLSI Design hapter 8 Low-Power VLSI Design Methodology Jin-Fu Li
hapter 8 Low-Power VLSI Design Methodology Introduction Low-Power Gate-Level Design Low-Power Architecture-Level Design Algorithmic-Level Power Reduction RTL Techniques for Optimizing Power Adiabatic Logic ircuits 2
Introduction Most SO design teams now regard power as one of their top design concerns Why low-power design? Battery lifetime (especially for portable devices) Reliability Power consumption Peak power Average power 3
Overview of Power onsumption Average power consumption Dynamic power consumption Short-circuit power consumption Leakage power consumption Static power consumption Dynamic power dissipation during switching input drain interconnect input 4
Overview of Power onsumption Generic representation of a MOS logic gate for switching power calculation V A V B pmos network V A V B nmos network drain V out + int erconnect + input P avg = 1 T / 2 T T [ dv dt out V ( ) + out load dt ( VDD 0 T / 2 V out )( load dv dt out ) dt] 5
Overview of Power onsumption The average power consumption can be expressed as P avg The node transition rate can be slower than the clock rate. To better represent this behavior, a node transition factor ( ) should be introduced P avg = 1 T = α load T V 2 DD load α T V The switching power expressed above are derived by taking into account the output node load capacitance = 2 DD f load LK V 2 DD f LK 6
Overview of Power onsumption V A V internal V A V B V B internal V internal V out V A V B load V out The generalized expression for the average power dissipation can be rewritten as P avg # ofnodes = α i = 1 Ti V i i V DD f LK 7
Gate-Level Design Technology Mapping The objective of logic minimization is to reduce the boolean function. For low-power design, the signal switching activity is minimized by restructuring a logic circuit The power minimization is constrained by the delay, however, the area may increase. During this phase of logic minimization, the function to be minimized is i P (1 P ) i i i 8
Gate-Level Design Technology Mapping The first step in technology mapping is to decompose each logic function into two-input gates The objective of this decomposition is to minimizing the total power dissipation by reducing the total switching activity A 0.2 B 0.2 α =0.0384 α = 0.0196 α = 0.0099 A B D 0.5 D 0.5 A 0.2 B 0.2 α = 0.0384 0.5 D 0.5 α = 0.1875 α = 0.0099 9
Gate-Level Design Phase Assignment High activity node High activity node A A B B 10
Gate-Level Design Pin Swapping a b c d a b c d d a Switching activity c b a Switching activity b c d d c b a b a d c 11
Gate-Level Design Glitching Power Glitches spurious transitions due to imbalanced path delays A design has more balanced delay paths has fewer glitches, and thus has less power dissipation Note that there will be no glitches in a dynamic MOS logic A B D E A B D E 12
Gate-Level Design Glitching Power A chain structure has more glitches A tree structure has fewer glitches A B hain structure D A B D Tree structure 13
Gate-Level Design Precomputation REG R1 ombinational Logic REG R2 REG R1 ombinational Logic REG R2 Precomputation Logic 14
Gate-Level Design Precomputation A<n-1> B<n-1> REG R1 1-bit omparator (MSB) A<n-2:0> REG R2 B<n-2:0> Precomputation logic Enable REG R3 (n-1)-bit omparator REG R4 F 15
Gate-Level Design Gating lock clk D Q D Q D Q D Q Fail DFT rule checking T clk D Q D Q D Q D Q Add control pin to solve DFT violation problem 16
Gate-Level Design Input Gating f1 clk + select f2 17
Architecture-Level Design Parallelism A 16 R A 16 R f ref B 16 R 16x16 multiplier 32 16 32 16x16 f ref/2 multiplier R M UX 32 f ref f ref/2 Assume that With the same 16x16 multiplier, the power supply can be reduced from V ref to V ref /1.83. Vref f 2 ref Pparallel = 2.2 ref ( ) = 0. 33P 1.83 2 ref B 16 16 R f ref/2 R 16x16 multiplier 32 f ref 18 f ref/2
Architecture-Level Design Pipelining The hardware between the pipeline stages is reduced then the reference voltage V ref can be reduced to V new to maintain the same worst case delay. For example, let a 50MHz multiplier is broken into two equal parts as shown below. The delay between the pipeline stages can be remained at 50MHz when the voltage V new is equal to V ref /1.83 (A,B) 32 REG Half multiplier REG Half multiplier 32 f ref Vref 2 Ppipeline = 1.2 ref ( ) f ref = 0. 36 1.83 P ref 19
Architecture-Level Design Retiming Retiming is a transformation technique used to change the locations of delay elements in a circuit without affecting the input/output characteristics of the circuit. Two versions of an IIR filter. (1) (1) x(n) y(n) x(n) y(n) D D a 2D w(n) (1) (2) b retiming D a (1) D (2) w 1 (n) D b 2D (2) w 2 (n) (2) 20
Architecture-Level Design Retiming Retiming for pipeline design REG 1 (6ns) 2 (2ns) REG 3 (4ns) f ref REG 1 (6ns) REG 2 (2ns) 3 (4ns) f ref 21
Architecture-Level Design Retiming lock cycle is 4 gate delays lock cycle is 2 gate delays 22
Architecture-Level Design Power Management 2 1 1 _FREEZE 2 _FREEZE 2 1 1 _FREEZE 2 _FREEZE 23
Architecture-Level Design Bus Segmentation Avoid the sharing of resources Reduce the switched capacitance For example: a global system bus A single shared bus is connected to all modules, this structure results in a large bus capacitance due to The large number of drivers and receivers sharing the same bus The parasitic capacitance of the long bus line A segmented bus structure Switched capacitance during each bus access is significantly reduced Overall routing area may be increased 24
Architecture-Level Design Bus Segmentation bus bus1 Bus Interface bus1 25
Algorithmic-Level Design f activity Reduction Minimization the switching activity, at high level, is one way to reduce the power dissipation of digital processors. One method to minimize the switching signals, at the algorithmic level, is to use an appropriate coding for the signals rather than straight binary code. The table shown below shows a comparison of 3-bit representation of the binary and Gray codes. Binary ode Gray ode Decimal Equivalent 000 001 010 011 100 101 110 111 000 001 011 010 110 111 101 100 0 1 2 3 4 5 6 7 26
RTL-Level Level Design Signal Gating Simple Decoder Decoder with enable module decoder (a, sel); input [1:0[ a; ouput [3:0] sel; reg [3:0] sel; always @(a) begin case (a) 2 b00: sel=4 b0001; 2 b01: sel=4 b0010; 2 b10: sel=4 b0100; 2 b11: sel=4 b1000; endcase end endmodule module decoder (en,a, sel); input en; input [1:0[ a; ouput [3:0] sel; reg [3:0] sel; always @({en,a}) begin case ({en,a}) 3 b100: sel=4 b0001; 3 b101: sel=4 b0010; 3 b110: sel=4 b0100; 3 b111: sel=4 b1000; default: sel=4 b0000; endcase end endmodule 27
RTL-Level Level Design Datapath Reordering Initial Reordered A<B glitchy Mux stable Mux stable Mux A<B glitchy Mux 28
RTL-Level Level Design Memory Partition din addr write 128x32 dout 32 pre_addr 8 d q addr[7:0] addr[7:1] noe M UX 32 dout clk noe write addr din dout 32 addr0 128x32 29
RTL-Level Level Design Memory Partition Application-driven memory partition 64K bytes Reads ARM ore Data Addr R/W 28K 4K 32K 64K Addr Range 30
RTL-Level Level Design Memory Partition A power-optimal partitioned memory organization Decoder ARM ore Data Addr R/W S Data Addr R/W S Data Addr R/W S 31
Adiabatic Logic ircuits Energy drawn from the power supply during 0-to-V transition is calculated as follows Q E = = V QV = V 2 The amount of stored energy from in the output node is expressed as follows V E = V o 0 vodv = 1 2 V 2 32
Adiabatic Logic ircuits To reduce the dissipation, the designer can minimize the switching events, decrease the capacitance, reduce the voltage swing, or apply a combination of these methods The energy drawn from the power supply is used only once To increase the energy efficiency of logic circuits, other methods can be introduced for recycling the energy drawn from the power supply A novel class of logic circuits called adiabatic logic offers the possibility of further reducing the energy dissipated during switching events, and the possibility of recycling 33
Adiabatic Logic ircuits Adiabatic Switching The equivalent circuit used to model the charge-up event in conventional MOS circuits t=0 R V I source V I 1 ( t) = I V = source source ( t) t t 34
Adiabatic Logic ircuits Adiabatic Switching The amount of energy dissipated in the resistor R from t=0 to t=t can be expressed as E E diss diss = R = T 0 R T I 2 source V 2 A number of simple observations can be obtained The energy is smaller than the conventional case if the charging time T is larger than 2R The dissipated energy is proportional to the resistance R an a portion of the energy stored in the capacitance be reclaimed by reversing the current direction? The possibility is unique to the adiabatic operation dt ( T ) = RI 2 source T 35
Adiabatic Logic ircuits Adiabatic Switching It is possible to reduce the dissipation to an arbitrary degree by increasing the switching time to ever-larger values This is referred as the principle of adiabatic charging The term adiabatic is used to indicate that all charge transfer is to occur without generating heat Switching circuits that charge and discharge their load capacitance adiabatically are said to use adiabatic switching The circuits rely on special power supplies that provide accurate pulsed-power delivery The circuits are useful only if the supplies can deliver power efficiently and recycle the power fed back to them 36
Adiabatic Logic ircuits Adiabatic Logic Gates The adiabatic amplifier circuit V A X X Y Y X X i Y X X Y X X V A Y Y 37
Adiabatic Logic ircuits Adiabatic Logic Gates F charge-up path charge-down path F charge-up path V out inputs V out inputs load2 V DD F load charge-down path V DD F V out load1 38
Adiabatic Logic ircuits Adiabatic Logic Gates A A B B V out V A A A B V out B 39
Adiabatic Logic ircuits Adiabatic Logic Gates low V A constant current V out V A R i c V out load V DD V DD /n V A V out t 40
Adiabatic Logic ircuits Adiabatic Logic Gates V N V out V 2 V 1 V out t 41