An Improved Logical Effort Model and Framework Applied to Optimal Sizing of Circuits Operating in Multiple Supply Voltage Regimes

n Improved Loical Effort Model and Framework pplied to Optimal Sizin of Circuits Operatin in Multiple Supply Voltae Reimes Xue Lin, Yanzhi Wan, Shahin Nazarian, Massoud Pedram Department of Electrical Enineerin, University of Southern California, C US E-mail: {xuelin, yanzhiwa, snazaria, pedram}@usc.edu bstract Diital near-threshold loic circuits have recently been proposed for applications in the ultra-low power end of the desin spectrum, where the performance is of secondary importance. However, the characteristics of MOS transistors operatin in the near-threshold reion are very different from those in the stron-inversion reion. This paper first derives the loical effort and parasitic delay values for loic ates in multiple voltae (sub/near/super-threshold reimes based on the transreional model. The transreional model shows hiher accuracy for both sub- and near-threshold reions compared with the subthreshold model. Furthermore, the derived near-threshold loical effort method is subsequently used for delay optimization of circuits operatin in both near- and super-threshold reimes. In order to achieve this oal, a joint optimization of transistor sizin and adaptive body biasin is proposed and optimally solved usin eometric prorammin. Experimental results show that our improved loical effort-based optimization framework provides a performance improvement of up to 40.1% over the conventional loical effort method. Keywords Sub/near-threshold, loical effort, delay optimization 1. Introduction ressive voltae scalin from the conventional superthreshold reion to the sub/near-threshold reion has shown effectiveness in reducin power consumption in diital circuits [1][2][3]. It is especially beneficial for applications such as wireless sensor processin and RFID tas where performance is not the primary concern. The operatin frequency of sub/near-threshold loic is much lower than that of reular stron-inversion loic ( due to the smaller transistor current, which consists mostly of leakae current. uthors of [4][5] derived analytical expressions of the optimum to minimize enery consumption, i.e., the minimum enery point (MEP, and showed that the MEP for CMOS circuits typically occurs in the near-threshold reion. Loical effort based delay calculation relies on the computation of the loical effort and parasitic delay of loic ates [6]. The loical effort and parasitic delay of a loic ate are independent of the transistor sizes in the loic ate. The first step is the derivation of the sizes of NMOS and PMOS transistors for a minimum-size inverter that achieves equal rise and fall times and then usin this template inverter to derive the values of the loical effort and parasitic delays of more complex loic ates (e.., NND and NOR ates. However, the extension of the loical effort method to the sub/near-threshold reion requires takin into account the different characteristics of MOS transistors in the sub/nearthreshold reion compared with the stron-inversion reion. The authors of [7] extended the loical effort method to the subthreshold reion by (i usin circuit simulation to determine the optimal width ratio between NMOS and PMOS transistors for the template inverter and (ii providin an analytical sizin method of stacked transistors based on the case when the drive currents in all the stacked transistors are equal, which is not the actual worst case. The conventional MOSFET models are expressed in a piecewise fashion with a breakpoint at or near the threshold voltae, separatin the super-threshold reion where the -power law model [8] is applied and the subthreshold reion where the exponential dependency model [5] is applied. Researchers provided the EKV model [9] that is continuous and continuously differentiable for the nearthreshold reion. However, this model is difficult to provide back-of-the-envelope insihts and is hard to work with analytically. Therefore, we propose a simple empirical model that is an enhancement over [10] for the transistors operatin in the sub/near-threshold reime, takin into account the DIL (drain induced barrier lowerin effect that is phenomenal in short-channel transistors. This model results in an averae of 3.9% and 2.1% inaccuracy for NMOS and PMOS transistors, respectively, compared with HSpice simulation results. ased on the accurate transreional model, we extend the loical effort calculation and optimization framework to the near-threshold reime. Different from [7], we provide an analytical sizin method for the NMOS and PMOS transistors in the template inverter with equal rise and fall times, and validate the sizin results throuh HSpice simulation. Furthermore, we propose an analytical sizin method for the stacked transistors based on the worst-case rise and fall times, showin hiher accuracy than the stack sizin method proposed in [7]. We calculate the loical effort and parasitic delay values of different CMOS loic ates operatin in the near-threshold reime. Many burst-mode applications require hih performance for brief time periods between extended sections of low performance operation [11]. Diital circuits supportin such burst-mode applications should work in both near-threshold reion and super-threshold reion (for brief time periods. It would be difficult for a loic ate to achieve equal rise and fall times simultaneously in both reions even if certain techniques such as adaptive body biasin [12] are exploited. Therefore, we turn to loic ates with asymmetric rise and fall times. We derive the loical effort and parasitic delay values of any arbitrarily sized ates (possibly with asymmetric rise and fall times with body biasin in both near-threshold and super-threshold reions. Usin the

extension of the loical effort-based optimization framework, we perform a joint optimization of ate sizes and body biasin voltaes for circuits so that they can operate robustly with the minimum weihted delay. The optimization problem is formulated as a eometric prorammin problem and therefore can be solved optimally in polynomial time complexity [13]. This delay optimization framework can possibly be applied to size the critical paths in lare scale circuits. Experimental results of HSpice simulation usin 32nm Predictive Technoloy Model (PTM [14] show that the proposed improved loical effort-based optimization framework provides a performance improvement of up to 40.1% over the conventional loical effort method under different load capacitance requirements. To our best knowlede, this is the first work that derives loical effort and parasitic delay values of near-threshold loic ates with and without body biasin. The rest of this paper is oranized as follows. Section 2 presents the proposed enhanced transreional model for the transistors operatin in the sub/near-threshold reime. Section 3 provides the proposed template inverter sizin, stack sizin and loical effort calculation methods. Section 4 presents our joint near- and super-threshold delay optimization methodoloy. Experimental results and conclusion are presented in Section 5 and Section 6, respectively. 2. Transistor model in the sub- and nearthreshold reimes The drain current of NMOS transistors operatin in the subthreshold reime obeys an exponential dependency on the ate drive voltae and drain-to-source voltae, iven by: 1 1, (1 where is the mobility, is the oxide capacitance, is the subthreshold slope factor, is the DIL coefficient, and is the thermal voltae. Given a specific technoloy node (e.., the 32 nm PTM, we can rewrite the subthreshold model in Eqn. (1 as: 1, (2 where is a technoloy-dependent parameter. Fiure 1 (reen dots plots the simulated curve of v.s. (we set on a semi-loarithmic scale for an NMOS transistor with a threshold voltae of approximately 0.35 V. One can observe that the curve is nearly straiht for, correspondin to the exponential I-V relationship in the subthreshold reion. The curve rolls off when. We extend the transreional model over [10] that accounts for the DIL effect and unifies the transistor current model for both sub- and nearthreshold reions. The transreional model fits the drain current as (3 1, where is an empirical fittin parameter. We fit the values of parameters,, and from HSpice simulation results. I ds (/um 10m 1m 100u 10u 1u 100n 60.07 µ/µm 1.53 V 2.28 Simulation Result Subthreshold model: Eqn. (2 Transreional model: Eqn. (3 10n 0 0.2 0.4 0.6 0.8 V DD (V Fiure 1. v.s. of an NMOS transistor from simulation, subthreshold model and transreional model. Over the sub- and near-threshold operation rane of 0.10 V to 0.45 V ( 0.35 V for NMOS transistors, the transreional model (the red curve in Fiure 1 results in an averae error of 3.9% and a maximum error of 8.0%. However, the subthreshold model (the blue curve in Fiure 1 is only valid in the subthreshold reion. Over the subthreshold operation rane of 0.10 V to 0.3 V, it results in an averae error of 8.1% and a worst-case error of 19.0%. We can observe from Fiure 1 that the transreional model is even more accurate compared with the subthreshold model in the subthreshold reion. PMOS transistor fits the same empirical model Eqn. (3 with averae and worst case errors of 2.1% and 5.8%, respectively. 3. Loical effort of loic ates in the nearthreshold reime Loical effort based delay calculation [6] is a simple and effective way to both estimate and optimize the delay of CMOS circuits. It relies on the computation of the loical effort and parasitic delay values of loic ates. The loical effort and parasitic delay values are independent of the transistor sizes in a loic ate. The loical effort and parasitic delays of loic ates (e.., NND, NOR are derived based on a reference template. The reference template is typically an inverter with minimum size NMOS and PMOS transistors such that the rise and fall times are equalized. More specifically, the ate delay is modeled as, where is the loical effort, is the electrical effort, is the branchin factor that accounts for off-path capacitance, and is the parasitic delay. Loical effort is defined as the ratio of the input capacitance of a ate to that of an inverter deliverin the same amount of output current (related to its resistance. The electrical effort represents the ratio of output capacitance to input capacitance; the product is called the stae effort; and the parasitic delay is defined as the delay of a ate drivin no load. This final value is set by the parasitic junction capacitance.

3.1 Template inverter sizin In super-threshold reion, the desirable ratio of PMOS width ( to NMOS width ( for achievin equivalent drivin currents is approximately 2.7:1 (obtained throuh HSpice simulations usin the 32nm PTM, due to a joint contribution of the mobility difference between chare carriers in PMOS and NMOS devices and the velocity saturation effect [8]. However, the characteristics of MOS transistors in the super-threshold reion and in the sub/nearthreshold reion are sinificantly different. In the stroninversion reime, the drain current is a first or second-order function of the MOS terminal voltaes, whereas it is iven by Eqn. (3 in the sub/near-threshold reime. Hence, the : ratio for an inverter to achieve equal rise and fall times becomes different in the sub/near-threshold reion. Let, and then we have 0. From Eqn. (3, we derive the followin desirable : ratio in the sub- and near-threshold reions:,,,.,,, Usin the 32 nm PTM [14], the derived : ratios under different values for the subthreshold reion, the nearthreshold reion and the super-threshold reion are listed in Table 1. We observe from Table 1 that the : ratio is 0.75 for an inverter operatin in the near-threshold reion (0.3V, which is in the middle between 0.49 for the subthreshold reion and 2.67 for the super-threshold reion. We also provide the HSpice simulation results of the correspondin : ratios that achieve symmetric rise and fall times. We observe from Table 1 that the analytical results match the simulation results very well, with a maximum deviation of 4.2%. Table 1. : ratios under different values. (V 0.2 0.3 1.0 : ratios 0.49 0.75 (calculated : ratios 0.47 0.77 2.67 (simulated 3.2 Optimal stack sizin In order to develop the near-threshold loical effort framework, we need to find the stack sizin factor in the near-threshold reime. We use the 2-input NND ate as an example to derive the stackin factor for NMOS transistors. Please note that a stack of more than two transistors may not be favored in near-threshold circuits because of performance deradation. s shown in Fiure 2, the stack sizin factor for NMOS transistors (i.e., is defined as the ratio of the width of NMOS transistors in NND ate to the width of NMOS transistor in the template inverter, such that the pull down network in NND ate has the same current drivin strenth as the NMOS transistor in the template inverter. (4 The worst case fall delay of the NND ate can be obtained when input is always hih and input makes a transition from low to hih. The discharin process can be separated into two phases. Durin the first phase, drops to a relative low value,, while remains near to. t the beinnin of phase one, is much larer than, due to (1, 0, (2, 0, and (3 almost no DIL effect for NMOS. s is decreasin, increases exponentially and decreases slowly. Therefore, before matches, the node X loses more chare than node OUT. In addition, the capacitance at node X is typically smaller than that at node OUT. s a result, drops to a relative low value, rapidly, while remains near to durin phase one. The second phase starts when matches and starts to drop rapidly. V X V OUT I Fiure 2. Stack sizin of NMOS transistors. Then the stackin factor can be calculated from 1,, 1, where the left hand side of the equation is the current of the NMOS transistor in the template inverter, and the riht hand side of the equation is the current of the pull down network in the NND ate when matches. We need to derive, before we can calculate ρ usin Eqn. (5. ccordin to the near-threshold current model, we have,,, 1,,,,,, 1,,, where,,, (8, (9,,,. Note that is the DIL coefficient and is the body effect coefficient. fter the approximations in Eqns. (6 and (7 are made, implies,,. Then, is obtained as ρ ρ I (5 (6 (7

, /1 2. (10 With the calculated, value, we use Eqn. (5 to calculate the stackin factor. Table 2 summarizes the stackin factor for NMOS transistors in the 2-input NND ate usin our proposed method, HSpice simulation, and the method in [7] under different near-threshold voltaes. It shows that our proposed method is more accurate than the method in [7], because we use a more accurate current model and the DIL effect is more sinificant in short channel devices. The stackin factor for PMOS transistors in a 2-input NOR ate can also be characterized with the same method. Table 2. Stack sizin factor calculation. The Reference (V HSpice proposed [7] 0.3 2.28 2.16 2.42 0.4 1.89 2.00 2.85 Usin 2.2, we simulate the delay of a 2-input NND ate by HSpice. The results are shown in Table 3. The rise delay is almost equal to the fall delay for both cases in Table 3. Therefore, the NND ate can be approximated as a symmetric ate (i.e., the ate with and. lso, we derive the loical effort and parasitic delay for the worst case scenario, which means a larer parasitic delay value. For example, in the NND ate if we do not consider the worst case, the parasitic delay is 0.82 2.2/1.8. However, the worst case parasitic delay should be calculated usin Elmore delay method as: 0.5 0.5 6.0/1.8, 1.8 (11 where is the capacitance at node X in Fiure 2, is the resistance of NMOS transistor in the inverter in Fiure 2. Table 3. Delay of a 2-input NND ate. Worst case =1, chanes =1, chanes Rise delay 42.7622 ps 37.1325 ps Fall delay 42.7125 ps 36.8844 ps 3.3 Loical effort calculation ased on the results from the previous sections, we derive the loical effort and parasitic delay values for different types of ates operatin in the near-threshold, super-threshold and subthreshold reions, respectively. We size the template inverters in different operation reions for equal rise and fall times. Fiure 3 compares the loical effort and parasitic delay values of standard loic ates operatin in the near-threshold reion ( 0.3 V, in the stroninversion reion ( 1.0 V and in the subthreshold reion ( 0.2 V. nd we derive the loical effort and parasitic delays for the worst case, similar to (11. The NMOS and PMOS sizes in the near-threshold reion for equal rise and fall times are more balanced than those in the subthreshold reion or the super-threshold reion under the 32nm PTM. However, this conclusion is technoloydependent in eneral. 3.4 Simulation results We build a FO4 inverter chain (chain 1 usin the inverters sized for near-threshold operation (Fiure 3 (a and build a FO4 inverter chain (chain 2 usin the inverters sized for super-threshold operation (Fiure 3 (b. Then we simulate the delay of the two inverter chains under nearthreshold operation ( 0.3 V usin HSpice. Table 4 summarizes the normalized delays, showin that delay of the FO4 inverter chain can be reduced by 3.8%~34.0% by usin the new loical effort calculation for near-threshold operation. = 1 p = 1 = 1 p = 1 = 1 p = 1 0.8 1 2.7 1 0.5 1 0.8 = 3.0/1.8 p = 6.0/1.8 2.7 = 4.6/ p = 9.2/ 0.8 2.7 (a Near-threshold (0.3 V 0.5 0.5 (b Super-threshold (1.0 V = 2.6/1.5 p = 5.2/1.5 1 1 = 2.6/1.8 p = 5.2/1.8 1 1 = 4.8/ p = 9.6/ 1 1 = 1.9/1.5 p = 3.8/1.5 (c Subthreshold (0.2 V Fiure 3. Sizin of standard loic ates. n -input ND function can be constructed usin the NND-NOR structure, where 2-input NND/NOR ates are used to implement the ND function. We build an input ND function (ND 1 usin NND and NOR ates sized for near-threshold operation and build an -input ND function (ND 2 usin the NND and NOR ates sized for super-threshold operation. ND 1 and ND 2 have the same input capacitance and load capacitance, respectively. Then we simulate the delay of them under near-threshold operation usin HSpice. Table 5 summarizes the normalized delays, showin that the delay of ND function can be reduced by 18.6%~20.2% usin the new loical effort calculation for near-threshold operation. Table 4. Normalized delay of inverter chains. Stae Num. 2 3 4 6 8 Chain 1 0.660 0.663 0.715 0.962 0.705 Chain 2 1 1 1 1 1 2.2 2.2 1.9 1.9 2.1 2.1 1.6 1.6 3.8 3.8 0.9 0.9

Table 5. Normalized delay of -input ND function. 16-input ND 64-input ND ND 1 0.798 0.814 ND 2 1 1 4. Loical effort of asymmetric ates and its application Many burst-mode applications require hih performance for brief periods between extended sections of low performance operation [11]. Diital circuits supportin such burst-mode applications should work properly (with acceptable delay on both near-threshold reion and superthreshold reion. ate can hardly achieve equal rise and fall times simultaneously in both reions even if certain techniques such as adaptive body biasin are used. circuit optimally desined for super-threshold operation may not have optimal delay in the near-threshold reion, and vice versa. Therefore, we turn to ates with asymmetric rise and fall times. In this section, we will first derive the loical effort and parasitic delay values of arbitrarily sized (possibly asymmetric loic ates with body biasin for both nearthreshold and super-threshold reions. Next, we will introduce a joint optimization of ate sizin and body biasin on a super buffer so that it can have the minimum weihted delay durin its operation (both near-threshold and super-threshold. This framework can be applied to critical path sizin for lare scale circuits iven the input and load capacitance requirements. 4.1 Loical effort of asymmetric ates C 2.7 ( R st 0.8 ( R 1.8 C nt 1 ( R st Super-threshold template (1.0 V (1+s C 2.7 s ( R st s 1 ( yr st 1 ( R nt Near-threshold template (0.3 V (1+s C 0.8 s ( R nt s 1 ( zr nt y(1 + s z(1 + s f, st= f, nt= 1.8 y(1 + s z(1 + s p f, st= p f, nt= 1.8 2.7(1 + s 0.8(1 + s rst, = rnt, = s 1.8s 2.7(1 + s 0.8(1 + s prst, = prst, = s 1.8s Fiure 4. Loical effort and parasitic delay values of an inverter in the super- and near-threshold reions. Consider an inverter with an NMOS transistor with the size of one and a PMOS transistor with the size of. We assume that forward or reverse body biasin is only applied to the NMOS transistor in order to reduce the area required for routin extra sinals, and of course, the body biasin voltae is the same for all the ates in the same circuit block. Let and denote the ate capacitance and diffusion capacitance of the unit-sized NMOS transistor, respectively. We have for the 32 nm PTM. Let and denote the effective resistances of the unit-sized NMOS transistor (i.e., size = 1 in the super-threshold and near-threshold reimes, respectively, when body biasin voltae is not applied. Let, and, denote the applied body biasin voltaes on the NMOS transistors operatin in the super-threshold and near-threshold reions, respectively. The body biasin voltae values can be positive (forward body biasin or neative (reverse body biasin. Let and denote the effective resistances of the unit-sized NMOS transistors in the superthreshold and near-threshold reimes, respectively, when the effect of body biasin is considered. The values of and are functions of body biasin voltaes, and,, respectively. For such an inverter with specific values of, and, we derive its loical effort and parasitic delay values for both super-threshold and near-threshold reions. s shown in Fiure 4, we use different template inverters for superthreshold and near-threshold delay calculation. Let, denote the fallin loical effort of the inverter in the superthreshold reion (with respect to the super-threshold template inverter. Let, denote the fallin loical effort of the inverter in the near-threshold reion (with respect to the near-threshold template inverter. We calculate, usin:, 1 1 (12 nd we have,,. Similarly, we can define and calculate the other loical effort and parasitic delay values, as summarized in Fiure 4. 4.2 Joint optimization of a super buffer We use the super buffer as an example to demonstrate our joint optimization framework that can achieve the minimum weihted delay for circuits workin on both nearthreshold and super-threshold reions iven the input and load capacitance requirements. Let denote the portion of cycles when the super buffer operates in the near-threshold reion, and thereby, 1 is the portion of cycles when the super buffer operates in the super-threshold reion. The weihted delay of the super buffer is defined by 1 (13,, where is the delay of the super buffer in nearthreshold operation, and is the delay of the super buffer in the super-threshold operation. In Eqn. (13, we normalize by,, which is the delay of a super buffer specially desined for the optimal delay in near-threshold operation (without body biasin. Similarly, is normalized by,, which is the delay of a super buffer specially desined for the optimal delay in super-threshold operation (without body biasin. If we define the weihted delay as 1 instead of Eqn. (13, minimizin the weihted delay of the super buffer is almost equivalent to minimizin the

delay of the super buffer for near-threshold operation only, since is orders-of-manitude larer than. Fiure 5. n -stae super buffer. Fiure 5 shows a super buffer with staes and each inverter has a : ratio of : 1 (the inverter discussed in Section 4.1. The input capacitance and output capacitance are iven by and, respectively. Let 1 denote the input capacitance of the i-th inverter in the super buffer. We have and. Let, and, denote the rise and fall delays, respectively, of the super buffer in super-threshold operation with respect to the super-threshold template inverter. Let, and, denote the rise and fall delays, respectively, of the super buffer in near-threshold operation with respect to the near-threshold template inverter. These values can be calculated as follows based on the loical effort and parasitic delays derived in Section 4.1.,,,, 1 2.71 1 1.8 1 0.81 1.8 2.71 / 1 1 1 / 1 1 0.81 1.8 / 1 1 1 1.8 / 1 (14 (15 (16 (17 Then, the joint optimization for the minimum weihted delay is formulated as follows. Given:, input capacitance and load capacitance. Find: The optimal values of, 's,,, and. Minimize: The weihted delay 1 (18 Subject to the followin constraints:, (19, (20, (21, (22 The body biasin voltae constraints: (23 (24 The balancin constraints for each inverter:, 2.7, (25, 0.8, (26 Note that is the delay of the super buffer in nearthreshold operation with respect to the near-threshold template inverter, and is the delay of the super buffer in super-threshold operation with respect to the super-threshold template inverter. Therefore, minimizin Eqn. (18 is equivalent to minimizin Eqn. (13. The joint optimization problem is a eometric prorammin problem if is iven, which can be transformed into a standard convex optimization problem and therefore can be solved optimally with polynomial time complexity [13]. We use an outer loop to determine the optimal value in the optimization. This joint optimization framework can also be applied to other circuit structures with little modification. We will show the effectiveness of this framework on both a super buffer structure and an -input ND function in next Section. 5. Experimental results We test our joint optimization framework with the super buffer structure. We perform simulations usin the 32nm PTM. The supply voltae in super-threshold operation is 1.0 V, and the supply voltae in near-threshold operation is 0.3 V. For different iven, input capacitance and load capacitance values, we find the optimal values of, 's,,, and usin the convex optimization toolbox. Then we simulate the optimal super buffer usin HSpice with the optimal values of, 's,,, and. 's and correspond to the transistor sizes in the super buffer, and and are translated into body biasin voltaes in super-threshold and near-threshold reions, respectively. Note that we use one body biasin voltae for all the NMOS transistors in the super buffer for super-threshold operation and another body biasin voltae for all the NMOS transistors in the super buffer for near-threshold operation. aseline 1 is a super buffer (without body biasin desined for the minimum delay in super-threshold operation, with the same iven and. aseline 2 is a super buffer (without body biasin desined for the minimum delay in near-threshold operation, with the same iven and. Then we compare the weihted delay of the super buffer optimized with the proposed method, and the weihted delays of aseline 1 and aseline 2. The results are summarized in Table 6. The weihted delays of aseline 1 and aseline 2 are normalized

by the weihted delay of the proposed optimal super buffer with adaptive body biasin. The proposed optimal super buffer achieves a maximum of 40.1% ( 1 1/1.67 reduction in weihted delay compared to the baselines. Table 6. The normalized weihted delay values of the optimal super buffer, aseline 1 and aseline 2. Simulation Setup Weihted Delay Optimal ase 1 ase 2 3 10000 0.8 1 1.22 1.38 3 8000 0.8 1 1.29 1.36 3 6000 0.7 1 1.29 1.39 3 4000 0.7 1 1.40 1.32 3 1000 0.7 1 1.58 1.26 3 100 0.7 1 1.67 1.23 3 100 0.3 1 1.38 1.20 We further test our joint optimization framework with the -input ND function, which has a NND-NOR structure, and 2-input NND/NOR ates are used to implement the ND function. For iven, and values, we find the optimal values of 's,,, and usin the convex optimization toolbox for the -input ND function. Note that the number of staes in the ND function is determined by the input number, and therefore is no loner an optimization variable here. aseline 1 is an -input ND function desined for the minimum delay in super-threshold operation (without body biasin. aseline 2 is an -input ND function desined for the minimum delay in near-threshold operation (without body biasin. Then we compare the weihted delay of the ND function optimized with the proposed method with the weihted delays of the baselines. The results are summarized in Table 7. The weihted delays of aseline 1 and aseline 2 are normalized by the weihted delay of the proposed optimal ND function with adaptive body biasin. The proposed optimal ND function achieves a maximum of 31.0% ( 1 1/1.45 reduction in weihted delay compared to the baselines. Table 7. The normalized weihted delay values of the optimal ND function, aseline 1 and aseline 2. Optimal aseline 1 aseline 2 64-input 1 1.45 1.31 ND 256-input 1 1.24 1.21 ND 6. Conclusion This paper presented a new loical effort calculation and optimization framework for diital circuits operatin in both near- and super-threshold reions. Modification of the traditional loical effort method is required because the characteristics of MOS transistors operatin in the nearthreshold reion are very different from those in the stroninversion reion. This paper thus introduces: (i an enhanced analytical transreional transistor model with hih accuracy for the sub/near-threshold reion, and the application of such model for sizin the template inverter with equal rise and fall times; (ii an accurate sizin of the transistors in a stack considerin the worst case; (iii loical effort and parasitic delay calculation. We subsequently utilize the derived near-threshold loical effort method for delay optimization for circuits operatin in both near- and superthreshold reimes. We perform a joint optimization of transistor sizin and adaptive body biasin for a super buffer and an -input ND function, utilizin the eometric prorammin method. Experimental results on 32nm Predictive Technoloy Model shows that our improved loical effort-based optimization framework provides a performance improvement of up to 40.1% over the conventional loical effort method. 7. cknowledements This research is sponsored in part by rants from the PERFECT proram of the Defense dvanced Research Projects ency and the Software and Hardware Foundations of the National Science Foundation. 8. References [1] R. Dreslinski, M. Wiekowski, D. laauw, D. Sylvester, and T. Mude, "Near-threshold computin: reclaimin Moore's law throuh enery efficient interated circuits," Proc. of IEEE, vol. 98, no. 2, pp. 253-256, Feb. 2010. [2] D. Markovic, C. Wan, L. larcon, T. Liu, and J. Rabaey, Ultralow-power desin in near-threshold reion, Proc. of IEEE, vol. 98, no. 2, pp. 237 252, Feb. 2010. [3]. Wan and. Chandrakasan, 180 mv FFT processor usin subthreshold circuit techniques, IEEE International Solid-State Circuits Conference (ISSCC, pp. 292 293, Feb. 2004. [4]. Zhai, D. laauw, D. Sylvester, and K. Flautner, The limit of dynamic voltae scalin and insomniac dynamic voltae scalin, IEEE Trans. on VLSI, vol. 13, no. 11, pp. 1239 1252, Nov. 2005. [5]. H. Calhoun,. Wan, and. Chandrakasan, Modelin and sizin for minimum enery operation in subthreshold circuits, IEEE J. Solid-State Circuits, vol. 40, no. 9, pp. 1778 1786, Sept. 2005. [6] I. Sutherland,. Sproull, and D. Harris, Loical Effort: Desinin Fast CMOS Circuits. San Francisco, C: Moran Kaufmann, Jan. 1999. [7] J. Keane, H. Eom, T.-H. Kim, S. Sapatnekar, and C. Kim, Subthreshold loical effort: a systematic framework for optimal subthreshold device sizin, in Desin utomation Conference (DC, 2006. [8] T. Sakurai and R. Newton, "lpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas," IEEE J. Solid-State Circuits, vol. 25, no. 2, pp. 584-594, pril 1990. [9] C. Enz, F. Krummenacher, and E. Vittoz, n analytical MOS transistor model valid in all reions of operation and dedicated to low-voltae and lowcurrent applications, nalo Interated Circuits and Sinal Processin, vol. 8, no. 1, pp. 83 114, July 1995.

[10] D. M. Harris,. Keller, J. Karl, and S. Keller, transreional model for near-threshold circuits with application to minimum-enery operation, in International Conference on Microelectronics (ICM, 2010. [11]. H. Calhoun,. Wan, N. Verma, and. Chandrakasan, Sub-threshold desin: the challenes of minimizin circuit enery, ISLPED, 2006. [12] J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D.. ntoniadis,. P. Chandrakasan, and V. De, daptive body bias for reducin impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakae, IEEE J. Solid-State Circuits, Nov. 2002. [13] S. oyd and L. Vandenberhe, Convex Optimization, Cambride University Press, 2004. [14] W. Zhao and Y. Cao, New eneration of Predictive Technoloy Model for sub-45nm early desin exploration, IEEE Transactions on Electronic Devices, vol. 53, no. 11, pp. 2816 2823, Nov. 2006.