A Novel Low Power 1-bit Full Adder with CMOS Transmission-gate Architecture for Portable Applications M. C. Parameshwara 1,K.S.Shashidhara 2 and H. C. Srinivasaiah 3 1 Department of Electronics and Communication Engineering, Vemana Institute of Technology, Bangalore, India. 2 Department of Electronics and Communication Engineering, Nitte Meenakshi Institute of Technology, Bangalore, India. 3 Department of Telecommunication Engineering, Dayananda Sagar College of Engineering, Bangalore, India. e-mail: 1 mcp 05@rediffmail.com; 3 hcsrinivas@dayanandasagar.edu Abstract. This paper discusses a rail to rail swing, novel CMOS transmission gate architecture based 1-bit 30-transistor full adder. Worst-case: power, delay and power delay product of this 1-bit full adder is compared with other two high performance 1-bit full adder architectures reported till date, at 90 nm technology node. The proposed 1-bit adder has 13.9% improvement in power and 14.55% improvement in power delay product over the reported architectures. The delay performance of proposed 1-bit adder and that of the reported architectures are comparable. This analysis has been done at supply voltage V DD = 1.2 V, load capacitance C L = 2fF,ata maximum input data rate f MAX = 200 MHz. Keywords: Transmission-gate full-adder, Power delay product, Carry dependant sum adder, Transition matrix, Double pass logic, Complementary pass logic. 1. Introduction Modern embedded devices are characterized by integration of variety of functionalities augmented by advancements in architecture, design, manufacturing, etc., technologies. Some of the leading edge functionality implementation includes Digital Signal Processing (DSP). The DSP functionalities are an integral part of real-time multimedia processors, high speed digital transceivers, etc., of modern Internet technology. The most fundamental of all digital operations is addition. The design of an efficient 1-bit full adder (FA) is the most basic need for high speed real time DSP. The focus in this paper is to develop an efficient 1-bit FA circuit for integration in real-time DSP. In design of efficient digital circuits, the most important design metrics (DMs) of concern are power, speed, size, and cost [1 3]. The DMs compete with each other while optimizing; for e.g., reduced delay results in increased power dissipation. Simultaneous optimization of the DMs, need proper knowledge about each in terms of relationship among each other. Finding this relationship is a very complex problem with its roots in process-voltage-temperature (PVT) space. To simplify this complexity, simple heuristics are followed: like the design and optimization of sub-circuits with small number of inputs, and then integrate these sub-circuits in next level, and so on. We take product of competing DMs and optimize this product resulting in simultaneous optimization of each of the DMs in the product, e.g. power delay product (PDP). In this paper we have achieved minimum PDP for 1-bit FA through architectural innovation; accordingly we designed a novel, 30-transistor (30T) carry dependent sum 1-bit FA circuit using CMOS transmission-gates (CTGs). This paper is organized as follows: Section-2 reviews fundamentals of existing 1-bit FA architectures: alternative logic 1 (AL-1) and alternative logic 2 (AL-2). Section-3 presents the proposed 1-bit alternative logic 3 (AL-3) 1-bit FA circuit. Section-4 discusses methodology, analysis and novel aspects of results for AL-3 circuit. Finally we conclude in section-5. 2. Classification of 1-bit Full-Adder Architectures The FA architectures have been broadly classified into three main categories viz. XOR-XOR based, XNOR-XNOR based, and XOR-XNOR based depending upon the logic expression of two outputs: sum S i, and carry C i+1 [1,2]. Corresponding author Elsevier Publications 2013.
A Novel Low Power 1-bit Full Adder with CMOS Transmission-gate Architecture for Portable Applications Figure 1. Three 1-bit adder architecture implementation using MLS, (a), (c), and (e) are block diagram representations of AL-1, AL-2 and AL-3 respectively; (b), (d) and (f) circuit representations of AL-1, AL-2 and AL-3, respectively. Table 1. Truth table for the implementation of 1-bit FA AL-3 circuit. A i B i C i S i C i+1 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 Two more architecture, AL-1 and AL-2 reported in [4,5], together with AL-3 architecture proposed in this paper can be classified as another (4th) category, which follows mixed logic style (MLS). The AL-1 and AL-2 architectures have been realized based on double pass logic (DPL) and CTG logic, resulting in MLS implementation [4,5]. The performances of all the 3 architectures are compared with respect to delay, power and PDP performance metrics as discussed later in section-3 and 4. Figure 1(a), 1(c) and 1(e) shows block diagrams of AL-1, AL-2 and the proposed AL-3, 1-bit FA architectures; and figure 1(b), 1(d) and 1(f) are their respective circuit implementations. In AL-1, XOR, XNOR, AND and OR gates are implemented independently. In AL-2 the XOR and XNOR, are implemented as complementary pass logic (CPL) whose outputs are multiplexed with C i as select signal; whereas AND and OR gates are implemented independently. In figure 1(a) (d), the sum S i = H i C i = H i C i + H i C i where H i = A i B i, and the carry C (i+1) = (A i + B i ) C i + (A i B i ) C i. 3. Proposed 1-bit Full-Adder AL-3 Architecture and its Implementation using Transmission-Gates The proposed AL-3 1-bit adder architecture is based on truth table shown in table 1. Examining the truth table it can be observed that carry out C i+1 is equal to A i B i value when carry in C i equal to 0 and (A i + B i ) when C i is equal to 1 formulated as C ( i+1) = (A i +B i ) C i +(A i B i ) C i [4,5]. Similarly, in evaluating the sum S i, A i B i C i is selected when C i+1 = 1, and A i +B i +C i selected when C i+1 = 0, expressed as S i = (A i B i C i ) C ( i +1)+(A i +B i +C i ) C ( i +1). This proposed approach leads to carry dependant sum, AL-3 1-bit FA circuit of figure 1(e) and 1(f). Figure 1(e) is the block diagram and figure 1(f) is the circuit schematic of AL-3 1-bit FA circuit, derived from its expressions for S i and C i+1. The circuit of figure 1(f) uses 30T, which proves to better in terms of power, delay and PDP parameters, over both AL-1 (28T) and AL-2 (26T), as demonstrated, in later section. The inputs A i, B i and C i of 1-bit FA circuit under test (CUT), are driven by standard test signals with minimum sized buffers. A minimum load capacitance C L = 2fF for 90nm technology equal to fan-out-four (FO4) inverter load is used at the outputs S i and C i+1 for power and delay measurement. Elsevier Publications 2013. 385
M. C. Parameshwara, K. S. Shashidhara and H. C. Srinivasaiah Figure 2. Transition matrix (TM) corresponding to 56 input vector transitions on inputs A i, B i and C i for AL-3, 1-bit FA circuit. 4. Simulation Methodology and Discussion on Results All the 3, 1-bit FAs: AL-1, AL-2 and AL-3 are simulated using Cadence Spectre tool and generic 90 nm process design kit (PDK) to determine their worst-case: delay, average power and PDP under identical PVT conditions. Study is performed in 2 steps. In the first step, the study of the DMs are compared for the 3 adder circuits as a function of supply voltage V DD from 0.6 V to 1.8 V, at C L = 2fF, and f MAX = 200 MHz (reciprocal of pulse width = 5ns) [4,5]. In the second step the design metrics are again studied as a function of load capacitance C L, varied from 0 ff to 100 ff, at V DD = 1.2 V,and f MAX = 200 MHz. The propagation delay t pd is calculated as the time from 50% change in input signals to a corresponding 50% change in output signals. There are 2 k 2 k 1 = 56 numbers of input transitions for k = 3 inputs A i, B i and C i to evaluate outputs S i and C i+1. One of the 56 input transitions would be critical with maximum delay in outputs S i,orc i+1 for all the 3, 1-bit FA circuits. Figire 2 shows 8 8 TM for the proposed AL-3, 1-bit FA circuit. TM consists of 64 cells; 8 cells along the diagonal are insignificant, without transitions. Each 64 8 = 56 cells in the TM are partitioned into sub-cells, with 1st sub-cell containing delay in S i and 2nd sub-cell containing C i+1 delay. About 24 input transitions will have no change in the outputs S i and C i+1 labelled NA (Not Applicable) in figure 2. Similar TMs are generated for AL-1 and AL-2, 1-bit FA circuits also. The worst-case critical delay over 56 24 = 32 input transitions on A i, B i and C i are MAX (delay in S i,delayinc i+1 ) in any cell, as highlighted in figure 2. The worst-case delay equals 194.3ps for AL-1 and 196.5ps for AL-2 circuits occurring in critical C i+1 path for input transition from 010 to 101 ; whereas the worst-case delay for AL-3 is 195ps in the S i output for 100 to 011 transition (highlighted in figure 2). Average power dissipated, P avg is the sum of 3 components: dynamic, static, and short circuit powers. The worstcase power at a given V DD and C L is determined as the average power dissipated, over 9 combinations of 3 frequencies each, applied at inputs A i, B i and C i yielding valid logic levels at outputs S i and C i+1 [6,7] as shown in table 2; first 6 frequency combinations are equivalent to applying 56 different input transitions of figure 2. The 3 frequencies in table 2 are f MAX = f H = 200 MHz, f M = 100 MHz, and f L = 50 MHz; in last 3 rows, f MD assigned to B i is the f M delayed by 50% of its pulse width, which accounts for glitch power. To determine the worst-case PDP, we take the product of worst-case: power and delay (figure 2). The PDP provides a means of trading off between power and delay. Minimum PDP implies prolonged battery life, a desirable feature for portable applications. Figure 3 shows the dependence of worst-case: delay, power and PDP on supply voltage V DD and load capacitance C L for AL-1, AL-2 and AL-3, 1-bit FA circuits. Figure 3(a) depicts worst-case delay characteristics of Al-1, AL-2 and AL-3 circuits, as a function of V DD, varied from 0.6 V to 1.8 V. It is noticed that the AL-3 is having minimum delay for V DD > 1.0 V compared to AL-1 and AL-2 circuits. Figure 3(b) indicates the worst-case delay as a function of C L, varied from 0 ff to 100 ff. The delay of critical path output of AL-3 is comparable to AL-1 and AL-2 circuits when C L < 10 ff. For C L beyond 10 ff the delay of AL-3 is marginally high compared to AL-1 and AL-2. This implies relatively low fan-out for AL-3 in comparison with other 2 circuits. In order to estimate the average worst-case power (as discussed earlier), the frequency combinations on the inputs A i, B i and C i are applied as shown in table 2. The power analysis is done at f MAX = 200 MHz with V DD and/or 386 Elsevier Publications 2013.
A Novel Low Power 1-bit Full Adder with CMOS Transmission-gate Architecture for Portable Applications Table 2. Nine combinations of 3 frequencies each, applied at the 3 inputs A i, B i and C i of AL-1, AL-2 and AL-3 1-bit full adders. Frequency combinations at inputs A i, B i and C i Sl. no. A i B i C i 1 f H f M f L 2 f H f L f M 3 f M f L f H 4 f M f H f L 5 f L f H f M 6 f L f M f H 7 f M f MD f H 8 f M f MD f M 9 f M f MD f L Figure 3. Plots of delay, power and PDP as a function of supply voltage V DD and load capacitance, C L for AL-1, AL-2 and AL-3 circuits, (a) Delay v/s V DD, (b) Delay v/s C L,(c)Powerv/sV DD,(d)Powerv/sC L,(e)PDPv/sV DD and (f) PDP v/s C L. C L being varied. The average of the worst-case power is determined over the 9 frequency combinations applied at the inputs A i, B i and C i. Figure 3(c) shows the average power dissipation as a function of V DD. Compared to AL-1 andal-2,thepowerdissipatedinal-3issmallfor0.6v V DD 1.8V and C L < 10 ff; for C L> 10 ff, the difference in power for AL-3 and that of AL-1 and AL-2 is not significant. Figure 3(d) shows the average power dissipation as a function of C L for AL-1, AL-2 and AL-3 circuits; C L is varied from 0 ff to 100 ff, at V DD = 1.2V and f MAX = 200 MHz. The power dissipation in AL-1 and AL-3 is comparable for C L < 20 ff, whereas AL-2 has slightly more power than AL-1 and AL-3 for 0 ff C L 100 ff. Figure 3(e) shows the worst-case PDP as a function of V DD for AL-1, AL-2 and AL-3 circuits. The AL-3 circuit is having minimum worst-case PDP, among all the 3 circuits for 0.6V V DD 1.8VatC L = 2 ff. Figure 3(f) shows the variation of worst-case PDP as a function of C L, for 0 ff C L 100 ff. For C L < 20 ff, the worst-case PDP dependence on C L is comparable for all the 3 circuits; for C L > 20 ff, the AL-3 circuit is having marginally high PDP with respect to AL-2 and is higher when compared with AL-1. This implies that AL-3 is suitable candidate for portable applications. Table 3 shows percentage change ( ) improvement in power, delay and PDP for AL-3 with respect to AL-1 and AL-2 architectures. The minus sign in Delay indicates, increase in delay; but this increase in not significant, compared to improvement in power and PDP performance metrics (PM). Change in AL-3 metric with respect to AL-1 (PM of AL-1 OR AL-2) PM of AL-3 and AL-2 is = (PM of AL-1 OR AL-2) 100%, where PM = power, delay or PDP. In table 3, we notice improvement in power and PDP for AL-3 over AL-1 and AL-2 circuits. Elsevier Publications 2013. 387
M. C. Parameshwara, K. S. Shashidhara and H. C. Srinivasaiah Table 3. Percent improvement in worst-case: power, delay and PDP for AL-3 with respect to AL-1 and AL-2 circuits at V DD = 1.2V,C L = 2fF,and f MAX = 200 MHz. Improvement in Performance Metric of Power (%) Delay (%) PDP (%) AL-3 with respect to AL-1 9.9 0.36 9.63 AL-3 with respect to AL-2 13.9 0.763 14.55 5. Conclusions In this paper, a novel 1-bit FA circuit designated as alternative logic-3 (AL-3) is proposed, and analyzed to determine its worst-case: delay, power and PDP performance metrics in comparison with two high performance 1-bit FAs designated as AL-1 and AL-2. There is 9.9% and 9.63% improvement in worst-case: power and PDP respectively for the proposed AL-3, over AL-1 circuit; and AL-3 has 13.9% and 14.55% improvement in worst-case: power and PDP parameters respectively, over AL-2 circuit, at V DD = 1.2V, C L = 2fF, and f MAX = 200 MHz. To determine the dependence of worst-case: delay, power and PDP, over V DD and C L, the study is done in 2 steps. In the first step, the worst-case: delay, power and PDP are studied as a function of supply voltage V DD, varied from 0.6 V to 1.8 V at C L = 2fF, and f MAX = 200 MHz; for 0.6V V DD 1.8 V, the AL-3 circuit has minimum power and PDP, and comparable delay, over AL-1 and AL-2 circuits. In the second step, the worst-case: delay, power and PDP are studied as a function of load capacitance C L, varied from 0 ff to 100 ff, at V DD = 1.2 V,and f MAX = 200 MHz. For 0fF C L 100 ff, the performance of AL-3 circuit has delay comparable with AL-1 and AL-2 circuits, and it is superior in terms of power and PDP over AL-1 and Al-2 circuits. The result of this paper shows AL-3 FA circuit as a suitable alternative for portable applications. Acknowledgement Authors acknowledge management, Dayananda Sagar Group of Institutions (DSI), Bangalore for all its support in pursuing this research in Research center, Department of Telecommunication Engineering, Dayananda Sagar College of Engineering, Bangalore. Our special thanks are due to Dr. Premachandra Sagar, Vice-chairman, DSI for all his encouragement for research. References [1] S. Goel, A. Kumar and M. A. Bayoumi, Design of robust, energy-efficient full adders for deep-submicrometer design using hybrid- CMOS logic style. IEEE Transaction Very Large Scale Integr.(VLSI) System, Dec. 2006, 14(12), p. 1309 1321. [2] S. Goel, S. Gollamudi, A. Kumar and M. A. Bayoumi, On the design of low energy hybrid CMOS 1-bit full adder cells. In: Proceedings of 47th IEEE International Midwest Symposium of Circuits and System, 2004. p. 209 212. [3] Dipanjan Sengupta and Resve Saleh, Generalized Power Delay Metric in Deep Submicron CMOS Design. IEEE Transaction Computer Aided Design of Integrated Circuits and Systems Jan. 2007, 26(1), p. 183 189. [4] Mariano Aguirre-Hernandez and Monico Linares-Aranda, CMOS Full Adders for Energy-Efficient Arithmetic Applications. IEEE Transaction Very Large Scale Integration (VLSI) System, April. 2011, 19(4), p. 718 721. [5] Mariano Aguirre and Monico Linares, An alternative Logic Approach to Implement High-speed Low Power Full Adder Cells. In: SBCCI 05, September 4 7, 2005, Florianopolis, Brazil: p. 166 171. [6] M. Shams and M. Bayoumi, Performance Evaluation of 1 bit CMOS Adder Cells. In: IEEE Symposium on Circuits and System, Orlando, FL; 1999, p. 27 30. [7] A. M. Shams, T. K. Darwish and M. A. Bayoumi, Performance Analysis of Low-Power 1-bit CMOS Full Adder Cells. IEEE Transaction Very Large Scale Integration (VLSI) System, Feb. 2002, 10(1), p. 20 29. 388 Elsevier Publications 2013.