Progress in Reversible Processor Design: A Novel Methodology for Reversible Carry Look-ahead Adder

Progress in Reversible Processor Design: A Novel Methodology for Reversible Carry Look-ahead Adder Himanshu Thapliyal #, Jayashree H.V *, Nagamani A. N *, Hamid R. Arabnia + # Department of Computer Science and Engineering, University of South Florida, USA * Department of E &C, PES Institute of Technology, India + Department of Computer Science, The University of Georgia, USA {hthapliy@cse.usf.edu, jayashreehv@pes.edu, nagamani@pes.edu, hra@cs.uga.edu} Abstract. Reversible logic is playing a significant role in quantum computing as quantum operations are unitary in nature. Quantum computer performs computation at an atomic level; thereby doing high performance computations beyond the limits of the conventional computing systems. Reversible arithmetic units such as adders, subtractors, multipliers form the essential component of a quantum computing system. Among the adder designs, carry look-ahead is widely used in high performance computing due to its O (log n) depth. In this work, we present improved designs of both in-place and out-of-place reversible carry look-ahead adder proposed in [1]. The proposed designs utilize the properties of the reversible Peres gate and the TR gate to optimize the logic depth, quantum cost and gate count compared to the existing designs proposed in [1]. Both the improved designs assume no input carry (C0=0). While the first approach makes use of ancilla bits to store the sum outputs, the second approach stores the sum outputs in one of the input locations. Keywords: Reversible logic, Peres gate, Toffoli gate, TR gate, Quantum arithmetic. 1 Introduction According to [2], KTln2 Joules of heat energy is produced for every bit of information lost during irreversible computation, where K is Boltzmann constant and T is absolute temperature. This energy dissipation would not occur during reversible computation as there is no loss of information [3]. Reversible circuits always maintain one-to-one mapping between the inputs and the outputs, and is performed by reversible logic gates. There are outputs in reversible circuits which are neither used in the further stages of computation nor restore any original inputs. These redundant outputs are called garbage outputs, and are considered design overhead. Regenerated inputs at the outputs are not considered as garbage outputs [7]. The constant inputs (0 or 1) are called ancilla bits which are used in reversible quantum circuits for storing intermediate values during computation. Reversible circuits are designed using reversible logic gates such as Toffoli, Fredkin, Feynman, Peres, TR, CNOT and similar gates. Reversible logic has wide variety of applications in the field of emerging technologies such as quantum computers, optical computing, Cellular automata, ultra- low power VLSI [7, 8].

One of the major applications of reversible logic lies in quantum computing. Quantum computer performs computation at an atomic level; thereby doing high performance computations beyond the limits of the conventional computing systems. The quantum algorithms can potentially solve NP-complete problems. This means that once quantum computers are realized, previously unsolvable problems can be solved in critical fields such as biotechnology, nano-medicine, secure computing, etc. For example, quantum Shor factoring algorithm can find the prime factors of an integer N in polynomial time [4, 5]. A quantum computer will be viewed as a quantum network (or a family of quantum networks) composed of quantum logic gates; each gate performing an elementary unitary operation on one, two or more twostate quantum systems called qubits. Each qubit represents an elementary unit of information; corresponding to the classical bit values 0 and 1. Any unitary operation is reversible and hence quantum networks must be built from reversible logical components. Several important metrics need to be considered in the design of reversible circuits the importance of which needs to be discussed. Quantum computers of many qubits are extremely difficult to realize thus the number of qubits in the quantum circuits needs to be minimized. Further, more the number of ancilla inputs and the garbage outputs more will be the number of the I/O pins in the circuit. This sets the major objective of optimizing the number of ancilla inputs and the number of the garbage outputs in the reversible logic based quantum circuits. The reversible circuit has other important parameters of quantum cost and delay which need to be optimized. Each reversible gate has a cost associated with it called the quantum cost. The quantum cost of a reversible gate is the number of 1x1 and 2x2 reversible gates or quantum logic gates required in its design. The quantum costs of all reversible 1x1 and 2x2 gates are taken as unity. For delay calculation, the delay of a 3x3 reversible gate can be computed by calculating its logical depth when it is designed from smaller 1x1 and 2x2 reversible gates. Reversible arithmetic units such as adders, subtractors, multipliers form the essential component of a quantum computing system. Among the adder designs, carry lookahead is widely used in high performance computing due to its O (log n) depth. In this work, we present improved designs of both in-place and out-of-place reversible carry look-ahead adder proposed in [1]. The proposed designs utilize the properties of the reversible Peres gate and the TR gate, and also reduce the number of steps of computation to optimize the logic depth, quantum cost and gate count compared to the existing designs proposed in [1]. Both the improved designs assume no input carry (C0=0). While the first approach makes use of ancilla bits to store the sum outputs, the second approach stores the sum outputs in one of the input locations. The rest of the paper is organized as follows: Section 2 discusses the basics of reversible logic gates. Section 3 presents the classical carry look-ahead adder and prior works. Section 4 and Section 5 are dedicated for design methodology of proposed reversible out-of-place and in-place reversible carry look-ahead adders, respectively. Section 6 shows the simulation results that verifies the working of the proposed design methodologies. Finally the conclusions are provided in Section 7.

2 Basic Reversible Gates The reversible gates used in this work are Peres gate [11], Toffoli gate [12], TR gate [13], Controlled NOT gate (Feynman gate) and NOT gate. The Toffoli gate, the Peres gate and the TR gate are 3x3 reversible gates that can be constructed using primitive gates such as Controlled NOT gate, Controlled V gate and its conjugate Controlled V+ gate. The complexity of a 3x3 reversible gate is measured by its quantum cost. Quantum cost is the implementation cost of a 3x3 reversible gate from basic primitive gates, such as Controlled NOT gate, Controlled V gate and its conjugate Controlled V+ gate. Quantum costs of primitive gates are considered to be 1 [14-16]. (a) (b) Fig.1. (a) Feynman gate; (b) Graphical representation (a) (b) (c) Fig.2. (a) Peres gate; (b) Graphical representation; (c) Quantum implementation. 2.1 Feynman/CNOT Gate The Feynman gate (FG) or the Controlled NOT gate (CNOT) is 2 inputs, 2 outputs reversible gate with inputs (A, B) which are mapped to outputs (P=A, Q= A B). It has a quantum cost of 1. Figures 1.a and 1.b show the block diagrams and quantum representation of the Feynman gate. The Feynman gate is widely used for either copying the signal A when B=0 thus avoiding the fan-out problem in reversible logic, or for generating the complement of the signal A when B=1 [14,15,16,36].

2.2 Peres Gate (PG) Figure 2.a, Figure 2.b and Figure 2.c show a Peres gate, the graphical representation and its quantum implementation [11]. It is a 3*3 reversible gate having inputs (A, B, C) and outputs P = A, Q = A B, R = AB C. The quantum cost of Peres gate is 4 [15], since it requires 2 Controlled V+ gates, 1 Controlled V gate and 1 CNOT gate in its design. In the existing literature, among the 3*3 reversible gates, Peres gate has the minimum quantum cost along with the TR gate which has also the quantum cost of 4 as discussed below. 2.3 TR Gate (TR) Figure 3.a, Figure 3.b and Figure 3.c [13] show a TR gate, the graphical representation and its quantum implementation. It is a 3*3 gate with inputs (A, B, C) and outputs P=A, Q = A B, R=AB C. Quantum cost of TR gate is 4 since its quantum implementation uses 2 Controlled V gates, 1 Controlled V+ gate and one CNOT gate. (a) (b) (c) Fig.3. (a) TR gate; (b) Graphical representation; (c) Quantum implementation. 2.4 Toffoli Gate (TG) Figure 4a, Figure 4b and Figure 4c show a Toffoli gate, the graphical notation and its quantum implementation [12]. It is a 3*3 gate with inputs (A, B, C) and outputs P=A, Q=B, R=AB C. Toffoli gate is one of the most popular reversible gates and has quantum cost of 5. The quantum cost of Toffoli gate is 5 as it needs 2 Controlled-V gates, 1 Controlled-V+ gate and 2 CNOT gates to implement it.

(a) (b) (c) Fig. 4. (a) Toffoli gate; (b) Graphical representation; (c) Quantum implementation 3 Classical Carry Look-ahead Adder and Prior Works The carry look-ahead adder is used in most of the arithmetic circuits to reduce the effect of propagation delay that was significant in the fundamental ripple carry adder. In ripple carry adder the carry at any given stage would be available only after the carry of previous stage has been generated. The carry generated by the ripple carry adder in the least significant bit must propagate through all the intermediate adders till it reaches the most significant adder. So the propagation delay increases as the number of bits increase. Observing the carry equation it is clear that apart from the input bits to be added, the sum and carry out of any stage depends on the carry out of the previous stage. In carry look-ahead adder the sum and carry bits of any stage is not dependent on the results of previous stages, due to this reason the ripple effect of ripple carry adder is eliminated. This reduces the propagation delay. The carry equation for the ripple carry adder is: C i = a i.b i + b i.c i + a i.c i = a i.b i + c i (a i +b i ) (1) where, a i and b i represents two binary numbers, with n number of bits, i has range: 0 i n-1 and C i is the input carry bit. The term a i b i relates to the carry formed at i th stage and is referred as carry generate function g i = a i b i. The term a i +b i relates to the carry c i generated at the previous stage and thus a i +b i is referred as carry propagate function p i. It is clear from g i and p i expressions that they are the functions of only the parallel inputs a i and b i. Now we can write carry generated in carry lookahead adder as: C i+1 =g i +p i c i (2)

Using (2), the carry generated at every stage for 4 bit carry look-ahead adder is: C 1 =g 0 +p 0.c 0 C 2 =g 1 +p 1.c 1 =g 1 +p 1.g 0 +p 1.p 0.c 0 C 3 =g 2 +p 2.g 1 +p 2.p 1.g 0 +p 2.p 1.p 0.C 0 C 4 = g 3 + p 3.g 2 + p 3.p 2.g 1 + p 3 p 2.p 1.g 0 + p 3 p 2.p 1.p 0.C 0 Since C i =g i:0, carry ripple addition can be considered as group propagate and generate logic to form g i:0 =g i +p i. g i-1:0. Group generates a carry if the most significant bits or least significant bits generate, and upper portion propagates that carry. The group propagates a carry if both least significant and most significant portions propagate the carry. These signals can be defined recursively for i < k < j as g i:j =g i:k. p k : j g k: j and p i:j =p i:k. p k-1:j. Now we can define generate and propagate signals for bit0 as g 0:0 =C in and p 0:0 = 0.The sum signal can be calculated as Si = a i b i C i = p i C i. 3.1 Prior Works Reversible logic could be of high interest in supercomputing. It is illustrated in [21] that supercomputing which exceeds 32 Petaflops to 25 Exaflops could be based on reversible logic [21]. Shor showed that prime factorization and discrete logarithms can be done quickly on a quantum computer [4,5]. The new technique Quantum Fourier Transform introduced by the author in [17] to reduce number of qubits used for addition. Measurement based (MBQCLA) carry look ahead adder and graph based carry look-ahead adders are (GSQCLA) are presented in [18]. Modified carry lookahead BCD Adder with CMOS and Reversible Logic Implementation with a novel powerless AND and groundless OR gates is presented in [19] that fulfills the need for IEEE 754r format. A novel carry look-ahead BCD subtractor based on carry lookahead BCD adder is proposed by the authors in [20], in which the number of reversible gates and garbage outputs are optimized. In another research the authors in [22] proposed a circuit that uses NAND gates to replace the AND and NOT gates in CLA, it decreases the cost of CLA and increases the speed of CLA. Addition with one bit ancilla is presented in [23]. In [37], researchers have designed the quantum ripple carry adder having no input carry with one ancilla input bit, while in [9 ] and [38] new designs of the quantum ripple carry adder with no ancilla input bit and improved delay are presented. In [13], the authors presented new designs for reversible binary and BCD adder circuits that are primarily optimized for the number of ancilla inputs and the number of garbage outputs, and are designed for possible best values for the quantum cost and delay. A logarithmic depth quantum carry look ahead adder is presented in [1], which demonstrates two versions of addition namely in place and out-of-place method of addition. This quantum carry look ahead (QCLA) adder accepts two n-bit numbers and adds them in O(log n) depth using O(n) ancillary qubits. Another design presented in [24] modifies the work carried out by the authors in [1] and optimizes asymptotically the number of ancilla bits by introducing a novel carry gate. In this work, we present improved designs of both in-place and out-of-place designs of reversible carry look-ahead adder proposed in [1]. The proposed designs utilize the properties of the reversible Peres gate and the TR gate to optimize the logic depth,

quantum cost and gate count compared to the existing designs proposed in [1]. Both the improved designs assume no input carry (C0=0). While the first approach makes use of ancilla bits to store the sum outputs, the second approach stores the sum outputs in one of the input locations. In the proposed designs, there is no overhead in terms of the number of ancilla inputs and garbage outputs compared to the existing design presented in [1]. 3.2 Existing Design [1]: Logarithmic depth Quantum Carry Look-ahead Adder Authors in [1] present an efficient carry look-ahead adder quantum circuit which adds two n bit numbers with O(log n) depth and O(n) ancillary qubits. The circuit design involves two methodologies they are termed as in-place and out-of-place design methods. In the out-of-place method, the sum of each bit is stored in the memory location reserved for ancillary bit. In the in-place method the sum of each bit is stored in the memory location reserved for input bits. The author demonstrated mainly few constraints in the circuit design. The constraints are namely Reversibility (do not lose information on the input bits), Erasure (reset the ancilla bits to get benefit of quantum interference) and Bounded Fan out (to use multiple bit perform fan out operation). In our discussion, logic depth is defined as the number of reversible logic gates on the longest path of the circuit. Let a i and b i be the representation for two binary numbers, n be the number of bits, where range of i is given by 0 i n-1 and Si be the representation for the sum of a i and b i. Let Ci be the carry bit where i ranges from 0 to n. Authors define p[i,j] as 1 if carry is propagated from bit i to bit j, where i and j are in the range 1 i < i+1< j n and g[i,j] also as 1 if carry is generated from bit i to bit j where the range for i and j are0 i < j n. Equations for p[i, j], g[i,j], sum and carry are already discussed in detail in section 2.1 of this paper. Circuit in [1] computes sum of a i and b i as follows: It is assumed that all the ancilla locations are initialized with the value 0. It computes p and g bits in roughly around log n steps namely g round, p round, c round and p -1 rounds. In these steps g[i, j], p[i, j], Cj are computed, respectively, and assigned to the respective memory locations corresponding to the in-place and out-of- place methodology. In the circuit p -1 rounds plays a major role, in which p rounds computations are repeated in the reverse order to erase the values stored during p round computations. Each round consists of log n steps and each computation is carried out by using Toffoli gate. The general steps for computing sum carried out in [1] are summarized as below: Step 1: a) Computes p[i,i+1] and p[i,j ] as below p [i,i+1] =a i b i 1 i n-1 p[i,j]=p[i,k] ^ p[k,j], 1 i < i+1 < j n and i < k < j b) Similary g[i,j] is computed as below g[i,i+1] = a i b i 0 i n-1 g[i,j]=g[i,k] ^ p[k,j] g[k,j] 0 i < i+1 < j n and i < k < j Step 2: Computes g[0, i ], size of the interval is doubled for each computation for any i in the range given by 1 i n.

Step 3: Sum of each bit S i is computed for i in the range given by 0 i n, computation is carried out according to the following expressions. S 0 =p[0,1] S i = p[i,i+1] g[0,i] for 1 i n-1 S n =g[0,n] The equations of logic depth, quantum cost and gate count of the reversible carry look-ahead adder proposed in [1] are given and compared with our method in the next sections. 4 Design Methodology of Proposed Reversible Out-of-Place Carry Look-ahead Adder We present the design of reversible carry look-ahead adder with no input carry (C0) and is designed with ancilla inputs. The proposed method improves the logic depth, quantum cost and the gate count of the carry look-ahead adder compared to the existing design approach presented in [1]. Logic Depth is improved by performing possible parallel computation of g and p. Similarly gate count and quantum cost are reduced by replacing combination of Toffoli gate and Feynman gate by Peres gate. Consider the addition of two n bit numbers a i and b i stored at memory locations A(i) and B(i) respectively, where 0 i n 1. Further, consider that memory location Zg0, Zg(i)and Zp(i) are initialized with 0. Zg0 ancilla location corresponds to bit 0, at the end of computation Zg0 memory location will have S 0. Zg(i) is the ancillae location corresponding to bit i where 0 i n 1. At the end of the computation, the memory location Zg(i) will have S i+1, while the location A(i), B(i) keeps the value of inputs a i and b i, respectively. The additional ancilla locations Zp(i) are used to store propagated carry value for intermediate bits. These Zp(i) bits are reset to zero to satisfy the reversibility constraint. Here S i is the sum bit produced and is defined as : { a i b i C i 0 i n 1 Si = C n i = n where C i is the carry bit and is defined as: Ci = { 0 i = 0 a i-1 b i -1 b i-1 C i-1 C i-1 a i-1 0 i n 1

As after the generation of sum bits, the location A(i) and B(i) are restored to the value a i and b i, respectively. This helps in minimizing the garbage outputs. The proposed methodology of generating the carry look-ahead adder circuit involves two phases: (i) in the first phase sum is stored in the ancillary space, and (ii) in the second phase sum is stored in the data input space. The proposed methodology is generic in nature and can design the carry look-ahead adder circuit with no input carry of any size. The steps involved in the proposed methodology are explained for addition of two n bit numbers a i and b i, where 0 i n 1. An illustrative example of generation of reversible carry look-ahead adder circuit that can perform the addition of two 8 bit numbers a=a 0 to a 7 and b=b 0 to b 7 is shown in Figure 5. We follow the same notations as in [1]. 1. Step 1: For i=0, apply Toffoli gate at locations A(i), B(i), Zg0 such that A(i),B(i), remains same and Zg0 is transformed to zg0 =a(i) b(i). For i=1 to n-1, apply Peres gate at locations A(i), B(i), Zg(i) such that A(i) remains same, B(i) transformed to b(i) =a(i) and zg(i) =a(i) b(i). In this step p[i,i+1] and g[i,i+1] are computed and stored in memory locations B(i) and Zg(i), respectively; where 0 i n-1 for g[i,i+1] computation, and 1 < i n-1 for p[i,i+1] computation. Step 1 is illustrated in the Figure 5. 2. Step 2: For i =2 to n-2, where i is even, apply Toffoli gate at locations B(i), B(i+1), Zp(i) such that B(i), B(i+1) remains same and Zp(i) transforms to zp(i) =b(i) b(i+1). For i= 0 to n-2, where i is even, apply Toffoli gate at locations Zg(i), B(i+1), Zg(i+1) such that Zg(i), B(i) remains same where as Zg( i+1 ) transforms to zg(i+1) = zg(i) b(i+1). In this step p[i,i+2] and g[i,i+2] are computed and stored in memory locations Zp(i) and Zg(i+1), respectively; where 2 i n-2 for p[i,i+2] computation, and 0 i n-2 for g[i,i+2] computation here i takes the values of all even numbers for the given range. Step 2 is shown in Figure 5. 3. Step 3:For i=n/2 and 0, apply Toffoli gate at locations Zg(i+1), Zp(i+2), Zg(i+3) such that Zg(i+1), Zp(i+2), remains same and Zg(i+3) transforms to zg(i+3) =zg(i+1) zp(i+2). When i=n/2, apply Toffoli gate at locations Zp(i), Zp(i+2), Zp(i+1) such that Zp(i), Zp(i+2), remains same and Zp(i+1) transforms to zp(i+1) = zp(i) zp(i+2). In this step, p[i,i+4], g[i,i+4] are computed and stored in memory location Zp(i+1) and Zg(i+3); where i=n/2 for p[i,i+4] computation, and i=n/2 and 0 for g[i,i+4] computation. Step 3 is shown in Figure 5. 4. Step 4: For i=n-2, apply Toffoli gate at locations Zg(i-3), Zp(i-2), Zg(i-1) such that Zg(i-3), Zp(i-2) remain same, and Zg(i-1) transforms to zg(i-1) = zg(i- 3) zp(i-2). For i=n, apply Toffoli gate at locations Zg(i-5), Zp(i-3), Zg(i-1) such that Zg(i-5), Zp(i-3) remain same, and Zg(i-1) transforms to zg(i-1) = zg(i- 5) zp(i-3). In this step, g[0,i] is computed and stored in the memory location

Zg(i-1) where i takes the values as n and n-2. The step to compute g[0,n] is required when n is power of 2. Step 4 is shown in Figure 5. 5. Step 5: For i=2 to n-2 where i is even, apply Toffoli gate at locations Zg(i-1), B(i), Zg(i) such that Zg(i-1),B(i) remain same, Zg(i) transforms to Zg(i) = zg(i-1) b(i). For i=n/2, apply Toffoli gate at locations Zp(i), Zp(i+2), Zp(i+1) such that Zp(i), Zp(i+2) remain same, and Zp(i+1) transforms to zp(i+1) = zp(i-1) zp(i+1). In this step g[0,i+1] and p[i,i+4] are computed and stored in memory location Zg(i) and Zp(i+1), respectively; where 2 i n-2 and i is even for g[0,i+1] computation; and i = n/2 for p[i,i+4] computation. This step re-computes the value of p[i,i+4] which was earlier carried out in Step 3 to satisfy the erasure property of the reversible logic. Step 5 is shown in Figure 5. 6. Step 6: For i=2 to n-2 where i is even, apply Toffoli gate at locations B(i), B(i+1), Zp(i) such that B(i), B(i+1) remain same, and Zp(i) transforms to zp(i) = b(i) b(i+1). Apply Feynman gate at locations B(0) and Zg0 such that B(0) remains same while Zg0 transforms to zg0 =b(0). For i= 0 to n-1, apply Feynman gate at locations B(i+1) and Zg(i) such that B(i+1) remains same, Zg(i) transforms to zg(i) =b(i+1). In this step S i+1 is computed for 0 i < n and stored in the memory location Zg(i). Computation of p[i, i+2] is carried out for the second time to satisfy the erasure property and to reset the ancillary bit. p[i,i+2] is computed for 2 i n-2 for all even values of i and stored in the memory location Zp(i). Step 6 is shown in Figure 5. 7. Step 7: Apply Feynman gate at locations A(0) and Zg0 such that A(0) remains same while Zg0 transforms to zg0 =a(0). The sum of a(0) and b(0) that is S 0 is stored in Zg0 location. For i=1 to n-1, apply Feynman gate at locations A(i) and B(i) such that A(i) remains same while B(i) transforms to b(i) =a(i). In this step B(i) is recomputed to satisfy the data retention property. After the computation of this step the inputs b(i) are regenerated. Step 7 is shown in Figure 5. In Figure 5, all the regenerated inputs a(i),b(i), ancilla inputs zp(i) are also shown.

Fig. 5. Steps of design methodology of proposed reversible out-of-place carry look-ahead adder

4.1 Delay Delay is another important parameter that can indicate the efficiency of the reversible circuits. Here, delay represents the critical delay of the circuit. In our delay calculations, we use the logical depth as the measure of the delay [13, 36]. The delays of all 1x1 gate and 2x2 reversible gates are taken as unit delay, denoted as δ. Any 3x3 reversible gate can be designed from 1x1 reversible gates and 2x2 reversible gates, such as the CNOT gate, the Controlled-V and the Controlled -V + gates. Thus the delay of a 3x3 reversible gate can be computed by calculating its logical depth when it is designed from smaller 1x1 and 2x2 reversible gates. Figure 4c shows the logic depth in the quantum implementation of Toffoli gate. Thus, it can be seen that the Toffoli gate has the delay of 5δ. Each 2x2 reversible gate in the logic depth contributes to 1δ delay. Similarly, Peres gate and TR gate shown in Figure 2.c and Figure 3.c, respectively, are with logic depth of 4 that results in their delay as 4 δ. Delay and Quantum Cost Calculation of Steps of Out-of-place Method 1. Step 1 needs (n-1) Peres gates and one Toffoli gate and both gates work concurrently. Thus this step has 4(n-1)+5 quantum cost, as quantum cost of Peres gate and Toffoli gates are 4 and 5 respectively. Maximum delay of 4 δ for 1 to n- 1 bits and 5 δ for 0 th bit. So delay of this step is 5δ. 2. Step 2 needs (n-1) Toffoli gates. Quantum cost of this step is 5(n-1) and since the gates work in concurrence. Maximum delay of this is 5 δ. 3. Step 3 needs n-4 Toffoli gates. Quantum cost of this step is 5(n-4), maximum delay is 5 δ. 4. Step 4 needs (n/2)-2 Toffoli gates. Quantum cost of this step is 5((n/2)-2), maximum delay is 5 δ. 5. Step 5 needs (n/2) Toffoli gates. Quantum cost of this step is 5(n/2), maximum delay is 5 δ. 6. Step 6 needs n+3 Toffoli gates. Quantum cost of this step is 5(n+3), maximum delay is 5 δ. 7. Step 7 needs n CNOT gates. Quantum cost of this step is n, maximum delay is 1δ. 4.2 Gate Count The complete circuit of the proposed reversible out-of-place carry look-ahead adder contains 4n-3w(n)-3log n Toffoli gates, (n-1) Peres gates and 2n CNOT gates, while the existing design in [1] contains 5n-3w(n)-3log n-1 Toffoli gates and (3n-1) CNOT gates. Table 1 gives the comparison of gate count. Here, w(n) denote the number of ones in the binary expansion of n [1].

Table 1. Comparison of gate count No. of Existing work[1] Proposed work % RG bits C- P C- C PG TG NOT TGC TG NOT TGC NOT G NOT 4 Nil 10 11 Nil 21 3 7 8 Nil 18 14.3 6 Nil 15 17 Nil 32 5 10 12 Nil 27 15.6 8 Nil 27 23 Nil 50 7 20 16 Nil 43 14 10 Nil 33 29 Nil 62 9 24 20 Nil 53 14.5 16 Nil 64 47 Nil 111 15 49 32 Nil 96 13.5 32 Nil 141 95 Nil 236 31 110 64 Nil 205 13.1 64 Nil 310 191 Nil 501 63 247 128 Nil 438 12.6 128 Nil 615 383 Nil 998 12 7 488 256 Nil 871 12.7 256 Nil 125 25 767 Nil 2019 2 5 997 512 Nil 1764 12.6 512 Nil 252 51 201 1535 Nil 4064 9 1 8 1024 Nil 3553 12.6 Abbrevations of the column headers of this table are given below TG- Toffoli Gate PG-Peres Gate TGC-Total Gate count RGC-Reduction in Gate count 4.3 Quantum Cost The quantum cost of the existing work in [1] is 28n-15w(n)-15logn-6 which is derived from (5(5n-3w(n)-3log(n) -1)+3n-1), while the quantum cost of the proposed method is 26n-15w(n)-15log (n-4) which is derived from (5(4n-3w(n)-3log(n))+4(n- 1)+2n). Table 2 shows the comparison results of the quantum cost. Table 2. Comparison of quantum cost No. of Existing Proposed % improvement bits Work[1] work 4 61 55 10 6 93 83 11 8 158 144 9 10 194 176 9 16 367 337 8 32 800 738 8 64 1741 1615 7 128 3458 3204 7 256 7027 6517 7 512 14,180 13158 7

4.4 Logic Depth Table 3 shows the comparison of logic depth of existing work in [1] and the proposed method. Logic depth of proposed method is log n + log n/3 +2 while the logic depth of the existing design in [1] is log n + log n/3 +7. Table 3. Comparison of logic depth No. of Existing Proposed Bits Work[1] work % Improvement 4 9.4 4.4 53 6 10.58 5.58 47 8 11.4 6.45 43 10 12.05 7.05 41 16 13.4 8.41 37 32 15.4 10.4 32 64 17.4 12.4 29 128 19.4 14.4 26 256 21.4 16.4 23 512 23.4 18.4 21 As summarized in Table 1, Table 2, and Table 3, respectively, the proposed design of reversible carry look-ahead adder with no input carry and out-of- place methodology achieves the maximum improvement of 15.6% in gate count, 11% in logic depth and 53% in quantum cost compared to the existing design presented in [1]. 5 Design Methodology of Proposed Reversible In-Place Carry Look-ahead Adder In this proposed methodology of generating the carry look-ahead adder circuit sum is stored in the data input space. In this method, at the end of the computation, the memory location B(i) will have S i, while the location A(i) keeps the value a i. This helps in minimizing the garbage outputs. The proposed design methodology is explained with an illustrative example of generation of reversible carry look-ahead adder circuit that can perform the addition of two 8 bit numbers a=a 0 to a 7 and b=b 0 to b 7. 1. Step 1: For 0 i n-1, apply Peres gate at location A(i), B(i), Zg(i) such that A(i) remains same, and B(i) is transformed to b(i) =a(i) and Zg(i) is transformed to zg(i) =a(i) b(i). In this step p[i,i+1] and g[i,i+1] are computed and stored in memory locations B(i) and Zg(i), respectively; where 0 i n-1. Step 1 is shown in Figure 6.a. 2. Step 2: For i =2 to n-2, where i is even, apply Toffoli gate at locations B(i), B(i+1), Zp(i) such that B(i), B(i+1) remains same and Zp(i) transforms to zp(i) =b(i) b(i+1). For i= 0 to n-2, where i is even, apply Toffoli gate at locations

Zg(i), B(i+1), Zg(i+1) such that Zg(i), B(i) remains same where as Zg( i+1 ) transforms to zg(i+1) = zg(i) b(i+1). In this step p[i,i+2] and g[i,i+2] are computed and stored in memory locations Zp(i) and Zg(i+1), respectively; where 2 i n-2 for p[i,i+2] computation, and 0 i n-2 for g[i,i+2] computation here i takes the values of all even numbers for the given range. Step 2 is shown in Figure 6.a. 3. Step 3: For i=n/2 and 0, apply Toffoli gate at locations Zg(i+1), Zp(i+2), Zg(i+3) such that Zg(i+1) and Zp(i+2) remains same, while Zg(i+3) transforms to zg(i+3) = zg(i+1) zp(i+2). For i=n/2, apply Toffoli gate at locations Zp(i), Zp(i+2), Zp(i+1) such that Zp(i) and Zp(i+2) remains same, while Zp(i+1) transforms to zp(i+1) = zp(i) zp(i+2). In this step p[i,i+4],g[i,i+4] are computed and stored in memory location Zp(i+1) and Zg(i+3) where i=n/2 for p[i,i+4] computation, and i=n/2 and 0 for g[i,i+4] computation. Step 3 is shown in Figure 6.a. 4. Step 4 : For i= n-2, apply Toffoli gate at locations Zg(i-3), Zp(i-2), Zg(i-1) such that Zg(i-3), Zp(i-2) remain same, Zg(i-1) transforms to zg(i-1) = zg(i-3) zp(i-2). For i= n, apply Toffoli gate at locations Zg(i-5), Zp(i-3), Zg(i-1) such that Zg(i-5), Zp(i-3) remain same, and Zg(i-1) transforms to zg(i-1) = zg(i-5) zp(i-3). In this step, g[0,i] is computed and stored in the memory location Zg(i-1) where i takes the values as n and n-2. The step to compute g[0,n] is required when n is power of 2. Step 4 is shown in Figure 6.a. 5. Step 5 : For i = 2 to n-2, where i is even, apply Toffoli gate at locations Zg(i-1), B(i), Zg(i) such that Zg(i-1), B(i) remain same, while Zg(i) transforms to zg(i) = zg(i-1) b(i). For I = n/2, apply Toffoli gate such that Zp(i), Zp(i+2), Zp(i+1) such that Zp(i), Zp(i+2) remain same, while Zp(i+1) transforms to zp(i+1) = zp(i-1) zp(i+1). In this step g[0,i+1] and p[i,i+4] are computed and stored in memory location Zg(i) and Zp(i+1), respectively; where 2 i n- 2 and i is even for g[0,i+1] computation and i=n/2 for p[i,i+4] computation. This step recomputs the value of p[i,i+4] which was earlier carried out in step 3 to satisfy the erasure property of the reversible logic. Step 5 is shown in Figure 6.a. 6. Step 6: For i=2 to n-2, where i is even, apply Toffoli gate such that B(i), B(i+1), Zp(i) such that B(i), B(i+1) remain same, while Zp(i) transforms to zp(i) = b(i) b(i+1). In this step p[i,i+2] are computed and stored in the memory locations Zp(i) where 1 i n and here i takes the values of all even numbers in the given range. Step 6 is shown in Figure 6.a. In this step g[0,i] p[i,i+2] are computed and stored in the memory locations B(i) where 1 i n. 7. Step 7: For i=1 to n-1, apply Feynman gate at locations Zg(i-1) and Bi such that Zg(i-1) remains same, while Bi transforms to bi = zg(i-1). Step 7 is shown in Figure 6.a.

Fig. 6.a. Step 1 to Step 7 of design methodology of proposed reversible in-place carry look-ahead adder

8. Step 8: In this step ~ (g[0,i] p[i,i+2]) are computed and stored in the memory locations B(i) where 1 i n. Applying NOT gate at locations B(i) for 1 i n. B(i) transforms to bi= ~bi. Step 8 is shown in Figure 6.b. 9. Step 9: At the end of step 8, B(i) contains a(i) ~(g[0,i] p[i,i+2]). Where 1 i n. For i=1 to n-1, apply Feynman gate at locations A(i) and B(i) such that A(i) remains same, while B(i) transforms to b(i) = a(i). Step 9 is shown in Figure 6.b. 10. Step 10: In this step, Zp(i) is computed. Where 1 i n and here i is even for the given range. For i=2 to n-2, apply Toffoli gate at locations B(i), B(i+1), Zp(i) such that B(i), B(i+1) remain same, while Zp(i) transforms to zp(i) = b(i) b(i+1). Step 10 is shown in Figure 6.b. 11. Step 11: In this step, Step 5 is repeated to confirm the erasure condition on the memory locations Zg(i) and Zp(i+1), respectively; where 2 i n-2 and i is even for g[0,i+1] computation, and i=n/2 for p[i,i+4] computation. For i = 2 to n- 2 where i is even, apply Toffoli gate at locations Zg(i-1), B(i), Zg(i) such that Zg(i-1), B(i) remain same, while Zg(i) transforms to zg(i) = zg(i-1) b(i). For i= n/2, apply Toffoli gate such that Zp(i), Zp(i+2), Zp(i+1) such that Zp(i), Zp(i+2) remain same, while Zp(i+1) transforms to zp(i+1) = zp(i-1) zp(i+1). Step 11 is shown in Figure 6.b. 12. Step 12: In this step, Step 4 is repeated so that Zg(i) locations are recomputed to confirm the erasure condition on the memory locations Zg(i-1) where i takes the values as n and n-2. For i=n-2, apply Toffoli gate at locations Zg(i-3), Zp(i-2), Zg(i-1) such that Zg(i-3), Zp(i-2) remain same, while Zg(i-1) transforms to zg(i- 1) = zg(i-3), zp(i-2). For i=n, apply Toffoli at locations Zg(i-1), Zg(i-5) and Zp(i-3) such that Zg(i-5) and Zg(i-3) remain same, while zg(i-1) = zg(i-5), zp(i-3). Step 12 is shown in Figure 6.b. 13. Step 13: This step is carried out to confirm the erasure condition on the memory locations Zg(i-1) where i takes the values as n and n-2. For i=n/2 and 0, apply Toffoli gate at locations Zg(i+1), Zp(i+2), Zg(i+3) such that Zg(i+1), Zp(i+2), remains same, while Zg(i+3) transforms to zg(i+3) = zg(i+1) zp(i+2). Step 13 is shown in Figure 6.b. 14. Step 14: This step is carried out to confirm the erasure condition on the memory locations Zg(i+1) where i takes the values of all even numbers in the given range. For i =0 to n-2,where i is even, apply Toffoli gate at locations Zg(i), B(i+1), Zg(i+1) such that Zg(i), B(i+1) remains same, while Zg(i+1) transforms to zg(i+1) =zg(i) b(i+1). For i=n/2, apply Toffoli gate at locations Zp(i), Zp(i+2), Zp(i+1) such that Zp(i) and Zp(i+2) remains same, while Zp(i+1) transforms to zp(i+1) = zp(i) zp(i+2). Step 14 is shown in Figure 6.b.

15. Step 15: In this step, the Zp(i) locations are recomputed to confirm the erasure condition on the memory locations Zp(i) where i takes the values of all even numbers in the given range. For i =2 to n-2 when i is even, apply Toffoli gate at locations B(i), B(i+1), Zp(i) such that B(i), B(i+1) remains same, while Zp(i) transforms to zp(i) =b(i) b(i+1). Step 15 is shown in Figure 6.b. 16. Step 16: In this step, the Zg(i) and Bi locations are recomputed to confirm the erasure constraint on the memory locations Zg(i) and to retrieve the sum in Bi location where 0 i n-1. TR gate proposed by the authors in [13] is used in this step which reduces the quantum cost as compared to the design in [1]. Instead of using combination of Feynman and Toffoli, TR gate is used which performs the operation of these two gates in one step. Hence reduces the quantum cost as well as logic depth by one in this step. For i= 0, apply Toffoli gate at location A(i), B(i) and Zg(i) such that A(i) and B(i) remains the same and Zg(i) transforms to zg(i) =a(i) b(i). For 1 i n-1, apply TR gate at location A(i), B(i), Zg(i) such that A(i) remains same and B(i) is transformed to b(i) =a(i) and Zg(i) is transformed to zg(i) =a(i) b(i). Step 16 is shown in Figure 6.b. 17. Step 17: This final step is performed to store the sum bits at all B(i) locations. At the end of this step, B(i) will have S i bits. For i=1 to n-1, apply not gate at locations B(i) to negate the values at locations B(i). Bi transforms to bi= ~bi. Step 17 is shown in Figure 6.b. The regenerated inputs a(i) and ancilla inputs zg(i) and zp(i) are also shown in Figure 6.b.

Fig. 6.b. Steps 8 to Step 17 of design methodology of proposed reversible in-place carry look-ahead adder

5.1 Delay and Quantum Cost Calculation of Steps of In-place Methodology 1. Step 1 needs n Peres gates. Thus this step has 4n quantum cost, as quantum cost of Peres gate is 4. Maximum delay is 4 δ. 2. Step 2 needs (n-1) Toffoli gates. Quantum cost of this step is 5(n-1), maximum delay is 5 δ. 3. Step 3 needs n/2-1 Toffoli gates. Quantum cost of this step is 5((n/2)-1), maximum delay is 5 δ. 4. Step 4 needs (n/4) Toffoli gates. Quantum cost of this step is 5(n/4), maximum delay is 5 δ. 5. Step 5 needs (n/2) Toffoli gates. Quantum cost of this step is 5((n/2)), maximum delay is 5 δ. 6. Step 6 needs (n/2-1) Toffoli gates. Quantum cost of this step is 5((n/2)-1), maximum delay is 5 δ. 7. Step 7 needs n-1 CNOT gates. Quantum cost of this step is (n-1), maximum delay is δ. 8. Step 8 needs n NOT gates. Quantum cost of this step is n, maximum delay is δ 9. Step 9 needs n-1 CNOT gates. Quantum cost of this step is (n-1), maximum delay is δ. 10. Step 10 needs (n/2)-1 Toffoli gates. Quantum cost of this step is 5(n/2-1), maximum delay is 5 δ. 11. Step 11 needs (n/2) Toffoli gates. Quantum cost of this step is 5(n/2), maximum delay is 5 δ. 12. Step 12 needs (n/4) Toffoli gates. Quantum cost of this step is 5((n/4)), maximum delay is 5 δ. 13. Step 13 needs (n/4) Toffoli gates. Quantum cost of this step is 5((n/4)), maximum delay is 5 δ. 14. Step 14 needs n/2+1 Toffoli gates. Quantum cost of this step is 5(n/2+1), maximum delay is 5 δ. 15. Step 15 needs (n/2)-1 Toffoli gates. Quantum cost of this step is 5((n/2-1)), maximum delay is 5 δ. 16. Step 16 needs n-1 TR gate and 1 Toffoli gate. Quantum cost of this step is 4(n)+5, maximum delay is 5 δ (since Toffoli and TR work in concurrence). 17. Step 17 needs n NOT gates. Quantum cost of this step is (n), maximum delay is 1 δ. 5.2 Gate Count The proposed circuit contains 4n-3w(n)-3logn Toffoli gates, n-1 Peres gates, n-1 CNOT gates and 2n-2 NOT gates, whereas existing design in [1] contains 10n- 3w(n)-3w(n-1)-3log n-3log(n-1)-7 Toffoli gates, 4n-5 CNOT gates and 2n-2 NOT gates. Table 4 gives the comparison of gate count.

Table 4. Comparison of gate count No. of Existing work[1] Proposed work % RG bits C- C- C PG TG NOT TGC PG TG NOT TR TGC NOT NOT 4 nil 14 9 6 29 3 8 5 6 3 25 13 6 nil 27 19 10 56 5 19 9 10 5 48 14 8 nil 44 27 14 85 7 30 13 14 7 71 16 10 nil 62 35 18 115 9 44 17 18 9 97 16 16 nil 114 59 30 203 15 80 29 30 15 169 16 32 nil 266 123 62 451 31 200 61 62 31 385 15 64 nil 576 251 126 893 63 384 125 126 63 761 15 128 nil 1208 507 254 1969 127 950 253 254 127 1711 13 256 nil 2479 1019 510 4008 255 1965 509 510 255 3494 13 512 nil 5029 2043 1022 8094 512 4007 1021 1022 512 7072 13 Abbreviations of the column headers of this table are given below TG- Toffoli Gate PG-Peres Gate TGC-Total Gate count RGC-Reduction in Gate count 5.3 Quantum Cost Quantum cost of the existing work in [1] is 56n-15w(n)-15w(n-1)-15log n-15 log(n- 1)-42 which is derived from 5(10n-3w(n)-3w(n-1)-3log n-3log(n-1)-7) + 4n-5+2n-2, while the quantum cost of the proposed method is 52n-15w(n)-15w(n-1)-15logn- 15log(n-1). Table 5 shows the comparison of quantum cost. Table 5. Comparison of quantum cost No. of Bits Existing Work[1] Proposed work % improvement 4 85 72 15 6 164 140 14 8 261 231 12 10 363 327 9 16 609 600 1.4 32 1515 1463 2.7 64 3258 3009 7.6 128 6801 6290 7.5 256 13924 12750 8.4 512 28210 26168 7.2 5.4 Logic Depth The proposed method has logarithmic depth in number of bits. Logic depth of proposed method is log n + log(n-1) + log n/3 + log (n-1)/3 +8, while the logic depth of the method proposed in [1] is log n + log(n-1) + log n/3 + log (n-1)/3 +14. The logic depth of proposed method and existing work are compared in Table 6.

Table 6. Logic depth comparison No. of Bits Existing Work[1] Proposed work % improvement 4 17.9 11.94 33 6 20.6 14.6 29 8 22.41 16.41 27 10 23.76 17.76 25 16 26.62 20.6 23 32 30.67 24.6 20 64 34.7 28.7 18 128 38.7 32.7 16 256 42.7 36.7 14 512 46.7 40.7 13 As summarized in Table 4, Table 5 and Table 6, respectively, the proposed design achieves the maximum improvement of 16% in gate count, 33% in logic depth and 15% in quantum cost over the existing design presented in [1]. 6 Simulation and Verification The proposed reversible carry look-ahead adder designs are functionally verified through simulations. The simulation is performed by creating a library of reversible gates such as Toffoli gate, Peres gate, NOT gate, Feynman gate and TR gate in VHDL. The VHDL library of reversible gates is used to code the proposed reversible designs. The test benches are created for every reversible circuit, proposed in this work and exhaustive simulations are done to verify the correctness. The simulation is carried out using Modelsim simulator. Figure 7.a shows the simulation results of carry look-ahead addition using out-of-place methodology for n=8. In this methodology, sum outputs are stored in ancilla locations Zg0 and Zg(i) with i ranging from 1 to n-1. Addition using in-place methodology is shown in Figure 7.b for n=8. In this methodology, outputs are stored in B(i) locations with i ranging from 0 to n-1. 7 Conclusions In this work, we have presented efficient designs of reversible carry look-ahead binary adder primarily optimizing the logic depth, quantum cost and gate count. The proposed reversible binary carry look-ahead adder designs are shown to be better than the existing design in terms of logic depth, quantum cost, and gate count without sacrificing for ancilla inputs and garbage outputs which remain same as that of existing design in [1]. In the out-of-place design of reversible carry look-ahead adder maximum of 53%, 11% and 15% improvement is achieved over the existing design in [1] for logic depth, quantum cost and gate count, respectively. Similarly 33%, 15% and 16% improvement are achieved by the proposed in-place reversible design of

carry look-ahead adder with respect to logic depth, quantum cost and gate count, respectively. The reversible designs are functionally verified by using the VHDL hardware description language and ModelSim HDL simulator. We conclude that the use of the specific reversible gates for a particular combinational function can be very much beneficial in minimizing the logic depth, quantum cost and the number of gates. The carry look-ahead adder designs of our work will find applications in quantum/reversible computing. (a). (b) Fig.7. (a) Simulation results of out-of-place methodology for n=8; (b) Simulation results of in-place methodology for n=8.

References 1. Draper, T. G., Kutin, S. A., Rains, E. M., and Svore, K.M.: A logarithmic-depth quantum carry look-ahead adder. Quantum Information and computation. vol. 6 No. 4&5, pp. 351-369 (2006) 2. Landauer, R.: Irreversibility and Heat Generation in the Computing Process. IBM Journal of Research and Development, vol. 3, pp. 183-191 (1961) 3. Bennett, C. H.: Logical Reversibility of Computation. IBM Journal of Research and Development, pp. 525-532 (1973) 4. Shor, P. W.: Algorithms for Quantum Computation: Discrete Logarithms and Factoring. In: proceeding of 35th annual symposium on Foundations of Computer science, IEEE Computer Society Press, pp. 124-134, November (1994) 5. Shor, P. W.: Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer. quant-ph/9508027, Vol. 2 (1997) 6. Vedral, V., Bareno, A., and Ekert, A.: Quantum networks for elementary arithmetic operations. Phys. Rev. A 54, 147-153 (1996) 7. Fredkin, E., Toffoli, T.: Conservative logic. International Journal of Theoretical Physics, vol. 21, pp. 219-253 (1982) 8. Nielsen, M. A., and Chuang, I. L.: Quantum Computation and Quantum Information. Cambridge University Press, New York (2000) 9. Takahashi, Y., and Kunihiro, N.: A linear-size quantum circuit for addition with no ancillary qubits. Quantum Information and Computation, vol. 5, no.6, pp. 440-448 (2005) 10. Takahashi, Y.: Quantum arithmetic circuits, a survey. In: Proceedings of IEICE Transactions, vol. E92-A, no. 5, pp. 1276-1283 (2009) 11. Peres, A.: Reversible logic and quantum computers. Phys. Rev. A, Gen. Phys., vol. 32, no. 6, pp. 3266-3276 (1985) 12. Toffoli, T.: Reversible computing. Technical Report, MIT/LCS/TM-151, MIT Lab for Computer Science (1980) 13. Thapliyal, H., and Ranganathan, N.: Design of Efficient Reversible Logic Based Binary and BCD Adder Circuits. In: To appear in ACM Journal of Emerging Technologies in Computing Systems September (2012) 14. Smolin, J. A., and DiVincenzo, D. P.: Five two-bit quantum gates are sufficient to implement the quantum Fredkin gate. Physical Review A, vol. 53, no. 4, 2855 2856 (1996) 15. Hung, W. N., Song, X., Yang, G., Yang, J., and Perkowski, M.: Optimal synthesis of multiple output boolean functions using a set of quantum gates by symbolic reachability analysis. IEEE Transactions on Computer-Aided Design, vol. 25, no. 9, pp. 1652 1663 (2006) 16. Maslov, D., and Miller, D. M.: Comparison of the cost metrics through investigation of the relation between optimal NCV and optimal NCT 3-qubit reversible circuits. IET Computers & Digital Techniques, vol. 1, no. 2, pp. 98-104, March (2007) 17. Draper, T. G.: Addition on a Quantum Computer. quant-ph/0008033, vol. 7 (2000) 18. Trisetyarso, A., and Meter, R. V.: Circuit design for a measurement-based quantum carry look-ahead adder. International Journal of Quantum Information, vol. 8, Issue 5, 843-867 (2010) 19. Thapliyal, H., Arabnia, H.R.: Modified Carry Look-ahead BCD Adder with CMOS and Reversible Logic Implementation. In: Proceedings of CDES, pp. 64-69 (2006) 20. Thapliyal, H., and Gupta, S.K.: Design of Novel Reversible Carry Look-ahead BCD Subtractor. In: 9th International Conference on Information Technology (ICIT'06), 0-7695-2635-7/06 $20.00, IEEE (2006)

21. DeBenedictis, E.: Reversible logic for supercomputing. In 2nd conference on Computing frontiers, 391-402 (2005). 22. Pai, Y., and Chen, Y.: The Fastest Carry Look-ahead Adder. In: Proceedings of the Second IEEE International Workshop on Electronic Design, Test and Applications (DELTA 04), pp. 434-436 (2004) 23. Kaye, P.: Reversible addition circuit using one ancillary bit with application to quantum computing. quanth-ph/ 0408173, vol.2 (2004) 24. Takahashi, Y., and Kunihiro, N.: A fast quantum circuit for addition with few qubits. Quantum Information and computation. vol. 8, no. 6-7, pp. 636-649 (2008) 25. Mohammadi, M., Haghparast, M., Eshghi, M., and Navi, K.: Minimization optimization of Reversible BCD-full adder/subtractor using genetic algorithm and don t care concept. International Journal of Quantum Information, vol.7, no. 5, pp. 969 989 (2009) 26. Thapliyal, H., Arabnia, H.R., and Srinivas, M.B.: Efficient Reversible Logic Design of BCD Subtractors. Transactions on Computational Sciences Journal, Springer-Verlag, vol. 3, LNCS 5300, pp. 99-121 (2009) 27. Thapliyal, H., Arabnia, H.R., and Srinivas, M.B.: Reduced Area Low Power High Throughput BCD Adders. In: Proceedings of the 11th International CSI Computer Conference, vol. 2, pp. 59-64 (2006) 28. Thapliyal, H., Arabnia, H.R., Bajpai R., and Sharma, K.K.: Partial Reversible Gates (PRG) for Reversible BCD Arithmetic. In: Proceedings of 2007 International Conference on Computer Design (CDES'07), USA, ISBN 1-60132-036-1, pp. 97-98 (2007) 29. Thapliyal, H., Arabnia, H.R., Bajpai, R., and Sharma, K.K.: Combined Integer and Variable Precision (CIVP) Floating Point Multiplication Architecture for FPGAs. In: Proceedings of International Conference on Parallel & Distributed Processing Techniques & Applications (PDPTA'07), USA, ISBN 1-60132-022-1, pp. 449-450 (2007) 30. Thapliyal, H., Vinod, A.P., and Arabnia, H.R.: Combined Integer and Floating Point Multiplication Architecture (CIFM) for FPGAs and its Reversible Logic Implementation. In: 49th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS'06), San Juan, Puerto Rico, pp. 148-154 (2006) 31. Thapliyal, H., Verma, V., and Arabnia, H.R.: A Double Precision Floating Point Multiplier Suitably Designed for FPGAs and ASICs. In: Proceedings of the 2006 International Conference on Computer Design and conference on Computing in Nanotechnology (CDES'06), Las Vegas, USA, ISBN 1-60132-009-4, pp. 36-38, June (2006) 32. Thapliyal, H., and Arabnia, H.R.: Reversible Programmable Logic Array (RPLA) Using Fredkin and Feynman Gates for Industrial Electronics and Applications. In: Proceedings of the 2006 International Conference on Computer Design & Conference on Computing in Nanotechnology (CDES'06), Las Vegas, USA, ISBN 1-60132-009-4, pp. 70-74, June (2006) 33. Thapliyal, H., and Arabnia, H.R.: Modified Carry Look-ahead BCD Adder with CMOS and Reversible Logic Implementation. In: Proceedings of the 2006 International Conference on Computer Design and Conference on Computing in Nanotechnology (CDES'06), Las Vegas, USA, ISBN 1-60132-009-4, pp. 64-69, June (2006) 34. Thapliyal, H., Rameshwar, A., Bajpai, R., and Arabnia, H.R.: Novel NAND and AND Gate Using DNA Ligation and Two Transistors Implementations. In: Proceedings of the 2006 International Conference on Computer Design and Conference on Computing in Nanotechnology (CDES'06), Las Vegas, USA, ISBN 1-60132-009-4, pp.130-132 June (2006) 35. Gopineedi, P., Thapliyal, H., Srinivas, M.B., and Arabnia, H.R.: Novel and Efficient 4:2 and 5:2 Compressors with Minimum Number of Transistors Designed for Low-Power Operations. In: Proceedings of the 2006 International Conference on Embedded Systems and Applications (ESA'06), Las Vegas, USA, ISBN 1-60132-017-5, pp. 160-166, June (2006)