GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders

Size: px
Start display at page:

Download "GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders"

Transcription

1 GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton, NY Israel Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA ABSTRACT In this paper, we present a novel algorithm and its VLSI implementation for performing fixed point two s complement addition. The proposed adder has a hybrid architecture where the generation of the sum bits is separated from the generation and propagation of the carry bits. The sum bits for every group of (consecutive) bit positions are obtained through a carry-select scheme. The incoming carries into these groups are generated by a binary carry-look-ahead tree. We achieve a substantial reduction in the delay of the carry-look-ahead tree and in the number of transistors required for its implementation by introducing a new pair of variables, namely, carryoriginate and carry-propagate variables. The new carry-originate variable includes the conventional carry-generate variable as a special case. The theoretical framework developed here leads to the definition of a fundamental carry-selection operator which can be implemented using a fast and compact multiplexor-based circuit. By employing this multiplexor-based selection circuit the delay of each level of the look-ahead tree is reduced to that of a single gate. The block (group) size for the generation of sum bits through the carry-select scheme is determined in such a way that the two alternative subsets of sum bits are ready prior to the generation of the block s incoming carry. As a result, the delay of the proposed adder is equal to 4 + log 2 n gate delays when adding two n-bit operands. A comparison with other recently proposed designs such as an adder utilizing intermediate signed digits and the adder in Digital s ALPHA processor demonstrates that our design is significantly faster for all wordlengths; and, equally important, requires fewer transistors for wordlengths equal to and larger than 32. The architecture presented in this paper is being patented.

2 I Introduction Despite the fact that VLSI technology is now mature and complex arithmetic operations can be directly implemented in hardware, something as basic as the operation of addition still continues to fascinate researchers. This is demonstrated by the continual efforts to synthesize ever faster and/or more compact VLSI adders [1], [2], [3], [7], [8], [12], [16], [17], [18], [19]. Undoubtedly, these efforts are well directed and worthwhile because the addition is the most fundamental arithmetic operation. Many other arithmetic operations are implemented with additions and shifts as basic steps. Conventional high-speed adders include carry-look-ahead [19] and its variants [1, 5, 13]; conditionalsum [17]; carry-select; and carry-skip [12] along with its derivatives [2, 3, 7]. To obtain further speed enhancement, it is possible to combine two or more of these techniques (such as carry-look-ahead and carry-select) as illustrated by the designs in [4, 8, 18]. Most of these techniques are independent of the technology used to implement the actual hardware and represent modifications at the algorithm level. Given an algorithm, elaborate optimizations are often employed at the implementation level in order to extract the fastest circuit at the lowest possible hardware cost. Various implementations of Manchester carry chains, odd-even ripple cells [6] etc., are examples of enhancements achieved via circuit optimizations. If the underlying algorithm can be modified so as to best match a given technology, the potential speed enhancement and hardware savings can exceed those that can be obtained by either algorithm modification alone or by circuit optimizations alone. For example, Ling [13] re-writes the basic look-ahead equations in order to reduce the hardware cost (fan-in). The design in [18], on the other hand employs an intermediate signed-digit representation and then utilizes a tree of multiplexors in order to achieve speed enhancement. It is well known that a multiplexor (MUX) is cheap to implement in CMOS technology (in both static as well as dynamic CMOS circuits). In fact, the (dynamic CMOS) adder in Digital Equipment Corporation s ALPHA AXP microprocessor utilizes the conditional-sum technique because their clever implementation of dual MUXes makes it possible to select both possible outcomes with minimal delay and hardware cost [4, 9, 10]. The multiplier design in [14] also exploits the fact that MUXes are faster and require fewer transistors in order to realize fast addition. Like the design in [18] it also utilizes an intermediate signed-digit representation. In this paper we present a CMOS architecture that is extremely fast and very economical in terms of the number of transistors. We employ a combination of carry-look-ahead and carry-select techniques. In particular, we express the basic carry-look-ahead equations in a different form so that those can be implemented by MUXes. We demonstrate that the conventional and the MUX-based implementations are interchangeable, thereby allowing the designer to freely mix and match and select the most suitable implementation anywhere in the look-ahead tree that employs conventional generate and propagate variables. Furthermore, we introduce a generalized form of the fundamental carry operator as well as a new pair consisting of carry originate and propagate variables. The new bit-wise originate variable introduced in 1

3 this paper includes the conventional generate variable as a special case, and leads to a significant saving in the number of transistors required, without affecting the critical path delay. The rest of the paper is organized as follows. The next section briefly summarizes background material on carry look-ahead techniques and the recursion unrolling therein. Section III builds the theoretical framework by defining new originate and propagate variables and establishing the recursion relations those variables satisfy. Section IV describes the delay model; presents the implementation of basic building blocks; the structure of our look ahead tree and the resultant adder architecture. Section V compares and contrasts our architecture with some others published in the literature. The comparison clearly demonstrates that our architecture is significantly faster and requires fewer transistors than those in [4] and [18]. The last section presents discussion and conclusions. II Background Carry-look-ahead schemes for addition are well known [9]. Let A = (a n 1 a n 2 a 0 ) and B = (b n 1 b n 2 b 0 ) be the two operands to be added, with a i and b i being the operand bits in the ith bit position. Here, a n 1 and b n 1 denote the most significant bits (MSBs), while a 0 and b 0 denote the least significant bits (LSBs). Let c i and c i+1 denote the carries into and out of the ith bit position, respectively. The Boolean carry propagation equation is c i+1 = a i b i + c i (a i + b i ) = G i + P i c i where (1) G i = a i b i and (2) P i = a i + b i or P i = a i b i (3) Here, G i and P i are the well-known generate and propagate terms. In the above equations (and in all other Boolean equations in the rest of the manuscript) a or a product term without any symbol between the literals represents a logical AND operation and a + indicates the logical OR. Following the literature, the symbol is used to represent the Exclusive-OR (XOR) operation and a bar over a literal is used to indicate its complement. Henceforth, all the equations in the manuscript are Boolean, unless stated otherwise. Also, the words variable, term and signal are used synonymously, since a Boolean variable or term can be directly associated with a physical (binary) signal. The propagate and generate signals for a group of bit positions i, i 1,, j (where i j) are denoted by P i:j and G i:j, respectively, and can be expressed in terms of the bit-wise signals as follows: P i:j = P i P i 1 P j G i:j = G i + P i G i 1 + P i P i 1 G i P i P i 1 P j+1 G j (5) (4) Any fast addition scheme must unroll the recursion inherent in the basic carry propagation equation (??) above. This is done with the help of the following properties satisfied by the generate and propagate 2

4 variables: P i:j = P i:m P m 1:j G i:j = G i:m + P i:m G m 1:j where i m j + 1 c i+1 = G i:j + P i:j c j (6) (7) (8) Instead of dealing with each of the group carry functions separately, the pair (G i:j, P i:j ) can be used along with a Boolean operator often called the fundamental carry operator [1, 10] henceforth denoted by, and defined by (P, G) ( P, G) = (P P, G + P G) (9) In terms of this operator, equations (??) to (??) can be generalized as follows [10] (P i:j, G i:j ) = (P i:m, G i:m ) (P v:j, G v:j ) where i m j + 1 and i v m 1 (10) (P i:j, c i+1 ) = (P i:j, G i:j ) (1, c j ) (11) Unrolling this type of affine recursion leads to a tree structure as illustrated by the implementation in [1]. At each level of the tree, the generate and propagate signals for smaller groups are combined into larger groups, as per equations (??) and (??). As is commonly done, we assume that the delay of an inverter, a two input NAND/NOR gate and a transmission gate is approximately the same and designate it to be 1 unit of delay (the delay model is explained in detail in Section IV). Under this assumption, the delay required for the fundamental carry operation is two units when implemented directly in the form shown in equations (??) and (??). Note that in CMOS technology, an AND must be implemented as a NAND followed by an inverter. Hence the generation of group propagate signal from its subgroups also requires two units even though it is a simple AND operation. Similarly, the synthesis of the group generate signal requires more than 1 unit of delay (it can be implemented with 2 units of delay). Thus, the delay associated with the look-ahead tree is O(2 log 2 n) where n is the word length of the input operands [1]. III Theoretical Framework In this section we demonstrate that expressing the fundamental carry operation in a different form makes it possible to implement it with a MUX, which requires a single unit of delay. We define the originate and propagate signals for such a scheme and show their relationship with their conventional counterparts. We show that the originate and propagate signals for a group can be synthesized from those of two subgroups with a single unit of delay. Hence the delay of our look-ahead tree is O(log 2 n). As 3

5 illustrated in Sections IV and V, this translates into significantly smaller critical path delay for all word lengths. We begin with the carry propagation equation (??), which can be expressed in the following form: c i+1 = (a i b i ) Q i + (a i b i ) c i = P i Q i + P i c i where (12) P i = a i b i and (13) a i b i or Q i = a i or (14) b i Equation (??) can be implemented as a MUX, where the signal P i selects between the incoming carry c i and the originate signal Q i. Figure 1 illustrates a transmission gate (TG) based static CMOS MUX implementation of the above equation. Figure 2 illustrates a direct implementation of c i+1. It is a straightforward fully complementary static CMOS circuit that actively restores the output signal to the proper logic level. This is different from the circuit in Figure 1, since the transmission gate is a passive device. Assuming that the select signals P i and P i are available simultaneously with or before the inputs Q i and c i ; the delay of the TG-based implementation shown in Figure 1 is one unit. The inversion of output caused by the circuit in Figure 2 does not cost any extra delay, as shown later in section IV: it is possible to have inversion at every level and operate on complementary signals. In other words, it is not necessary to retrieve c i+1 explicitly in an uncomplemented form by adding an inverter to the circuit of Figure 2. Hence, the delay of the circuit in Figure 2 is equivalent to the delay for charging or discharging through at most 2 transistors in series, or that of a two input NAND/NOR gate, i.e., 1 unit. However, the TG based implementation needs only 4 transistors, while the actively restoring circuit in Figure 2 needs 8 transistors, which is a substantial increase in view of the fact that MUXes are used at every bit position and at all levels of the look-ahead tree. Note that the P i as defined by equation (??) is the digit sum (modulo 2). If the digit sum is 1 then the carry-out equals the carry-in, or in other words, the incoming carry propagates. If the digit sum is 0 then the operands a i and b i are both 0 or both 1. A carry is generated if both are one, which implies that Q i should be set to 1. The interesting point is that one need not restrict Q i to the bit-wise AND. The originate signal Q i is selected only when P i is zero. In this case, a i = b i and hence a i b i = a i = b i. Thus, one can use any of the bits a i or b i by itself in place of (a i b i ). This saves the NAND gate and inverter required at every bit position in order to generate the bit-wise AND. In fact, the pair (P i, Q i ) contains all the information that the pair (a i, b i ) has, when Q i = a i (or b i ). P i indicates whether or not the bits are identical. This together with the value of one bit is sufficient to retrieve the value of the other bit. Note that in order to allow Q i = a i (or b i ), we must restrict P i to a i b i, while the conventional carry-look-ahead scheme allows P i to equal a i + b i as well (see equation (??)). Throughout the rest of the manuscript, P i is 4

6 therefore restricted to the XOR operation defined by equation (??). that We now derive the originate and propagate signals for a group of two bits. To this end, first observe P i Q i = (ā i bi + a i b i ) Q i = a i b i = G i (15) when Q i assumes any of the three values as defined in equation (??). For bit position 0 (i.e., i = 0), we obtain c 1 = P 0 Q 0 + P 0 c 0 (16) and for bit position 1, c 2 = P 1 Q 1 + P 1 c 1 = P 1 Q 1 + P 1 ( P 0 Q 0 + P 0 c 0 ) (17) = P 1 Q 1 + P 1 P0 Q 0 + P 1 P 0 c 0 Equation (??) can be re-written as c 2 = (P 1 P 0 ) ( P 1 Q 1 + P 1 Q 0 ) + (P 1 P 0 ) c 0 (18) or, in terms of the group propagate and originate signals as c 2 = P 1:0 Q 1:0 + P 1:0 c 0 (19) where P 1:0 = P 1 P 0 is the propagate term for the group of two bit positions and Q 1:0 = P 1 Q 1 + P 1 Q 0 is the originate term for the group. Equation (??) has the same form as the right hand side of equation (??). Similarly, c 3 can be derived as c 3 = P 2 Q 2 + P 2 P1 Q 1 + P 2 P 1 P0 Q 0 + P 2 P 1 P 0 c 0 (20) = (P 2 P 1 P 0 ) ( P 2 Q 2 + P 2 P1 Q 1 + P 2 P 1 Q 0 ) + (P 2 P 1 P 0 ) c 0 = P 2:0 Q 2:0 + P 2:0 c 0 where P 2:0 = P 2 P 1 P 0 and (21) Q 2:0 = P 2 Q 2 + P 2 P1 Q 1 + P 2 P 1 Q 0 (22) 5

7 From the above definitions, one can derive the following general expressions: P i:j = and Q i:j = P i P i 1 P j when i > j P i when i = j P i Q i + P i Pi 1 Q i 1 + P i P i 1 Pi 2 Q i P i P i 1 P j+1 Q j+1 + P i P i 1 P j+1 Q j when i > j Q i when i = j (23) (24) Using the relation G k = P k Q k, equation (??) for the conventional group generate term G i:j can be re-written as G i:j = P i Q i + P i Pi 1 Q i P i P i 1 P j+1 Q j+1 + P i P j+1 Pj Q j (25) Note that the only difference between Q i:j and G i:j as defined by equations (??) and (??), respectively, is that the last term in the latter contains the literal Pj which is not present in the last term in (??). The group originate and propagate signals defined above satisfy the following property: Lemma 1 : Pi:j Q i:j = G i:j for i j. Proof : Equation (??) states that P i:i Q i:i = G i:i and takes care of the case where i = j. Next, consider the case when i > j. In this case, P i:j Q i:j = ( P i + + P j ) ( P i Q i + P i Pi 1 Q i P i P j+1 Q j+1 + P i P j+1 Q j ) (26) We have P i Q i:j = P i Q i + 0 = P i Q i (27) P i 1 Q i:j = P i Q i Pi 1 + P i Pi 1 Q i = P i Q i Pi 1 + P i Pi 1 Q i 1 (28) From equations (??) and (??) above, we obtain ( P i + P i 1 ) Q i:j = P i Q i + P i Pi 1 Q i 1 (29) 6

8 In an identical manner, it follows that ( P i + P i 1 + P i 2 ) Q i:j = P i Q i + P i Pi 1 Q i 1 + P i P i 1 Pi 2 Q i 2. ( P i + P i 1 + P i P j+1 ) Q i:j = P i Q i + P i Pi 1 Q i P i P i 1 P j+1 Q j+1 ( P i + P i 1 + P i P j ) Q i:j = P i Q i + P i Pi 1 Q i P i P i 1 P j+1 Q j+1 + P i P i 1 P j Q j = G i:j (30) which proves the desired result. From equation (??) and Lemma 1, the group carry-out can be expressed as c i+1 = G i:j + P i:j c j = P i:j Q i:j + P i:j c j (31) which can be implemented by a MUX. Thus far, we have expressed the group originate and propagate signals Q i:j and P i:j in terms of the bit-wise originate and propagate signals Q m and P m where i m j. While this is necessary, it still does not tell us how to combine the originate and propagate signals of two smaller groups into those of a larger group. This operation is essential for a carry-look-ahead tree where smaller groups are combined into successively larger groups at each level of the tree. The following theorem establishes the required relations. Theorem 1 : The group propagate and originate variables satisfy P i:j = P i:m P v:j and (32) Q i:j = P i:m Q i:m + P i:m Q v:j (33) where i m j, i v j and v m 1 Proof : The proof of equation (??) is straightforward and is omitted for the sake of brevity. To prove equation (??), first re-write it using Lemma 1 as follows Q i:j = G i:m + P i:m Q v:j (34) Expanding the right hand side of the above equation using relations (??), (??) and (??) that define G i:j, P i:j and Q i:j, respectively, we get G i:m + P i:m Q v:j = P i Q i + P i Pi 1 Q i P i P i 1 P v Q v P i P i 1 P m Q m (35) + P i P v P m ( P v Q v + P v Pv 1 Q v P v P v 1 P m Q m ) + P i P v P m (P v P m 1 Q m P v P j+1 Q j+1 + P v P j+1 Q j ) 7

9 Note that the expression on the second line of the above equation reduces to zero. Using this fact and re-writing the expression on the third line of the above equation, we obtain G i:m + P i:m Q v:j = P i Q i + P i Pi 1 Q i P i P i 1 P v Q v P i P i 1 P m Q m (36) = Q i:j +P i P i 1 P m Pm 1 Q m P i P j+1 Q j+1 + P i P j+1 Q j which completes the proof. This theorem enables us to define a fundamental carry-select operator, henceforth denoted by the symbol, analogous to the conventional fundamental carry operator defined above by equation (??), as follows: (P, Q) ( P, Q) = (P P, P Q + P Q) (37) The relations stated in Theorem 1 that are satisfied by the group variables P and Q can be rewritten using the fundamental carry-select operator in the following manner (P i:j, Q i:j ) = (P i:m, Q i:m ) (P v:j, Q v:j ) where i m j and i v m 1 (38) (P i:j, c i+1 ) = (P i:j, Q i:j ) (1, c j ) (39) The above equations indicate that one can employ multiplexors to implement the look-ahead tree. Next, we elaborate on the relation between the conventional (P, G) variables and the (P, Q) variables introduced here. Note that the variable Q i includes G i as a special case. Let Q s i = a i b i = G i (40) denote Q in the special case when it is selected to be the bit-wise AND (the superscript s indicates that this is a special case). The variable G i:j (which is the same as Q s i:j ) satisfies the following property: Lemma 2 : Pi:j Q s i:j = P i:j G i:j = G i:j = Q s i:j for i j The proof is identical to that of Lemma 1 and is therefore omitted for the sake of brevity. In view of the above lemma, it is clear that the fundamental carry-select operator can also serve in place of the fundamental carry operator, i.e., the P and G variables also satisfy (P i:j, G i:j ) = (P i:m, G i:m ) (P v:j, G v:j ) where i m j and i v m 1 (41) (P i:j, c i+1 ) = (P i:j, G i:j ) (1, c j ) (42) 8

10 The conventional fundamental carry operator can therefore be generalized to include both the operators, viz., and defined above. This flexibility of the generalized fundamental carry operator has important implication with regard to the actual implementation, viz., that one can freely mix-and-match the conventional implementation (i.e., the one that directly implements equation of the form of (??) with AND and OR gates) and the MUX-based implementation anywhere in the look-ahead tree. Thus, the (P, G) pair is more flexible and better suited than the (P, Q) pair. However, Q i can be set to one of the operand bits a i or b i, thereby saving a significant number of transistors. Furthermore, one can switch from Q variables to G variables with a simple AND operation. We would like to point out that using Q i G i has significance only at the first level of the look-ahead tree. From the second level onwards, the (P, G) pair is more desirable since it lends itself to both the conventional as well as the MUX implementation. However, the conversion from Q to G leads to extra delay in the total addition time. To avoid this delay, we deal only with the Q variables in our design. 9

11 IV Implementation (A) Delay Model : While a time delay measured from an actual chip or a layout is desirable, raw nanoseconds do not always give an idea of the complexity of the underlying circuit. Also, the absolute delays change with technology, materials and processes. On the other hand, if the delay is specified in terms of equivalent gate delays, then it gives an idea of how many levels of logic the signal might have to traverse or what is the complexity of the underlying circuit. If the gate delay estimation is done realistically and correctly, it becomes somewhat independent of technology. Everything is relative to the delay of a basic unit such as a two input NAND/NOR gate which can be estimated fairly accurately for a given technology. Thus the gate delay model can be used to estimate delays when comparing different algorithms and has been used very extensively in the literature. We have therefore adopted it in this manuscript. Majority of the references cited here have also used the gate delay model, allowing us to make a meaningful comparison. As is commonly done, we assume that the delay of an inverter, a 2 input NAND/NOR gate, and a transmission gate is approximately the same and designate it to be 1 unit delay. If the number of transmission gates cascaded in series is not too large, then the delay per transmission gate can be assumed to be 1 unit. A chain of 4 transmission gates has a delay of 4 units under this assumption. If more gates are cascaded in series, the delay could increase in a nonlinear (superlinear) manner. For instance, the number of series devices is limited to 8 in the adder in Digital s ALPHA-AXP processor. In any case, a TG-based multiplexor can be replaced with the inverting MUX structure shown in Figure 2, which actively restores the logic levels. This leads to an inversion of the output, but it turns out that the critical path delay is not affected by this inversion since it is possible to use the true and complemented form of the relevant signals at alternate levels of the look-ahead tree. Note that the inverted outputs generated by the MUX shown in Figure 2 can be directly used at the next level. This is briefly explained as follows. Suppose that a normal MUX selects between signals A and B normal MUX output = SA + SB (43) If only the complemented signals Ā and B are available and one uses an inverting MUX, the resulting output is inverting MUX output = ( S Ā + S B) = SA + SB + AB (44) = SA + SB = normal MUX output This obviates the needs for extra inverters thereby reducing the critical path delay. In the circuit shown in Figure 2, the critical path delay is associated with a pull-up or pull-down through at most two series transistors and is therefore equivalent to the delay of a two input NAND/NOR gate or 10

12 1 unit. If the delay of a cascaded chain of transmission gates becomes superlinear, one can always use the inverting implementation, whose delay grows in proportion to the length of the series chain. Thus, there is no delay penalty if one uses the inverting MUX instead of a transmission gate MUX, but the number of transistors required goes up from 4 to 8. Our design utilizes 2 input XOR/XNOR gates that can be implemented in fully complementary logic, which requires 12 transistors and 2 units of delay, or with a transmission structure [20]. The latter implementation needs 6 transistors when both the inputs are available only in the true (uncomplemented) form and 4 transistors when at least one of the inputs is available in both true and complemented form. A TG-based implementation of XOR/XNOR [20] has an inverter followed by a transmission structure which makes the total delay closer to 2 units than 1. Hence, the delay associated with an XOR/XNOR gate is assumed to be 2 units. The extra inverter in the TG XOR circuit causes the delay associated with the two inputs to be unequal: the delay of the input signal that has to propagate through the inverter and the transmission structure is 2 while the delay of the signal that has to propagate only through the transmission structure is (approximately) 1 unit. Finally, we assume that the delay associated with complex CMOS gates with up to 4 independent inputs is 1.5 units. This abstraction is commonly used in the literature [18, 11, 15]. The AND-OR-INVERT (A-O-I) and OR-AND-INVERT (O-A-I) gates and majority gates are examples of complex gates whose delay is roughly 1.5 units. (B) Architecture : We use a look-ahead tree to generate the carries with the shortest possible delay. The blocking factor or fan-in of our tree is two at each level. In other words, we always combine P and Q signals of two smaller groups to generate the P and Q signals of the larger group at each node of the look-ahead tree. This differs from a conventional implementation where the group P and G signals for four smaller groups are combined into a larger group, at each node of the look-ahead tree. It turns out that for the static CMOS technology, such a binary tree is the fastest possible way of generating the carries for the word lengths of practical significance (less than 1024). Generation of a carry for each individual bit position can be avoided by using the carry-select technique. Here, the input bits are grouped into blocks and a ripple-carry addition is performed for each block assuming both possibilities, viz., the incoming carry is 0 and 1. When the actual carry input to a block becomes available (it is generated by the look-ahead tree), it is used to drive a block multiplexor that selects the correct output. This way, the need to generate a carry input for each individual bit position is obviated, thereby reducing the total critical path delay. The larger the block size, the fewer the number of carries to be generated; but the time delay required for the ripple-carry addition through a block increases with its size. The optimal size of the block should therefore be selected in such a way that the delay of the ripple-carry addition through the most significant block and the delay associated with the generation of the carry input to that block are equal or as balanced as possible, so that all the inputs to the block multiplexor arrive at approximately the same time. Later in this section, we tabulate the optimal block size as a function of operand word length. For the time being, consider a block size b = 4, purely for the purpose of illustration. The ripple-carry adder for a group of 4 11

13 bit positions is illustrated in Figure 3. In this figure, the XOR/XNOR gates that generate the final sum output bits are implemented with fully complementary logic (as opposed to transmission gate logic) and require a delay of 2 units, leading to a total delay of 5 units for the ripple-carry addition. We would like to point out that the selection of the fan-in factor of our look-ahead tree is totally independent of the choice of the block size b for the ripple-carry addition. For instance, the optimal block size changes along with the word length. The fan-in of the look-ahead tree, however, remains two for all these word lengths. As mentioned above, the generation of the group P and Q signals is accomplished with a binary lookahead tree. Figure 4 depicts the generation of the group P and Q signals for a group of 4 bits. The generation of the group Q signal is achieved by a two level binary tree of MUXes. At each node of the tree, a MUX is used to select between the incoming carry and the generated carry, based on whether or not the digit sums, i.e., the P s in that group are all ones. Since both P and its complement P are required to drive the MUXes, both these signals are generated at all levels of the tree. P i:j = ( P i:m + P v:j ) and (45) P i:j = (P i:m P v:j ) where i m j and i v m 1 (46) Note that equation (??) can be implemented by a single NOR gate while equation (??) can be implemented by a single NAND gate. If both P s and their complements are not available at each level of the tree, then one cannot synthesize the required signals in a single unit of delay since a direct implementation of an AND requires a NAND followed by an inverter. This way, the group P, P and Q signals are generated simultaneously from those of two subgroups within one unit delay, thereby allowing us to double the group size at the cost of a single unit of delay. The overall architecture of our 32-bit adder is shown in Figure 5. For this wordlength, the optimum block size turns out to be 4. The boxes labeled CSC (Carry Select Circuit) at the top employ the circuit shown in Figure 4 to implement the first two levels of the carry look-ahead tree. The outputs of the CSC blocks are the P, P and Q signals for groups of 4 bits. Note that the group of the 4 least significant bits employs a different circuit labeled CG ( Carry Generator) in place of the CSC circuit. The need for this asymmetry is explained a bit later in this section. The multiplexors labeled M are of the fully complementary inverting type shown in Figure 2. The modules near the bottom that are labeled BM are the block multiplexors that select one of the two sum outputs, depending on the incoming carry signal (the block carry and its complement drive the MUXes to select the proper sum output). The block MUXes BM can be of the transmission gate type, since all their inputs are actively restored (i.e., there are no transmission gates in series with the block MUXes). The delays associated with various signals shown in the figure are indicated with numbers enclosed in curly braces {}. For instance, the signal Q23:16 {3} indicates that it is the complement of the originate variable of the group of bits 16 to 23, and the postfix {3} indicates that this signal is generated 3 units 12

14 after the bit-wise P i and Q i signals become available to the CSC modules. Similarly, c 32 {5} indicates that the complement of the carry out of the MSB is available 5 units after the bit-wise P i and Q i have been generated. The outputs of the block multiplexors are the final sum bits. Note that the circuit in Figure 4 (which implements the first two levels of the look-ahead tree) has two transmission gates in series. Since the P inputs to this circuit are themselves generated using transmission structures (i.e., TG-based XOR/XNOR gates), the multiplexors at the third level of the tree (that select signals out of two CSC blocks) should not be implemented using transmission gates, otherwise, too many of them get cascaded in series. Hence, we use the fully complementary inverting multiplexor shown in Figure 2. From the third level onwards, the fan-out load on the MUX outputs also increases. Hence, we choose all the MUXes in the rest of the look-ahead tree to be of the inverting type since it actively restores the signals and also makes it possible to drive larger loads by properly sizing the transistors. The critical paths in the carry generation tree traverse via the inputs to the CSC circuits and the multiplexors that select the block carry-outs for all blocks in the most significant half. It is seen that c 16 is available after 4 units (since log 2 16 = 4 and we require unit delay per level of the look-ahead tree). While the computation of c 16 is being performed in the least significant half, the group propagate and originate signals for all the bit groups in the most significant half, viz., P 19:16, Q 19:16 ; P 23:16, Q 23:16 ; P 27:16, Q 27:16 and P 31:16, Q 31:16 are also computed in parallel (this is clearly illustrated in Figure 5). Consequently, all that remains is the computation of the carries c 20, c 24, c 28 and c 32 from the corresponding group P, P and Q signals and the carry c 16, which can also be done in parallel. Thus there is only one extra level of MUXes after the signal c 16, leading to a total of 5 levels in the look-ahead circuit. Generation of the complements of these carry signals requires one extra unit of delay (that of an inverter). Finally, the block multiplexors take one more unit of delay to give a total critical path delay of 7 units (from the time the inputs to the CSC blocks on top become available, to the generation of the sum outputs). Note that the delay associated with a 4 bit ripple-carry adder is 5 units, while the carry signals (and their complements) are available after 6 gate delays. So the ripple-carry adders are not on the critical paths. The least significant CG module (corresponding to bit positions 0 through 3) is slightly different from all others. This is done in order to shorten the critical path as explained next. The carry input to the least significant bit, viz., c 0 is typically 0 for a normal two operand addition. This is a fixed constant known beforehand and hence, there is no need to select between this constant and the originate signal Q 3:0 for the group. Such a selection would put one more MUX in the critical path, and increase the delay by one unit. In particular, note that c 4 = G 3:0 + P 3:0 c 0 = G 3:0 when c 0 = 0 (47) Hence, instead of generating Q 3:0 it suffices to generate G 3:0. This can be accomplished by setting the originate signal Q 0 to the logical AND of the inputs, a 0 and b 0, i.e., Q 0 = a 0 b 0. From equation (??), and 13

15 Lemma 1, note that G i:j = P i:m Q i:m + P i:m G v:j (48) In other words, if one replaces the lower order originate variable Q v:j in equation (??) by the generate variable G v:j, then the output is G i:j instead of Q i:j. We can exploit this fact to generate G 3:0. Note that setting Q 0 = a 0 b 0 = G 0 in Figure 4 causes multiplexor M1 in the CSC block to generate G 1:0 instead of Q 1:0 as per the above equation. As a result, multiplexor M3 in Figure 4 generates G 3:0 instead of Q 3:0 because the lower order generate term G 1:0 is used instead of Q 1:0. This scheme reduces the critical path delay at the expense of flexibility since the carry-in c 0 is not allowed to be an independent external input. As shown next, the block CG can be further modified to allow a variable external carry-in c 0 without affecting the critical path delay. A carry-in of 1 might be required, for instance, when performing a subtraction or a multi-operand addition. Subtraction A B is achieved by taking the bit-wise complement of B and adding a 1 to the least significant bit (LSB) position. The addition of 1 in the LSB position can be accomplished by setting the carry c 0 to 1. To see that a variable carry-in can be handled without affecting the total delay, note that in Figure 4, if the signal Q 0 is replaced by the actual carry c 1, then multiplexor M1 generates P 1 Q 1 + P 1 c 1 = c 2 (49) i.e., the true carry c 2. This, in turn, causes multiplexor M3 to generate P 3:2 Q 3:2 + P 3:2 c 2 = c 4 (50) i.e., the true carry c 4 as required. Thus, the problem now reduces to whether the carry c 1 (or its complement) can be synthesized without increasing the critical path delay, so that it can be used in place of Q 0. Note that the bit-wise propagate signals P i and P i are the XOR/XNOR functions whose realization requires 2 units of delay, as explained earlier. The propagation of signals in the look-ahead tree begins after the P and P signals become available. Therefore, if the carry c 1 (or its complement) can be generated within a delay smaller than two units, the total critical path delay remains unaffected. Fortunately, it turns out that c 1 can be implemented with a delay smaller than 2 units by employing the inverting majority gate (a complex gate) as illustrated in Figure 6. This gate realizes the complement of the majority function (a 0 b 0 + b 0 c 0 + c 0 a 0 ). Note that the worst case delay for this circuit is equivalent to that of a pull-up or pull-down through no more than 3 series transistors, or 1.5 units. Retrieving the uncomplemented c 1 by adding an inverter causes additional delay which would defeat the purpose. Hence we employ the complement Q 1 along with an inverting MUX M to obtain the carry c 2 in the true (uncomplemented) form. The resultant architecture of the CG block is shown in Figure 7 and this is the circuit that is used in the overall block diagram shown in Figure 5. 14

16 Next, we describe the selection of optimal value for the block size b for a given word length W. Recall that the carry look-ahead structure is a binary tree with a total critical path delay of (1 + log 2 W ) units, where x denotes the smallest integer greater than or equal to x and the constant 1 arises because the block multiplexors need both the incoming group carry signal and its complement to select the final sum bits. Some of these inverters (those driving the block MUXes in the most significant half) fall on the critical path leading to one additional unit of delay. The block size is chosen in such a way that the ripple addition through a block completes before the incoming carry becomes available (ideally these inputs to the block MUXes should arrive at the same time, but that is not always feasible). The optimal values of b for different wordlengths are summarized in Table 1 below. Word Length Ripple-carry block size b Delay of ripple-carry addition Delay of carry through b bits look-ahead tree Table 1: Optimal ripple-carry block size b as a function of word length of the input operands. The delays refer to the time required to generate the inputs to the block multiplexors that select the final sum bits, from the time the digit wise P and Q signals become available. Our investigation revealed that going to unequal block sizes cannot shorten the overall critical path delay beyond the values indicated in Table 1 (under the delay model discussed earlier). For instance, increasing b to 5, 6 or 7 in the most significant half for W = 32 or W = 64 does not lead to any further speed up because such unequal block sizes increase the delay through the look-ahead tree by 1 unit (as compared to that when the block size is uniform and equals 4). The extra delay is incurred due to the asymmetry introduced in the bit positions for which a group carry in signal is to be generated. Only when the delay of the look-ahead tree becomes long enough to be comparable to that of a ripplecarry addition through 8 bits does the block size b = 8 become optimal. The delay of ripple-carry addition through 8 bits is 9 units. Hence, a block size of 8 becomes optimal only for word lengths equal to and beyond 256 bits, that are of limited practical significance. We would like to point out that the delay model ignores fan-out loads. If fan-out loading is considered, then transistor sizing for higher drive capability and/or extra buffering might be necessary, thereby increasing the delay of our 64 bit adder tree by an extra unit or two. In that case, larger block size in the most significant half might turn out to be better. However, this kind of optimization depends on the actual circuit parameters such as resistance, capacitance, etc. which in turn depend on the layout, fabrication technology, etc. Analysis at this detailed level is impossible without a full layout and is beyond the scope of this manuscript. The main point is that the circuit level and layout level optimizations can be carried out on top of the algorithm improvements presented in this paper. Thus, under the delay model discussed above, it is better to have equal sized blocks with b values 15

17 shown in Table 1. V Comparison The addition times for different designs are listed in Table 2 as a function of the wordlength of the input operands (W ). Parameters Addition Time Delays Our Adders Alternate Design Adder in Digital s Word Our Block Proposed (Carry select + Adder DEC ALPHA-AXP Length size Design Binary look-ahead in [18] processor [4] W b tree without (conditional sum) MUXes) Table 2: Comparison of addition times (gate delays) for different designs as function of operand wordlength W. The delay for the adder in ALPHA-AXP processor is estimated based on the block and circuit diagrams presented in [4]. It was assumed that a ripple propagation through 8 bits in that design takes 8 units of delay, and that the cascode dual MUX switches have a delay of 1 unit. All buffering and latching delays have been ignored. It is seen that the use of MUXes enables doubling of word length for 1 unit of additional delay. One of the main reasons which makes that design slower is the fact that the ripple-carry addition through 8 bits takes a long time. Column 5 in Table 2 lists delays for the adder in [18]. As pointed out at the beginning of this section, it is more realistic to assume the delay of an XOR/XNOR gate to be 2 units. This assumption causes the delay values listed in the table above to be 2 units higher than the corresponding delay values originally presented in [18]. It is seen that our adder is faster than that of [18] for all wordlengths. For 32 and 64 bit operands, the speed improvements are 33% and 31%, respectively. The adder in [18] is slower than ours because (i) (ii) The ripple through 8 bit positions is slow. The design in [18] inserts inverters in the tree of MUXes to retrieve the signs in true (uncomplemented) form. This is not required and can be avoided as explained in Section IV. Thus, our look-ahead tree architecture is faster than that of [18] for all word lengths. 16

18 (iii) The generation of the signals that feed the look-ahead tree in [18] takes longer delay (4 units as opposed to 2). Column 4 of Table 2 lists the delay of an alternate architecture which is the same as the proposed (optimal) one in all respects (i.e., the block size for ripple carry addition and overall architecture of the look-ahead tree) except that it implements equations (??) and (??) directly, without using MUXes. In the alternate design, the complement of G i:j is synthesized in 1.5 units of delay by using an AND-OR-INVERT (AOI) complex gate with 3 inputs, viz., G i:m, P i:m and G m 1:j. The inversion inherent in CMOS causes no extra delay and nodes at all levels of the tree can be inverting, thereby yielding a delay of 1.5 units per level. We decided to present this alternate architecture for comparison instead of the binary look-ahead design presented in [1] for two reasons: (1) It clearly pinpoints the advantage gained by the use of MUXes even though everything else remains the same: only 1 unit of delay is required per level of the look-ahead tree, resulting in a saving of 0.5 unit per level (as compared with a design that uses NAND/NOR gates instead of MUXes to implement equations (??) and (??)). (2) The original design in [1] does not employ carry selection at all. It calculates carries for each individual bit position. This causes the number of levels in the look-ahead circuit to grow considerably. For a 2 w bit adder, the number of levels in the carry generation circuit in their design is (2w 1) because the carries of successively larger groups are computed in w levels and then an inverted tree is used to calculate the carries for all the intermediate individual bit positions. This makes the design in [1] very slow and hence it was decided that a direct comparison with this design is not particularly interesting. An inverted look-ahead tree is unnecessary when the carry-selection technique is employed. Consequently, our architecture of a 2 w -bit adder has only w levels in the look-ahead circuit. While the carry out of the least significant half, i.e., c 2 w 1 is being computed, the P and Q signals for all the bit blocks in the most significant half are also computed in parallel. In fact the generation of c 2 w 1 and all the P and Q signals in the most significant half finishes at exactly the same time, i.e., after w 1 levels of the tree. Consequently, once c 2 w 1 is available, the carry-in signals for all the bit blocks in the most significant half are calculated in parallel at the last level of the look-ahead circuit. A special case of this general property of our architecture is illustrated by the 32-bit adder shown in Figure 5. For this case, w = 5 and c 16 is available after w 1 = 4 time units. All the carries in the most significant half are then generated in the last level of the look-ahead circuit, leading to a delay of w = 5 units (for signals to propagate through the tree) The alternate design thus demonstrates that one need not be restricted to P and Q signals only: it is possible to have the same (w) levels in the look-ahead tree and use the conventional P and G signals along with the fully complementary, actively restoring inverting implementation of equation (??). 17

19 It is seen that our alternate design with the conventional P and G signals is faster than the design in [18]. This happens despite the fact that the conventional design needs 1.5 units of delay per level of the look-ahead tree while the design in [18] employs MUXes requiring only 1 unit of delay per level of the tree. The main reason is that the ripple-carry addition through 8 bits in that design is very slow and offsets the speed advantage gained by using MUXes. Thus, the use of MUXes alone does not fully optimize the delay, the architecture of the look-ahead tree must also be optimized in order to achieve the best possible result. Table 3 compares the transistor count estimates for different designs. The transistor counts for the designs in [18] are reproduced from Table III therein, for the sake of comparison. Word Our Adder (Proposed Design) Adder Our Adder Length (W) Allowing up to 4 Allowing up to 6 in [18] Alternate series transmission gates series transmission gates Design Table 3: Transistor counts for different adders for different wordlengths. Table 3 shows that our adder (proposed design) needs a smaller number of transistors than that required by the adder in [18] for wordlengths equal to and larger than 32 bits. It should be pointed out that the design in [18] has up to 11 transmission gates cascaded in series. When transmission gates are replaced by their fully complementary actively restoring circuits, the transistor count goes up significantly. For example, the XOR gates that generate the sum output bits in the ripple circuit in Figure 3 can be implemented very economically with merely 4 transistors each, if one utilizes transmission structures. The delay is likely to be small ( 1 unit) as well, since both P and P signals are already available and hence can be put on the slow input of the TG XOR circuit, leaving the late arriving ripple carry on the fast input. A fully complementary implementation, on the other hand needs 10 transistors and has a delay of 2 units. Even when we limit the number of series transmission gates to 4, our 32 and 64 bit adders require 7% and 4% fewer transistors than the corresponding design in [18]. If we limit the number of series transmission gates to 6 (which is reasonable, especially as the feature size continues to shrink) the improvements in savings in the number of transistors are substantial: 28% and 25%, for 32 and 64 bit adders, respectively. The savings come about mainly because the design in [18] needs too much processing (transistors) at the top to generate the inputs required for the look-ahead tree. The cells that provide the inputs to the look-ahead tree are essentially at the leaf nodes of the tree. This, along with the fact that the number of leaf nodes in a tree is (1 + the number of internal nodes) implies that any saving in transistors at the leaf nodes will lead to a significant saving in the total number of transistors. The last column of Table 3 shows that our alternate architecture (combining carry-select and binary look-ahead with conventional P and G signals), requires a larger number of transistors than our MUX- 18

20 based (optimal and preferred) design. Actually, the transistor counts for the adders excluding the cells that generate the bit-wise P, Q and P, G signals (i.e., transistor counts of all the blocks shown in Figure 5) turn out to be about the same for both the proposed and the alternate design. The main difference is that the proposed design utilizes the Q i variables which require no processing whatsoever since Q i can be set to either a i or b i by itself. The alternate design, on the other hand, must use G i = a i b i, i.e., a bit-wise AND (in reality, a NAND is sufficient). This causes the transistor count of the P i, G i generator cell in the alternate design to be 14, instead of the 10 transistors that are required for the P i, Q i generation in the proposed design. This leads to a sizable savings in the total number of transistors and clearly brings out the advantage of using the Q variables instead of the G variables. The conditional-sum architecture is expensive in terms of the number of transistors [9]. Since the adder in the ALPHA chip is basically a conditional sum design, its transistor count is likely to be larger than all those presented in Table 3. Comparing its transistor count does not lead to any additional insight and was therefore excluded from Table 3. Tables 2 and 3 clearly demonstrate that our design is both (i) significantly faster than those in [18] and [4]; and (ii) requires fewer transistors. A substantial speedup ratio of 31% or higher is achieved despite the fact that our design needs less hardware. Finally, we compare our design with a recently published dynamic CMOS design [8] which adds two 56 bit operands (56 bits is the length of the significand prescribed in the IEEE standard for doubleprecision floating-point numbers). It employs variable length Manchester chains and several other circuit optimizations to achieve a propagation delay of 1.85 ns through the look-ahead tree (up to the generation of sum bits). However, it is a dynamic CMOS design and the 1.85 ns excludes the delay required to generate the bit-wise G and P signals, as well as the delay required to latch the final outputs. In [8], the delay of a basic two input gate was measured to be about 0.25 ns for a 1 micron technology. Assuming a delay of 2 units to generate the G and P signals and 1 unit to latch the final output, the total delay for that design would be = 2.6 ns. The delay of our 64 bit design, on the other hand would be = 2.5 ns., which is comparable to that of [8]. Note, however, that our design can handle 64 bits. The design in [8], on the other hand, was fine tuned and optimized for a 56 bit addition. Consequently, it is not clear whether that design can be extended easily for a 64 bit addition. Furthermore, if it is required to add 128 bits (perhaps in the final stage of a double precision multiply operation), our design scales up easily while the design in [8] might have to be completely re-done. We would like to stress that a complete comparison is not possible until a detailed layout is available. However, we expect our design to perform better because the speedup and transistor savings come about as a result of improvements at the algorithm and architecture levels. All the circuit optimizations employed in other contemporary designs can also be utilized along with our algorithm to yield even further performance enhancements. VI Conclusion 19

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10, A NOVEL DOMINO LOGIC DESIGN FOR EMBEDDED APPLICATION Dr.K.Sujatha Associate Professor, Department of Computer science and Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, Tamilnadu,

More information

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: Digital Logic Computer Science 324 Computer Architecture Mount Holyoke College Fall 2007 Topic Notes: Digital Logic Our goal for the next few weeks is to paint a a reasonably complete picture of how we can go from transistor

More information

Area-Time Optimal Adder with Relative Placement Generator

Area-Time Optimal Adder with Relative Placement Generator Area-Time Optimal Adder with Relative Placement Generator Abstract: This paper presents the design of a generator, for the production of area-time-optimal adders. A unique feature of this generator is

More information

Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters

Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters Zhaojun Wo and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst,

More information

Hardware Design I Chap. 4 Representative combinational logic

Hardware Design I Chap. 4 Representative combinational logic Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload

More information

CSE140: Components and Design Techniques for Digital Systems. Logic minimization algorithm summary. Instructor: Mohsen Imani UC San Diego

CSE140: Components and Design Techniques for Digital Systems. Logic minimization algorithm summary. Instructor: Mohsen Imani UC San Diego CSE4: Components and Design Techniques for Digital Systems Logic minimization algorithm summary Instructor: Mohsen Imani UC San Diego Slides from: Prof.Tajana Simunic Rosing & Dr.Pietro Mercati Definition

More information

Where are we? Data Path Design

Where are we? Data Path Design Where are we? Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath Data Path Design

More information

Logic Synthesis and Verification

Logic Synthesis and Verification Logic Synthesis and Verification Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Fall Timing Analysis & Optimization Reading: Logic Synthesis in a Nutshell Sections

More information

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute DIGITAL TECHNICS Dr. Bálint Pődör Óbuda University, Microelectronics and Technology Institute 4. LECTURE: COMBINATIONAL LOGIC DESIGN: ARITHMETICS (THROUGH EXAMPLES) 2016/2017 COMBINATIONAL LOGIC DESIGN:

More information

Floating Point Representation and Digital Logic. Lecture 11 CS301

Floating Point Representation and Digital Logic. Lecture 11 CS301 Floating Point Representation and Digital Logic Lecture 11 CS301 Administrative Daily Review of today s lecture w Due tomorrow (10/4) at 8am Lab #3 due Friday (9/7) 1:29pm HW #5 assigned w Due Monday 10/8

More information

Lecture 4. Adders. Computer Systems Laboratory Stanford University

Lecture 4. Adders. Computer Systems Laboratory Stanford University Lecture 4 Adders Computer Systems Laboratory Stanford University horowitz@stanford.edu Copyright 2006 Mark Horowitz Some figures from High-Performance Microprocessor Design IEEE 1 Overview Readings Today

More information

Binary addition by hand. Adding two bits

Binary addition by hand. Adding two bits Chapter 3 Arithmetic is the most basic thing you can do with a computer We focus on addition, subtraction, multiplication and arithmetic-logic units, or ALUs, which are the heart of CPUs. ALU design Bit

More information

Logic. Combinational. inputs. outputs. the result. system can

Logic. Combinational. inputs. outputs. the result. system can Digital Electronics Combinational Logic Functions Digital logic circuits can be classified as either combinational or sequential circuits. A combinational circuit is one where the output at any time depends

More information

Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan

Where are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan Where are we? Data Path Design Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath

More information

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing CSE4: Components and Design Techniques for Digital Systems Decoders, adders, comparators, multipliers and other ALU elements Tajana Simunic Rosing Mux, Demux Encoder, Decoder 2 Transmission Gate: Mux/Tristate

More information

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS ARITHMETIC COMBINATIONAL MODULES AND NETWORKS 1 SPECIFICATION OF ADDER MODULES FOR POSITIVE INTEGERS HALF-ADDER AND FULL-ADDER MODULES CARRY-RIPPLE AND CARRY-LOOKAHEAD ADDER MODULES NETWORKS OF ADDER MODULES

More information

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic

Section 3: Combinational Logic Design. Department of Electrical Engineering, University of Waterloo. Combinational Logic Section 3: Combinational Logic Design Major Topics Design Procedure Multilevel circuits Design with XOR gates Adders and Subtractors Binary parallel adder Decoders Encoders Multiplexers Programmed Logic

More information

Building a Computer Adder

Building a Computer Adder Logic Gates are used to translate Boolean logic into circuits. In the abstract it is clear that we can build AND gates that perform the AND function and OR gates that perform the OR function and so on.

More information

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic

More information

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7

EECS 427 Lecture 8: Adders Readings: EECS 427 F09 Lecture 8 1. Reminders. HW3 project initial proposal: due Wednesday 10/7 EECS 427 Lecture 8: dders Readings: 11.1-11.3.3 3 EECS 427 F09 Lecture 8 1 Reminders HW3 project initial proposal: due Wednesday 10/7 You can schedule a half-hour hour appointment with me to discuss your

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Instructor: Mohsen Imani UC San Diego Slides from: Prof.Tajana Simunic Rosing

More information

Novel Modulo 2 n +1Multipliers

Novel Modulo 2 n +1Multipliers Novel Modulo Multipliers H. T. Vergos Computer Engineering and Informatics Dept., University of Patras, 26500 Patras, Greece. vergos@ceid.upatras.gr C. Efstathiou Informatics Dept.,TEI of Athens, 12210

More information

CMOS Digital Integrated Circuits Lec 10 Combinational CMOS Logic Circuits

CMOS Digital Integrated Circuits Lec 10 Combinational CMOS Logic Circuits Lec 10 Combinational CMOS Logic Circuits 1 Combinational vs. Sequential Logic In Combinational Logic circuit Out In Combinational Logic circuit Out State Combinational The output is determined only by

More information

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS B. Venkata Sreecharan 1, C. Venkata Sudhakar 2 1 M.TECH (VLSI DESIGN)

More information

Carry Look Ahead Adders

Carry Look Ahead Adders Carry Look Ahead Adders Lesson Objectives: The objectives of this lesson are to learn about: 1. Carry Look Ahead Adder circuit. 2. Binary Parallel Adder/Subtractor circuit. 3. BCD adder circuit. 4. Binary

More information

Binary addition (1-bit) P Q Y = P + Q Comments Carry = Carry = Carry = Carry = 1 P Q

Binary addition (1-bit) P Q Y = P + Q Comments Carry = Carry = Carry = Carry = 1 P Q Digital Arithmetic In Chapter 2, we have discussed number systems such as binary, hexadecimal, decimal, and octal. We have also discussed sign representation techniques, for example, sign-bit representation

More information

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples Overview rithmetic circuits Last lecture PLDs ROMs Tristates Design examples Today dders Ripple-carry Carry-lookahead Carry-select The conclusion of combinational logic!!! General-purpose building blocks

More information

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters April 15, 2010 John Wawrzynek 1 Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3 b 0 a 2 b 0 a 1 b

More information

Arithmetic Building Blocks

Arithmetic Building Blocks rithmetic uilding locks Datapath elements dder design Static adder Dynamic adder Multiplier design rray multipliers Shifters, Parity circuits ECE 261 Krish Chakrabarty 1 Generic Digital Processor Input-Output

More information

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 VLSI Design Adder Design [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 Major Components of a Computer Processor Devices Control Memory Input Datapath

More information

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks ECE 545 Digital System Design with VHDL Lecture Digital Logic Refresher Part A Combinational Logic Building Blocks Lecture Roadmap Combinational Logic Basic Logic Review Basic Gates De Morgan s Law Combinational

More information

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Hw 6 due Thursday, Nov 3, 5pm No lab this week EE141 Fall 2005 Lecture 18 dders nnouncements Hw 6 due Thursday, Nov 3, 5pm No lab this week Midterm 2 Review: Tue Nov 8, North Gate Hall, Room 105, 6:30-8:30pm Exam: Thu Nov 10, Morgan, Room 101, 6:30-8:00pm

More information

Implementation of Carry Look-Ahead in Domino Logic

Implementation of Carry Look-Ahead in Domino Logic Implementation of Carry Look-Ahead in Domino Logic G. Vijayakumar 1 M. Poorani Swasthika 2 S. Valarmathi 3 And A. Vidhyasekar 4 1, 2, 3 Master of Engineering (VLSI design) & 4 Asst.Prof/ Dept.of ECE Akshaya

More information

Dynamic Combinational Circuits. Dynamic Logic

Dynamic Combinational Circuits. Dynamic Logic Dynamic Combinational Circuits Dynamic circuits Charge sharing, charge redistribution Domino logic np-cmos (zipper CMOS) Krish Chakrabarty 1 Dynamic Logic Dynamic gates use a clocked pmos pullup Two modes:

More information

Additional Gates COE 202. Digital Logic Design. Dr. Muhamed Mudawar King Fahd University of Petroleum and Minerals

Additional Gates COE 202. Digital Logic Design. Dr. Muhamed Mudawar King Fahd University of Petroleum and Minerals Additional Gates COE 202 Digital Logic Design Dr. Muhamed Mudawar King Fahd University of Petroleum and Minerals Presentation Outline Additional Gates and Symbols Universality of NAND and NOR gates NAND-NAND

More information

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two.

Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two. Announcements Midterm Exam Two is scheduled on April 8 in class. On March 27 I will help you prepare Midterm Exam Two. Chapter 5 1 Chapter 3: Part 3 Arithmetic Functions Iterative combinational circuits

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Adders 2 Circuit Delay Transistors have instrinsic resistance and capacitance

More information

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018 ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018 ΔΙΑΛΕΞΗ 11: Dynamic CMOS Circuits ΧΑΡΗΣ ΘΕΟΧΑΡΙΔΗΣ (ttheocharides@ucy.ac.cy) (ack: Prof. Mary Jane Irwin and Vijay Narayanan) [Προσαρμογή από

More information

Arithmetic in Integer Rings and Prime Fields

Arithmetic in Integer Rings and Prime Fields Arithmetic in Integer Rings and Prime Fields A 3 B 3 A 2 B 2 A 1 B 1 A 0 B 0 FA C 3 FA C 2 FA C 1 FA C 0 C 4 S 3 S 2 S 1 S 0 http://koclab.org Çetin Kaya Koç Spring 2018 1 / 71 Contents Arithmetic in Integer

More information

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview 407 Computer Aided Design for Electronic Systems Simulation Instructor: Maria K. Michael Overview What is simulation? Design verification Modeling Levels Modeling circuits for simulation True-value simulation

More information

Problem Set 6 Solutions

Problem Set 6 Solutions CS/EE 260 Digital Computers: Organization and Logical Design Problem Set 6 Solutions Jon Turner Quiz on 2/21/02 1. The logic diagram at left below shows a 5 bit ripple-carry decrement circuit. Draw a logic

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Sciences Analysis and Design of Digital Integrated Circuits (6.374) - Fall 2003 Quiz #2 Prof. Anantha Chandrakasan

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proposal to Improve Data Format Conversions for a Hybrid Number System Processor LUCIAN JURCA, DANIEL-IOAN CURIAC, AUREL GONTEAN, FLORIN ALEXA Department of Applied Electronics, Department of Automation

More information

XOR - XNOR Gates. The graphic symbol and truth table of XOR gate is shown in the figure.

XOR - XNOR Gates. The graphic symbol and truth table of XOR gate is shown in the figure. XOR - XNOR Gates Lesson Objectives: In addition to AND, OR, NOT, NAND and NOR gates, exclusive-or (XOR) and exclusive-nor (XNOR) gates are also used in the design of digital circuits. These have special

More information

Latches. October 13, 2003 Latches 1

Latches. October 13, 2003 Latches 1 Latches The second part of CS231 focuses on sequential circuits, where we add memory to the hardware that we ve already seen. Our schedule will be very similar to before: We first show how primitive memory

More information

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK SUBJECT CODE: EC 1354 SUB.NAME : VLSI DESIGN YEAR / SEMESTER: III / VI UNIT I MOS TRANSISTOR THEORY AND

More information

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 9. Datapath Design Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 2, 2017 ECE Department, University of Texas at Austin

More information

The equivalence of twos-complement addition and the conversion of redundant-binary to twos-complement numbers

The equivalence of twos-complement addition and the conversion of redundant-binary to twos-complement numbers The equivalence of twos-complement addition and the conversion of redundant-binary to twos-complement numbers Gerard MBlair The Department of Electrical Engineering The University of Edinburgh The King

More information

DESIGN OF PARITY PRESERVING LOGIC BASED FAULT TOLERANT REVERSIBLE ARITHMETIC LOGIC UNIT

DESIGN OF PARITY PRESERVING LOGIC BASED FAULT TOLERANT REVERSIBLE ARITHMETIC LOGIC UNIT International Journal of VLSI design & Communication Systems (VLSICS) Vol.4, No.3, June 2013 DESIGN OF PARITY PRESERVING LOGIC BASED FAULT TOLERANT REVERSIBLE ARITHMETIC LOGIC UNIT Rakshith Saligram 1

More information

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Hakim Weatherspoon CS 3410 Computer Science Cornell University Hakim Weatherspoon CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. memory inst 32 register

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proceedings of the 11th WSEAS International Conference on COMPUTERS, Agios Nikolaos, Crete Island, Greece, July 6-8, 007 653 Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19

More information

Computer organization

Computer organization Computer organization Levels of abstraction Assembler Simulator Applications C C++ Java High-level language SOFTWARE add lw ori Assembly language Goal 0000 0001 0000 1001 0101 Machine instructions/data

More information

EECS150. Arithmetic Circuits

EECS150. Arithmetic Circuits EE5 ection 8 Arithmetic ircuits Fall 2 Arithmetic ircuits Excellent Examples of ombinational Logic Design Time vs. pace Trade-offs Doing things fast may require more logic and thus more space Example:

More information

Number representation

Number representation Number representation A number can be represented in binary in many ways. The most common number types to be represented are: Integers, positive integers one-complement, two-complement, sign-magnitude

More information

Fast Ripple-Carry Adders in Standard-Cell CMOS VLSI

Fast Ripple-Carry Adders in Standard-Cell CMOS VLSI 2011 20th IEEE Symposium on Computer Arithmetic Fast Ripple-Carry Adders in Standard-Cell CMOS VLSI Neil Burgess ARM Inc. Austin, TX, USA Abstract This paper presents a number of new high-radix ripple-carry

More information

Combinational Logic Design Arithmetic Functions and Circuits

Combinational Logic Design Arithmetic Functions and Circuits Combinational Logic Design Arithmetic Functions and Circuits Overview Binary Addition Half Adder Full Adder Ripple Carry Adder Carry Look-ahead Adder Binary Subtraction Binary Subtractor Binary Adder-Subtractor

More information

Advanced VLSI Design Prof. A. N. Chandorkar Department of Electrical Engineering Indian Institute of Technology- Bombay

Advanced VLSI Design Prof. A. N. Chandorkar Department of Electrical Engineering Indian Institute of Technology- Bombay Advanced VLSI Design Prof. A. N. Chandorkar Department of Electrical Engineering Indian Institute of Technology- Bombay Lecture - 04 Logical Effort-A Way of Designing Fast CMOS Circuits (Contd.) Last time,

More information

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002 CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING 18-322 DIGITAL INTEGRATED CIRCUITS FALL 2002 Final Examination, Monday Dec. 16, 2002 NAME: SECTION: Time: 180 minutes Closed

More information

Chapter 5 Arithmetic Circuits

Chapter 5 Arithmetic Circuits Chapter 5 Arithmetic Circuits SKEE2263 Digital Systems Mun im/ismahani/izam {munim@utm.my,e-izam@utm.my,ismahani@fke.utm.my} February 11, 2016 Table of Contents 1 Iterative Designs 2 Adders 3 High-Speed

More information

Part II Addition / Subtraction

Part II Addition / Subtraction Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations

More information

Digital Logic. Lecture 5 - Chapter 2. Outline. Other Logic Gates and their uses. Other Logic Operations. CS 2420 Husain Gholoom - lecturer Page 1

Digital Logic. Lecture 5 - Chapter 2. Outline. Other Logic Gates and their uses. Other Logic Operations. CS 2420 Husain Gholoom - lecturer Page 1 Lecture 5 - Chapter 2 Outline Other Logic Gates and their uses Other Logic Operations CS 2420 Husain Gholoom - lecturer Page 1 Digital logic gates CS 2420 Husain Gholoom - lecturer Page 2 Buffer A buffer

More information

COMP 103. Lecture 16. Dynamic Logic

COMP 103. Lecture 16. Dynamic Logic COMP 03 Lecture 6 Dynamic Logic Reading: 6.3, 6.4 [ll lecture notes are adapted from Mary Jane Irwin, Penn State, which were adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] COMP03

More information

School of Computer Science and Electrical Engineering 28/05/01. Digital Circuits. Lecture 14. ENG1030 Electrical Physics and Electronics

School of Computer Science and Electrical Engineering 28/05/01. Digital Circuits. Lecture 14. ENG1030 Electrical Physics and Electronics Digital Circuits 1 Why are we studying digital So that one day you can design something which is better than the... circuits? 2 Why are we studying digital or something better than the... circuits? 3 Why

More information

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers*

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers* Quantum Algorithm Processor For Finding Exact Divisors Professor J R Burger Summary Wiring diagrams are given for a quantum algorithm processor in CMOS to compute, in parallel, all divisors of an n-bit

More information

Week-I. Combinational Logic & Circuits

Week-I. Combinational Logic & Circuits Week-I Combinational Logic & Circuits Overview Binary logic operations and gates Switching algebra Algebraic Minimization Standard forms Karnaugh Map Minimization Other logic operators IC families and

More information

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm EE241 - Spring 2010 Advanced Digital Integrated Circuits Lecture 25: Digital Arithmetic Adders Announcements Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due

More information

Chapter 2 Combinational Logic Circuits

Chapter 2 Combinational Logic Circuits Logic and Computer Design Fundamentals Chapter 2 Combinational Logic Circuits Part 3 Additional Gates and Circuits Charles Kime & Thomas Kaminski 2008 Pearson Education, Inc. (Hyperlinks are active in

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 19: March 29, 2018 Memory Overview, Memory Core Cells Today! Charge Leakage/Charge Sharing " Domino Logic Design Considerations! Logic Comparisons!

More information

Low Power, High Speed Parallel Architecture For Cyclic Convolution Based On Fermat Number Transform (FNT)

Low Power, High Speed Parallel Architecture For Cyclic Convolution Based On Fermat Number Transform (FNT) RESEARCH ARTICLE OPEN ACCESS Low Power, High Speed Parallel Architecture For Cyclic Convolution Based On Fermat Number Transform (FNT) T.Jyothsna 1 M.Tech, M.Pradeep 2 M.Tech 1 E.C.E department, shri Vishnu

More information

Digital System Design Combinational Logic. Assoc. Prof. Pradondet Nilagupta

Digital System Design Combinational Logic. Assoc. Prof. Pradondet Nilagupta Digital System Design Combinational Logic Assoc. Prof. Pradondet Nilagupta pom@ku.ac.th Acknowledgement This lecture note is modified from Engin112: Digital Design by Prof. Maciej Ciesielski, Prof. Tilman

More information

Dynamic Combinational Circuits. Dynamic Logic

Dynamic Combinational Circuits. Dynamic Logic Dynamic Combinational Circuits Dynamic circuits Charge sharing, charge redistribution Domino logic np-cmos (zipper CMOS) Krish Chakrabarty 1 Dynamic Logic Dynamic gates use a clocked pmos pullup Two modes:

More information

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo Digital Integrated Circuits Designing Combinational Logic Circuits Fuyuzhuo Introduction Digital IC Dynamic Logic Introduction Digital IC EE141 2 Dynamic logic outline Dynamic logic principle Dynamic logic

More information

Computer Architecture. ESE 345 Computer Architecture. Design Process. CA: Design process

Computer Architecture. ESE 345 Computer Architecture. Design Process. CA: Design process Computer Architecture ESE 345 Computer Architecture Design Process 1 The Design Process "To Design Is To Represent" Design activity yields description/representation of an object -- Traditional craftsman

More information

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors

We are here. Assembly Language. Processors Arithmetic Logic Units. Finite State Machines. Circuits Gates. Transistors CSC258 Week 3 1 Logistics If you cannot login to MarkUs, email me your UTORID and name. Check lab marks on MarkUs, if it s recorded wrong, contact Larry within a week after the lab. Quiz 1 average: 86%

More information

Numbers and Arithmetic

Numbers and Arithmetic Numbers and Arithmetic See: P&H Chapter 2.4 2.6, 3.2, C.5 C.6 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register file alu

More information

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder M.S.Navya Deepthi M.Tech (VLSI), Department of ECE, BVC College of Engineering, Rajahmundry. Abstract: Quantum cellular automata (QCA) is

More information

Design of A Efficient Hybrid Adder Using Qca

Design of A Efficient Hybrid Adder Using Qca International Journal of Engineering Science Invention ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 PP30-34 Design of A Efficient Hybrid Adder Using Qca 1, Ravi chander, 2, PMurali Krishna 1, PG Scholar,

More information

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology Uppoju Shiva Jyothi M.Tech (ES & VLSI Design), Malla Reddy Engineering College For Women, Secunderabad. Abstract: Quantum cellular automata

More information

CS 140 Lecture 14 Standard Combinational Modules

CS 140 Lecture 14 Standard Combinational Modules CS 14 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris 1 Part III. Standard Modules A. Interconnect B. Operators. Adders Multiplier

More information

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits Logic and Computer Design Fundamentals Chapter 5 Arithmetic Functions and Circuits Arithmetic functions Operate on binary vectors Use the same subfunction in each bit position Can design functional block

More information

Looking at a two binary digit sum shows what we need to extend addition to multiple binary digits.

Looking at a two binary digit sum shows what we need to extend addition to multiple binary digits. A Full Adder The half-adder is extremely useful until you want to add more that one binary digit quantities. The slow way to develop a two binary digit adders would be to make a truth table and reduce

More information

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective Digital Integrated Circuits Design Perspective Jan M. Rabaey nantha Chandrakasan orivoje Nikolić Designing Combinational Logic Circuits November 2002. 1 Combinational vs. Sequential Logic In Combinational

More information

Numbers and Arithmetic

Numbers and Arithmetic Numbers and Arithmetic See: P&H Chapter 2.4 2.6, 3.2, C.5 C.6 Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University Big Picture: Building a Processor memory inst register file alu

More information

The Design Procedure. Output Equation Determination - Derive output equations from the state table

The Design Procedure. Output Equation Determination - Derive output equations from the state table The Design Procedure Specification Formulation - Obtain a state diagram or state table State Assignment - Assign binary codes to the states Flip-Flop Input Equation Determination - Select flipflop types

More information

Every time has a value associated with it, not just some times. A variable can take on any value within a range

Every time has a value associated with it, not just some times. A variable can take on any value within a range Digital Logic Circuits Binary Logic and Gates Logic Simulation Boolean Algebra NAND/NOR and XOR gates Decoder fundamentals Half Adder, Full Adder, Ripple Carry Adder Analog vs Digital Analog Continuous»

More information

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

! Charge Leakage/Charge Sharing.  Domino Logic Design Considerations. ! Logic Comparisons. ! Memory.  Classification.  ROM Memories. ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 9: March 9, 8 Memory Overview, Memory Core Cells Today! Charge Leakage/ " Domino Logic Design Considerations! Logic Comparisons! Memory " Classification

More information

Lecture 22 Chapters 3 Logic Circuits Part 1

Lecture 22 Chapters 3 Logic Circuits Part 1 Lecture 22 Chapters 3 Logic Circuits Part 1 LC-3 Data Path Revisited How are the components Seen here implemented? 5-2 Computing Layers Problems Algorithms Language Instruction Set Architecture Microarchitecture

More information

CMPE12 - Notes chapter 2. Digital Logic. (Textbook Chapters and 2.1)"

CMPE12 - Notes chapter 2. Digital Logic. (Textbook Chapters and 2.1) CMPE12 - Notes chapter 2 Digital Logic (Textbook Chapters 3.1-3.5 and 2.1)" Truth table" The most basic representation of a logic function." Brute force representation listing the output for all possible

More information

EEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation

EEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation EEC 116 Lecture #5: CMOS Logic Rajeevan mirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation nnouncements Quiz 1 today! Lab 2 reports due this week Lab 3 this week HW

More information

Combinational Logic. Mantıksal Tasarım BBM231. section instructor: Ufuk Çelikcan

Combinational Logic. Mantıksal Tasarım BBM231. section instructor: Ufuk Çelikcan Combinational Logic Mantıksal Tasarım BBM23 section instructor: Ufuk Çelikcan Classification. Combinational no memory outputs depends on only the present inputs expressed by Boolean functions 2. Sequential

More information

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Sciences EECS5 J. Wawrzynek Spring 22 2/22/2. [2 pts] Short Answers. Midterm Exam I a) [2 pts]

More information

CHAPTER 15 CMOS DIGITAL LOGIC CIRCUITS

CHAPTER 15 CMOS DIGITAL LOGIC CIRCUITS CHAPTER 5 CMOS DIGITAL LOGIC CIRCUITS Chapter Outline 5. CMOS Logic Gate Circuits 5. Digital Logic Inverters 5.3 The CMOS Inverter 5.4 Dynamic Operation of the CMOS Inverter 5.5 Transistor Sizing 5.6 Power

More information

Class Website:

Class Website: ECE 20B, Winter 2003 Introduction to Electrical Engineering, II LECTURE NOTES #5 Instructor: Andrew B. Kahng (lecture) Email: abk@ece.ucsd.edu Telephone: 858-822-4884 office, 858-353-0550 cell Office:

More information

Implementation of Boolean Logic by Digital Circuits

Implementation of Boolean Logic by Digital Circuits Implementation of Boolean Logic by Digital Circuits We now consider the use of electronic circuits to implement Boolean functions and arithmetic functions that can be derived from these Boolean functions.

More information

What s the Deal? MULTIPLICATION. Time to multiply

What s the Deal? MULTIPLICATION. Time to multiply What s the Deal? MULTIPLICATION Time to multiply Multiplying two numbers requires a multiply Luckily, in binary that s just an AND gate! 0*0=0, 0*1=0, 1*0=0, 1*1=1 Generate a bunch of partial products

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH

More information

Unit II Chapter 4:- Digital Logic Contents 4.1 Introduction... 4

Unit II Chapter 4:- Digital Logic Contents 4.1 Introduction... 4 Unit II Chapter 4:- Digital Logic Contents 4.1 Introduction... 4 4.1.1 Signal... 4 4.1.2 Comparison of Analog and Digital Signal... 7 4.2 Number Systems... 7 4.2.1 Decimal Number System... 7 4.2.2 Binary

More information

BOOLEAN ALGEBRA INTRODUCTION SUBSETS

BOOLEAN ALGEBRA INTRODUCTION SUBSETS BOOLEAN ALGEBRA M. Ragheb 1/294/2018 INTRODUCTION Modern algebra is centered around the concept of an algebraic system: A, consisting of a set of elements: ai, i=1, 2,, which are combined by a set of operations

More information