Forward and Reverse Converters and Moduli Set Selection in Signed-Digit Residue Number Systems

Size: px
Start display at page:

Download "Forward and Reverse Converters and Moduli Set Selection in Signed-Digit Residue Number Systems"

Transcription

1 J Sign Process Syst DOI /s Forward and Reverse Converters and Moduli Set Selection in Signed-Digit Residue Number Systems Andreas Persson Lars Bengtsson Received: 8 March 2007 / Revised: 21 May 2008 / Accepted: 12 June Springer Science + Business Media, LLC. Manufactured in The United States Abstract This paper presents an investigation into using a combination of two alternative digital number representations; the residue number system (RNS) and the signed-digit (SD) number representation in digital arithmetic circuits. The combined number system is called RNS/SD for short. Since the performance of RNS/SD arithmetic circuits depends on the choice of the moduli set (a set of pairwise prime numbers), the purpose of this work is to compare RNS/SD number systems based on different sets. Five specific moduli sets of different lengths are selected. Moduli-setspecific forward and reverse RNS/SD converters are introduced for each of these sets. A generic conversion technique for moduli sets consisting of any number of elements is also presented. Finite impulse response (FIR) filters are used as reference designs in order to evaluate the performance of RNS/SD processing. The designs are evaluated with respect to delay and circuit area in a commercial 0.13 μm CMOS process. For the case of FIR filters it is shown that generic moduli sets with five or six moduli results in designs with the best area delay products. A. Persson Centre for Research on Embedded Systems (CERES), Halmstad University, Sweden andreas.persson@hh.se L. Bengtsson (B) Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden labe@chalmers.se Keywords Residue number system Signed-digit Moduli-selection Converters FIR filters 1 Introduction This paper presents an investigation into a combination of two number representations; the residue number system (RNS) and the signed-digit (SD) number system. In RNS, an integer is decomposed into a set of residues with shorter binary representations, which can be processed in parallel. Carry propagation within RNS arithmetic circuits can be eliminated by using the SD number system to represent the residues. The SD number system provides a redundant number representation that facilitates carry-free addition. The use of SD numbers also implies efficient modulo arithmetic, which helps to simplify crucial RNS operations. The basis of a residue number system is a set of pairwise prime integers, called the moduli set. The performance of RNS processing depends on the choice of this set and on the implementation of forward and reverse RNS conversion. Many moduli sets and conversion techniques have been suggested for RNS systems with residues on 2 s complement form. This work investigates moduli sets and converters for use with signeddigit residue number systems (RNS/SD). The aim is to investigate how the choice of the moduli set affects the performance of RNS/SD arithmetic operations. A number of moduli sets were selected for evaluation and forward and reverse RNS/ SD converters were implemented for each of these sets. In order to compare the performance of RNS/SD processing, RNS/SD finite impulse response (FIR) filters were implemented using Synopsys Design Compiler and a

2 A. Persson, L. Bengtsson 0.13 μm CMOS cell library from UMC. The synthesized designs are compared with respect to delay (speed) and circuit area. The paper is organized as follows. Section 2 gives an introduction to the signed-digit residue number system and outlines the principles of forward and reverse RNS/SD conversion. The moduli sets selected for evaluation in this work are presented in Section 3 together with guidelines for selecting efficient moduli sets. Section 4 presents a technique for RNS/SD encoding and Section 5 presents moduli-set-specific decoding techniques for each of the of the sets introduced in Section 3. A reverse conversion technique for RNS/SD number systems using general moduli sets is presented in Section 6. The technique detailed in Section 6 is applicable to all coprime sets with one element of the form 2 n. The RNS/SD finite impulse response (FIR) filters which have been used as reference designs are described in Section 7. ASIC synthesis results are presented in Section 8 where performance evaluations and comparisons are made with respect to delay and circuit area. Section 9 gives the conclusions. 2 Background 2.1 Residue Number System The residue number system (RNS) is an integer system capable of supporting high speed concurrent arithmetic. In RNS, an integer is decomposed into a set of smaller integers (i.e. with shorter binary representations), which can be processed independently and in parallel. The basis of an RNS is a set of pairwise prime integers S ={m 1, m 2,...,m L },wheregcd(m i, m j ) = 1 for i = j. ThesetS is called the moduli set and the dynamic range of the number system is [0, M), where M is the product of all moduli m i in S. Any integer X within the dynamic range has a unique RNS representation given by an ordered set of residues X {x 1, x 2,...,x L }, x i = X mi. where X mi denotes X mod m i. The most important characteristic of the RNS representation is that it is a non-weighted number system, which facilitates parallel computing. If integers A and B have RNS representations {a 1, a 2,...,a L } and {b 1, b 2,...,b L } respectively, then the RNS representation of C = A B is C {c 1, c 2,...,c L }, c i = a i b i mi. where denotes addition, subtraction, multiplication or any combination of the three. The computation of c i depends upon a i, b i and m i only. Hence, each c i can be computed using a separate arithmetic unit, often called a channel. The reconstruction X from {x 1, x 2,...,x L } is based on the Chinese Remainder Theorem (CRT) X = where L i=1 ˆM i ˆM 1 i x i m i M M = L i=1 m i, ˆM i = M m i and, (1) ˆ M 1 i m i is the multiplicative inverse of ˆM i modulo m i, such that ˆM i ˆM 1 i 1. m i 2.2 Signed-Digit Number System The radix-2 signed-digit (SD) number system has the digit-set {1, 0, 1}, where 1 denotes 1. AnN-digit SD number Y =[y N 1...y 0 ] SD, y i {1, 0, 1}, has the value N 1 Y = 2 i y i (2) i=0 which is the same as for an unsigned binary number except that y i can be 1. This yields a redundant number representation. For example, 6 can be represented as [0110] SD, [ ] SD or [ ] SD.Zero, however, has a unique representation. To represent an SD digit y, two bits, y and y + are required. That is, y =[y y + ]. Using this digit encoding, the value of an N-digit SD number Y=[y n 1...y 0 ] SD is given by Eq. 3. N 1 N 1 Y = 2 i y + i 2 i y i (3) i=0 i=0 Note that, unlike the 2 s complement representation, it is possible to represent any integer and its negation with an equal number of digits. The negation of an integer is a very simple operation in the SD number system. The negation of y =[y y + ] is ( y) =[y + y ],ascanbe seen by negating Eq. 3. No logic gates are required for this operation. By exploiting the redundancy of the Signed Digit number representation, carry propagation is limited to one bit position when adding SD numbers. Addition of two numbers X and Y is performed according to the set of rules presented in Table 1, inwhichc i denotes the (non-propagating) carry and u i the interim sum. These rules avoid any carry propagation when the final sum s i is computed according to Eq. 4. Consequently, addition

3 Forward and reverse converters and moduli set selection... Table 1 Rules for adding SD numbers. Rules x i y i x i 1 y i 1 neither is 1 at least one is 1 neither is 1 at least one is 1 c i u i Q2 is performed in constant time, regardless of operand widths. s i = u i + c i, s i {1, 0, 1} (4) 2.3 Combining RNS and SD The use of the SD number system has been suggested as a way to eliminate carry propagation within RNS arithmetic circuits. The carry-free properties of the SD number system provides constant time addition operations. The parallel processing capabilities of the residue number system results in faster and more area-efficient multiplication operations. The use of signed digit representation also implies efficient modulo arithmetic, which helps to simplify crucial RNS operations. An important consideration when designing RNS systems is the choice of the moduli set. Sets with elements of the forms 2 n, 2 n 1 and 2 n + 1 are of special interest. Such low-cost moduli facilitates the use of simplified arithmetic units. The properties of the SD number system helps to further simplify modulo arithmetic for low-cost moduli. Addition modulo 2 n 1 and 2 n + 1 is performed using SD adders with end-aroundcarry logic [1]. Due to the limited carry-propagation in SD adders, there is no delay penalty for the endaround-carry operations. Furthermore, unlike in the 2 s complement number system, the result of modulo 2 n + 1 addition is represented using n SD-digits, since representations for sums greater than 2 n 1 are taken from the negative range. Modulo multiplication by powers of two with respect to the low-cost moduli relies on simple shift operations, according to the rules in Eq. 5,wherex =[x n 1...x 0 ] SD is an n-digit SD Number. These operations are accomplished by wiring connections appropriately. 2 a x 2 n 1 =[x n a 1...x 0 x n 1...x n a ] SD 2 a x 2 n =[x n a 1...x ] SD 2 a x 2 n +1 =[x n a 1...x 0 x n 1... x n a ] SD (5) 2.4 Previous Work The RNS and SD number systems are well known and have been thoroughly studied in the literature, for example in [2 6]. The possibility to combine RNS and SD arithmetic has also, in a lesser extent, been studied, most notably by Wei and Shimizu [7 9]. In [1], Lindström et al. present efficient forward and reverse converters for the combined RNS/ SD number system using the popular moduli set {2 n 1, 2 n, 2 n + 1}. In [10], Lindahl and Bengtsson present direct form finite impulse response (FIR) filter implementations using RNS/SD. Their work shows that the use of the RNS/SD number representation can reduce circuit area and power dissipation while the clock period is retained. 3 Moduli Selection in RNS/SD One of the most important considerations when designing RNS Systems is the choice of the moduli set. The choice of moduli affects the complexity of forward and reverse converters as well as RNS arithmetic circuits. In [11], Abdallah and Skavantzos state that the moduli set, S = m 1,...,m L, should be chosen such that the moduli m i s satisfy the following criteria: 1. They should be pairwise prime. That is, gcd(m i, m j ) = 1 for all m i = m j. 2. Each moduli m i should be as small as possible so that operations modulo m i require minimum computational time. 3. The moduli m i s should imply simple binary to RNS and RNS to binary conversions as well as simple RNS arithmetic. 4. The moduli product should be large enough to implement the desired dynamic range. 5. The moduli should provide a well balanced decomposition of the dynamic range. This means that the difference in word length between the moduli should be as small as possible. Sets with all elements being of the forms 2 n, 2 n 1 and 2 n + 1 satisfy the requirement of simple conversions and efficient modulo arithmetic. Since the SD number system is used to represent residues, addition and subtraction are performed in constant time, regardless of operand widths. Consequently, criteria 2 and 5 are less important for adder-based RNS/SD applications. However, for multiplication-intensive applica-

4 A. Persson, L. Bengtsson tions, moduli sets with small and balanced moduli results in faster and more area-efficient implementations. 3.1 Parameterized Moduli Sets Several types of moduli sets have been considered by RNS researchers. A large number of different parameterized moduli sets have been suggested in the literature. The parameterized sets consist of a small number of low-cost moduli on a fix form, where each moduli is expressed as a function of a parameter, say n. The dynamic range of such sets can easily be scaled by adjusting n. The use of parameterized moduli implies efficient RNS conversions, since it is possible to take advantage of moduli-set-specific properties, such as attractive close form expressions for the moduli product and for the multiplicative inverses required for reverse conversion. However, as the number of moduli is increased, such attractive properties are rare and come at the cost of balance in residue word lengths. Five parameterized moduli sets, S 1,...,S 5, have been selected for evaluation in this work. S 1 ={2 n 1, 2 n + 1} S 2 ={2 n 1, 2 n, 2 n + 1} S 3 ={2 n 1, 2 n, 2 n + 1, 2 2n + 1} S 4 ={2 n 1, 2 n 1 1, 2 n 1, 2 n + 1} S 5 ={2 n, 2 n 1, 2 n 1 1, 2 n 1 + 1} Moduli-set-specific forward and reverse converters have been implemented for RNS/SD number system based on each of the five sets. The forward conversion technique is outlined in Section 4. Reverse converters for the parameterized moduli sets are presented in Section General Moduli Sets If large dynamic ranges are required, a general moduli set consisting of a larger number of moduli might result in better performance. If low-cost moduli are used, RNS/SD forward conversions for such sets are as efficient as for the parameterized moduli sets. Reverse conversion, on the other hand, is a significantly more difficult task. A generic RNS/SD decoder must handle the adverse properties of the Chinese Remainder Theorem, that is, modulo M operations for a large-valued M and multiplications by constant factors which do not necessarily have attractive forms. A decoder for general moduli sets has been designed and the implementation is outlined in Section 6. 4 RNS/SD Encoders For each moduli set presented in Section 3, an RNS/SD encoder has been developed. The encoder for a moduli set S ={m 1, m 2,...,m L } converts an integer in binary form into L SD residues. Modulo reduction for lowcost moduli is straightforward when using the SD number System. To construct a residue x i from an integer X, X is partitioned into vectors of the same length as the corresponding moduli m i. The last vector is padded with constant zeros if necessary. x i = X mi = k i,0 + 2 n i k i, n i k i, ln i k i,l m i, (6) where k i,0 = [ x ni 1x ni 2...x 0 ], k i,1 = [ x 2ni 1x ni 2...x ni ],... k i,l = [ 0...0x W 1...x lni ]. n i is the word length of moduli m i and W is the word length of X.Sincem i is either 2 n i, 2 n i 1 or 2 n i + 1,the rules of Eq. 5 apply for multiplication by powers of two and Eq. 6 simplifies to ki,0 + k i,1 + k i,2 + k i, if m 2 ni 1 i =2 n i 1, x i = k i,0 if m i =2 n i, ki,0 k i,1 + k i,2 k i, if m 2 ni +1 i =2 n i + 1. The RNS/SD encoders consist of a number of multioperand SD modulo adders, one for each m i = 2 n i. Since the input X is in binary form. It is possible to reduce the complexity of the encoder by using simplified SD adder cells on the first levels of each adder tree. Figure 1 shows the encoder for the RNS/SD number system with moduli set {128, 129, 127, 65, 17}. 5 RNS/SD Decoders for Parameterized Moduli Sets 5.1 RNS/SD Decoder for Moduli Set S 1 The proposed architecture for decoding SD residues with respect to the S 1 ={m 1, m 2 }={2 n 1, 2 n + 1} moduli set is based on the Chinese Remainder Theorem, as presented in Section 2.1. For an RNS with two moduli, the CRT procedure in Eq. 1 is reduced to 1 X = ˆM 1 1 ˆM 1 x 1 + ˆM 2 ˆM 2 x 2. (7) m 1 m 2 M

5 Forward and reverse converters and moduli set selection... Figure 1 RNS/SD Encoder for moduli set {128, 129, 127, 65, 17}. For the particular set S 1, we have M = 2 2n 1, ˆM 1 = M m 1 = 2 n + 1, ˆM 2 = M m 2 = 2 n 1. It is easy to see that the two multiplicative inverses needed for computation of Eq. 7 are both powers of two. 1 Claim: ˆM 1 = 2 n 1 m1 1 Proof: ˆM 1 ˆM 1 = 2 n 1 (2 n + 1) m 2 n 1 1 = 2 n 1 2 n n 1 Claim: Proof: 1 ˆM 2 m2 1 ˆM 2 ˆM 2 = 2 n n 1 = 2 n 2 n 1 = 1 = 2 n 1 = 2 n 1 (2 n 1) m 2 n +1 2 = 2 n 1 2 n 1 2 n +1 = 2 n 1 ( 2) 2 n +1 2 n 1 2 n +1 = 2 n 2 n +1 = 1 1 Inserting the derived expressions for ˆM 1, ˆM 2, ˆM 1, m1 1 ˆM 2 and M into Eq. 7 yields m2 X = (2 n + 1)2 n 1 x 1 + (2 n 1)2 n 1 x 2 2 2n 1 = ( 2 2n n 1) x 1 + ( 2 2n 1 2 n 1) x 2 2 2n 1 = Ax 1 + Bx 2 2 2n 1 (8) Using the rules for multiplication by powers of two from Eq. 5, together with the fact that x 1 and x 2 both have digit-length n in the SD number system, of Eq. 8 can be computed as the sum of two 2n-digit SD vectors, formed by concatenation, rotation and negation. Ax 1 = 2 2n 1 x n 1 [ ] x 1 = x10 x 1n 1...x 10 x 1n 1...x 11 Bx 2 = 2 2n 1 x 2 2 n 1 [ ] x 2 = x10 x 1n 1... x 10 x 1n 1...x 11 No logic gates are required to form Ax 1 and Bx 2. One modulo 2 2n 1 SD adder is sufficient to generate X. The result will be in the range ( M, M), due to the fact that SD modulo adders use the negative range as well as the positive. If the output is required to be in the range [0, M), the correct result is obtained by adding M =[1 2n ] to X, whenx is negative. Adding constant ones to an SD integer is a simple operation, as shown in [1]. Carry-look-ahead (CLA) adders are used to obtain the binary representation of X, accordingtoeq.3. In order to minimize the extra delay introduced by this range correction, both X and X + M are decoded to binary form, using two CLAs operating in parallel. The correct value is selected by examining the carry-out bit of the adder for X. The hardware architecture of the decoder is depicted in Fig RNS/SD Decoder for Moduli Set S 2 The set S 2 ={2 n 1, 2 n, 2 n + 1} is probably the most widely used moduli set for RNS. It is also the moduli set that has been most intensively studied in the literature. The decoding of binary-residue number systems based on set S 2 is studied, for example, in [12 14]. An efficient decoder for RNS systems with SD residues is presented in [1]. This is also the decoder that has been used in this work, with some minor modifications. The conversion technique is outlined again in this section since the new decoder for moduli set S 3, presented in Section 5.3, is based on a similar approach. The decoder is based upon a modified formulation of the Chinese Remainder Theorem, the New CRT-I,

6 A. Persson, L. Bengtsson Claim: Proof: k 2 = 2 n 1 k 2 m 1 m 2 m3 = 2 n 1 2 n (2 n + 1) 2 n 1 = 2 n 1 2 n 2 n 1 2 n n 1 = 2 n n 1 = 2 n 2 n 1 = 1 2 n 1 Using the expressions for m 1, m 2 and m 3, together with the derived expressions for k 1 and k 2, Eq. 9 simplifies to X = x n X, X = 2 n (x 2 x 1 ) + 2 n 1 ( 2 n + 1 ) (x 3 x 2 ) 2 2n 1. (10) Figure 2 RNS/SD decoder for moduli set S 1. By expanding the terms of Eq. 10 and grouping the coefficients of x 1, x 2 and x 3, the expression for X can be rewritten as the sum of three terms X = Ax 1 + Bx 2 + Cx 3 2 2n 1, (11) as presented in [15]. According to the New CRT-I, the binary representation X of a residue number {x 1, x 2,...,x L } can be computed as X = x 1 + m 1 X, k1 (x 2 x 1 ) + k 2 m 2 (x 3 x 2 ) +... X = +k L 1 m 2 m 3...m L 1 (x L x L 1 ) m 2 m 3...m L, where {m 1, m 2,...,m L } is the moduli set and k 1, k 1,...,k L 1 are multiplicative inverses, given by k 1 m 1 m2 m 3...m L 1 k 2 m 1 m 2 m3 m 4...m L 1... k L 1 m 1 m 2...m L 1 ml 1 (9) If the elements of S 2 are rearranged, such that m 1 =2 n, m 2 =2 n +1 and m 3 =2 n 1, then the two multiplicative inverses k 1 and k 2 are both powers of two. Claim: k 1 = 2 n Proof: k 1 m 1 m2 m 3 = 2 n 2 n 2 2n 1 = 2 2n 2 2n 1 = 1 where A = 2 n, B = 2 2n n 1, C = 2 2n n 1. No logic gates are required to form SD representations of Ax 1, Bx 2 and Cx 3. Using the rules for multiplication by powers of two from Eq. 5, we have Ax 1 = [ x 1n 1... x 10 0 n ]SD, Bx 2 = [ x 20 x 2n 1...x 20 x 2n 1... x 21 ]SD, Cx 3 = [ x 30 x 3n 1...x 30 x 3n 1...x 31 ]SD. The result X is the concatenation of x 1 and X.TwoSD modulo adders are required to generate X according to Eq. 11. To make sure that the result is in the positive range, M =[1 2n n ] is added to X, when X is negative. Note that this has no effect on the lower part of X. Figure 3 shows the hardware architecture of the RNS/SD reverse converter for moduli set S RNS/SD Decoder for Moduli Set S 3 The four-moduli set S 3 ={2 n 1, 2 n, 2 n + 1, 2 2n + 1} is an extension of the popular S 2 moduli set, and has been suggested as a way to increase the dynamic range of the RNS. The resulting RNS decoder is as efficient as the S 2 decoder, while the dynamic range is increased

7 Forward and reverse converters and moduli set selection... Claim: k 2 = 2 n 1 Proof: k 2 m 1 m 2 m3 m 4 = 2 n 1 2 n (2 2n + 1) (2 n 1)(2 n +1) = 2 n 1 2 n 2 2n n 1 = 2 n 1 2 n 2 2 2n 1 = 2 2n 2 2n 1 = 1 2 2n 1 Claim: Proof: k 3 = 2 n 2 k 3 m 1 m 2 m 3 m4 = 2 n 2 2 n (2 2n +1) (2 n +1) 2 n 1 = 2 n 2 2 n 2 n 1 2 2n n 1 2 n n 1 2 n 1 = 2 n n 1 = 2 n 2 n 1 = 1 Figure 3 RNS/SD decoder for moduli set S 2. Inserting the expressions for m 1...m 4 and k 1...k 3 into Eq. 12 yields from 3n 1 bits to 5n 1 bits. An adder-based binaryresidue decoder for the S 3 set is presented in [16]. By applying a new moduli reordering scheme and by exploiting the properties of the SD number system, the number of terms which need to be added has been reduced from six for the decoder from [16] to four for the decoder proposed in this section. For an RNS with four moduli, the New CRT-I procedure from Eq. 9 is reduced to X = x 1 + m 1 X, k1 (x 2 x 1 ) + k 2 m 2 (x 3 x 2 ) + X = +k 3 m 2 m 3 (x 4 x 3 ) m 2 m 3 m 4. (12) The elements of S 3 are rearranged, such that m 1 = 2 n, m 2 = 2 2n + 1, m 3 = 2 n + 1 and m 4 = 2 n 1. Using this ordering, k 1, k 2 and k 3 in Eq. 12 are powers of two. Claim: k 1 = 2 3n Proof: k 1 m 1 m2 m 3 m 4 = 2 3n 2 n 2 4n 1 = 2 4n 2 4n 1 = 1 X = x n X, 2 3n (x 2 x 1 ) + 2 2n 1 (2 2n + 1) (x 3 x 2 ) + X = +2 n 2 (2 2n + 1)(2 n + 1) (x 4 x 3 ) 2 4n 1. (13) By expanding all terms in the expression for X of Eq. 13 and grouping the coefficients of each residue x 1,...,x 4, we find that X canberewrittenas X = Ax 1 + Bx 2 + Cx 3 + Dx 4 2 4n 1, (14) where A = 2 3n, B = 2 3n 1 2 n 1, C = 2 4n n 2 2 2n n 2, D = 2 4n n n n 2. Studying A, B, C and D, we find that the distance between two consecutive non-zero digits in the SD representation of each term, is equal to the word length of the corresponding residue (n bits for A, C, D and 2n bits for B). Consequently, no logic gates are required to form Ax 1, Bx 2, Cx 3 and Dx 4. Again, we use the rules

8 A. Persson, L. Bengtsson for multiplication by powers of two from Eq. 5 to form terms using concatenation, rotation and negation. Ax 1 = [ ] x 1n 1... x n Bx 2 = [ ] x 2n...x 20 x 22n 1... x 20 x 22n 1...x 2n+1 Cx 3 = [ ] x 31 x 30 x 3n 1...x 30 x 3n 1... x 30 x 3n 1...x 30 x 3n 1... x 32 Dx 4 = [ ] x 41 x 40 x 4n 1...x 40 x 4n 1...x 40 x 4n 1...x 40 x 4n 1...x 42 Three modulo 2 4n 1 SD adders are required to generate X. The result X is the concatenation of X and x 1. As for the decoder for moduli set S 2, range correction is carried out by adding M = [ 1 4n n ] to X, whenx is negative. Figure 4 depicts the hardware architecture of the RNS/SD decoder. 5.4 RNS/SD Decoder for Moduli Sets S 4 and S 5 The set S 4 ={2 n 1, 2 n 1 1, 2 n 1, 2 n + 1} is a balanced moduli set, well suited for large dynamic ranges. However, the elements of S 4 are pairwise prime for even values of n only. This might be a disadvantage when tailoring the set for a given dynamic range. To overcome this problem, the set S 5 ={2 n, 2 n 1, 2 n 1 1, 2 n 1 + 1} will be used as a complement to S 4 for odd values of n. S 5 has a similar form compared to S 4, only the exponents differ. The elements of S 5 are pairwise prime for odd values of n only. The two sets can be expressed on a common form as {m 1, m 2, m 3, m 4 }={2 a, 2 a 1, 2 b 1, 2 b + 1} where a = n 1, b = n for S 4 and a = n, b = n 1 for S 5. The proposed decoders for RNS/SD number systems using these moduli are SD implementations of a two-level approach to RNS decoding, detailed in [17]. On the first level, the moduli set is decomposed into two subsets, {m 1, m 2 } and {m 3, m 4 }. The corresponding residue subsets ({x 1, x 2 } for {m 1, m 2 } and {x 1, x 2 } for {m 3, m 4 }) are decoded using two reverse converters operating in parallel. The second level is a decoder for an RNS with moduli set {m 1 m 2, m 3 m 4 } where the residues X 1 m1 m 2 and X 2 m3 m 4 are the results from the first conversion step. The first-level converter for moduli subset {m 1, m 2 }={2 a, 2 a 1} is a variant of the New CRT-I decoders presented in Sections 5.2 and 5.3. The required multiplicative inverse has the value 1. The proof of this is trivial, since 2 a 2 a 1 = 1. Inserting the expressions for m 1 and m 2 into Eq. 9 yields X 1 = x a X 1, X 1 = x 2 x 1 2 a 1. (15) The other decoder on the first level is the CRT decoder from Section 5.1,where X 2 = ( 2 2b b 1) x 3 + ( 2 2b 1 2 b 1) x 4 2 2b 1. (16) One modulo 2 a 1 SD adder is needed to compute X 1 in Eq. 15. X 1 is the concatenation of X 1 and x 1.The computation of X 2 according to Eq. 16 requires one modulo 2 2b 1 SD adder. On the second level, two residues are decoded with respect to the moduli set {2 a (2 a 1), 2 2b 1}. Equation 9 is reduced to X = X a ( 2 a 1 ), X = k (X 2 X 1 ) 2 2b 1. (17) Figure 4 RNS/SD decoder for moduli set S 3. Equation 17 differs from the applications of the New CRT-I seen so far. The multiplicative inverse k does not have a closed form expression. A modulo 2 2b 1 adder/scaler is required to compute X = k (X 2 X 1 ) 2 2b 1 for a precalculated value of k. The final result X is computed as the regular (not modulo)

9 Forward and reverse converters and moduli set selection... sum of two terms, [ X X 1 ] and 2 a X,where [ X X 1 ] is the concatenation of X and X 1. Range correction is carried out by adding M = m 1 m 2 m 3 m 4 to negative values of X using simplified SD adder cells before X is converted to binary form. The complete decoder is depicted in Fig RNS/SD Decoders for General Moduli Sets A reverse conversion technique for RNS/SD number systems using general moduli sets is presented. The only constraint given for the moduli set is that one of the elements, say m 1, should be a power of two. The conversion technique presented here is inspired by the work of Wang et al. [18]. Although the general sets studied in this work consist of low-cost moduli exclusively, the technique is applicable to all coprime moduli sets with one element of the form 2 n. Figure 6 RNS/SD decoder for general moduli sets. In [18], Wang et al. propose a new formulation of the Chinese Remainder Theorem. For an RNS with moduli set {m 1,...,m L } and residues {x 1,...,x L }, the value of X is X = x 1 + m 1 X m 2 m 3...m L, X = where k 1 = L k i x i, (18) i=1 ˆM 1 ˆM m 1, m 1 ˆM i ˆM 1 i m k i = i, for i = 2, 3,...,L. m i ˆM i and ˆM 1 i are from the original formulation of the m i CRT in Eq. 1,thatis L M = m i, ˆM i = M, m i i=1 ˆM i ˆM 1 i 1. m i MSD(k): if (k = 0): return {} else: find e, such that 2 e k < 2 e +1 if (3k < 2 e +2 ): return {2 e +1, MSD(2 e +1 k)} else: return {2 e, MSD(k 2 e )} Figure 5 RNS/SD decoder for moduli sets S 4 and S 5. Figure 7 Algorithm for finding a minimal signed-digit representation of an integer k.

10 A. Persson, L. Bengtsson x 1 (3) x 1 (2) x 1 (1) x 1 (0) x 1 (3) x 1 (2) x 1 (1) x 1 (0) x 1 (3) x 1 (2) x 1 (1) x 1 (0) x 2 (3) x 2 (2) x 2 (1) x 2 (0) x 2 (3) x 2 (2) x 2 (1) x 2 (0) x 2 (3) x 2 (2) x 2 (1) x 2 (0) x 2 (3) x 2 (2) x 2 (1) x 2 (0) x 2 (3) x 2 (2) x 2 (1) x 2 (0) x 2 (3) x 2 (2) x 2 (1) x 2 (0) x 3 (2) x 3 (1) x 3 (0) x 3 (2) x 3 (1) x 3 (0) x 3 (2) x 3 (1) x 3 (0) x 3 (2) x 3 (1) x 3 (0) x 3 (2) x 3 (1) x 3 (0) x 4 (2) x 4 (1) x 4 (0) x 4 (2) x 4 (1) x 4 (0) x 4 (2) x 4 (1) x 4 (0) x 4 (2) x 4 (1) x 4 (0) x 5 (1) x 5 (0) x 5 (1) x 5 (0) x 5 (1) x 5 (0) x 5 (1) x 5 (0) Figure 8 Partial product array. The proposed converter has two parts, a multiplication-accumulation (MA) array and a modulo reduction unit. The MA array is used to generate X. The factors k 1, k 2,...,k L are constants and are calculated a priori. The modulo operation of Eq. 18 and the final range correction is carried out by the modulo reduction unit. As described in earlier chapters, X of Eq. 18 can be formed using concatenation if the moduli m 1 is chosen to be a power of two. The hardware architecture of the general converter is depicted in Fig. 6. Implementations of the MA array and the modulo reduction unit are detailed in Sections 6.1 and The SD Multiplication-Accumulation Array The task of the MA array is to compute L i=1 k ix i, where x 1, x 2,...,x L are variables and k 1, k 2,...,k L are integer constants. Wang et al. presents an implementation of an MA array for variables on binary form. The MA architecture outlined in [18] uses the Modified Booth recoding algorithm to form partial products which are added using a Wallace tree adder. The partial product generation of the MA array proposed here relies on a minimal signed-digit recoding scheme. The algorithm MSD(k) is used to find SD representations for the constant factors k 1, k 2,...,k L. In [19], it is proven that the algorithm given in Fig. 7 results in representations of minimal Hamming weight, that is, with a minimum number of non-zero digits. The resulting SD representation has no two adjacent non-zero digits. Thus, for an integer k, the number of non-zero digits is at most log 2 k / For example, MSD(383) returns {512, 128, 1} which corresponds to an SD representation of [ ] SD. Each nonzero digit in the minimal SD representations results in a partial product. The partial products are formed using shift and negation operations. For example, 383x is computed as (x 9) (x 7) x. The operation of the MA array is best explained using an example. Consider the five-moduli set S = {16, 17, 9, 7, 5} with a dynamic range of 16 bits. The x 2 (3) x 2 (2) x 1 (3) x 1 (2) x 1 (1) x 1 (0) x 2 (2) x 2 (1) x 1 (3) x 1 (2) x 1 (3) x 1 (2) x 1 (1) x 1 (0) x 2 (1) x 2 (0) - - x 2 (1) x 2 (3) x 2 (2) x 2 (3) x 2 (0) x 3 (2) x 2 (3) x 2 (2) x 1 (1) x 1 (0) x 2 (3) x 2 (2) x 4 (0) x 5 (0) - - x 3 (2) x 2 (0) x 3 (0) x 2 (1) x 3 (1) x 3 (0) x 2 (0) x 3 (2) x 2 (3) x 2 (2) x 2 (1) x 2 (0) x 5 (1) x 4 (2) x 3 (1) x 4 (2) x 3 (2) x 4 (0) - x 3 (1) x 3 (0) x 2 (1) x 2 (0) x 3 (1) x 3 (0) x 4 (1) x 4 (0) x 4 (1) - - x 5 (1) x 5 (0) x 3 (1) x 3 (2) x 4 (2) x 4 (1) x 5 (1) x 5 (0) x 4 (2) x 3 (0) x 4 (0) x 5 (1) x 4 (1) x 5 (0) Figure 9 Compressed partial product array.

11 Forward and reverse converters and moduli set selection... As seen in Fig. 9, The computation of X =1,004x 1 + 4,725x 2 + 2,380x 3 + 1,530x 4 + 1,071x 5 is achieved by adding eight terms, each 16 bits wide. Because of the carry-free properties of SD adders, there is no need to employ a complicated adder tree structure (Wallace, Dadda etc.). A binary tree of SD adders is used. Since some of the compressed partial products contains constant zeros, simplified SD adder cells are used where possible. For the example case of S ={16, 17, 9, 7, 5}, the adder tree has three levels. Thus, the total delay of the multiplication-accumulation unit is approximately three times the delay of an SD full adder cell. 6.2 The SD Modulo Reduction Unit Figure 10 Modulo reduction unit. constants k 1,...,k 5 and the corresponding minimal SD representations are precalculated. k 1 = 1,004 =[ ] SD, k 2 = 4,725 =[ ] SD, k 3 = 2,380 =[ ] SD, k 4 = 1,530 =[ ] SD, k 5 = 1,071 =[ ] SD. The SD representations of k 1,...,k 5 contain a total of 22 non-zero digits. Consequently, 22 partial products need to be added. Figure 8 shows the resulting partial product array, where each row represents a partial product. The array contains a large number of constant zero operands, depicted by -s in Fig. 8. The zero operands will not affect the result and can be eliminated by compression of the partial product array. As many zero operands as possible are removed, while the weights of non-constant operands are preserved. Figure 9 shows the compressed partial product array for the given example. The modulo reduction unit computes X M,whereX is the result from the MA step and M is the moduli product with m 1 excluded, that is M = m 2 m 3...m L. Since no modulo reduction is performed in the MA stage, the word length of X is greater than the word length of M. Let n be the digit-length of X and let a = log 2 M. Two SD vectors are created from X: X = 2 a 1 X high + X low, X high =[X n 1...X a 1 ] SD, X low =[X a 2...X 0 ] SD. Since X low has digit-length a 1, we know for sure that M < X low < M. A ROM look-up table is used to generate X LUT = 2 a 1 X high. It is not practical to use M redundant signed-digit numbers for ROM addressing. Instead, X high is decomposed into its binary components X + high and X high. X+ high and X high are unsigned binary numbers. Two ROM look-up tables are used to find X LUT. X + LUT 2 = a 1 X + high X LUT 2 = a 1 X high M, M, X LUT = X + LUT X LUT. Figure 11 Transposed form FIR filter.

12 A. Persson, L. Bengtsson The two lock-up tables are identical and a single twoport ROM memory, addressed by n a + 1 bits, can be used for the look-up operations. X + LUT and X LUT are unsigned binary numbers in the range [0, M). The SD subtractor cell for unsigned binary operands consists of just two logic gates and the gate depth is one. The result, X LUT,isana-digit SD number in the range ( M, M). The result of the modulo operation is the sum of X LUT and X low. These two numbers are both in the range ( M, M). Thus, their sum is in the range ( 2M, 2M). Four potential results are computed: Figure 12 Mod m i FIR filter tap. R 0 = X LUT + X low, R 1 = X LUT + X low M, R 2 = X LUT + X low + M, R 2 = X LUT + X low + 2M. The constant terms M, M and 2M are added to X low using simplified SD adders in parallel to the look-up operation. One of the potential results is in the desired range of [0, M). R 0,...,R 3 are converted to binary form using four carry-look-ahead adders operating in parallel. The correct result is selected by examining the carry out bits of the CLA adders. The hardware architecture of the modulo reduction unit is depicted in Fig RNS/SD FIR Filters In order to evaluate the performance of RNS/SD processing using the presented moduli sets, RNS/SD finite impulse response (FIR) filters have been implemented as reference designs. The filter designs implement programmable N-tap FIR filters. with forward and reverse RNS/SD converters. Implementation results are presented in Section 8. 8 VLSI Implementation Results The presented designs have been coded in structurallevel VHDL and mapped to standard-cells using Synopsys Design Compiler and a UMC 0.13 m CMOS cell library with eight metal layers and a core voltage of 1.2 Volts. The VHDL designs were compiled for typical operating and wire load conditions and synthesised for four different equivalent (binary) word lengths (16, 24, 32 and 40 bits). For the parameterized moduli sets, the parameter n was chosen such that the resulting moduli product was as small as possible, but at least equal to desired dynamic range. General moduli sets of length five and six have also been evaluated. The general sets were chosen according to the criteria for effective moduli sets given in Section 3. The moduli sets used for VLSI implementation are presented in Table 2. Note that no six-moduli set has been selected for the 16 bit dynamic range. It is not possible to form a set of six coprime low-cost moduli with a moduli product as small as N y(n) = a k x(n k) k=1 realized in transposed form as shown in Fig. 11. The filter coefficients a 1,...,a N are calculated a priori. For an RNS with moduli set S ={m 1, m 2,...,m L }, the FIR filter is decomposed into L subfilters operating in parallel, each subfilter using modulo m i arithmetic. Each filter tap consists of a modulo adder, a modulo multiplier and a register. Figure 12 shows an SD filter tap and Fig. 13 depicts an RNS/SD FIR filter, complete Figure 13 RNS/SD FIR filter.

13 Forward and reverse converters and moduli set selection... Table 2 Moduli sets used for VLSI implementation. Moduli set 16 bits 24 bits 32 bits 40 bits Number Values Number Values Number Values Number Values S 1 8 {255, 257} 12 {4095, 4097} 16 {65535, 65537} 20 {220 1, } S 2 6 {63, 64, 65} 8 {255, 256, 257} 11 {2,047, 2,048, 2,049} 14 {16,383, 16,384, 16,385} S 3 4 {15, 16, 17, 257} 5 {31, 32, 33, 1,025} 7 {127, 128, 129, 16,385} 8 {255, 256, 257, 65,537} S 4 /S 5 5 {32, 31, 15, 17} 7 {128, 127, 63, 65} 9 {512, 511, 255, 257} 11 {2,048, 2,047, 1,023, 1,025} Five moduli {16, 17, 9, 7, 5} {64, 65, 31, 17, 7} {128, 129, 127, 65, 17} {512, 511, 257, 129, 65} Six moduli {32, 33, 31, 17, 7, 5} {128, 127, 65, 31, 17, 7} {256, 257, 129, 127, 31, 17} Table 3 Performance evaluation for RNS/SD encoders. Moduli set Delay [ns] Area [mm 2 ] 16 bits 24 bits 32 bits 40 bits 16 bits 24 bits 32 bits 40 bits S ,24 0,24 0, S S 3 1,34 1,34 1,34 1, S 4 /S 5 1,35 1,35 1,35 1, Five moduli Six moduli Table 4 Performance evaluation of RNS/SD decoders. Moduli set Delay [ns] Area [mm 2 ] 16 bits 24 bits 32 bits 40 bits 16 bits 24 bits 32 bits 40 bits S 1 3, ,65 6, S S 3 4,50 5, , S 4 /S Five moduli Six moduli 8, Table 5 Performance evaluation of 8-tap RNS/SD FIR filters. Moduli set Delay [ns] Area [mm 2 ] Area Delay 16 bits 24 bits 32 bits 40 bits 16 bits 24 bits 32 bits 40 bits 16 bits 24 bits 32 bits 40 bits S 1 4,17 4, S 2 3,86 4,11 4,82 4, S 3 4,16 4,66 4,81 4, S 4 /S Five moduli ,89 4, Six moduli 3,65 4,

14 A. Persson, L. Bengtsson 8.1 RNS/SD Encoders Table 3 shows VLSI implementation results for RNS/ SD encoders using the moduli sets from Table 2. For the parameterized moduli sets, the circuit delay is not affected by the value of the parameter n. The RNS/SD encoders for general moduli sets, on the other hand, consist of different adder-tree structures for different dynamic ranges. Thus, the circuit delay is not constant. The circuit area grows linearly with increased dynamic ranges for all encoders. 8.2 RNS/SD Decoders Table 4 shows VLSI implementation results for the proposed RNS/SD decoders. As seen in Table 4, the decoder for moduli set S 1 has the smallest area and, for the 16-bit dynamic range, also the shortest circuit delay. For the larger dynamic ranges, the decoder for moduli set S 2 has the shortest delay. The decoders for sets S 4 and S 5 has considerably longer delay and larger area, due to the constant multipliers in the second stage of the converters. the performance of RNS/SD arithmetic circuits depends on the choice of the moduli set (a set of pairwise prime numbers), the purpose of this work has been to compare RNS/SD number systems based on different sets. Four moduli-set-specific conversion techniques are proposed. A conversion technique for general moduli sets consisting of any number of coprime moduli has also been presented. Finite impulse response (FIR) filters have been used in order to evaluate the performance of RNS/SD processing using the proposed moduli sets. All designs have been implemented in a commercially available 0.13 μm CMOS process. The designs have been compared with respect to delay, area and area delay products. The implementation results show that the complexity of RNS/SD converters grows as the number of moduli is increased. However, if the designs are large enough, the increased complexity of the converters is overcome by area savings in RNS/SD processing units. For the case of FIR filters it is shown that generic moduli sets with five or six moduli results in designs with the best area delay products. 8.3 RNS/SD FIR Filters Implementation results for 8-tap FIR filters are presented in Table 5. When implementing the FIR filters, pipeline stages where added in the RNS/SD converters to maintain a clock cycle that is determined by the critical path of the filter taps. The RNS/SD forward conversions introduce an additional latency of one clock cycle. The reverse conversions introduce an additional latency of two clock cycles for filters using moduli sets S 1,...,S 3 and three clock cycles for filters with moduli sets S 4 and S 5. The reverse conversions for general moduli sets introduce a latency of three clock cycles. For the case of 8-tap FIR filters we see that generic moduli sets with five or six moduli results in designs with the best area delay products. Considering even longer filters, the impact of the forward and backward converters on the total circuit area will decrease. This will furthermore favor the longer moduli sets. 9 Conclusions This work has presented new forward and reverse converters for signed-digit residue number systems. Since References 1. Lindström, A., Nordseth, M., Bengtsson, L., & Omondi, A. (2004). Arithmetic circuits combining residue and signeddigit representations. In Lecture notes in computer science (LNCS) (Vol. 2823, pp ). Springer. 2. Szab, N. S., & Tanaka, R. I. (1967). Residue arithmetic and its applications to computer technology. McGraw-Hill (December). 3. Soderstrand, M., & Jenkins, W. (1986). Residue number system arithmetic: Modern applications in digital signal processing. IEEE Press. 4. Wang, W., Swamy, M., & Ahmad, M. (2003). Rns application in digital image processing. In Proceedings of the 3rd IEEE international workshop on system-on-chip for real-time applications (pp ) (July). 5. Avizienis, A. (1961). Signed-digit number representation for fast parallel arithmetic. IRE Transactions on Electronic Computers, EC-10, Parhami, B. (1988). Carry-free addition of recoded binary signed-digit numbers. IEEE Transactions on Computers, 37(11), (November). 7. Wei, S., & Shimizu, K. (2000). A novel residue arithmetic hardware algorithm using a signed-digit number representation. IEICE Transactions on Information and Systems, E83 D(12), (December). 8. Wei, S., & Shimizu, K. (2001). Fast residue arithmetic multipliers based on a signed-digit number system. In Proceedings of the 8th IEEE international conference on electronics, circuits and systems (Vol. 1, pp ) (September).

15 Forward and reverse converters and moduli set selection Wei, S., & Shimizu, K. (2002). Residue signed-digit arithmetic circuit with a complement of mudulus and the application to rsa encryption processor. In Proceedings of the 9th IEEE international conference on electronics, circuits and systems (Vol. 2, pp ) (September). 10. Lindahl, A., & Bengtsson, L. (2005). A low-power fir filter using combined residue and radix-2 signed-digit representation. In Proceedings of the 8th EUROMICRO conference on digital system design (DSD 05) (pp ). Porto, Portugal: IEEE Computer Society Press (August September). 11. Abdallah, M., & Skavantzos, A. (1995). A systematic approach for selecting practical moduli sets for residue number systems. In Proceedings of the 27th IEEE southeastern symposium on system theory (pp ) (March). 12. Vinnakota, B., & Rao, V. B. (1994). Fast conversion techniques for binary-residue number systems. IEEE transactions on circuits and systems I: Fundamental theory and applications, CAS-41(12), (December). 13. Wang, W., Swamy, M., Ahmad, M., & Wang, Y. (1999). The applications of the new Chinese remainder theorems for three moduli sets. In Proceedings of the 1999 IEEE Conadian conference on electrical and computer engineering (Vol. 1, pp ) (May). 14. Wang, Y., Song, X., Aboulhamid, M., & Shen, H. (2002). Adder based residue to binary number converters for (2 n 1, 2 n, 2 n + 1). IEEE Transactions on Signal Processing, 50(7), (July). 15. Wang, Y. (2000). Residue-to-binary converters based on new Chinese remainder theorems. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 47(3), (March). 16. Cao, B., Chang, C., & Srikanthan, T. (2003). An efficient reverse converter for the 4-moduli set {2 n 1, 2 n, 2 n + 1, 2 2n + 1} based on the new Chinese remainder theorem. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, 50(10), (October). 17. Skavantzos, A., & Saturates, T. (1999). Grouped-moduli residue number systems for fast signal processing. In Proceedings of the 1999 IEEE international symposium on circuits and systems (ISCAS 99) (Vol. 3, pp ) (May). 18. Wang, W., Swamy, M., & Ahmad, M. (2000). An area-time efficient residue-to-binary converter. In Proceedings of the 43rd IEEE midwest symposium on circuits and systems (pp ) (August). 19. Shallit, J. (2005). A primer on balanced binary representations. Retrieved October 2005, from ca/ shallit/papers/bbr.pdf. Andreas Persson obtained the M.Sc. degree from Chalmers University of Technology, Gothenburg, Sweden in He is now pursuing the Ph.D. degree at the Centre for Research on Embedded Systems (CERES) at Halmstad University, Sweden. Lars Bengtsson obtained the M.Sc. and Ph.D. degrees from Chalmers University of Technology, Gothenburg, Sweden in 1983 and 1997 respectively. After working in industry for some years as a HW and SW engineer he was recruited for a position as senior lecturer and later promoted to associate professor at Halmstad University, Sweden. He subsequently moved to Chalmers where he was appointed associate professor in year His research interest lies in the area of embedded and networked processors, active RFID, and digital VLSI circuits.

A High-Speed Realization of Chinese Remainder Theorem

A High-Speed Realization of Chinese Remainder Theorem Proceedings of the 2007 WSEAS Int. Conference on Circuits, Systems, Signal and Telecommunications, Gold Coast, Australia, January 17-19, 2007 97 A High-Speed Realization of Chinese Remainder Theorem Shuangching

More information

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA.

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA. GLOBAL JOURNAL OF ADVANCED ENGINEERING TECHNOLOGIES AND SCIENCES DESIGN OF A QUINARY TO RESIDUE NUMBER SYSTEM CONVERTER USING MULTI-LEVELS OF CONVERSION Hassan Amin Osseily Electrical and Electronics Department,

More information

An Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 }

An Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 } An Effective New CRT Based Reverse Converter for a Novel Moduli Set +1 1, +1, 1 } Edem Kwedzo Bankas, Kazeem Alagbe Gbolagade Department of Computer Science, Faculty of Mathematical Sciences, University

More information

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at Volume 3, No 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at wwwjgrcsinfo A NOVEL HIGH DYNAMIC RANGE 5-MODULUS SET WHIT EFFICIENT REVERSE CONVERTER AND

More information

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System

More information

On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli

On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli Behrooz Parhami Department of Electrical and Computer Engineering University of California Santa Barbara, CA 93106-9560,

More information

Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n,

Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n, Regular Paper Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n, 2 n +1) Shuangching Chen and Shugang Wei

More information

Residue Number Systems Ivor Page 1

Residue Number Systems Ivor Page 1 Residue Number Systems 1 Residue Number Systems Ivor Page 1 7.1 Arithmetic in a modulus system The great speed of arithmetic in Residue Number Systems (RNS) comes from a simple theorem from number theory:

More information

Lecture 8: Sequential Multipliers

Lecture 8: Sequential Multipliers Lecture 8: Sequential Multipliers ECE 645 Computer Arithmetic 3/25/08 ECE 645 Computer Arithmetic Lecture Roadmap Sequential Multipliers Unsigned Signed Radix-2 Booth Recoding High-Radix Multiplication

More information

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,

More information

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

Tree and Array Multipliers Ivor Page 1

Tree and Array Multipliers Ivor Page 1 Tree and Array Multipliers 1 Tree and Array Multipliers Ivor Page 1 11.1 Tree Multipliers In Figure 1 seven input operands are combined by a tree of CSAs. The final level of the tree is a carry-completion

More information

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic

More information

Optimization of new Chinese Remainder theorems using special moduli sets

Optimization of new Chinese Remainder theorems using special moduli sets Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2010 Optimization of new Chinese Remainder theorems using special moduli sets Narendran Narayanaswamy Louisiana State

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proposal to Improve Data Format Conversions for a Hybrid Number System Processor LUCIAN JURCA, DANIEL-IOAN CURIAC, AUREL GONTEAN, FLORIN ALEXA Department of Applied Electronics, Department of Automation

More information

GENERALIZED ARYABHATA REMAINDER THEOREM

GENERALIZED ARYABHATA REMAINDER THEOREM International Journal of Innovative Computing, Information and Control ICIC International c 2010 ISSN 1349-4198 Volume 6, Number 4, April 2010 pp. 1865 1871 GENERALIZED ARYABHATA REMAINDER THEOREM Chin-Chen

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proceedings of the 11th WSEAS International Conference on COMPUTERS, Agios Nikolaos, Crete Island, Greece, July 6-8, 007 653 Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 8, August 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient

More information

Information encoding and decoding using Residue Number System for {2 2n -1, 2 2n, 2 2n +1} moduli sets

Information encoding and decoding using Residue Number System for {2 2n -1, 2 2n, 2 2n +1} moduli sets Information encoding and decoding using Residue Number System for {2-1, 2, 2 +1} moduli sets Idris Abiodun Aremu Kazeem Alagbe Gbolagade Abstract- This paper presents the design methods of information

More information

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California VLSI Arithmetic Lecture 9: Carry-Save and Multi-Operand Addition Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel Carry-Save Addition* *from Parhami 2 June 18, 2003 Carry-Save

More information

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters April 15, 2010 John Wawrzynek 1 Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3 b 0 a 2 b 0 a 1 b

More information

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 9. Datapath Design Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 2, 2017 ECE Department, University of Texas at Austin

More information

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography. Power Consumption Analysis General principle: measure the current I in the circuit Arithmetic Level Countermeasures for ECC Coprocessor Arnaud Tisserand, Thomas Chabrier, Danuta Pamula I V DD circuit traces

More information

AREA EFFICIENT MODULAR ADDER/SUBTRACTOR FOR RESIDUE MODULI

AREA EFFICIENT MODULAR ADDER/SUBTRACTOR FOR RESIDUE MODULI AREA EFFICIENT MODULAR ADDER/SUBTRACTOR FOR RESIDUE MODULI G.CHANDANA 1 (M.TECH),chandana.g89@gmail.com P.RAJINI 2 (M.TECH),paddam.rajani@gmail.com Abstract Efficient modular adders and subtractors for

More information

Analysis and Synthesis of Weighted-Sum Functions

Analysis and Synthesis of Weighted-Sum Functions Analysis and Synthesis of Weighted-Sum Functions Tsutomu Sasao Department of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 820-8502, Japan April 28, 2005 Abstract A weighted-sum

More information

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1)) The Computer Journal, 47(1), The British Computer Society; all rights reserved A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form ( n ( p ± 1)) Ahmad A. Hiasat Electronics Engineering

More information

Design and Implementation of Efficient Modulo 2 n +1 Adder

Design and Implementation of Efficient Modulo 2 n +1 Adder www..org 18 Design and Implementation of Efficient Modulo 2 n +1 Adder V. Jagadheesh 1, Y. Swetha 2 1,2 Research Scholar(INDIA) Abstract In this brief, we proposed an efficient weighted modulo (2 n +1)

More information

Chapter 5 Arithmetic Circuits

Chapter 5 Arithmetic Circuits Chapter 5 Arithmetic Circuits SKEE2263 Digital Systems Mun im/ismahani/izam {munim@utm.my,e-izam@utm.my,ismahani@fke.utm.my} February 11, 2016 Table of Contents 1 Iterative Designs 2 Adders 3 High-Speed

More information

A VLSI Algorithm for Modular Multiplication/Division

A VLSI Algorithm for Modular Multiplication/Division A VLSI Algorithm for Modular Multiplication/Division Marcelo E. Kaihara and Naofumi Takagi Department of Information Engineering Nagoya University Nagoya, 464-8603, Japan mkaihara@takagi.nuie.nagoya-u.ac.jp

More information

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High speed counters International Journal of Engineering Research and Advanced Technology (IJERAT) DOI:http://dx.doi.org/10.31695/IJERAT.2018.3271 E-ISSN : 2454-6135 Volume.4, Issue 6 June -2018 Design and Comparison of Wallace

More information

Novel Modulo 2 n +1Multipliers

Novel Modulo 2 n +1Multipliers Novel Modulo Multipliers H. T. Vergos Computer Engineering and Informatics Dept., University of Patras, 26500 Patras, Greece. vergos@ceid.upatras.gr C. Efstathiou Informatics Dept.,TEI of Athens, 12210

More information

Cost/Performance Tradeoff of n-select Square Root Implementations

Cost/Performance Tradeoff of n-select Square Root Implementations Australian Computer Science Communications, Vol.22, No.4, 2, pp.9 6, IEEE Comp. Society Press Cost/Performance Tradeoff of n-select Square Root Implementations Wanming Chu and Yamin Li Computer Architecture

More information

Design and Implementation of Carry Tree Adders using Low Power FPGAs

Design and Implementation of Carry Tree Adders using Low Power FPGAs 1 Design and Implementation of Carry Tree Adders using Low Power FPGAs Sivannarayana G 1, Raveendra babu Maddasani 2 and Padmasri Ch 3. Department of Electronics & Communication Engineering 1,2&3, Al-Ameer

More information

Parallel Multipliers. Dr. Shoab Khan

Parallel Multipliers. Dr. Shoab Khan Parallel Multipliers Dr. Shoab Khan String Property 7=111=8-1=1001 31= 1 1 1 1 1 =32-1 Or 1 0 0 0 0 1=32-1=31 Replace string of 1s in multiplier with In a string when ever we have the least significant

More information

On the Complexity of Error Detection Functions for Redundant Residue Number Systems

On the Complexity of Error Detection Functions for Redundant Residue Number Systems On the Complexity of Error Detection Functions for Redundant Residue Number Systems Tsutomu Sasao 1 and Yukihiro Iguchi 2 1 Dept. of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka

More information

Lecture 8. Sequential Multipliers

Lecture 8. Sequential Multipliers Lecture 8 Sequential Multipliers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 9, Basic Multiplication Scheme Chapter 10, High-Radix Multipliers Chapter

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data

More information

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 5 ISSN : 2456-3307 Design and Implementation of Carry Look Ahead Adder

More information

FIXED WIDTH BOOTH MULTIPLIER BASED ON PEB CIRCUIT

FIXED WIDTH BOOTH MULTIPLIER BASED ON PEB CIRCUIT FIXED WIDTH BOOTH MULTIPLIER BASED ON PEB CIRCUIT Dr. V.Vidya Devi 1, GuruKumar.Lokku 2, A.Natarajan 3 1 Professor, Department of ECE, A. M.S. Engineering college, T.N., India vidyapeace@gmail.com 2 VLSI

More information

Computer Architecture 10. Fast Adders

Computer Architecture 10. Fast Adders Computer Architecture 10 Fast s Ma d e wi t h Op e n Of f i c e. o r g 1 Carry Problem Addition is primary mechanism in implementing arithmetic operations Slow addition directly affects the total performance

More information

HARDWARE IMPLEMENTATION OF FIR/IIR DIGITAL FILTERS USING INTEGRAL STOCHASTIC COMPUTATION. Arash Ardakani, François Leduc-Primeau and Warren J.

HARDWARE IMPLEMENTATION OF FIR/IIR DIGITAL FILTERS USING INTEGRAL STOCHASTIC COMPUTATION. Arash Ardakani, François Leduc-Primeau and Warren J. HARWARE IMPLEMENTATION OF FIR/IIR IGITAL FILTERS USING INTEGRAL STOCHASTIC COMPUTATION Arash Ardakani, François Leduc-Primeau and Warren J. Gross epartment of Electrical and Computer Engineering McGill

More information

Computer Architecture 10. Residue Number Systems

Computer Architecture 10. Residue Number Systems Computer Architecture 10 Residue Number Systems Ma d e wi t h Op e n Of f i c e. o r g 1 A Puzzle What number has the reminders 2, 3 and 2 when divided by the numbers 7, 5 and 3? x mod 7 = 2 x mod 5 =

More information

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier Espen Stenersen Master of Science in Electronics Submission date: June 2008 Supervisor: Per Gunnar Kjeldsberg, IET Co-supervisor: Torstein

More information

On-Line Hardware Implementation for Complex Exponential and Logarithm

On-Line Hardware Implementation for Complex Exponential and Logarithm On-Line Hardware Implementation for Complex Exponential and Logarithm Ali SKAF, Jean-Michel MULLER * and Alain GUYOT Laboratoire TIMA / INPG - 46, Av. Félix Viallet, 3831 Grenoble Cedex * Laboratoire LIP

More information

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography

Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography Frequency Domain Finite Field Arithmetic for Elliptic Curve Cryptography Selçuk Baktır, Berk Sunar {selcuk,sunar}@wpi.edu Department of Electrical & Computer Engineering Worcester Polytechnic Institute

More information

doi: /TCAD

doi: /TCAD doi: 10.1109/TCAD.2006.870407 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 5, MAY 2006 789 Short Papers Analysis and Synthesis of Weighted-Sum Functions Tsutomu

More information

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute DIGITAL TECHNICS Dr. Bálint Pődör Óbuda University, Microelectronics and Technology Institute 4. LECTURE: COMBINATIONAL LOGIC DESIGN: ARITHMETICS (THROUGH EXAMPLES) 2016/2017 COMBINATIONAL LOGIC DESIGN:

More information

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER

AN IMPROVED LOW LATENCY SYSTOLIC STRUCTURED GALOIS FIELD MULTIPLIER Indian Journal of Electronics and Electrical Engineering (IJEEE) Vol.2.No.1 2014pp1-6 available at: www.goniv.com Paper Received :05-03-2014 Paper Published:28-03-2014 Paper Reviewed by: 1. John Arhter

More information

On the Analysis of Reversible Booth s Multiplier

On the Analysis of Reversible Booth s Multiplier 2015 28th International Conference 2015 on 28th VLSI International Design and Conference 2015 14th International VLSI Design Conference on Embedded Systems On the Analysis of Reversible Booth s Multiplier

More information

Hardware Design I Chap. 4 Representative combinational logic

Hardware Design I Chap. 4 Representative combinational logic Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload

More information

National Taiwan University Taipei, 106 Taiwan 2 Department of Computer Science and Information Engineering

National Taiwan University Taipei, 106 Taiwan 2 Department of Computer Science and Information Engineering JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 907-919 (007) Short Paper Improved Modulo ( n + 1) Multiplier for IDEA * YI-JUNG CHEN 1, DYI-RONG DUH AND YUNGHSIANG SAM HAN 1 Department of Computer Science

More information

High Performance GHASH Function for Long Messages

High Performance GHASH Function for Long Messages High Performance GHASH Function for Long Messages Nicolas Méloni 1, Christophe Négre 2 and M. Anwar Hasan 1 1 Department of Electrical and Computer Engineering University of Waterloo, Canada 2 Team DALI/ELIAUS

More information

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials C. Shu, S. Kwon and K. Gaj Abstract: The efficient design of digit-serial multipliers

More information

NEW SELF-CHECKING BOOTH MULTIPLIERS

NEW SELF-CHECKING BOOTH MULTIPLIERS Int. J. Appl. Math. Comput. Sci., 2008, Vol. 18, No. 3, 319 328 DOI: 10.2478/v10006-008-0029-4 NEW SELF-CHECKING BOOTH MULTIPLIERS MARC HUNGER, DANIEL MARIENFELD Department of Electrical Engineering and

More information

Power Optimization using Reversible Gates for Booth s Multiplier

Power Optimization using Reversible Gates for Booth s Multiplier International Journal for Modern Trends in Science and Technology Volume: 02, Issue No: 11, November 2016 ISSN: 2455-3778 http://www.ijmtst.com Power Optimization using Reversible Gates for Booth s Multiplier

More information

Literature Review on Multiplier Accumulation Unit by Using Hybrid Adder

Literature Review on Multiplier Accumulation Unit by Using Hybrid Adder Literature Review on Multiplier Accumulation Unit by Using Hybrid Adder Amiya Prakash M.E. Scholar, Department of (ECE) NITTTR Chandigarh, Punjab Dr. Kanika Sharma Assistant Prof. Department of (ECE) NITTTR

More information

Implementation of Reversible Control and Full Adder Unit Using HNG Reversible Logic Gate

Implementation of Reversible Control and Full Adder Unit Using HNG Reversible Logic Gate Implementation of Reversible Control and Full Adder Unit Using HNG Reversible Logic Gate Naresh Chandra Agrawal 1, Anil Kumar 2, A. K. Jaiswal 3 1 Research scholar, 2 Assistant Professor, 3 Professor,

More information

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding Binary Multipliers The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 2 4 6 8 2 4 6 8 3 3 6 9 2 5 8 2 24 27 4 4 8 2 6

More information

Logic. Combinational. inputs. outputs. the result. system can

Logic. Combinational. inputs. outputs. the result. system can Digital Electronics Combinational Logic Functions Digital logic circuits can be classified as either combinational or sequential circuits. A combinational circuit is one where the output at any time depends

More information

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases

A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases A New Bit-Serial Architecture for Field Multiplication Using Polynomial Bases Arash Reyhani-Masoleh Department of Electrical and Computer Engineering The University of Western Ontario London, Ontario,

More information

Lecture 11. Advanced Dividers

Lecture 11. Advanced Dividers Lecture 11 Advanced Dividers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 15 Variation in Dividers 15.3, Combinational and Array Dividers Chapter 16, Division

More information

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks ECE 545 Digital System Design with VHDL Lecture Digital Logic Refresher Part A Combinational Logic Building Blocks Lecture Roadmap Combinational Logic Basic Logic Review Basic Gates De Morgan s Law Combinational

More information

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny

More information

THE discrete sine transform (DST) and the discrete cosine

THE discrete sine transform (DST) and the discrete cosine IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BIREFS 1 New Systolic Algorithm and Array Architecture for Prime-Length Discrete Sine Transform Pramod K. Meher Senior Member, IEEE and M. N. S. Swamy

More information

Area-Time Optimal Adder with Relative Placement Generator

Area-Time Optimal Adder with Relative Placement Generator Area-Time Optimal Adder with Relative Placement Generator Abstract: This paper presents the design of a generator, for the production of area-time-optimal adders. A unique feature of this generator is

More information

Addition of QSD intermediat e carry and sum. Carry/Sum Generation. Fig:1 Block Diagram of QSD Addition

Addition of QSD intermediat e carry and sum. Carry/Sum Generation. Fig:1 Block Diagram of QSD Addition 1216 DESIGN AND ANALYSIS OF FAST ADDITION MECHANISM FOR INTEGERS USING QUATERNARY SIGNED DIGIT NUMBER SYSTEM G.MANASA 1, M.DAMODHAR RAO 2, K.MIRANJI 3 1 PG Student, ECE Department, Gudlavalleru Engineering

More information

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1> Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building

More information

GF(2 m ) arithmetic: summary

GF(2 m ) arithmetic: summary GF(2 m ) arithmetic: summary EE 387, Notes 18, Handout #32 Addition/subtraction: bitwise XOR (m gates/ops) Multiplication: bit serial (shift and add) bit parallel (combinational) subfield representation

More information

ECE380 Digital Logic. Positional representation

ECE380 Digital Logic. Positional representation ECE380 Digital Logic Number Representation and Arithmetic Circuits: Number Representation and Unsigned Addition Dr. D. J. Jackson Lecture 16-1 Positional representation First consider integers Begin with

More information

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology Uppoju Shiva Jyothi M.Tech (ES & VLSI Design), Malla Reddy Engineering College For Women, Secunderabad. Abstract: Quantum cellular automata

More information

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers*

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers* Quantum Algorithm Processor For Finding Exact Divisors Professor J R Burger Summary Wiring diagrams are given for a quantum algorithm processor in CMOS to compute, in parallel, all divisors of an n-bit

More information

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder M.S.Navya Deepthi M.Tech (VLSI), Department of ECE, BVC College of Engineering, Rajahmundry. Abstract: Quantum cellular automata (QCA) is

More information

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m )

A Digit-Serial Systolic Multiplier for Finite Fields GF(2 m ) A Digit-Serial Systolic Multiplier for Finite Fields GF( m ) Chang Hoon Kim, Sang Duk Han, and Chun Pyo Hong Department of Computer and Information Engineering Taegu University 5 Naeri, Jinryang, Kyungsan,

More information

Logic BIST. Sungho Kang Yonsei University

Logic BIST. Sungho Kang Yonsei University Logic BIST Sungho Kang Yonsei University Outline Introduction Basics Issues Weighted Random Pattern Generation BIST Architectures Deterministic BIST Conclusion 2 Built In Self Test Test/ Normal Input Pattern

More information

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose

More information

Algorithms (II) Yu Yu. Shanghai Jiaotong University

Algorithms (II) Yu Yu. Shanghai Jiaotong University Algorithms (II) Yu Yu Shanghai Jiaotong University Chapter 1. Algorithms with Numbers Two seemingly similar problems Factoring: Given a number N, express it as a product of its prime factors. Primality:

More information

A Bit-Plane Decomposition Matrix-Based VLSI Integer Transform Architecture for HEVC

A Bit-Plane Decomposition Matrix-Based VLSI Integer Transform Architecture for HEVC IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 64, NO. 3, MARCH 2017 349 A Bit-Plane Decomposition Matrix-Based VLSI Integer Transform Architecture for HEVC Honggang Qi, Member, IEEE,

More information

What s the Deal? MULTIPLICATION. Time to multiply

What s the Deal? MULTIPLICATION. Time to multiply What s the Deal? MULTIPLICATION Time to multiply Multiplying two numbers requires a multiply Luckily, in binary that s just an AND gate! 0*0=0, 0*1=0, 1*0=0, 1*1=1 Generate a bunch of partial products

More information

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10, A NOVEL DOMINO LOGIC DESIGN FOR EMBEDDED APPLICATION Dr.K.Sujatha Associate Professor, Department of Computer science and Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, Tamilnadu,

More information

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach

Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach Subquadratic space complexity multiplier for a class of binary fields using Toeplitz matrix approach M A Hasan 1 and C Negre 2 1 ECE Department and CACR, University of Waterloo, Ontario, Canada 2 Team

More information

Chapter 2 Basic Arithmetic Circuits

Chapter 2 Basic Arithmetic Circuits Chapter 2 Basic Arithmetic Circuits This chapter is devoted to the description of simple circuits for the implementation of some of the arithmetic operations presented in Chap. 1. Specifically, the design

More information

Class Website:

Class Website: ECE 20B, Winter 2003 Introduction to Electrical Engineering, II LECTURE NOTES #5 Instructor: Andrew B. Kahng (lecture) Email: abk@ece.ucsd.edu Telephone: 858-822-4884 office, 858-353-0550 cell Office:

More information

Information redundancy

Information redundancy Information redundancy Information redundancy add information to date to tolerate faults error detecting codes error correcting codes data applications communication memory p. 2 - Design of Fault Tolerant

More information

VHDL DESIGN AND IMPLEMENTATION OF C.P.U BY REVERSIBLE LOGIC GATES

VHDL DESIGN AND IMPLEMENTATION OF C.P.U BY REVERSIBLE LOGIC GATES VHDL DESIGN AND IMPLEMENTATION OF C.P.U BY REVERSIBLE LOGIC GATES 1.Devarasetty Vinod Kumar/ M.tech,2. Dr. Tata Jagannadha Swamy/Professor, Dept of Electronics and Commn. Engineering, Gokaraju Rangaraju

More information

CMP 334: Seventh Class

CMP 334: Seventh Class CMP 334: Seventh Class Performance HW 5 solution Averages and weighted averages (review) Amdahl's law Ripple-carry adder circuits Binary addition Half-adder circuits Full-adder circuits Subtraction, negative

More information

On LUT Cascade Realizations of FIR Filters

On LUT Cascade Realizations of FIR Filters On LUT Cascade Realizations of FIR Filters Tsutomu Sasao 1 Yukihiro Iguchi 2 Takahiro Suzuki 2 1 Kyushu Institute of Technology, Dept. of Comput. Science & Electronics, Iizuka 820-8502, Japan 2 Meiji University,

More information

Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters

Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters Zhaojun Wo and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst,

More information

Carry Look Ahead Adders

Carry Look Ahead Adders Carry Look Ahead Adders Lesson Objectives: The objectives of this lesson are to learn about: 1. Carry Look Ahead Adder circuit. 2. Binary Parallel Adder/Subtractor circuit. 3. BCD adder circuit. 4. Binary

More information

Design of A Efficient Hybrid Adder Using Qca

Design of A Efficient Hybrid Adder Using Qca International Journal of Engineering Science Invention ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 PP30-34 Design of A Efficient Hybrid Adder Using Qca 1, Ravi chander, 2, PMurali Krishna 1, PG Scholar,

More information

FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES

FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES R.P.MEENAAKSHI SUNDHARI 1, Dr.R.ANITA 2 1 Department of ECE, Sasurie College of Engineering, Vijayamangalam, Tamilnadu, India.

More information

VHDL Implementation of Reed Solomon Improved Encoding Algorithm

VHDL Implementation of Reed Solomon Improved Encoding Algorithm VHDL Implementation of Reed Solomon Improved Encoding Algorithm P.Ravi Tej 1, Smt.K.Jhansi Rani 2 1 Project Associate, Department of ECE, UCEK, JNTUK, Kakinada A.P. 2 Assistant Professor, Department of

More information

OPTIMAL DESIGN AND SYNTHESIS OF FAULT TOLERANT PARALLEL ADDER/SUBTRACTOR USING REVERSIBLE LOGIC GATES. India. Andhra Pradesh India,

OPTIMAL DESIGN AND SYNTHESIS OF FAULT TOLERANT PARALLEL ADDER/SUBTRACTOR USING REVERSIBLE LOGIC GATES. India. Andhra Pradesh India, OPTIMAL DESIGN AND SYNTHESIS OF FAULT TOLERANT PARALLEL ADDER/SUBTRACTOR USING REVERSIBLE LOGIC GATES S.Sushmitha 1, H.Devanna 2, K.Sudhakar 3 1 MTECH VLSI-SD, Dept of ECE, ST. Johns College of Engineering

More information

Design and Study of Enhanced Parallel FIR Filter Using Various Adders for 16 Bit Length

Design and Study of Enhanced Parallel FIR Filter Using Various Adders for 16 Bit Length International Journal of Soft Computing and Engineering (IJSCE) Design and Study of Enhanced Parallel FIR Filter Using Various Adders for 16 Bit Length D.Ashok Kumar, P.Samundiswary Abstract Now a day

More information

Low Power and Low Complexity Shift-and-Add Based Computations

Low Power and Low Complexity Shift-and-Add Based Computations Linköping Studies in Science and Technology Dissertations, No. 2 Low Power and Low Complexity Shift-and-Add Based Computations Kenny Johansson Department of Electrical Engineering Linköping University,

More information

Numeration and Computer Arithmetic Some Examples

Numeration and Computer Arithmetic Some Examples Numeration and Computer Arithmetic 1/31 Numeration and Computer Arithmetic Some Examples JC Bajard LIRMM, CNRS UM2 161 rue Ada, 34392 Montpellier cedex 5, France April 27 Numeration and Computer Arithmetic

More information

Outline. Computer Arithmetic for Cryptography in the Arith Group. LIRMM Montpellier Laboratory of Computer Science, Robotics, and Microelectronics

Outline. Computer Arithmetic for Cryptography in the Arith Group. LIRMM Montpellier Laboratory of Computer Science, Robotics, and Microelectronics Outline Computer Arithmetic for Cryptography in the Arith Group Arnaud Tisserand LIRMM, CNRS Univ. Montpellier 2 Arith Group Crypto Puces Porquerolles, April 16 18, 2007 Introduction LIRMM Laboratory Arith

More information

CS 140 Lecture 14 Standard Combinational Modules

CS 140 Lecture 14 Standard Combinational Modules CS 14 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris 1 Part III. Standard Modules A. Interconnect B. Operators. Adders Multiplier

More information

Binary addition (1-bit) P Q Y = P + Q Comments Carry = Carry = Carry = Carry = 1 P Q

Binary addition (1-bit) P Q Y = P + Q Comments Carry = Carry = Carry = Carry = 1 P Q Digital Arithmetic In Chapter 2, we have discussed number systems such as binary, hexadecimal, decimal, and octal. We have also discussed sign representation techniques, for example, sign-bit representation

More information

Low-complexity generation of scalable complete complementary sets of sequences

Low-complexity generation of scalable complete complementary sets of sequences University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2006 Low-complexity generation of scalable complete complementary sets

More information

EXPLOITING RESIDUE NUMBER SYSTEM FOR POWER-EFFICIENT DIGITAL SIGNAL PROCESSING IN EMBEDDED PROCESSORS

EXPLOITING RESIDUE NUMBER SYSTEM FOR POWER-EFFICIENT DIGITAL SIGNAL PROCESSING IN EMBEDDED PROCESSORS EXPLOITING RESIDUE NUMBER SYSTEM FOR POWER-EFFICIENT DIGITAL SIGNAL PROCESSING IN EMBEDDED PROCESSORS Rooju Chokshi 1, Krzysztof S. Berezowski 2,3, Aviral Shrivastava 2, Stanisław J. Piestrak 4 1 Microsoft

More information