Radix-4 Vectoring CORDIC Algorithm and Architectures. July 1998 Technical Report No: UMA-DAC-98/20

Size: px
Start display at page:

Download "Radix-4 Vectoring CORDIC Algorithm and Architectures. July 1998 Technical Report No: UMA-DAC-98/20"

Transcription

1 Radix-4 Vectoring CORDIC Algorithm and Architectures J. Villalba E. Antelo J.D. Bruguera E.L. Zapata July 1998 Technical Report No: UMA-DAC-98/20 Published in: J. of VLSI Signal Processing Systems for Signal, Image, and Video Technology vol. 19, no. 2, pp , July 1998 University of Malaga Department of Computer Architecture C. Tecnologico PO Box 4114 E Malaga Spain

2 Radix-4 vectoring CORDIC algorithm and architectures by J. Villalba 1, E. Antelo 2, J.D. Bruguera 2 and E.L. Zapata 1 1 Dept. of Computer Architecture University of Malaga Malaga. Spain. julio@ac.uma.es 2 Dept. Electronica y Computacion. University of Santiago de Compostela. Spain elisardo@usc.es MANUSCRIPT FOR PARHI AND TAYLOR ASAP-96 SPECIAL ISSUE Author responsible for correspondence: Prof. Julio Villalba Moreno Dpto. Arquitectura de Computadores Universidad de Malaga, Complejo Tecnologico P.O.BOX : 4114 Malaga E (SPAIN) Tfno fax julio@ac.uma.es This work was supported in part by the Ministry of Education and Science (CICYT) of Spain under contract TIC C03-01

3 Radix-4 vectoring CORDIC algorithm and architectures Abstract In this work we extend the radix{4 CORDIC algorithm to the vectoring mode (the radix-4 CORDIC algorithm was proposed recently by the authors for the rotation mode). The extension to the vectoring mode is not straightforward, since the digit selection function is more complex in the vectoring case than in the rotation case; as in the rotation mode, the scale factor is not constant. Although the radix{4 CORDIC algorithm in vectoring mode has a similar recurrence as the radix{4 division algorithm, there are specic issues concerning the vectoring algorithm that demand dedicated study. We present the digit selection for non{redundant and redundant arithmetic (following two dierent approaches: arithmetic comparisons and table look{up), the computation and compensation of the scale factor, and the implementation of the algorithm (with both types of digit selection) in a word{serial architecture. When compared with conventional radix{2 (redundant and non-redundant) architectures, the radix-4 algorithms present a signicant speed up for angle calculation. For the computation of the magnitude the speed up is very slight, due to the non{constant scale factor in the radix{4 algorithm. 1 Introduction The CORDIC algorithm (COordinate Rotation DIgital Computer) was introduced to compute trigonometric functions and generalized to compute linear and hyperbolic functions [1][2]. It is an iterative algorithm suitable for VLSI implementation because it employs only adders and shifters and it has a broad application eld. Special attention has been paid by dierent researchers to the improvement of the algorithm in the last few years, as referenced in [3]. By means of the CORDIC algorithm, a vector (x; y) is rotated by an angle (rotation mode) or it is taken to the coordinate axis (vectoring mode). The algorithm is based on rotations over given elementary angles. The basic iteration or microrotation is: x i+1 = x i + i 2?i y i y i+1 = y i? i 2?i x i (1) z i+1 = z i + i tan?1 (2?i ) where (x 0 ; y 0 ) are the initial coordinates of the vector, z coordinate accumulates the angle, and i 2 f?1; +1g species the direction of each microrotation. Each iteration q introduces a scaling over both coordinates given by the expression k i = 1 + i 2 2?2i, and thus, after n iterations the nal x and y 1

4 coordinates are scaled by the factor: K = n?1 Y i=0 q i 2?2i (2) In radix-2 CORDIC, the scale factor is constant since j i j = 1 (for a detailed review of CORDIC see [3]). To preserve the magnitude of the vector, this factor must be compensated. In recent years many computionality intensive applications have been proposed, including matrix computations and computer graphics algorithms that need to compute angles [4] [5] [6]. In [4] [5] [7] the angle computation and rotation operation is performed by means of CORDIC modules. It has been shown that the processors for angle calculation and rotation based on CORDIC modules present a signicant improvement in performance as compared to the more conventional approach using standard modules like division, multiplication or square{root [4]. In the classic radix{2 CORDIC algorithm, roughly only one bit of the result is computed in each of the iterations. In [8], we design a rotator based on a radix-4 CORDIC algorithm (rotation mode only). In that paper we prove that if radix{4 instead of radix{2 is used, the total number of microrotations of the CORDIC algorithm is halved because two bits of the result are computed in each of the iterations. However the radix{4 CORDIC algorithm presents two drawbacks: 1. The selecction of i is more complex than in the radix-2 case. 2. The scale factor is not constant, since i takes values dierent for 1 or -1. As we show in [8] for the rotation mode the selection of i only depends on z i and the resulting digit selection table is simple. The scale factor is computed by combining look{up table and linear approximations of the scale factor of each microrotation. The resulting architectures demonstrate to be ecient when compared to the conventional radix-2 architectures (both with redundant and non{redundant arithmetic). In this paper we extend the radix{4 CORDIC algorithm to the vectoring mode (both for redundant and non{redundant representation of the operands). This extension is very interesting for applications that are based in the angle computation and rotation operation. However the extension is not straighforward due to the selection of i. In the vectoring case the selection of i depends on both x and y coordinates and the resulting selection is more complex than in the rotation mode. On the other hand the scale factor may be computed and compensated in a similar way as in the rotation mode. The organization of the paper is as follows. First in Section 2 we present the radix{4 CORDIC vectoring algorithm and demonstrate its convergence. 2

5 In Section 3 we deal with the selection of i. Section 4 is dedicated to the scale factor computation and compensation. In Section 5 we illustrate the implementation of the algorithm in a word{serial architecture. Finally Section 6 is dedicated to the evaluation and comparison and Section 7 to the conclusions. 2 Radix-4 CORDIC Algorithm In this section we develop a radix-4 CORDIC algorithm in the vectoring mode. We prove the convergence and give the precision and number of iterations. First we perform an extension of the iterative equations of the radix{2 CORDIC algorithm to radix{4 [8]. Basically, we use elementary angles of the form tan?1 ( i 4?i ) instead of tan?1 (2?i ), in such a way that equations (1) become: x i+1 = x i + i 4?i y i y i+1 = y i? i 4?i x i (3) z i+1 = z i + i ( i ) where i ( i ) = tan?1 ( i 4?i ), i takes values in the digit set f?a; : : :; 0; : : :; +ag, and with 2 a; 3. The number of iterations to achieve n bits of precision is n=2. The rotated coordinates are scaled by the factor K = Y i (1 + 2 i 4?2i ) 1=2 (4) The scale factor K is not constant, as it depends on the i values. In the vectoring mode, coordinate y is taken to 0 as the iterations progress. Therefore, in each iteration, the i value must be selected so that the y coordinate approaches to zero as the iterations progress. At the end of the iterations we only have to compensate the x coordinate, as the y coordinate is zero within the precision. Coordinate z does not require any correction. 2.1 Convergence of the radix{4 CORDIC algorithm in the vectoring mode In order to prove the convergence of the radix{4 CORDIC algorithm we have to prove that variable y approaches to zero as the index of the microrotations increases. In order to obtain a set of iterations where the radix{4 CORDIC vectoring may be eciently performed, we consider the scaled value of y i. That is, we dene a new variable w i : w i = 4 i y i (5) This equation introduces a scaling on y i of the same order as the decrease produced in y i in each iteration. In this way, we manage to maintain the value of w i bounded. This simplies the selection criteria and eliminates possible 3

6 imprecisions in the calculation. A similar solution is used in [5] [9]. With this change, equations (3) look like: x i+1 = x i + i 4?2i w i w i+1 = 4 (w i? i x i ) (6) z i+1 = z i + i ( i ) These equations reduce the number of barrel shifters to only one (see section 5). Based on these new equations, we may obtain a selection criteria for i in each microrotation that guarantees the convergence of the algorithm. The digit set for i is f?a; : : :; 0; : : :; +ag. This radix-4 digit set can be minimally redundant if a = 2 or maximally redundant if a = 3. As in the case of the radix-4 SRT-division [10] it is convenient to dene a redundancy factor as = a=3. The main drawback of the radix{4 algorithm is the selection of the i values. In order to bound the w variable in each iteration i must be a function of the value of both x i and w i (see iteration w in equation (6)). In this way, the selection process seems to be more complex than in the case of the radix{4 algorithm in the rotation mode [8] where the i values only depends on the z variable in each iteration. We now propose the selection intervals for i, that is if w i 2 [L q (x i ); U q (x i )] then i = q with q 2 f?a; : : :; 0; : : :; +ag. The selection interval [L q (x i ); U q (x i )] must assure that w i+1 is bounded selecting i = q. The selection intervals we propose are dened by: L q (x i ) = (q? ) x i (7) and U q (x i ) = (q + ) x i (8) These intervals are similar to the selection intervals used in the radix{4 SRT{division [10]. This is due to the fact that iteration w is similar to the iteration used in division. The iteration for division has the form w i+1 = 4(w i?q i d) where q i is the quotient digit and d is the divisor. However there are two important dierences between the division algorithm and the CORDIC algorithm: rst, in division the divisor is constant for all of the iterations whereas in the CORDIC algorithm the x coordinate takes dierent values in each one of the iterations. Secondly, if redundant arithmetic is considered, in the division algorithm the divisor is represented in a non{redundant form, but in the CORDIC algorithm the x coordinate is represented in redundant form, leading to a more complex selection function. Although both algorithms are very similar, these two dierences impose dierent constraints for both algorithms. Consequently, a particular study must be made for the radix{4 CORDIC algorithm. The intervals given in expressions (7) and (8) have to assure a) the continuity condition [10] and b) the bounding of w i. The continuity condition implies that all of the selection intervals must cover the whole range of w i. The continuity condition is assured if L q (x i ) U q?1 (x i ) for all i and q. Based on the denition of L q (x i ) and U q?1 (x i ) (see equations (7) and (8)) the continuity 4

7 condition is assured if 1=2. As we have dened = a=3 and the minimum value of a we can select is a = 2, the minimum value for is 2=3, and then the continuity condition is satised. Next, we demonstrate that w i is bounded in each of the iterations with the selection intervals given in expressions (7) and (8). Without any loss of generality in what follows we assume that x 0 0 and x 0 and y 0 are fractional values, one of them is normalized within the interval [0:5; 1). The bound for w i is: Proof: we prove this by induction. jw i j 4 x i (9) Base case (i=1). We are going to consider two sets of values that w 0 may take in relation to x 0 : that jw 0 j takes values that are lower than or equal to 4x 0 or that it takes larger values. a) Let us consider the set of values jw 0 j 4x 0. Since w 0 4x 0 it is clear that 9q 2 f?a; : : :; 0; : : :; +ag that satis- es q x 0? x 0 w 0 q x 0 + x 0 (10) Subtracting q x 0, multiplying by 4 and observing (6) we have:?4x 0 w 1 4x 0 (11) Thus as x i is a growing succession [5] we may write: jw 1 j 4x 1 (12) b) We now consider the set of values jw 0 j > 4x 0. Let us assume that for this set of values we select q = 2; we are going to see that with this selection expression (9) is satised and there is still a bound for w 1. The worst case occurs when the ratio between w 0 and x 0 is innite, that is, when x 0 = 0. If q = 2 and x 0 = 0, in the rst iteration (i = 0) we have (see equations (6)): We can thus write that: x 1 = 2w 0 w 1 = 4w 0 since = a=3 with a = 2 or a = 3. w 1 = 4w 0 = 2 2w 0 = 2x 1 4x 1 (13) Induction hypothesis (i = m? 1). We assume as true that jw m?1 j 4 x m?1 Induction step (i=m): Because of the induction hypothesis it is true that there is a q satisfying: q x m?1? x m?1 w m?1 q x m?1 + x m?1 (14) 5

8 Subtracting q x m?1, multiplying by 4 and taking (6) into account we may write:?4x m?1 w m 4x m?1 (15) Therefore, as x i is a growing succession [5] we may write: jw m j 4x m (16) Q.E.D. Now, we prove that x i is bounded. If we substitute the expression (9) in the rst equation of (6) we obtain: x i+1 x i + 4 i 4?2i x i = x i (1 + 4 i 4?2i ) (17) Expressing this inequality as a function of x 0 we have: iy (1 + 4 k 4?2k ) (18) x i+1 x 0 k=0 The maximum value of this expression is reached when j = a 8j i and = 1. Therefore, we obtain the expression: iy ( ?2k ) (19) x i+1 x 0 k=0 This product is convergent since the corresponding innite product is also convergent. The innite product is convergent since it fulls two conditions: the general term ( ?2i ) tends to 1 when k goes to innite, and the series composed by u 1 + u 2 + ::: + u p + ::: with u p = 12 4?2p is convergent too. Therefore, taking into account expression (9) and the bound of x i we conclude that w i is bounded. In Figure 1 we show the selection intervals for the case = 1 and = 2=3. In this Figure we only show the intervals for positive values of w i. The intervals are symmetrical for the negative values of w i. As we can see in this Figure there is an overlap between the selection intervals. Observe that the overlap is greater in the case = 1 than in the case = 2=3. This overlap implies that we do not need to make exact comparisons to determine the selection interval. We can determine the suitable interval based on estimations of w i and x i. This is very useful, both for non{redundant and redundant representations. For example, for redundant carry{save arithmetic and we only have to assimilate (convert from redundant to non{redundant representation) a reduced number of most signicant bits of w i and x i to determine the i value. In the next section we obtain the selection function based on estimations Precision and number of iterations obtained in radix-4 After n iterations, the angle between the vector(x p ; y p ) and the x axis is (see expressions (5) and (9)):!! tan?1 y p = tan?1 4?p w p tan?1 2 2?2p+1 (20) x p x p 6

9 If = 2=3 this value is slightly greater than the value obtained by means 2p standard radix{2 CORDIC iterations (tan?1 (2?2p+1 )) and slightly less than the value obtained with 2p + 1 iterations (tan?1 (2?2p+2 )). If = 1 this value coincides with 2p + 1 standard radix-2 iterations. Therefore, the radix{4 CORDIC algorithm in vectoring mode basically halves the number of microrotations with respect to the standard radix{2 CORDIC algorithm. 3 Selection function In this section we will obtain the selection functions for the radix{4 CORDIC algorithm in the vectoring mode. We obtain a selection function that is valid for its hardware implementation in redundant as well as non-redundant arithmetic. To do this, we use two dierent techniques that produce two dierent selection functions. The rst technique is based on arithmetic comparison and the second one is based on a look-up table. There is no clear dierence in terms of area and time between both techniques, and only a real implementation would tell us which one of them is better. For this reason, we explain both techniques. 3.1 Selection function by arithmetic comparisons This method is based on comparing coordinate w to a couple of comparison points so that the i value is obtained. To reduce the number of comparisons we choose a = 2 so that q = f0; 1; 2g. On the other hand, the only restrictions of the input data is jx 0 j < 1; jw 0 j < 1 and one of them must be normalized. Without any loss of generality we also assume that x 0 0. For clarity in the presentation, in what follows we assume non{redundant arithmetic. The extension to redundant arithmetic is considered at the end of this subsection Obtaining xed comparison points for all the iterations Let us assume the selection intervals given by [L q (x i ); U q (x i )] (see expressions (7)(8)). We dene P i (1) as the comparison point used for discriminating between values i = 0 and i = 1, and we dene P i (2) as the comparison point used for discriminating between the values i = 1 and i = 2 (We dene P i (?1) and P i (?2) in a similar way). The comparison points that we have dened must belong to the overlap intervals and be easy to calculate and implement. Two suitable selections for the comparison points are: P i (1) = 1 2 x i P i (2) = 3 2 x i (21) because they are simple and belong to the overlap intervals (see Figure 1b). However, it is necessary to recalculate the comparison points in each iteration, as they depend on the numerical value of x i. 7

10 The alternative we now present makes it only necessary to calculate the comparison points P i (1) and P i (2) in a few initial iterations, they remain xed for the rest of the iterations. We are going to calculate from which iteration the comparison points obtained are valid for the remaining iterations. As x i is a succession of growing terms, the successions of terms L q (x i ) and U q (x i ) are also growing (see expressions (7) (8) and Figure 1b). According to this, for the comparison points of the i{th stage (P i (1) and P i (2)) to still be valid as comparison points in the remaining iterations, they must belong to the overlap intervals of these iterations. In Figure 2 we see how there is a common overlap area between all the iterations for q = 0 and q = 1. We are going to seek an iteration i such that the comparison points belong to the common overlap area, that is, we seek a P i (1) and P i (2) such that (see Figure 2): L 1 (x 1 ) P i (1) U 0 (x i ) (22) L 2 (x 1 ) P i (2) U 1 (x i ) (23) (The arguments would be the same for P i (?1) and P i (?2)) Let us analyze equation (22); taking into account expressions (7) (8) and that the value of P i (1) is 1=2 x i, we may write: 1=3 x 1 1=2 x i 2=3 x i (24) A top bound for x i is obtained making i = 2 for every j > i in the equation on x in (6). If we consider i = 2 and substitute the expression (9) with = 2=3 in equation on x of (6) we obtain: x i+1 x i ?2i x i = x i ( ?2i ) (25) Now, we obtain the value x i+k as a function of x i : Y x i+k x i i+k?1 j=i ( ?2j ) (26) As shown in section 2.1, this series is convergent. The rst inequality of (24) can be written as x 1 3 (27) x i 2 For practical implementations it is enough to take x i+k with a large k value instead of x 1. Therefore, condition (27) can be substituted by x i+k x i 3 2 (28) with k large enough. From expression (26) we can write x i+k x i Y i+k?1 j=i ( ?2j ) (29) 8

11 This series converges very quickly. We have veried that for i = 0 and k = 500 a bound for this product is 8:7 which does not full condition 28, whereas for i = 1 a good bound is 1.4 which does full condition 28. Therefore, we conclude that value P 1 (1) can be used as a comparison point for the rest of the iterations, as this value is always within the overlap that is produced in the following iterations, that is: P i (1) = P 1 (1) 8i 1 (30) Proceeding in the same way from expression (23) we nd that the value P 2 (2) may be used as a comparison point for the rest of the iterations, that is: P i (2) = P 2 (2) 8i 2 (31) Therefore, we only have to calculate the comparison points in the rst three iterations. From this iteration on, the values calculated as comparison points (P 2 (2) and P 1 (1)) are valid for the next iterations. Figure 3 shows the evolution of the common overlap bound and the location of P 1 (1) and P 2 (2). As a consequence, the selection function is: i = 8 >< >: +2 if w i > P i (2) +1 if P i (1) < w i P i (2) 0 if P i (?1) < w i P i (1)?1 if P i (?2) < w i P i (?1)?2 if w i P i (?2) (32) being P i (1) = ( 1 2 x 0 if i = x 1 if i 1 P i (2) = ( 3 2 x i if i x 2 if i 2 (33) Size of the comparators As the comparison points depend on the values of the x coordinate, the comparison with these points must be carried out by means of an add/subtract operation. This operation may be carried out in parallel with the shift associated with equations (6) in each iteration. However, we need n bit addition/subtractions, which increase the hardware and slow down the comparison process (which would be signicantly longer than the delay through the shifters). In what follows we will prove that it is enough to perform the comparison with a few most signicant bits of P i (1) and P i (2) for the coecients to still be correct. This implies that fast adders/subtractors may be used for performing the comparison, saving hardware and obtaining comparison times of the same order as the delay of the shifters. Reducing the number of bits to be compared We now calculate the number of most signicant bits of w i and P i (q) needed to perform the comparison without making errors. We must prevent 9

12 the comparison from producing dierent results for the truncated and non{ truncated values of w i and P i (q). Let us call ^w i and ^Pi (q) the truncated values of w i and P i (q) using f fractional bits. Let b and c be any values taken by w i and let ^b and ^c be the corresponding truncated values. Let us assume that the truncated values with f fractional bits of the higher limit of the overlap interval U q (x i ) and the comparison point P i (q + 1) correspond to the same value (see Figure 4). The value of ^b and ^c is the same as that of ^Pi (q + 1) and for them we select i = q; however, point c is higher than U q (x i ) and does not permit any selection except i = q + 1, and thus the decision made for point c with the truncated values is not correct. In order to prevent this situation, it is necessary for the truncated values ^P i (q + 1) and ^U i (q) to be dierent. Thus, the distance between P i (q + 1) and U q (x i ) must be higher than the precision we observe for the truncated values of w i (2?f = distance between two consecutive points of ^w i, see Figure 4). In this way, we always have ^P i (q + 1) 6= ^U i (q). We can express this mathematically as: ju q (x i )? P i (q + 1)j > 2?f (34) From Figure 1 we see that the overlap interval is 1=3 x i, and as x i is a growing function, the amplitude of the intervals also grows (see Figure 3). For the same reason, the extremes of the intervals L q (x i ) and U q (x i ) are shifted in the same direction (see Figure 3). Consequently, the smallest of the distances between P i (q + 1) and U q (x i ) arises in the initial iterations, and depends on the smallest value that x i may have in these iterations. Due to the normalization employed, the minimum and maximum magnitude of the initial vector (x0; w0) are 0:5 and p 2. The extremes of the intervals are U 0 (x i ) = 2=3 x i and U 1 (x i ) = 5=3 x i (see Figure 1). We are going to calculate the number of fractional bits needed. Let us assume that i > 0; due to the normalization of the input data, the minimum value of x i is 0:5. Thus, taking into account that the distance between P i (q + 1) and U q (x i ) is greater than or equal to than 1=6 x i, it must happen that 1=6 x i 2?f. From this expression we can deduce that f > 3:5, that is, we need at least 4 fractional bits. Reasoning in a similar fashion, we nd that for i = 0 we need at least 5 fractional bits. Summarizing, we need to assimilate 5 fractional bits for x 0 and w 0 (a total of 7 bits taking into account the two bits of the integer part including sign) and 4 fractional bits for x i and w i (i 1) (a total of 8 bits taking into account the four bits of the integer part, including sign). We may now rewrite the selection function (32) as follows: i = 8 >< >: +2 if ^w i > ^P i (2) +1 if ^Pi (1) < ^w i ^P i (2) 0 if ^Pi (?1) < ^w i ^P i (1)?1 if ^Pi (?2) < ^w i ^P i (?1)?2 if ^w i ^P i (?2) 10 (35)

13 where ^Pi (1) and ^Pi (2) are the truncated values of P i (1) and P i (2) (see expressions (33)) Extension to redundant arithmetic Without any loss of generality, we assume that we use carry{save redundant arithmetic. We truncate w i with f fractional bits. Since we are not taking into account bits with weight of less than 2?f for w i, the maximum error is 2?f in the sum word, and 2?f in the carry word, so the total error is 2?f+1. As this error is positive, the truncated value ^w i and the real value w i satises: ^w i w i < ^w i + 2?f+1 (36) Now, the condition for the truncated values ^P i (q + 1) and ^U i (q) is not the same if the distance between P i (q + 1) and U q (x i ) is no larger than the precision we observe for the truncated values of w i in redundant arithmetic: 2?f+1. Condition (34) is now transformed into: ju q (x i )? P i (q + 1)j > 2?f+1 (37) Using the same arguments we nd that in redundant arithmetic it is necessary to observe one more fractional bit. However, if the input data (x0; w0) are in conventional arithmetic, this condition does not have to be applied for i = 0, and thus the number of fractional bits is 5 for all the iterations. Consequently, the selection function we propose in redundant carry{save arithmetic coincides with (35), truncating P i (q) and w i with 5 fractional bits. 3.2 Selection function by look-up table We prefer to develop this method in redundant arithmetic. The non{redundant arithmetic version can be easily obtained from the redundant one and it will be studied in section For the selection process we have to take into account that both w i and x i are represented in redundant carry{save form, but due to the overlap between the selection intervals, we can take an estimation of these values to obtain i. Assume that we assimilate w i up to the t fractional bit, and x i up to the fractional bit. We call the assimilated values ^w i and ^x i respectively. Therefore we can write that: ^w i w i < ^w i + 2?t+1 and ^x i x i < ^x i + 2?+1 (38) Now we have to obtain relations between t and that assure the convergence of the algorithm, that is, the conditions that assure a correct selection of i. We follow Figure 5 to obtain the suitable values for t and. To make a correct selection of i from an estimation of x i (^x i ), the overlap ( q [^x i ]) we have to consider between the intervals q and q? 1 is: q [^x i ] = U q?1 [^x i ]? L q [^x i + 2?+1 ] (39) 11

14 The value of q [^x i ] is the worst case overlap, only dependent on the value of the estimation of x i. In this way the selection only depends on the assimilated value of x i and not on the true value. On the other hand to make a correct selection using an estimation of w i, it is necessary that the overlap between the intervals ( q [^x i ]) be greater than 2?t (this is the same case as the radix{4 SRT{division [10]). Therefore, the selection with estimations will be correct if the following condition is satised: q [^x i ] 2?t (40) From condition (40) and taking into account equations (7), (8) and (39), we obtain: ( + q? 1) ^x i? (q? ) (^x i + 2?+1 ) 2?t (41) The worst case condition is obtained for the greatest allowable value for q, that is q = a = 3. Then we obtain a new expression: (2? 1) ^x i? 2 2?+1 2?t (42) The values of and t are constrained to the values of and ^x i. To obtain and t independent of the value of x i, we must consider the worst case in expression (42), that is, we have to take ^x i as the minimum possible value for x i. We assume that x 0 or y 0 are normalized in the range [0:5; 1), and then x 1 0:5 [5]. The minimum value of x i is 0.5 since x i+1 x i x 1 [5]. Then we take ^x i = 0:5 in expression (42). The parameter can take values 2=3 or 1. From condition (42) we obtain suitable values for and t for both cases, = 2=3 and = 1: a) = 2=3: For this case a = 2 and i 2 f?2;?1; 0; +1; +2g. As in the CORDIC iteration (see equation (6)) i multiplies the value of x i and w i. This digit set is very interesting since all digits are powers of two, and the multiplication by i can be done only by shifting. We obtain the fact that suitable values are = 5 and t = 5 (actually t = 4 could be used, but the resulting digit selection would be very complex). b) = 1: In this case a = 3 and then i 2 f?3;?2;?1; 0; +1; +2; +3g. The introduction of the value 3 in the digit set implies that an additional adder must be incorporated to make the multiplication by 3. For this case we obtain = 4 and t = 3 (as in the previous case, t = 2 could be used but this would result in a complex selection function). As the selection of i is in the critical path of our architecture, we make design decisions in order to achieve a reduced critical path time at the cost of more silicon area, so we take = 1 which leads to a less complex selection than the case = 2=3, since the values of and t are lower in case b (note that for digit selection table the number of input bits is critical). We have determined the number of fractional bits of x i and w i that must be assimilated ( and t). Now we have to determine the number of integer bits to assimilate of both operands, to obtain the total number of bits to be assimilated. 12

15 The maximum value of x i (x max ) is given by [5] x max = K max (x w2 0 )1=2 where K max is the maximum value of the scale factor. The maximum value of the scale factor depends on the value of a. If the scale factor is too large the range of x is also large and then more integer bits must be assimilated. In order to reduce the value of K max we make the microrotation i = 0 as a radix-2 microrotation with i 2 f?1; +1g. We can do that since we assume that the maximum angle to be computed is within the interval [?=2; +=2]. It can be easily demonstrated that making the microrotation i = 0 as radix{2, this range is covered. In this way, according to equations (2) and (4) and taking the maximun value for i in every iteration (that is 0 = 1 and i = 38i 1), we obtain a scale factor of K max = 1:80068, and taking into account the expression for x max, we obtain x max = 2:55. As the range for the angle is [?=2; +=2], the value of x i is always positive. Then we have to assimilate six bits of x i in each iteration, two integer bits (without sign) and four fractional bits. The maximum value for w i (w max ) is easy to obtain, taking into account that jw i j < 4 x i. As we have selected = 1, w max = 4 x max = 10:2. Then we have to assimilate eight bits, ve integer bits and three fractional bits of w Size of the look-up table The selection of i is done by implementing a selection function (usually a look{up table) whose inputs are the assimilated bits of w i and x i. In this way the look{up table will have a total of 14 input bits. This seems very large and look{up operation would be too slow. To reduce the complexity of the look{up table, we make use of the scaling technique. This technique has been widely used for division [10], and consists of the scaling of the dividend and the divisor such that the scaled divisor is within a certain range. The scaling does not aect the result since the quotient only depends on the ratio between the dividend and the divisor, and scaling does not aect this ratio. This idea can be applied to our CORDIC algorithm, that is the scaling of w i and x i does not aect the angle to be computed. To make the implementation simpler we perform the scaling over the assimilated values, and not over the full length words of w i and x i. We propose scaling the value of ^x i to have the scaled value in the interval [0:5; 1). As the range of ^x i is [0:5; 2:55) the scaling operation involves only right shifts, and then the scaling does not aect the result, since the assimilation error is also reduced. The scaling operation is very simple: if ^x i 2 [0:5; 1) then no scaling is performed, if ^x i 2 [1; 2) then a right shift is performed, and if ^x i 2 [2; 2:55), two right shifts are carried out. The scaling also aects ^w i, and the scaled value is within the range (?4:5; 4:25), We only have to consider 3 bits of ^x i (since ^x i 0:5 the bit with weight 0.5 is always one) and 7 bits of ^w i. Following a similar procedure to the case of the radix{4 SRT{division [10] we have obtained the selection function for i. In Table 1 we show the selection function to be implemented by means of a look{up table. As we can 13

16 ^wi 4 ^xi i =?3 i =?2 i =?1 i = 0 i = 1 i = 2 i = 3 [0:5000; 0:5625) [?11;?8] [?7;?6] [?5;?3] [?2; 0] [1; 2] [3; 4] [5; 10] [0:5625; 0:6250) [?12;?8] [?7;?6] [?5;?3] [?2; 0] [1; 2] [3; 5] [6; 11] [0:6250; 0:6875) [?13;?8] [?7;?6] [?5;?3] [?2; 0] [1; 2] [3; 5] [6; 12] [0:6875; 0:7500) [?14;?8] [?7;?6] [?5;?3] [?2; 0] [1; 4] [5; 6] [7; 13] [0:7500; 0:8125) [?15;?8] [?7;?6] [?5;?3] [?2; 0] [1; 4] [5; 6] [7; 14] [0:8125; 0:8750) [?16;?8] [?7;?6] [?5;?3] [?2; 0] [1; 4] [5; 8] [9; 15] [0:8750; 0:9375) [?17;?13] [?12;?6] [?5;?3] [?2; 0] [1; 4] [5; 8] [9; 16] [0:9375; 1:0000) [?18;?13] [?12;?6] [?5;?3] [?2; 0] [1; 4] [5; 8] [9; 17] Table 1: Selection table for i. see in this table, the least signicant bit of the scaled value of ^w i does not aect the selection, and therefore the look{up table will have 9 input bits, 3 corresponding to ^x i, and 6 corresponding to ^w i, and 3 output bits The look-up table method in non-redundant arithmetic The non-redundant arithmetic case can be seen as a simplication of the redundant arithmetic case. Now, the maximun assimilation error in coordinates x and w is 2?t and 2? respectively, and expression (38) becomes: ^w i w i < ^w i + 2?t and ^x i x i < ^x i + 2? (43) In this case, the overlap is q [^x i ] 0 and condition (42) becomes: 2? (2? 1) ^x i (44) 2 In this case the best option is to choose = 2=3 since we avoid the adders needed to work with the value i = 3, and the size of the table obtained is 9 input bits (6 bits for w and 3 bits for x), which is similar to the redundant case with = 1. 4 Scale Factor If we are interested in the magnitude of the vector, it is necessary to compensate the scale factor. We use the same technique that appears in [8] to solve the nonconstant scale factor problem. Basically, after the rst n=8 + 1 microrotations, we access a table in order to take the value of the scale factor, and in the next iterations (n=8 + 1 i n=4) we calculate the scale factor by a shift and operation in each iteration, since for these iterations the scale factor k i = q i 4?2i generated, may be approximated by the two rst terms of the Taylor series expansion (k i 1 + 1=2 2 i 4?2i ). Finally, we perform the division by K (K = Q k i ) using the radix{4 CORDIC algorithm in the vectoring mode 14

17 and linear coordinates (this is a conventional radix{4 division). The radix{4 CORDIC equations in linear coordinates are the following: x i+1 = x i w i+1 = 4 (w i? i x i ) (45) z i+1 = z i + i 4?i Performing the same analysis as in Section 2 we can test the convergence of the algorithm in linear coordinates. Also, the same selection functions given in Section 3 can be used. After n/2 iterations, we obtain the following value over coordinate z: z n=2 = z 0 + w 0 x 0 (46) Therefore, performing a suitable selection of the coordinates, we can carry out the division. In the next section we analyze the hardware requirements for calculating and compensating the scale factor. 5 Architectures In this section we obtain dierent architectures that implement the radix{4 CORDIC algorithm in the vectoring mode. There are dierent architectures that may implement the algorithm: redundant or non{redundant arithmetic; selection with arithmetic comparisons or selection by table; word{serial or pipelined. We illustrate the implementation of the radix{4 vectoring algorithm with two of these architectures. First, we consider a word-serial architecture using selection by the arithmetic comparison method in conventional arithmetic. Then we present a word{serial architecture using selection by look-up table in redundant arithmetic. Finally, some comments are given for the implementation in a pipelined architecture. 5.1 Architecture for the arithmetic comparison method In this subsection we develop a word-serial architecture in redundant arithmetic and selection by arithmetic comparisons. First, we design the hardware to implement the selection function (35) where the zero{skipping technique is incorporated. Then, we design the data path, explaining the dierent operations that are carried out over this architecture Implementation of the selection function with the zero{skipping technique If a microrotation obtains i = 0, then x i+1 = x i and w i+1 = 4w i (see equations (6)) and the value of i+1 can be obtained directly from w i (note that 4w i is no more than a shift of w i ). Thus, if in a microrotation we obtain, in parallel, 15

18 coecients i and i+1 and the rst is zero, it is not necessary to carry out microrotation i because the rotation angle is zero, and we can directly proceed to microrotation i + 1. In this way, we skip iteration i reducing the total number of microrotations. This technique is called the zero{skipping technique and it was initially developed for the division algorithm [11]. In [8] we used this technique for the rotation mode. In that paper we conclude that a reduction of about 20% in the total number of iterations can be achieved. In Figure 6 we present the hardware implementation of the selection functions incorporating the zero{skipping technique. Registers P 1 and P 2 keep the values of the comparison points ^Pi (1) and ^Pi (2). In order to be able to apply the zero{skipping technique, we must carry out a double comparison of the comparison points. On one hand, we use two 7 bit comparators (basically the necessary hardware for generating the carry c 7 of a 7-CLA) for comparing the 8 MSBs of w i to the comparison points in order to obtain i ; on the other hand, we need a twin architecture (indicated with dotted lines in Figure 6) that carries out the comparison with the 8 bits that follow the two most signicant of w i for obtaining i+1 (if i = 0, w i+1 = 4w i and the two MSBs of w i are zero). The Control Logic block found in the Figure 6 generates the value of i from the analysis of signal c 7 of each comparator and the sign of w i. Also, the skip is activated when i = 0. For low values of n this technique may be not ecient enough. In this case, we eliminate the hardware indicated with dotted lines in Figure Design of a data path In this section we analyze in detail a possible word{serial architecture in non{ redundant arithmetic. Figure 7 shows the architecture of paths x, w and z. The realization of the vectoring mode is carried out by programming the data paths as a function of the iteration we are in. The basic processes are the calculation of the comparison points, the realization of the radix{4 CORDIC iterations in circular coordinates (equations (6)), the calculation of the scale factor, and nally, its compensation (radix{4 CORDIC in linear coordinates). Some of these processes are carried out in parallel, and we now describe how the system works as a function of the iteration. Module A in Figure 7 corresponds to the hardware to implement the selection function with the zero{skipping technique, which is shown in detail in Figure 6. Table 2 may help in the description we now present, and reects the operation mode of the paths x, w and z, the function carried out by each path and the operation it performs in module A for each of the iterations. 1. Iterations i=0 to i=n/8 The main processes carried out in these iterations are the calculation of the comparison points, the processing of the corresponding radix{4 CORDIC iterations and the generation of the address in the scale factor table corresponding to the angle to be rotated. 16

19 Data path operation mode Function Module A i x w and z x w z operation 0 Eval. P i (2) { P 0 (2) = 3=2 x 0 { { Load P1,P2 0 CORDIC CORDIC x 1 = x w 0 w 1 = 4(w 0? 0 x 0 ) z 1 = z ( 0 ) Comput. 0 Circular Circular 1 Eval. P i (2) { P 1 (2) = 3=2 x 1 { { Load P1,P2 1 CORDIC CORDIC x 2 = x ?2 w 1 w 2 = 4(w 1? 1 x 1 ) z 2 = z ( 1 ) Comput. 1 Circular Circular 2 Eval. P i (2) { P 2 (2) = 3=2 x 2 { { Load P1,P2 2 to CORDIC CORDIC x i+1 = x i + i 4?2i w i w i+1 = 4(w i? i x i ) z i+1 = z i + i ( i ) Comput. i n/8 Circular Circular n/8+1 CORDIC CORDIC x i+1 = x i + i 4?2i w i w i+1 = 4(w i? i x i ) z i+1 = z i + i ( i ) Comput. i to n/4 Circular Circular Store j = i n/4+1 Compute CORDIC k i+1 = k i (1+ w i+1 = 4(w i? i x n=4 ) z i+1 = z i + i ( i ) Comput. i to n/2 Scale factor Circular +j 22?4j?1 ), j=i?n=8 n/2 to CORDIC CORDIC x r+1 = x r w r+1 = 4(w r? r K) z r+1 = z r + r4?r Comput. r to n** Linear Linear r=i-n/2 * j=i-n/8 ** Only for scale factor compensation purpose Table 2: Operation mode and function of the paths x; w; z and Module A The evaluation of the comparison points ^P 0 (1); ^P 0 (2); ^P 1 (1); ^P 1 (2) and ^P 2 (2) is performed by means of specic purpose iterations using data path x. In the rst, third and fth iterations, we program this path for the evaluation of comparison point P i (2) (Eval. P i (2) mode in Table 2). In the second, fourth and from the sixth iteration on, the hardware is programmed for obtaining the radix{4 CORDIC equations in circular coordinates (CORDIC circular mode in Table 2), that is, equations (6) are evaluated. MUX-1 allows obtaining the value 3=2 x i (leftmost input), and together with MUX-2,3,4 permits selecting the appropriate input for supporting i = 1; 2 or i+1 = 1; 2 if a zero skip takes place. As we can see in Table 2, it is not necessary to calculate P i after i = 2 since the comparison point obtained in iteration i = 2 is valid as comparison point for the remaining iterations. The i values obtained in these rst iterations are also used for addressing the table that stores the dierent scale factors (see Section 4). 2. Iterations i=n/8+1 to i=n/4 During these iterations, paths x, w and z obtain equations (6) (radix{4 CORDIC in circular coordinates). Module A calculates i ( i+1 if a zero{ skipping occurs) and puts the values j i j in a shift register, necessary for the calculation of the scale factor in later iterations. 3. Iterations i=n/4+1 to i=n/2 The main processing carried out during this period is the calculation of the scale factor (over data path x) and the ending of the angle computation (over data path z). Taking into account that the maximum value obtained experimentally for w i is 5.3, we have that i 4?2i w i < 2?n+1 if i n= Therefore we obtain, from (6), that x i+1 x i for the precision considered. Therefore it is not necessary to use path x from this iteration on. 17

20 Now, path x is free, and it is used to evaluate the scale factor (Compute Scale factor mode in Table 2). To preserve the value x n=4 obtained in the previous iterations we use the auxiliary register RX' in Figure 7. The scale factor produced by the rst n/8+1 rotations is obtained from the scale factor table, and it is initially loaded onto register RX (see Figure 7 ). From now on, the approximation k i = 1 + 1=2 2 i 4?2i [8] is carried out over path x (see Section 4). At the end of these iterations we have obtained the value of the scale factor K, which is in register RX, and the value of the nal rotated angle, which is in register RZ. Just in that moment, we have obtained one of the two results generated by the CORDIC algorithm: the value of the angle (argument of the initial vector). In applications that do not require the evaluation of the magnitude of the vector (for example, angle calculation and rotation [5]) the scale factor table and the process for the calculation of the scale factor carried out in data path x are not necessary. 4. Iterations i=n/2+1 to i=n These iterations have the aim of compensating the scale factor in those applications that require it. In order to do this, we program paths x, w and z for performing the radix{4 CORDIC algorithm in the vectoring mode in linear coordinates (CORDIC linear mode in Table 2, see also radix{4 CORDIC equations in linear coordinates (45)). Registers RW and RX were loaded with x n=4 and K respectively in the iteration i = n=2. The value 4?i of the equation in z of (45) is directly obtained from the angle table as follows: from iteration i n=6 we have that tan?1 ( i 4?i ) i 4?i, and thus the values 4?i with i n=6 are already present in the table; we only have to add to the primitive angle table, the values 4?i with i < n=6 (for example, for n=32 we would have to add six values: 4 0 ; 4?1,...,4?5 ). Consequently, after iteration i = n we obtain in RZ the value of the magnitude of the initial vector (see equation (46)). 5.2 Architecture for the look-up table method Figure 9 shows the architecture that implements the radix{4 algorithm using this method using redundant carry{save arithmetic. The considerations to calculate and compensate the scale factor are similar to those of the previous subsection, and they have been skipped to make the understanding of the architecture easier. Since for the selection with table we have considered that i may take value 3, two 4{to{2 adders are needed to perform the products 3 x i and 3 w i. These values will be needed if j i j = 3. Two word multiplexers permits to compute j i j x i and j i j w i. Finally two 4{to{2 adder/subtracters permit obtaining x i+1 and w i+1. 18

21 Figure 8 shows the block diagram of the digit selection network which is in charge of the selection of i. Six bits of x i and 8 bits of w i are assimilated. From the two most signicant bits of ^x i we obtain the suitable shift to make the scaling. By means of multiplexers we perform the scaling in both ^x i and ^w i. In the output of the multiplexers we obtain the inputs to the look{up table. The table has 9 input bits and three output bits. A look{up table is needed to store the microrotation angles, a multiplexer to select the suitable value for the microrotation angle depending on the value of i, and nally the 3{to{2 adder/subtractor performs the iteration to obtain z i+1. We use a 3-to-2 adder/subtractor since the microrotation angle has a non{redundant representation. 5.3 Pipelined architectures For a pipeline the crucial points are the hardware cost, the latency and the throughput (related to the cycle time). For the radix{4 CORDIC vectoring the selection of i should be implemented in each of the iterations. As we have seen in previous sections the selection (using arithmetic comparision or table) is more complex (in time and in hardware) than in the case of the radix{2 algorithm. The problems with the pipeline implementation are two-fold: 1. Since in each microrotation the shift to be performed is given, hardwired shifts are performed. Thus in this architecture there is no overlap between the digit selection and the shift operation, and the complex digit selection is fully in the critical path. 2. The replication of the hardware associated with each microrotation. This implies replicating the hardware for digit selection. Furthermore, in the case of the selection with table, two additional adders are needed in each microrotation to perform the multiplication by 3. Due to these factors, it seems that a full radix{4 pipelined CORDIC vectoring needs additional research to be ecient. 6 Evaluation and comparison In this section we compare the word{serial architectures proposed in the previous sections with each other and with the architectures proposed in [5] and [9] in the case of using redundant arithmetic, and with a conventional radix{2 architecture (with w iterations) when non{redundant arithmetic is used. To carry out the evaluation we will express the delay of each hardware element in terms of the delay and area of one full adder (t f a and a f a ). We have used the reference technology and Library (ES2-ECPD10 Standard Cells Library, 1m double metal CMOS [12]) for the hardware elements which do not 19

22 Element delay (tfa) Area (afa) Buer tbuf {to{1 mux t 2?1mux n 3{to{1 mux t 3?1mux n 4{to{1 mux t 4?1mux 0.5 n 6{to{1 mux t 6?1mux n Register treg 1.5 n 3{to{2 csa t 3?2csa 1 n 4{to{2 csa t 4?2csa n 7-CLA* t 7?CLA Ripple adder tripple 0.83n n Constant width carry skip** tcwcs 13(15) 45(87) CLA tcla?n dlog 2 (n)e 2n Barrel shifter r levels tbs 0:5 dlog 2 (r)e n log 2 r * Only the logic to obtain c 7 ** Values for n=32 and n=64 (this last between brackets) Table 3: Delays assumed for hardware elements have recognized delays in terms of the delay of one full adder. The delays and areas assumed for the dierent hardware elements are showed in Table 3. For the 4{to{2 carry{save adder/subtracter we have assumed the implementation given in [13], which possesses the same delay as a 4{to{2 carry{save adder. We have taken into account the delay introduced by the buers, which are necessary for control signals that are heavily loaded. We would like to emphasize that a true comparison between dierent implementations is possible only if actual implementation is considered and logic level simulations are carried out. Therefore, we present a rough, rst order approximation comparison based on Table 3. Nevertheless, we claim that it can express the general trend between dierent designs. In [5] a radix-2 CORDIC architecture is proposed with on-line redundant arithmetic, but the word{serial architecture is also considered. In that work the i values can be f0; 1g, resulting in a non-constant scale factor. This architecture is specially interesting when we are only interested in calculating the angle. In [9] a radix{2 CORDIC architecture in redundant arithmetic is proposed with a constant scale factor. In this case, the most signicant bits of w are used to estime the i values (truncating t fractional bits). Due to the estimation error it is necessary to repeat some microrrotation to assure convergence. To be exact, it is necesary to repeat one microrotation every t? 1 iterations during the rst n=2 microrrotation. From i > n=2, the value i = 0 is allowed, and only one repetition is necessary. The conventional radix{2 architecture with non{redundant arithmetic can be found in [3]. Basically, the word{serial version of the three architectures has the same 20

DIVISION BY DIGIT RECURRENCE

DIVISION BY DIGIT RECURRENCE DIVISION BY DIGIT RECURRENCE 1 SEVERAL DIVISION METHODS: DIGIT-RECURRENCE METHOD studied in this chapter MULTIPLICATIVE METHOD (Chapter 7) VARIOUS APPROXIMATION METHODS (power series expansion), SPECIAL

More information

Hardware Design I Chap. 4 Representative combinational logic

Hardware Design I Chap. 4 Representative combinational logic Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload

More information

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 9. Datapath Design Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 2, 2017 ECE Department, University of Texas at Austin

More information

A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS *

A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS * Copyright IEEE 999: Published in the Proceedings of Globecom 999, Rio de Janeiro, Dec 5-9, 999 A HIGH-SPEED PROCESSOR FOR RECTAGULAR-TO-POLAR COVERSIO WITH APPLICATIOS I DIGITAL COMMUICATIOS * Dengwei

More information

Chapter 6: Solutions to Exercises

Chapter 6: Solutions to Exercises 1 DIGITAL ARITHMETIC Miloš D. Ercegovac and Tomás Lang Morgan Kaufmann Publishers, an imprint of Elsevier Science, c 00 Updated: September 3, 003 With contributions by Elisardo Antelo and Fabrizio Lamberti

More information

Svoboda-Tung Division With No Compensation

Svoboda-Tung Division With No Compensation Svoboda-Tung Division With No Compensation Luis MONTALVO (IEEE Student Member), Alain GUYOT Integrated Systems Design Group, TIMA/INPG 46, Av. Félix Viallet, 38031 Grenoble Cedex, France. E-mail: montalvo@archi.imag.fr

More information

Computer Architecture 10. Fast Adders

Computer Architecture 10. Fast Adders Computer Architecture 10 Fast s Ma d e wi t h Op e n Of f i c e. o r g 1 Carry Problem Addition is primary mechanism in implementing arithmetic operations Slow addition directly affects the total performance

More information

Lecture 11. Advanced Dividers

Lecture 11. Advanced Dividers Lecture 11 Advanced Dividers Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 15 Variation in Dividers 15.3, Combinational and Array Dividers Chapter 16, Division

More information

Chapter 5: Solutions to Exercises

Chapter 5: Solutions to Exercises 1 DIGITAL ARITHMETIC Miloš D. Ercegovac and Tomás Lang Morgan Kaufmann Publishers, an imprint of Elsevier Science, c 2004 Updated: September 9, 2003 Chapter 5: Solutions to Selected Exercises With contributions

More information

A Hardware-Oriented Method for Evaluating Complex Polynomials

A Hardware-Oriented Method for Evaluating Complex Polynomials A Hardware-Oriented Method for Evaluating Complex Polynomials Miloš D Ercegovac Computer Science Department University of California at Los Angeles Los Angeles, CA 90095, USA milos@csuclaedu Jean-Michel

More information

Part VI Function Evaluation

Part VI Function Evaluation Part VI Function Evaluation Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations

More information

Essentials of Intermediate Algebra

Essentials of Intermediate Algebra Essentials of Intermediate Algebra BY Tom K. Kim, Ph.D. Peninsula College, WA Randy Anderson, M.S. Peninsula College, WA 9/24/2012 Contents 1 Review 1 2 Rules of Exponents 2 2.1 Multiplying Two Exponentials

More information

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proceedings of the 11th WSEAS International Conference on COMPUTERS, Agios Nikolaos, Crete Island, Greece, July 6-8, 007 653 Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

More information

Complex Logarithmic Number System Arithmetic Using High-Radix Redundant CORDIC Algorithms

Complex Logarithmic Number System Arithmetic Using High-Radix Redundant CORDIC Algorithms Complex Logarithmic Number System Arithmetic Using High-Radix Redundant CORDIC Algorithms David Lewis Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S

More information

Cost/Performance Tradeoff of n-select Square Root Implementations

Cost/Performance Tradeoff of n-select Square Root Implementations Australian Computer Science Communications, Vol.22, No.4, 2, pp.9 6, IEEE Comp. Society Press Cost/Performance Tradeoff of n-select Square Root Implementations Wanming Chu and Yamin Li Computer Architecture

More information

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters April 15, 2010 John Wawrzynek 1 Multiplication a 3 a 2 a 1 a 0 Multiplicand b 3 b 2 b 1 b 0 Multiplier X a 3 b 0 a 2 b 0 a 1 b

More information

Chapter 5 Arithmetic Circuits

Chapter 5 Arithmetic Circuits Chapter 5 Arithmetic Circuits SKEE2263 Digital Systems Mun im/ismahani/izam {munim@utm.my,e-izam@utm.my,ismahani@fke.utm.my} February 11, 2016 Table of Contents 1 Iterative Designs 2 Adders 3 High-Speed

More information

On-Line Hardware Implementation for Complex Exponential and Logarithm

On-Line Hardware Implementation for Complex Exponential and Logarithm On-Line Hardware Implementation for Complex Exponential and Logarithm Ali SKAF, Jean-Michel MULLER * and Alain GUYOT Laboratoire TIMA / INPG - 46, Av. Félix Viallet, 3831 Grenoble Cedex * Laboratoire LIP

More information

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor Proposal to Improve Data Format Conversions for a Hybrid Number System Processor LUCIAN JURCA, DANIEL-IOAN CURIAC, AUREL GONTEAN, FLORIN ALEXA Department of Applied Electronics, Department of Automation

More information

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS ARITHMETIC COMBINATIONAL MODULES AND NETWORKS 1 SPECIFICATION OF ADDER MODULES FOR POSITIVE INTEGERS HALF-ADDER AND FULL-ADDER MODULES CARRY-RIPPLE AND CARRY-LOOKAHEAD ADDER MODULES NETWORKS OF ADDER MODULES

More information

CS 140 Lecture 14 Standard Combinational Modules

CS 140 Lecture 14 Standard Combinational Modules CS 14 Lecture 14 Standard Combinational Modules Professor CK Cheng CSE Dept. UC San Diego Some slides from Harris and Harris 1 Part III. Standard Modules A. Interconnect B. Operators. Adders Multiplier

More information

1 Solutions to selected problems

1 Solutions to selected problems Solutions to selected problems Section., #a,c,d. a. p x = n for i = n : 0 p x = xp x + i end b. z = x, y = x for i = : n y = y + x i z = zy end c. y = (t x ), p t = a for i = : n y = y(t x i ) p t = p

More information

ALU (3) - Division Algorithms

ALU (3) - Division Algorithms HUMBOLDT-UNIVERSITÄT ZU BERLIN INSTITUT FÜR INFORMATIK Lecture 12 ALU (3) - Division Algorithms Sommersemester 2002 Leitung: Prof. Dr. Miroslaw Malek www.informatik.hu-berlin.de/rok/ca CA - XII - ALU(3)

More information

1 Solutions to selected problems

1 Solutions to selected problems Solutions to selected problems Section., #a,c,d. a. p x = n for i = n : 0 p x = xp x + i end b. z = x, y = x for i = : n y = y + x i z = zy end c. y = (t x ), p t = a for i = : n y = y(t x i ) p t = p

More information

Notes on arithmetic. 1. Representation in base B

Notes on arithmetic. 1. Representation in base B Notes on arithmetic The Babylonians that is to say, the people that inhabited what is now southern Iraq for reasons not entirely clear to us, ued base 60 in scientific calculation. This offers us an excuse

More information

Chapter 2 Basic Arithmetic Circuits

Chapter 2 Basic Arithmetic Circuits Chapter 2 Basic Arithmetic Circuits This chapter is devoted to the description of simple circuits for the implementation of some of the arithmetic operations presented in Chap. 1. Specifically, the design

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Instructor: Mohsen Imani UC San Diego Slides from: Prof.Tajana Simunic Rosing

More information

Graduate Institute of Electronics Engineering, NTU Basic Division Scheme

Graduate Institute of Electronics Engineering, NTU Basic Division Scheme Basic Division Scheme 台灣大學電子所吳安宇博士 2002 ACCESS IC LAB Outline Shift/subtract division algorithm. Programmed division. Restoring hardware dividers. Nonstoring and signed division. Radix-2 SRT divisioin.

More information

Lecture 8: Sequential Multipliers

Lecture 8: Sequential Multipliers Lecture 8: Sequential Multipliers ECE 645 Computer Arithmetic 3/25/08 ECE 645 Computer Arithmetic Lecture Roadmap Sequential Multipliers Unsigned Signed Radix-2 Booth Recoding High-Radix Multiplication

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic

More information

Part II Addition / Subtraction

Part II Addition / Subtraction Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations

More information

Carry Look Ahead Adders

Carry Look Ahead Adders Carry Look Ahead Adders Lesson Objectives: The objectives of this lesson are to learn about: 1. Carry Look Ahead Adder circuit. 2. Binary Parallel Adder/Subtractor circuit. 3. BCD adder circuit. 4. Binary

More information

CORDIC, Divider, Square Root

CORDIC, Divider, Square Root 4// EE6B: VLSI Signal Processing CORDIC, Divider, Square Root Prof. Dejan Marković ee6b@gmail.com Iterative algorithms CORDIC Division Square root Lecture Overview Topics covered include Algorithms and

More information

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers*

The goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers* Quantum Algorithm Processor For Finding Exact Divisors Professor J R Burger Summary Wiring diagrams are given for a quantum algorithm processor in CMOS to compute, in parallel, all divisors of an n-bit

More information

Binary addition example worked out

Binary addition example worked out Binary addition example worked out Some terms are given here Exercise: what are these numbers equivalent to in decimal? The initial carry in is implicitly 0 1 1 1 0 (Carries) 1 0 1 1 (Augend) + 1 1 1 0

More information

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding Binary Multipliers The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 2 2 4 6 8 2 4 6 8 3 3 6 9 2 5 8 2 24 27 4 4 8 2 6

More information

Chapter 1: Solutions to Exercises

Chapter 1: Solutions to Exercises 1 DIGITAL ARITHMETIC Miloš D. Ercegovac and Tomás Lang Morgan Kaufmann Publishers, an imprint of Elsevier, c 2004 Exercise 1.1 (a) 1. 9 bits since 2 8 297 2 9 2. 3 radix-8 digits since 8 2 297 8 3 3. 3

More information

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California

VLSI Arithmetic. Lecture 9: Carry-Save and Multi-Operand Addition. Prof. Vojin G. Oklobdzija University of California VLSI Arithmetic Lecture 9: Carry-Save and Multi-Operand Addition Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel Carry-Save Addition* *from Parhami 2 June 18, 2003 Carry-Save

More information

A 32-bit Decimal Floating-Point Logarithmic Converter

A 32-bit Decimal Floating-Point Logarithmic Converter A 3-bit Decimal Floating-Point Logarithmic Converter Dongdong Chen 1, Yu Zhang 1, Younhee Choi 1, Moon Ho Lee, Seok-Bum Ko 1, Department of Electrical and Computer Engineering, University of Saskatchewan

More information

ECE380 Digital Logic. Positional representation

ECE380 Digital Logic. Positional representation ECE380 Digital Logic Number Representation and Arithmetic Circuits: Number Representation and Unsigned Addition Dr. D. J. Jackson Lecture 16-1 Positional representation First consider integers Begin with

More information

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,

More information

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute DIGITAL TECHNICS Dr. Bálint Pődör Óbuda University, Microelectronics and Technology Institute 4. LECTURE: COMBINATIONAL LOGIC DESIGN: ARITHMETICS (THROUGH EXAMPLES) 2016/2017 COMBINATIONAL LOGIC DESIGN:

More information

1 Matrices and Systems of Linear Equations

1 Matrices and Systems of Linear Equations Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 207, v 260) Contents Matrices and Systems of Linear Equations Systems of Linear Equations Elimination, Matrix Formulation

More information

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1> Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building

More information

Research Article Implementation of Special Function Unit for Vertex Shader Processor Using Hybrid Number System

Research Article Implementation of Special Function Unit for Vertex Shader Processor Using Hybrid Number System Computer Networks and Communications, Article ID 890354, 7 pages http://dx.doi.org/10.1155/2014/890354 Research Article Implementation of Special Function Unit for Vertex Shader Processor Using Hybrid

More information

Linear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02)

Linear Algebra (part 1) : Matrices and Systems of Linear Equations (by Evan Dummit, 2016, v. 2.02) Linear Algebra (part ) : Matrices and Systems of Linear Equations (by Evan Dummit, 206, v 202) Contents 2 Matrices and Systems of Linear Equations 2 Systems of Linear Equations 2 Elimination, Matrix Formulation

More information

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing CSE4: Components and Design Techniques for Digital Systems Decoders, adders, comparators, multipliers and other ALU elements Tajana Simunic Rosing Mux, Demux Encoder, Decoder 2 Transmission Gate: Mux/Tristate

More information

Design of Sequential Circuits

Design of Sequential Circuits Design of Sequential Circuits Seven Steps: Construct a state diagram (showing contents of flip flop and inputs with next state) Assign letter variables to each flip flop and each input and output variable

More information

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

What s the Deal? MULTIPLICATION. Time to multiply

What s the Deal? MULTIPLICATION. Time to multiply What s the Deal? MULTIPLICATION Time to multiply Multiplying two numbers requires a multiply Luckily, in binary that s just an AND gate! 0*0=0, 0*1=0, 1*0=0, 1*1=1 Generate a bunch of partial products

More information

Tree and Array Multipliers Ivor Page 1

Tree and Array Multipliers Ivor Page 1 Tree and Array Multipliers 1 Tree and Array Multipliers Ivor Page 1 11.1 Tree Multipliers In Figure 1 seven input operands are combined by a tree of CSAs. The final level of the tree is a carry-completion

More information

3. Combinational Circuit Design

3. Combinational Circuit Design CSEE 3827: Fundamentals of Computer Systems, Spring 2 3. Combinational Circuit Design Prof. Martha Kim (martha@cs.columbia.edu) Web: http://www.cs.columbia.edu/~martha/courses/3827/sp/ Outline (H&H 2.8,

More information

Lecture 18: Datapath Functional Units

Lecture 18: Datapath Functional Units Lecture 8: Datapath Functional Unit Outline Comparator Shifter Multi-input Adder Multiplier 8: Datapath Functional Unit CMOS VLSI Deign 4th Ed. 2 Comparator 0 detector: A = 00 000 detector: A = Equality

More information

Residue Number Systems Ivor Page 1

Residue Number Systems Ivor Page 1 Residue Number Systems 1 Residue Number Systems Ivor Page 1 7.1 Arithmetic in a modulus system The great speed of arithmetic in Residue Number Systems (RNS) comes from a simple theorem from number theory:

More information

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v 250) Contents 2 Vector Spaces 1 21 Vectors in R n 1 22 The Formal Denition of a Vector Space 4 23 Subspaces 6 24 Linear Combinations and

More information

Introduction and mathematical preliminaries

Introduction and mathematical preliminaries Chapter Introduction and mathematical preliminaries Contents. Motivation..................................2 Finite-digit arithmetic.......................... 2.3 Errors in numerical calculations.....................

More information

DIGIT-SERIAL ARITHMETIC

DIGIT-SERIAL ARITHMETIC DIGIT-SERIAL ARITHMETIC 1 Modes of operation:lsdf and MSDF Algorithm and implementation models LSDF arithmetic MSDF: Online arithmetic TIMING PARAMETERS 2 radix-r number system: conventional and redundant

More information

Part II Addition / Subtraction

Part II Addition / Subtraction Part II Addition / Subtraction Parts Chapters I. Number Representation 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems Elementary Operations

More information

On-line Algorithms for Computing Exponentials and Logarithms. Asger Munk Nielsen. Dept. of Mathematics and Computer Science

On-line Algorithms for Computing Exponentials and Logarithms. Asger Munk Nielsen. Dept. of Mathematics and Computer Science On-line Algorithms for Computing Exponentials and Logarithms Asger Munk Nielsen Dept. of Mathematics and Computer Science Odense University, Denmark asger@imada.ou.dk Jean-Michel Muller Laboratoire LIP

More information

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1)) The Computer Journal, 47(1), The British Computer Society; all rights reserved A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form ( n ( p ± 1)) Ahmad A. Hiasat Electronics Engineering

More information

L8/9: Arithmetic Structures

L8/9: Arithmetic Structures L8/9: Arithmetic Structures Acknowledgements: Materials in this lecture are courtesy of the following sources and are used with permission. Rex Min Kevin Atkinson Prof. Randy Katz (Unified Microelectronics

More information

DIVIDER IMPLEMENTATION

DIVIDER IMPLEMENTATION c n = cn-= DAIL LLAOCCA CLab@OU DIVID IPLTATIO The division of two unsigned integer numbers A (where A is the dividend and the divisor), results in a quotient and a residue. These quantities are related

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Adders 2 Circuit Delay Transistors have instrinsic resistance and capacitance

More information

Area-Time Optimal Adder with Relative Placement Generator

Area-Time Optimal Adder with Relative Placement Generator Area-Time Optimal Adder with Relative Placement Generator Abstract: This paper presents the design of a generator, for the production of area-time-optimal adders. A unique feature of this generator is

More information

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits Logic and Computer Design Fundamentals Chapter 5 Arithmetic Functions and Circuits Arithmetic functions Operate on binary vectors Use the same subfunction in each bit position Can design functional block

More information

Remainders. We learned how to multiply and divide in elementary

Remainders. We learned how to multiply and divide in elementary Remainders We learned how to multiply and divide in elementary school. As adults we perform division mostly by pressing the key on a calculator. This key supplies the quotient. In numerical analysis and

More information

Gravitational potential energy *

Gravitational potential energy * OpenStax-CNX module: m15090 1 Gravitational potential energy * Sunil Kumar Singh This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 2.0 The concept of potential

More information

Stochastic dominance with imprecise information

Stochastic dominance with imprecise information Stochastic dominance with imprecise information Ignacio Montes, Enrique Miranda, Susana Montes University of Oviedo, Dep. of Statistics and Operations Research. Abstract Stochastic dominance, which is

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH

More information

SRT Division and the Pentium FDIV Bug (draft lecture notes, CSCI P415)

SRT Division and the Pentium FDIV Bug (draft lecture notes, CSCI P415) SRT Division and the Pentium FDIV Bug (draft lecture notes, CSCI P4) Steven D. Johnson September 0, 000 Abstract This talk explains the widely publicized design error in a 994 issue of the Intel Corp.

More information

Tunable Floating-Point for Energy Efficient Accelerators

Tunable Floating-Point for Energy Efficient Accelerators Tunable Floating-Point for Energy Efficient Accelerators Alberto Nannarelli DTU Compute, Technical University of Denmark 25 th IEEE Symposium on Computer Arithmetic A. Nannarelli (DTU Compute) Tunable

More information

Divisor matrices and magic sequences

Divisor matrices and magic sequences Discrete Mathematics 250 (2002) 125 135 www.elsevier.com/locate/disc Divisor matrices and magic sequences R.H. Jeurissen Mathematical Institute, University of Nijmegen, Toernooiveld, 6525 ED Nijmegen,

More information

REDUNDANT TRINOMIALS FOR FINITE FIELDS OF CHARACTERISTIC 2

REDUNDANT TRINOMIALS FOR FINITE FIELDS OF CHARACTERISTIC 2 REDUNDANT TRINOMIALS FOR FINITE FIELDS OF CHARACTERISTIC 2 CHRISTOPHE DOCHE Abstract. In this paper we introduce so-called redundant trinomials to represent elements of nite elds of characteristic 2. The

More information

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 VLSI Design Adder Design [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 Major Components of a Computer Processor Devices Control Memory Input Datapath

More information

`First Come, First Served' can be unstable! Thomas I. Seidman. Department of Mathematics and Statistics. University of Maryland Baltimore County

`First Come, First Served' can be unstable! Thomas I. Seidman. Department of Mathematics and Statistics. University of Maryland Baltimore County revision2: 9/4/'93 `First Come, First Served' can be unstable! Thomas I. Seidman Department of Mathematics and Statistics University of Maryland Baltimore County Baltimore, MD 21228, USA e-mail: hseidman@math.umbc.edui

More information

14:332:231 DIGITAL LOGIC DESIGN. Why Binary Number System?

14:332:231 DIGITAL LOGIC DESIGN. Why Binary Number System? :33:3 DIGITAL LOGIC DESIGN Ivan Marsic, Rutgers University Electrical & Computer Engineering Fall 3 Lecture #: Binary Number System Complement Number Representation X Y Why Binary Number System? Because

More information

DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman.

DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman. SP esign Lecture 7 Unfolding cont. & Folding r. Fredrik Edman fredrik.edman@eit.lth.se Unfolding Unfolding creates a program with more than one iteration, J=unfolding factor Unfolding is a structured way

More information

Number representation

Number representation Number representation A number can be represented in binary in many ways. The most common number types to be represented are: Integers, positive integers one-complement, two-complement, sign-magnitude

More information

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,

ISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10, A NOVEL DOMINO LOGIC DESIGN FOR EMBEDDED APPLICATION Dr.K.Sujatha Associate Professor, Department of Computer science and Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, Tamilnadu,

More information

Lecture 12: Datapath Functional Units

Lecture 12: Datapath Functional Units Lecture 2: Datapath Functional Unit Slide courtey of Deming Chen Slide baed on the initial et from David Harri CMOS VLSI Deign Outline Comparator Shifter Multi-input Adder Multiplier Reading:.3-4;.8-9

More information

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples

Overview. Arithmetic circuits. Binary half adder. Binary full adder. Last lecture PLDs ROMs Tristates Design examples Overview rithmetic circuits Last lecture PLDs ROMs Tristates Design examples Today dders Ripple-carry Carry-lookahead Carry-select The conclusion of combinational logic!!! General-purpose building blocks

More information

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19

More information

Roots of Unity, Cyclotomic Polynomials and Applications

Roots of Unity, Cyclotomic Polynomials and Applications Swiss Mathematical Olympiad smo osm Roots of Unity, Cyclotomic Polynomials and Applications The task to be done here is to give an introduction to the topics in the title. This paper is neither complete

More information

Combinational Logic Design Arithmetic Functions and Circuits

Combinational Logic Design Arithmetic Functions and Circuits Combinational Logic Design Arithmetic Functions and Circuits Overview Binary Addition Half Adder Full Adder Ripple Carry Adder Carry Look-ahead Adder Binary Subtraction Binary Subtractor Binary Adder-Subtractor

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH

More information

A VLSI Algorithm for Modular Multiplication/Division

A VLSI Algorithm for Modular Multiplication/Division A VLSI Algorithm for Modular Multiplication/Division Marcelo E. Kaihara and Naofumi Takagi Department of Information Engineering Nagoya University Nagoya, 464-8603, Japan mkaihara@takagi.nuie.nagoya-u.ac.jp

More information

14:332:231 DIGITAL LOGIC DESIGN. 2 s-complement Representation

14:332:231 DIGITAL LOGIC DESIGN. 2 s-complement Representation 4:332:23 DIGITAL LOGIC DESIGN Ivan Marsic, Rutgers University Electrical & Computer Engineering Fall 203 Lecture #3: Addition, Subtraction, Multiplication, and Division 2 s-complement Representation RECALL

More information

EECS150. Arithmetic Circuits

EECS150. Arithmetic Circuits EE5 ection 8 Arithmetic ircuits Fall 2 Arithmetic ircuits Excellent Examples of ombinational Logic Design Time vs. pace Trade-offs Doing things fast may require more logic and thus more space Example:

More information

UNSIGNED BINARY NUMBERS DIGITAL ELECTRONICS SYSTEM DESIGN WHAT ABOUT NEGATIVE NUMBERS? BINARY ADDITION 11/9/2018

UNSIGNED BINARY NUMBERS DIGITAL ELECTRONICS SYSTEM DESIGN WHAT ABOUT NEGATIVE NUMBERS? BINARY ADDITION 11/9/2018 DIGITAL ELECTRONICS SYSTEM DESIGN LL 2018 PROFS. IRIS BAHAR & ROD BERESFORD NOVEMBER 9, 2018 LECTURE 19: BINARY ADDITION, UNSIGNED BINARY NUMBERS For the binary number b n-1 b n-2 b 1 b 0. b -1 b -2 b

More information

EECS150 - Digital Design Lecture 21 - Design Blocks

EECS150 - Digital Design Lecture 21 - Design Blocks EECS150 - Digital Design Lecture 21 - Design Blocks April 3, 2012 John Wawrzynek Spring 2012 EECS150 - Lec21-db3 Page 1 Fixed Shifters / Rotators fixed shifters hardwire the shift amount into the circuit.

More information

Squared Error. > Steady State Error Iteration

Squared Error. > Steady State Error Iteration The Logarithmic Number System for Strength Reduction in Adaptive Filtering John R. Sacha and Mary Jane Irwin Computer Science and Engineering Department The Pennsylvania State University University Park,

More information

Multiplication Ivor Page 1

Multiplication Ivor Page 1 Multiplication 1 Multiplication Ivor Page 1 10.1 Shift/Add Multiplication Algorithms We will adopt the notation, a Multiplicand a k 1 a k 2 a 1 a 0 b Multiplier x k 1 x k 2 x 1 x 0 p Product p 2k 1 p 2k

More information

Upper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139

Upper and Lower Bounds on the Number of Faults. a System Can Withstand Without Repairs. Cambridge, MA 02139 Upper and Lower Bounds on the Number of Faults a System Can Withstand Without Repairs Michel Goemans y Nancy Lynch z Isaac Saias x Laboratory for Computer Science Massachusetts Institute of Technology

More information

Neri Merhav. and. Vasudev Bhaskaran. Abstract. A method is developed and proposed to eciently implement spatial domain ltering

Neri Merhav. and. Vasudev Bhaskaran. Abstract. A method is developed and proposed to eciently implement spatial domain ltering A Fast Algorithm for DCT Domain Filtering Neri Merhav HP Israel Science Center and Vasudev Bhaskaran Computer Systems Laboratory y Keywords: DCT domain ltering, data compression. Abstract A method is developed

More information

Combinational Logic. By : Ali Mustafa

Combinational Logic. By : Ali Mustafa Combinational Logic By : Ali Mustafa Contents Adder Subtractor Multiplier Comparator Decoder Encoder Multiplexer How to Analyze any combinational circuit like this? Analysis Procedure To obtain the output

More information

1 RN(1/y) Ulp Accurate, Monotonic

1 RN(1/y) Ulp Accurate, Monotonic URL: http://www.elsevier.nl/locate/entcs/volume24.html 29 pages Analysis of Reciprocal and Square Root Reciprocal Instructions in the AMD K6-2 Implementation of 3DNow! Cristina Iordache and David W. Matula

More information

A High-Speed Realization of Chinese Remainder Theorem

A High-Speed Realization of Chinese Remainder Theorem Proceedings of the 2007 WSEAS Int. Conference on Circuits, Systems, Signal and Telecommunications, Gold Coast, Australia, January 17-19, 2007 97 A High-Speed Realization of Chinese Remainder Theorem Shuangching

More information

Low Power and Low Complexity Shift-and-Add Based Computations

Low Power and Low Complexity Shift-and-Add Based Computations Linköping Studies in Science and Technology Dissertations, No. 2 Low Power and Low Complexity Shift-and-Add Based Computations Kenny Johansson Department of Electrical Engineering Linköping University,

More information