A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS *

Size: px

Start display at page:

Download "A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS *"

Randolf Dean
5 years ago
Views:

1 Copyright IEEE 999: Published in the Proceedings of Globecom 999, Rio de Janeiro, Dec 5-9, 999 A HIGH-SPEED PROCESSOR FOR RECTAGULAR-TO-POLAR COVERSIO WITH APPLICATIOS I DIGITAL COMMUICATIOS * Dengwei Fu and Alan Willson, Jr Electrical Engineering Department, UCLA dwf@icsluclaedu, willson@icsluclaedu Abstract - In many communication system implementations, such as M-ary PSK and FSK modems, DMT for ADSL, as well as carrier and clock synchronization, the efficient conversion from rectangular to polar coordinates is essential In this paper, we present a novel architecture for a circuit that computes the angle, as well as the magnitude, of a polar-form representation of a complex number, given the real and imaginary components of that number Introduction Conversions between rectangular and polar coordinates are quite common requirements in communication applications We have presented a novel approach in [] for fast and low-cost polar-to-rectangular conversion For a class of communication problems, such as constant-amplitude FSK and PSK modem implementations [], as well as carrier and clock synchronization [3], the computation of phase and magnitude from the rectangular coordinates is essential There are several well-known implementations for a rectangular to polar coordinate conversion, ie obtaining the magnitude and phase of a complex number One method uses a ROM lookup table with both the real and imaginary components as input This is practical only for low bit-accuracy requirements, as the ROM size grows exponentially with an increasing number of input bits To reduce the ROM size, we can first divide the imaginary by the real component, then use the quotient to index the lookup table But the hardware for a full-speed divider is very complicated and power consuming An iterative divider implemented using shifting and subtraction requires less hardware, but it is usually quite slow Recently, CORDIC has been applied in this coordinate conversion [] However, due to the sequential nature of CORDIC, it is difficult to pipeline, thus limiting the throughput rate In burst-mode communication systems, rapid carrier and clock synchronization is crucial [3] Therefore, a fast rectangular-to-polar conversion is desired In this paper, we present an algorithm that implements the angle computation for rectangular-to-polar conversion with low latency and low hardware cost * This research was supported by SF Grant MIP and by California Micro Grant 99-4 Y 0 Background and motivation 0 Fig displays a point in the Cartesian X-Y plane having coordinates ( X 0, Y 0 ) The angle ϕ can be computed as ϕ = tan ( Y 0 () In deriving the core of our algorithm, we assume the dividend and divisor satisfy X 0 Y 0 0 () We will discuss how to extend the result to arbitrary values in Section 5 To achieve the highest precision for given hardware, the inputs X 0 should be scaled such that X 0 < (3) A straightforward method for fast implementation of () can be devised as follows: ) Obtain the reciprocal of X 0 from a lookup table ) Compute Y 0 ( with a fast multiplier 3) Use this product to index an arctangent table for ϕ However, the size of the two tables grows exponentially with increased precision requirements on ϕ, and rather large tables would be required to achieve accurate results Therefore, for high-precision applications, such an implementation seems impractical If we approximate X 0 by the reciprocal of the most significant bits (MSBs) of X 0, denoted by [ X 0 ], then the required reciprocal table is much smaller We can then multiply the table output by Y 0 to yield Y 0 [ X 0 ], which is an approximation of Y 0 X 0 This quotient can then be used to index an arctangent table Similar to the reciprocal table, a much smaller arctangent table is needed if we use only the MSBs of Y 0 [ X 0 ], denoted by [ Y 0 [ X 0 ]], to address the table, which returns ϕ = tan ([ Y [ X ]]) Obviously, this 0 0 result is just an approximation to ϕ We will subsequently refer to the computation of ϕ as the coarse computation stage Let ϕ be the difference between ϕ and ϕ Using the trigonometric identity ϕ X 0 ( X 0, Y 0 ) Fig Cartesian to Polar conversion

2 tanϕ = tan( ϕ ϕ ) = ( tanϕ tanϕ ) ( tanϕ tanϕ )(4) and the definitions tanϕ = Y 0 X 0 and tanϕ = [ Y 0 [ X 0 ]], we have Y tanϕ 0 X 0 [ Y 0 [ X 0 ]] Y X 0 [ Y 0 [ X 0 ]] = = ( Y 0 [ Y 0 [ X 0 ]] X 0 Y 0 [ Y 0 [ X 0 ]] (5) Using this relationship, ϕ can be determined from [ Y 0 [ X 0 ]], the coarse computation results Therefore, the desired result ϕ can be obtained by adding the fine correction angle ϕ to the coarse approximation ϕ This procedure of finding ϕ will subsequently be referred to as the fine computation stage By partitioning the computation of () into two stages, the table size in the coarse stage can be reduced significantly at the expense of additional computations, which are handled by the fine stage Let us now examine the complexity of the fine stage To find ϕ using (5), we can first compute X = X 0 Y 0 [ Y 0 [ X 0 ]] (6) Y = Y 0 X 0 [ Y 0 [ X 0 ]] and then find ϕ as ϕ = tan ( Y X ) (7) The computation in (6) involves only adders and multipliers, while (7) requires lookup tables Moreover, it seems we can t use the same coarse-stage tables because they have low resolution and thus can t satisfy the high precision requirements for the fine angle ϕ ow let us analyze ϕ to see if there is any property that can help in this situation If ϕ is a good approximation of ϕ, then ϕ = ϕ ϕ is close to zero In view of (7), Y X should be very small too This property helps us in two respects: ) The difference between Y X and Y [ X ] is much smaller than that between X and [ X ] This suggests that if we use the same low resolution reciprocal table as in the coarse stage, the error contributed to the final result will be very small We will demonstrate this in the next section ) If Y X is sufficiently small to satisfy Y X = tanϕ < 3 (8) where denotes the desired number of bits in ϕ, then ϕ = tan ( Y X ) Y X (9) and we can compute ϕ without using an arctangent table This is explained as follows: From the Taylor expansion of tan ( Y X ) near Y X = 0, we obtain tan ( Y X ) = Y X ( Y X ) 3 3 O( ( Y X ) 5 ) (0) Since O( ( Y X ) 5 ) is negligible in comparison to ( Y X ) 3 3, it can be omitted Therefore, if Y X is used to approximate tan ( Y X ), an error tan = tan ( Y X ) Y X = ( Y X ) 3 3 () will occur However, according to (8), tan is bounded by tan < 3 () which is very small This indicates that the approximation (9) is quite accurate if (8) is satisfied From the above analysis, no additional tables are needed for the fine stage if ϕ is sufficiently close to ϕ On the other hand, the better that ϕ approximates ϕ, the larger the tables required for its computation become As mentioned previously, table size grows exponentially as the precision increases In view of these conflicting requirements, a good trade-off is obtained when the result ϕ of the coarse stage is just close enough to ϕ that (8) is satisfied, thereby eliminating the additional tables in the fine stage ext, a detailed analysis and description of the algorithm are provided 3 The Algorithm In this section we first analyze how the coarse approximation error ϕ = ϕ ϕ depends upon the precision of the tables, in order to determine the amount of hardware that must be allocated to the coarse stage ext we explore ways to simplify the computations in the fine stage A Simplification in the coarse computation stage The main concern in the coarse stage design is how the lookup table values are generated to produce as precise results as possible for a given table size As mentioned previously, there are two lookup tables: ) The reciprocal table: The input to this table, X 0 <, can be expressed as X 0 = x x x i x (3) where only bits x through x i are used to index the table To generate the table value, if we merely truncate X 0 as [ X 0 ] = x x x i (4) then the quantization error X0 = X 0 [ X 0 ] is bounded by 0 < X0 < i (5) Thus, the difference between the table value and X 0, X 0 [ X 0 ] = ([ X 0 ] X 0 ) ([ X 0 ]X 0 ) X0 X 0 (6) is bounded by i < X 0 [ X 0 ] 0 (7) But if we generate the table value corresponding to [ X 0 ] = x x x i (8) with a bit appended as the LSB, then the quantization error in (5) is centered around zero: i < X0 < i (9) hence, the error in the reciprocal is also centered around zero: i < X 0 [ X 0 ] i (0)

3 Comparing (0) to (7), the maximum absolute error is reduced, without having to increase the table size This is the technique introduced in [4] Since the output of the table will be multiplied by Y 0, the fewer the bits in the table value, the smaller the required multiplier hardware Let the table value Z 0 be generated by rounding [ X 0 ] to j bits: Z 0 = 0z z 3 z j () The quantization error Z0 = [ X 0 ] Z 0 is then bounded by j < Z0 < j () We can then multiply Z 0 by Y 0 to approximate Y 0 X 0 This result is used to address the arctangent table for ϕ ) The arctangent table: In order to use a very small table, Y 0 Z 0 is rounded to k bits to the right of the radix point to become [ Y 0 Z 0 ], with the rounding error bounded by k < Y 0 Z = Y 0 0 Z 0 [ Y 0 Z 0 ] < k (3) Then, [ Y 0 Z 0 ] is used to index the arctangent table, which returns the coarse angle ϕ = tan ([ Y 0 Z 0 ]) ow we must determine the minimum i, j and k values such that (8) is satisfied First, let us examine X and Y which are computed using [ Y 0 Z 0 ] as X = X 0 Y 0 [ Y 0 Z 0 ] (4) Y = Y 0 X 0 [ Y 0 Z 0 ] (5) Dividing (5) by (4), and then dividing both the numerator and denominator by X 0, we have Y X = ( Y 0 X 0 [ Y 0 Z 0 ]) ( ( Y 0 [ Y 0 Z 0 ]) (6) Y 0 X 0 [ Y 0 Z 0 ] The inequality is true because X 0 Y 0 0 and [ Y 0 Z 0 ] 0 Taking into account all the quantization errors in (0), () and (3), we can express Y 0 X 0 in terms of [ Y 0 Z 0 ] as Y 0 ( Y 0 ( [ X 0 ] X0 = Y 0 (( Z 0 Z0 ) X0 (7) = Y 0 Z 0 Y 0 Z0 = [ Y 0 Z 0 ] Y 0 Z Y 0 0 Z0 Substituting this result into (6), we have Y X Y 0 Z Y 0 0 Z0 (8) Since Y 0 ( X0 = ( Y 0 ( X0, from () and (9), i < Y 0 ( X0 < i (9) Also, according to () and (), we have j < Y 0 Z0 < j (30) Applying (3), (9) and (30) to (8), we obtain Y X < i j k If we choose i 3, j 3 and k 3, then Y X < (3) Therefore, since the inputs X and Y to the fine stage satisfy (8), no additional tables are needed for the fine stage Henceforth we choose i = 3, j = 3 and k = 3 B Hardware reduction in the fine computation stage Since (8) is satisfied, we can obtain the fine angle ϕ by computing the quotient Y X From (4), we have X 0 X X 0 Y 0, hence X < 4 In order to use the same reciprocal table as in the coarse stage, X should be scaled such that X < (3) This can be satisfied by shifting X to the right if X Of course Y should also be shifted accordingly so that Y X remains unchanged As in the coarse stage, the reciprocal table accepts 3 MSBs of X and returns Z We define the reciprocal error δ = X Z Since the same reciprocal table is used as in the coarse stage, δ and δ 0 must have the same bound Since δ = δ 0 = X 0 Z 0 = X 0 [ X 0 ] Z0 (33) we can use (0) and () to obtain < δ < (34) The bound on Y can be found using (3) and (3): Y < (35) ow we can obtain the final error bound in approximating Y X by Y Z, according to (34) and (35), as Y X Y Z = Y δ < ( 075) 3 (36) Clearly, this approximation error is too large To reduce the maximum error below, the bound on δ should be approximately 3, which would require the reciprocal table to accept 3 bits as input That is, the table needed for such a high-resolution input would be significantly larger than the one already employed by the coarse stage To overcome this difficulty, we can apply the ewton- Raphson iteration method [5] to reduce the initial approximation error δ Using Z as the initial guess, the updated X value after one iteration is Z = Z ( X Z ) (37) Substituting Z = X δ into (37), we have Z = ( X δ )( X ( X δ )) = X X δ (38) According to (3), (34) and (35), after one ewton-raphson iteration, the error in Y Z is reduced to

4 Z Y Y X Y Z = Y X δ < ( 075) (39) A rather accurate result is obtained with just one iteration Finally, the fine angle can be computed by multiplying by : ϕ Y Z = Y Z ( X Z ) (40) Although there are three multipliers involved in (40), the size of these multipliers can be reduced with just a slight accuracy loss by truncating the data before feeding them to the multipliers The computational procedure of (40) is as follows: ) The inputs to the fine stage, X and Y, are truncated to 3 and 3 bits to the right of their radix points, respectively Since the 3 MSBs in Y are just sign bits, as indicated by (35), they do not influence the complexity of the multiplier that produces Y Z The corresponding quantization errors are bounded by 0 X < 3 (4) 0 Y < (4) ) Both quantized X and Y are multiplied by Z 3) To form X Z, instead of generating the two s complement of X Z, we can use the ones complement with only an insignificant error Since this error is much smaller, in comparison to the truncation error in the next step, we can neglect it 4) The product Y Z is truncated to 3 bits We would also truncate the ones complement of X Z But since the inverted LSBs of X Z will be discarded, we can truncate X Z to 3 bits and then take its ones complement The corresponding quantization errors, as discussed above, are: 0 X Z < 3 (43) 0 Y Z < (44) After including all the error sources due to simplification, we now analyze the effects of these errors on the final result ϕ Taking the errors into account, we can rewrite (40) as: ϕ (( Y Y )( X δ ) Y Z ) ( ( X X )( X δ ) X Z ) (45) Expanding this product and neglecting terms whose magnitudes are insignificant, we have ϕ Y X Y X δ ( Y X ) X ( X ) Y ( Y X )( X Z ) Y Z (46) As mentioned in Section, Y X is an approximation of tan ( Y X ) Its approximation error, defined in (8), is bounded by tan = ( Y X ) 3 3 < ( 075) 3 (47) Replacing Y X by tan ( Y X ) ( Y X ) 3 3 in (46), we have ϕ tan ( Y X ) ( Y X ) 3 3 Y X δ ( Y X ) X ( X ) Y ( Y X ) X Z Y Z (48) The total error, ε = ϕ tan ( Y X ), is ε = ( Y X ) ( Y X ) 3 ( X δ ) X X X Z ( X ) Y Y Z (49) The subtotal ( Y X ) 3 ( X δ ) X X X Z has all non-negative terms Thus, a lower bound on this subtotal is the minimum value of ( X δ ), which is = 056 3, according to (34) Correspondingly, an upper bound is the sum of the maximum values of the other three terms: ( ) 3 = Finally, we can obtain the total error bound as: [ X 0 ] [ X ] Input Mux Z 3 -wd Output [ Y 0 Z 0 ] 3 -wd ϕ Reciprocal Demux Z Arctan ROM 0 ROM X 0 X Scaling Shifter ones complement X Z ϕ ϕ ϕ Y Y Y Z Fig Proposed architecture

5 ε < (50) = 076 The proposed architecture is shown in Fig, with the sizes of the various multipliers displayed in parentheses 4 Magnitude calculation Once the angle of the vector ( X 0, Y 0 ) is known, its magnitude can be obtained by multiplying X 0 by cosϕ, whose values can be pre-calculated and stored in a ROM, thereby requiring only a single multiplication However, if we use all the available bits to index the ROM table, it is likely that a very large ROM will be needed As we know from the preceding discussion, the coarse angle ϕ is an approximation of ϕ Similarly cosϕ approximates cosϕ Therefore, we can expand the coarse-stage ROM to include also the cosϕ values That is, for each input [ Y 0 Z 0 ], the coarse-stage ROM would output both ϕ = tan ([ Y 0 Z 0 ]) and cosϕ Since X 0 satisfy () and (3), the cosϕ value is within [, ] For many applications, the magnitude value is used only to adjust the scaling of some signal level, and high precision is not necessary For applications where a higher precision is desired, we propose the following approach: First, instead of using the above-mentioned table of cosϕ values, we pre-calculate and store in ROM the cosϕ M values, where ϕ M contains only the m MSBs of ϕ Obviously a small table, one of comparable size to the cosϕ table, is needed Then, we can look up the table entries for the two nearest values to ϕ, namely ϕ M and ϕ M = ϕ M m Then a better approximation of cosϕ can be obtained by interpolating between the table values cosϕ M and cosϕ M as cosϕ M cosϕ M cosϕ cosϕ M ( (5) ϕ M ϕ ϕ ϕ ) M M Let ϕ L = ϕ ϕ M Obviously, ϕ L simply contains the LSBs of ϕ We can now rewrite (5) as cosϕ cosϕ M ( cosϕ M cosϕ M ) ϕ L m (5) which involves only a multiplication and a shift operation, in addition to two adders 5 Converting arbitrary inputs In previous sections we have restricted the input values to lie within the bounds of () and (3) However, if the coordinates of ( X 0, Y 0 ) do not satisfy that condition, we must map the given point so they do Of course, the resulting angle must be modified accordingly To do that, we replace X 0 by their absolute values This maps ( X 0, Y 0 ) into the first quadrant ext, the larger of X 0 is used as the denominator in () and the other as the numerator This places the corresponding angle in the interval [ 0, π 4] We can now use the procedure discussed previously to obtain ϕ Once we get ϕ, we can find the angle φ that corresponds to the original coordinates from ϕ First, if originally X 0 < Y 0, we should map ϕ to [ π 4, π ] using φ = π ϕ Otherwise φ = ϕ We then map this result to the original quadrant according to Table I ext, let us examine the affect of the above-mentioned mapping on the magnitude calculation Since the negation and exchange of the original X 0 values do not change the magnitude, whose value is ( X 0 Y 0 ), the result obtained using the X 0 values after the mapping needs no correction However, if the input values were scaled to satisfy (3), we then need to scale the computed magnitude to the original scale of X 0 TABLE I 6 Test results We have verified our error bound estimation by a bit-level simulation of the structure of Fig To test the algorithm, we generated the pair of inputs X 0 randomly within the range described by () and (3) This test was run repeatedly over many different values of X 0, and the maximum error value was recorded Choosing = 9 for this simulation, the error bound estimate according to (50) is 0005 Our test results yielded the error bounds [ 00004, 00005], well within the calculated bound References Original coordinates Modification to the angle X 0 < 0, Y 0 > 0 φ = π φ X 0 < 0, Y 0 < 0 φ = π φ X 0 > 0, Y 0 < 0 φ = π φ [] D Fu and A Willson, Jr, A high-speed processor for digital sine/cosine generation and angle rotation, in Conf Record 3nd Annual Asilomar Conference on Signals, Systems and Computers, vol, pp 77-8, ov 998 [] A Chen and S Yang, Reduced complexity CORDIC demodulator implementation for D-AMPS and digital IF-sampled receiver, in Proc Globecom 98, vol 3, pp , ov 998 [3] M Andronico, et al, A new algorithm for fast synchronization in a burst mode PSK demodulator, in Proc 995 IEEE International Conference on Communications, vol 3, pp , June 995 [4] D L Fowler and J E Smith, An accurate high speed implementation of division by reciprocal approximation, in Proc 9th Symp on Computer Arithmetic, pp 60-67, Sep 989 [5] I Koren, Computer Arithmetic Algorithms Prentice Hall, Englewood Cliffs, J, 993

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 9. Datapath Design Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017 October 2, 2017 ECE Department, University of Texas at Austin