IMPLEMENTATION OF SIGNAL POWER ESTIMATION METHODS

Size: px

Start display at page:

Download "IMPLEMENTATION OF SIGNAL POWER ESTIMATION METHODS"

Julius Kennedy
5 years ago
Views:

1 IMPLEMENTATION OF SIGNAL POWER ESTIMATION METHODS Sei-Yeu Cheng and Joseph B. Evans Telecommunications & Information Sciences Laboratory Department of Electrical Engineering & Computer Science University of Kansas Lawrence, KS Abstract Estimates of signal power are widely used in signal processing algorithms, particularly in adaptive algorithms. For example, normalized gradient search adaptive algorithms based on both transversal and lattice forms use an estimate of the signal power in order to provide robustness to the input signal environment. A limitation on the use of such methods is the complexity of traditional implementation strategies. In this work, methods for performing sum-of-squares signal power estimation which may be implemented in an extremely simple manner are presented. These strategies are shown to provide excellent estimates with substantial implementation savings. Comparisons and example designs based on FPGAs and custom implementations illustrate the advantages of the new methods. The authors can be contacted via at evans@tisl.ukans.edu. Portions of this work were presented at ISCAS ' 92 in San Diego, California. This research is partially supported by the University of Kansas General Research allocation J. B. Evans is currently on sabbatical at the University of Cambridge Computer Laboratory and ORL Limited, Cambridge, England.

2 1 Introduction Signal power or L 2 norm estimation using a sum-of-squares computation is widely used in signal processing algorithms [2, 7, 8, 10]. For example, the normalized LMS (NLMS) algorithm uses a signal power estimate for step size normalization in the coefficient update term. This normalization provides robustness in applications where the input signal power is not known a priori. Signal power estimates are utilized in a similar manner in other adaptive signal processing algorithms such as gradient-based lattice structures. Simplified implementations of these operations can yield significant benefits in hardware implementations of such systems. The power of the N element signal vector, or specifically the L 2 norm, is calculated according to jjjj 2 = N,1 x 2 k ; (1) where jj æ jj represents the L 2 norm and the x k are the individual elements of the signal vector. The traditional method of implementing such an estimator is based on a multiplier-accumulator architecture, as shown in Figure 1. This paper will investigate techniques to simplify the implementation of this operator. In a related vein, in order to simplify the implementation of gradient-based transversal adaptive algorithms, power-of-two quantizers have been employed in the coefficient update term [3, 4]. A B q bit power-of-two quantizer is characterized by the relationship qèuè = 8 é é: sgnèuè juj ç1 2 blog 2 jujc sgnèuè 2,B q+1 çjujé1 0 jujé2,bq+1 ; where sgnèuè is the sign of u, andbuc is the largest integer less than or equal to u. The multiplication of the error and data terms is replaced by a simple shift operation which reduces the complexity of the implementation and improves processing delays. Utilization of this operator on the power estimate in normalized adaptive algorithms allows elimination of a division operation, which is replaced by a single shift operation, resulting in significant reductions in implementation complexity and improvements in the speed of the system. Power-of-two quantizers have been studied in relation to simplification of the implementation of adaptive lattice algorithms [9]. Both the convergence rate and steady-state error resulting from the use of this simplification were demonstrated to be comparable to that of the ideal method. Power-of-two quantizer multipliers can be implemented such that they require half the area and are twice as fast as the equivalent array multiplier [4]. The (2) 1

3 elimination of the division operation provides significant advantages according to area and speed criteria. In this paper, we propose methods for simplifying the implementation of signal power estimation, which in a typical system allows a multiplier to be replaced with simple combinational logic. We will also consider simplified implementation schemes specifically for use with a power-of-two quantizer, so that power normalization can be realized with only simple logic and a shifter, less complex than even the system in [9]. In resource limited environments such as FPGA-based systems, such methods may enable the use of normalized algorithms which would be otherwise impractical. It will be shown that the proposed algorithms improve the implementation complexity of their respective systems with little or no degradation in overall algorithm performance. Two sum-of-square signal power estimators are presented in Section 2. One of these is to be used with a power-of-two quantizer, and the other without. The performance of these algorithms is studied in Section 3, the implementation of the algorithms is investigated in Section 4, and the conclusions from this work appear in Section 5. 2 Simplified Estimation Methods The proposed methods are based on the observation that many applications of L 2 norm estimates may not require extremely high accuracy, as suggested by the results in [6, 9]. By exploiting the inherent relationships between the input and output of a module which performs the square, the multiplication operation can be replaced by a simple quantizer. Two different implementations of the bit manipulation logic will be discussed in this paper. The first method should be used when the signal power estimate will be applied directly, that is, when the most accurate estimate is needed for subsequent use. The second method should be used when the estimator is followed by a powerof-two quantizer. The second method, while less accurate in isolation, provides a better final estimate when used in conjunction with the power-of-two quantizer due to canceling of errors. The fundamental structure for the first method is illustrated in Figure 2(a), where the multiplier in the traditional structure is replaced by a quantizer implemented by simple combinatorial logic. The output of the system in Figure 2(a) is input into a power-of-two quantizer to form the structure for the second method, which is shown in Figure 2(b). This circuit is designed for use in a simplified normalization algorithm based on PTQ division, that is, division by a power-of-two quantized value (which reduces to a shift operation). Both of these approximations involve only a single gate delay, with linear growth in area complexity with the number of bits. Note that the circuits presented here are 2

4 only configured for positive inputs, for simplicity of discussion. Additional circuitry can be used to provide for inputs using various standard signed number representations with a minimal increase in complexity. The resulting signal power estimates show excellent agreement with the ideal case, despite the coarse approximations used. Further, these methods have been applied to typical adaptive filtering scenarios with very good results. 2.1 The Simplified Estimation Algorithm Each input bit plays a different role in each multiplication stage and in the final result. We can constrain the number of input bits that we include in each output bit of the simplified square operator based on the error generated in each case. Under this constraint we can find the optimum logic combination for generating the bits constituting the square output for selected input distributions. Let x q be the output of an ideal linear quantizer. It can be expressed as x q = B,1 x k 2 k ; where the x k are the individual bits of a single B bit input word x. Let Qæèx q è be the square of x q, Qæèx q è = = è B,1 B,1 B,1 x k 2 k! 2 = l=0 è B,1 èx k æ x l è2 k+l =!è B,1 x k 2 k 2B,1 j=0 l=0 x l 2 l! w j 2 j ; (3) where æ indicates a logical AND operation, and w j denotes an individual output bit of the 2B bit output word w. An example 6 by 6 bit multiplier-based square operation can be found in Table 1 where the formation of partial products to yield a 12 bit product by shift-and-add is shown. Since it is a square operation, the 2x k x l terms can be promoted to one bit level higher without knowing the actual input value; this provides the information to generate Table 2, which expresses the relationship between the partial products and their related output bits for a 6 by 6 bit multiplier. By examining the partial products at each output bit level in Table 2, we concluded that all partial products can be separated into two output groups to display the final result in a two term format. Thus, we rewrite Equation (3) as Qæèx q è= B,1 w 2k 2 2k + w 2k+1 2 2k+1 : (4) 3

5 Equation (3) can be converted into a summation of B pairs of output bits, where each pair of output bits is composed of an even bit, w 2k, and an odd bit, w 2k+1. Based on the above two expressions for Qæèx q è, Equations (3) and (4), and the information observed in Table 2, we express the output of the power estimator in a general form as Qèx q è= B,1 A k 2 2k + B k 2 2k+1 ; (5) where A k and B k are implemented by the bit set logic in Figure 2(a) and 2(b). Only two bits from each word of the quantized input signal sequence will be used to form the A k and B k terms in Equation (5) in order to simplify implementation. Although this is somewhat arbitrary, the small estimation errors obtained justify this constraint. In particular, the estimates appear to be sufficiently accurate for use in adaptive filtering systems, as suggested by the results presented in the next section. All possible logic functions within the neighborhood of k were evaluated in the search for the optimum function. More specifically, all functions of x i for i = k + 2;k + 1;k;k,1;k,2;k,3were evaluated. The logic functions of A k and B k were chosen to minimize the mean absolute error (MAE) of the power estimator, that is, MAE = E 2 4 æ 1 N N,1 j=0 x 2 j, Qèèx jè q è æ 3 5 ; (6) where N is the number of samples and the expectation was evaluated for inputs with Gaussian and uniform distributions prior to quantization. A similar methodology can be used for other distributions as required. 2.2 Method 1 - Direct Application Method For the first method, we want to directly replace the traditional multiplier with the proposed bit manipulation logic in Equation (5) and keep the approximation error of the new method as small as possible. As expressed in Equation (3), the output of an ideal multiplier is the summation of the product of two input bits at each bit level plus the carry from preceding stages. In the case of the proposed simplified method, formulated in Equation (5), the output is in the simplified two-term form. All combinations of a constrained set of logic functions were evaluated by exhaustive search using the mean absolute error criterion for Gaussian and uniform inputs. The logic functions are all of the functions of two input variables. Only the bits within three places of the bit in question were selected as input variables, as it was assumed that other bits would be of lesser significance. The results appear to support this conjecture, although it is possible to construct pathological cases were it does not hold. 4

6 The functions providing the lowest MAE for Method 1, with B ranging from 6 to 20 bits, are listed in order in Table 3. The best quantizers for both example distributions are found to be identical. The selection of quantizer under the indicated constraints is B,1 Q 1 èx q è= èx k+1 + x k è2 2k +èx k æx k,1 è2 2k+1 : (7) Although the optimal logic function for Method 1 in Equation 7 bears no easily recognizable resemblance to the form in Table 2, the small MAE for both input distributions partially justifies the constraints on the logic functions. 2.3 Method 2 - Method for Use with PTQ The second method is used to improve the performance of a power-of-two quantizer when a division operation is needed. In this case, the bit manipulation logic is designed so that the approximation errors in this stage help to cancel the errors in the subsequent power-of-two quantization stage. The linearized analysis of the power-of-two quantizer in [11], implies that the transfer characteristic of the linearized approximation of a power-of-two quantizer can be expressed as qèuè ç 2 u. In order to reduce the error generated by the 3 linearized approximation, the bit set logic for this method, must introduce an overestimation error to cancel the error produced by the power-of-two quantizer. We applied the same search procedure to find the logic function for the second method. The functions providing the lowest errors results are listed in Table 4. The best quantizer for both distributions is in the form B,1 Q 2 èx q è= èx k æ x k,1 è2 2k + x k 2 2k+1 : (8) In particular, note that the functions A k = x k+1 and B k = x k provide performance close to that of the best quantizers, and are trivial to implement. 3 Performance Evaluation Simulations of the proposed functions for Gaussian and uniform inputs were performed. The input sequences generated for our simulation are assumed to be scaled for representation within the range between -1 and 1 with limited or no clipping. For each simulation, the input vector size N is 32. The final simulationresult is the average of 500 simulation runs. 5

7 3.1 Simulation Results A comparison of the performance of the proposed algorithms with that of the ideal finite word length implementation and an implementation using an ideal square computation with a power-of-two quantizer on the result [9] for Gaussian inputs is shown in Figure 3, where the solid line is the ideal case. The curve immediately under the ideal one is the result of Method 1, which is then followed by the quantized ideal case. The second lowest curve is the method in [9], and the lowest curve is the method using a power-of-two quantizer only. The results for the direct use approximation (Method 1) are almost indistinguishable from the ideal case, while the simplified method for use with a power-of-two quantizer provides better performance than the power-of-two quantizer alone. A similar comparison of the performance of the various algorithms with uniformly distributed inputs is shown in Figure 4. Once again, the solid smooth curve is the ideal estimate, which is overlapped by the curve of the quantized ideal case. The curves of both methods oscillate around the ideal curve, but the method in [9] has a larger magnitude. The results for the direct use approximation are in good agreement with the ideal case, although not as close as in the Gaussian case. The simplified method for use with a power-of-two quantizer is once again superior to the power-of-two quantizer alone, and is closer to the ideal case than that of the Gaussian case. In order to confirm the utility of these techniques in the proposed adaptive signal processing application, two example scenarios were simulated using the technique in which an approximation of the square is used in conjunction with a power-oftwo quantizer (Method 2). The result of the first example is shown in Figure 5. This example compares a simplified normalized 11 tap transversal adaptive equalizer and the traditional NLMS implementation adapting to a raised cosine channel with shaping factor W =3:1[7] and additive Gaussian noise of variance It can be seen that the new algorithm provides performance comparable to the ideal implementation. Both the convergence rate and the steady-state error of the new method are slightly higher than those of the traditional method because the same step size numerator was used in the comparison; they perform comparably with minor adjustment of this coefficient. The simulation results of the second example, which compares a 20 tap simplified normalized adaptive lattice equalizer with alternative lattice implementations, is shown in Figure 6. In this case, the telephone network channel of [5] is used. The performance of the simplified algorithm is comparable to that of the traditional lattice algorithm. The normalized lattice algorithm in [9] has the same convergence rate as the lattice one, but with a higher steady state error. The simplified normalized lattice algorithm, which uses the new quantizer, has a convergence rate comparable to that of the other two, and provides a steady state error almost identical to that of the classical lattice algorithm. We also investigated the effects of finite input data wordlength on the perfor- 6

8 mance of both methods using the MAE criterion. The results are listed in Table 5. We applied the same 20 bit power-of-two quantizer in every case for Method 2. The results show that a larger number of input bits does not guarantee a smaller MAE for these methods and input distributions. The quantization effects of the power-of-two quantizer on the performance of Method 2 with various input data wordlengths were also investigated. The MAE results are shown in Figures 7 and 8 for Gaussian and uniform input distributions, respectively. The example logic functions for the power estimator are selected from the optimal case, B k = x k and A k = x k æ x k,1, for both input distributions. The results show that the MAE starts to increase when the number of power-of-two quantizer bits is reduced below 8 for both input distributions. 3.2 Analytical Results General Analysis Let Qèxè be the output of one of the new square quantizers Q 1 èxè or Q 2 èxè. The absolute mean of the approximation error (MAE) is given by Eëjejë =Eëjx 2,Qèxèjë: (9) We will assume that the input word length is B +1bits, so that the total number of levels is 2 B+1,ofwhich2 B are positive. The mean of the approximated output is given by 1 EëQèxèë = Qèxè pèxè dx =2,1 2 L k I k pèxè dx; (10) where I k is the k th input quantization interval, and L k is the level of the output in the k th interval, which is a constant. The mean absolute error (MAE) then reduces to Eëjejë = 2 = I k jx 2, L k jpèxè dx è I + k èx 2, L k èpèxè dx + I, k èl k, x 2 èpèxè dx è ; (11) where I + k is the region where x2, L k é 0 and I, k is the region where x2, L k é 0. These regions can be determined in a straightforward manner for each interval. The mean square error (MSE) for the quantizers is given by Eëe 2 ë = Eëèx 2, Qèxèè 2 ë = Eëx 4 ë, 2Eëx 2 Qèxèë + EëQ 2 èxèë: (12) 7

9 The last term can be rewritten as 1 EëQ 2 èxèë =,1 Q2 èxè pèxè dx =2 2 2 L k pèxè dx: (13) I k The first square term can be expressed in a similar manner as and the cross term is so that 1 Eëx 4 ë=,1 x4 pèxè dx =2 2 1 Eëx 2 Qèxèë =,1 x2 Qèxè pèxè dx =2 Eëe 2 ë = 2 = Analysis for Gaussian Inputs I k x 4 pèxè dx; (14) 2 I k èx 4, 2x 2 L k + L k 2 è pèxè dx L k I k x 2 pèxè dx; (15) I k èx 2, L k è 2 pèxè dx: (16) The performance of these algorithms for Gaussian inputs can be studied by applying the results derived above for arbitrary input distributions. We will assume that the input consists of an independent and identically distributed (i.i.d.) zero mean Gaussian sequence with variance ç 2. x The mean of the approximation output is then Letting I k =èx k ;x k+1 ë, EëQèxèë = EëQèxèë = 2 s 2 çç x 2 L k èerf R where erfèuè = p 2 u ç 0 dt. e,t2 According to Equation (11), we have Eëjejë = 2 = p 2çç x è 2 I k jx 2, L k j e,x L k e I 2 2 çx 2 dx: (17) k! è x k+1 p, erf 2 ç x,x 2 2 ç 2 x dx è 1 p èx 2çç x I 2,x 2 2 ç, L k è e x 2 dx + + k 8!è x k p ; (18) 2 ç x I, k èl k, x 2,x 2 2 ç è e x 2 dx è (19) ;

10 which can be evaluated in a straightforward manner for most wordlengths B of interest Analysis for Uniform Inputs In this section we assume the inputs are uniformly distributed over the interval ë,a; aë with zero mean and variance a 2 =3. The mean of the approximation output is EëQèxèë = 2 2 L k 1 dx: (20) I k 2a Letting I k =èx k ;x k+1 ë, EëQèxèë = 8 é é: 1 a 2 L k èx k+1, x k è;,a ç x ç a 0; elsewhere. (21) We can also find that the MAE is Eëjejë = 1 a = 1 a 2 2 I k jx 2, L k j dx è Power of Two Quantizer Output I + k èx 2, L k è dx + I, k èl k, x 2 è dx è : (22) The performance of the cascade of one of the square quantizers with a power-oftwo quantizer can be approximated by assuming that the input to the power-of-two quantizer qèæè isgaussian with first and second momentsgivenby ç = EëQèxèë and ç 2 =èeëq 2 èxèë, ç 2 è=n, which are the mean and variance of the accumulated and scaled estimator logic outputs Qèxè, respectively. While this assumption is clearly not ideal, it does provide reasonable results, particularly for large N. Letting Q T èxè denote a B q bit power-of-two quantizer function, we find that EëQ T èxèë = = B q Q T èxèpèxèdx = L T k 0 B q è è! L T k erf x k+1, ç p 2 ç I k pèxèdx, erf è x k, ç p 2 ç!è ; (23) where L T k is the level of the output in the k th interval, which is a constant derived from Equation (2). 9

11 Figure 9 shows a comparison of the simulation and analytical results with Gaussian inputs for N =32. For both proposed methods, the agreement is excellent. The comparative results for the system with the power-of-two quantizer show excellent agreement even for values as low as N = 4, and the results for the individual square quantizers are outstanding for all values of N. Similar agreement is obtained for alternative system input distributions, such as the uniform distribution which is shown in Figure Implementation Results 4.1 FPGA Implementation The new methods can be used to implement elements of an adaptive signal processing system using commercially available FPGAs, such as the ilinx C3100 and C4000 series parts. The simplicity of the logic for the square operation implies that at most a single configurable logic block (CLB) column will be required for implementation of the square approximation, while in the case of the simplest functions no additional logic will be required. Several parallel adaptive algorithm tap update operations can be implemented on a single FPGA if a power-of-two quantizer divider is used for normalization and a power-of-two quantizer multiplier is used for the error-data multiplication operation. A typical configuration may, for example, require 8 bits for the data and error terms, and 16 bits for the tap weights. If the proper pipelining is applied, the update operations can be performed at 30 MHz in the C3100 series parts, and 38 MHz in the C4000 series parts. Only 42N+74 C3100 CLBs or 21N+53 C4000 CLBs are required to implement the tap weight update for a normalized LMS algorithm, as shown in Table 6. This implies that up to 8 parallel updates may be possible on a single C3195 FPGA, and up to 16 tap updates on a C4010 FPGA (although routing constraints will make approaching this limit difficult). Such density would clearly be difficult to obtain with conventional implementation schemes, since an 8 bit by 8 bit parallel multiply alone requires approximately 64 CLBs. 4.2 Full Custom Implementation The advantages of the new methods can also be quantified by comparing a CMOS implementation of the proposed methods with traditional methods. Prototype systems were designed and fabricated in the MOSIS 2.0 çm, double-level metal SC- MOS process. A layout of an implementation of the algorithm for direct use is shown in Figure 11. The input word size was 10 bits, with a 20 bit accumulator. A Brent-Kung adder design was used for the accumulator, while a Baugh-Wooley 10

12 array multiplier was used for the comparison system. Clearly, improved performance (particularly in speed) could have been attained with more optimized primitive modules, but these results serve to illustrate the advantages of the proposed systems. The results are summarized in Table 7. It can be seen that the new methods are approximately 57.5% smaller in area and 196% faster than the traditional implementation. The substantial area-speed advantages of the new methods are even more greatly apparent when comparisons are made to normalization systems using high speed division circuitry. Such division implementations typically require two and a half times the area of the equivalent array multiplier, and impose two to two and a half times the delay [1]. Under these assumptions, the comparisons shown in Table 8 can be made. The power-of-two division method is approximately 71% smaller in area and 248% faster than a traditional implementation. 5 Conclusion In this paper, we have presented easily implemented methods for estimation of signal power using the sum-of-squares approach. These methods can be applied to a variety of algorithms, such as normalized adaptive algorithms. The selection of the optimum logic functions for both methods were based on a MAE criterion. The optimum functions will be dependent on B (the number of bits), and k (the bit position), which makes the general optimization problem difficult to solve. Under the restrictions that the functions A k and B k can be composed of no more than two terms, and are similar in form for all k, programmable and full-custom circuits based on the new methods were developed and were shown to compare favorably to the traditional strategies. References [1] H. Ahmed and K.-H. Fu. A VLSI array CORDIC architecture. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages , Glasgow, Scotland, May [2] C.F.N.CowanandP.M.Grant.Adaptive Filters. Prentice-Hall, [3] D. L. Duttweiler. Adaptive filter performance with nonlinearities in the correlation multiplier. IEEE Trans. Acoust., Speech, Signal Processing, ASSP- 30(4): , Aug [4] J. B. Evans and B. Liu. A CMOS implementation of a variable step size digital adaptive filter. In IEEE Int. Conf. Acoust., Speech, Signal Processing, pages 11

13 , May [5] J. B. Evans, P. ue, and B. Liu. Analysis and implementation of variable step size adaptive algorithms. IEEE Trans. Signal Processing, pages , August [6] S. Gazor and B. Farhang-Boroujeny. Quantization effects in transformdomain normalized lms algorithms. IEEE Trans. Circuits and Syst. - II, 39(1):1 7, Jan [7] S. Haykin. Adaptive Filter Theory. Prentice-Hall, [8] M. L. Honig and D. G. Messerschmitt. Adaptive Filters: Structures, Algorithms, and Applications. Kluwer Academic, [9] M. J. Reed and B. Liu. Analysis of simplified gradient adaptive lattice algorithms using power-of-two quantization. In IEEE Int. Symp. Circuits and Syst., pages , May [10] B. Widrow and S. D. Stearns. Adaptive Signal Processing. Prentice-Hall, [11] P. ue and B. Liu. Adaptive equalizer using finite-bit power-of-two quantizer. IEEE Trans. Acoust., Speech, Signal Processing, ASSP-34: , Dec

14 Table 1: Formation of the product of two 6-bit words by shift-and-add. x 5 x 4 x 3 x 2 x 1 x 0 x 5 x 4 x 3 x 2 x 1 x 0 x 5 x 0 x 4 x 0 x 3 x 0 x 2 x 0 x 1 x 0 x 2 0 x 5 x 1 x 4 x 1 x 3 x 1 x 2 x 1 x 2 1 x 0 x 1 x 5 x 2 x 4 x 2 x 3 x 2 x 2 2 x 1 x 2 x 0 x 2 x 5 x 3 x 4 x 3 x 2 3 x 2 x 3 x 1 x 3 x 0 x 3 x 5 x 4 x 2 4 x 3 x 4 x 2 x 4 x 1 x 4 x 0 x 4 x 2 5 x 4 x 5 x 3 x 5 x 2 x 5 x 1 x 5 x 0 x 5 w 11 w 10 w 9 w 8 w 7 w 6 w 5 w 4 w 3 w 2 w 1 w 0 Table 2: Partial products of a 12 bit product and their related output bits. Even Bit Partial Product Odd Bit Partial Product w 0 x 2 0 w 1 0 w 2 x x 1 x 0 + carry w 3 x 2 x 0 + carry w 4 x x 3 x 0 + x 2 x 1 + carry w 5 x 4 x 0 + x 3 x 1 + carry w 6 x x 5 x 0 + x 4 x 1 + x 3 x 2 + carry w 7 x 5 x 1 + x 4 x 2 + carry w 8 x x 5 x 2 + x 4 x 3 + carry w 9 x 5 x 3 + carry w 10 x x 5 x 4 + carry w carry 13

15 Table 3: Comparison of the simulation MAE of the best logic functions for Method 1 corresponding to Gaussian distribution and uniform distribution. Logic MAE Logic MAE Index Function Gaussian Function Uniform B k A k Distribution B k A k Distribution linear quantizer x k æ x k,1 x k+1 + x k x k æ x k,1 x k+1 + x k x k æ x k,2 x k x k+1 x k+1 + x k x k æ x k,1 x k x k+1 x k x k æ x k,3 x k x k+1 æ x k x k+1 + x k x k æ x k,2 x k+1 + x k x k æ x k,3 x k æ x k, x k+1 x k+1 + x k x k æ x k,2 x k æ x k, x k æ x k,1 x k æ x k, x k æ x k,1 x k x k æ x k,3 x k æ x k, x k æ x k,2 x k x k æ x k,2 x k æ x k, x k+1 æ x k,1 x k+1 + x k x k+1 x k x k+1 æ x k,2 x k+1 + x k Table 4: Comparison of the simulation MAE of the best logic functions for Method 2 corresponding to Gaussian distribution and uniform distribution. Logic MAE Logic MAE Index Function Gaussian Function Uniform B k A k Distribution B k A k Distribution 1 x k x k æ x k, x k x k æ x k, x k x k æ x k, x k x k x k x k æ x k, x k x k+1 æ x k, x k+1 + x k x k+1 æ x k x k x k+1 æ x k, x k+1 + x k x k+1 æ x k, x k x k+1 æ x k, x k+1 + x k x k+1 æ x k, x k x k+1 æ x k, x k+1 + x k x k+1 æ x k, x k x k+1 æ x k, x k+1 + x k x k+1 æ x k, x k x k+1 æ x k, x k+1 + x k x k+1 æ x k, x k x k+1 æ x k x k x k x k x k+1 æ x k

16 Table 5: Comparison of the simulation MAE of the best design for both methods with different input bits corresponding to Gaussian distribution and uniform distribution. For Method 2, we use a 20-bit power-of-two quantizer. Number Mean Absolute Error of Method 1 Method 2 Input Gaussian Uniform Gaussian Uniform Bits Distribution Distribution Distribution Distribution Table 6: Adaptive filter update implementation, ilinx C3100 and C4000 series FPGAs. FPGA Operation Square Accumulate Shifter Total C3100 normalize series weight update C4000 normalize series weight update Table 7: L 2 Norm Estimation Circuits (2.0 çm DLM CMOS) Area (mm 2 ) Speed (ns) original system (multiply, add, latch) new system (logic, add, latch) Table 8: Normalization Circuits (2.0 çm DLM CMOS) Area (mm 2 ) Speed (ns) full-precision division method power-of-two quantization division

17 Input Array Multiplier (B-bit by B-bit) Accumulator ( M-bit) Output Figure 1: Traditional signal power estimation circuit based on the sum-of-squares. Bit Input Set Accumulator Output Logic (a) Bit Power- Input Set Accumulator of-two Output Logic Quantizer (b) Figure 2: New sum-of-squares signal power estimation circuits (a) for direct application; (b) for method with power-of-two quantizer. 16

18 0.25 Power Estimator Performance, Gaussian Input Ideal Quantized Ideal 0.2 Output Power Estimate Simplified Estimator Power of Two Quantizer Simplified Estimator with PTQ Input Standard Deviation Figure 3: Simulation results for various sum-of-squares signal power estimation methods with Gaussian input signal. Power Estimator Performance, Uniform Input 0.25 Ideal, Quantized Ideal 0.2 Output Power Estimate Simplified Estimator Power of Two Quantizer 0.05 Simplified Estimator with PTQ Input Standard Deviation Figure 4: Simulation results for various sum-of-squares signal power estimation methods with uniform input signal. 17

19 1 Performance of Adaptive Algorithms Mean Square Error NLMS Simplified NLMS Minimum MSE Samples Figure 5: Simulation results, adaptive transversal filter with traditional and new normalization methods. 1 Performance of Adaptive Algorithms Mean Square Error Lattice Lattice with Various Simplifications Lattice with PTQ Divide Minimum MSE Samples Figure 6: Simulation results, adaptive lattice filter with traditional and new normalization methods. 18

20 Method 2 with PTQ, Gaussian Input (32 input elements) 8 Input Bits 10 Input Bits 12 Input Bits 14 Input Bits 16 Input Bits Mean Absolute Error (MAE) Number of Bits (Power-of-Two Quantizer) Figure 7: Dependence of the MAE of Method 2 on the number of input bits and the number of power-of-two quantizer bits corresponding to Gaussian distribution Method 2 with PTQ, Uniform Input (32 input elements) 8 Input Bits 10 Input Bits 12 Input Bits 14 Input Bits 16 Input Bits Mean Absolute Error (MAE) Number of Bits (Power-of-Two Quantizer) Figure 8: Dependence of the MAE of Method 2 on the number of input bits and the number of power-of-two quantizer bits corresponding to uniform distribution. 19

21 0.25 Power Estimator Performance, Gaussian Input (10 input bits) 0.2 Output Power Estimate Method 1, Theory & Simulation Method 2, Theory & Simulation Input Standard Deviation Figure 9: Comparison of analytical and simulation results for sum-of-squares signal power estimation methods with Gaussian input signal. Power Estimator Performance, Uniform Input (10 input bits) Output Power Estimate Method 1, Theory & Simulation Method 2, Theory & Simulation Input Standard Deviation Figure 10: Comparison of analytical and simulation results for sum-of-squares signal power estimation methods with uniform input signal. 20

22 Figure 11: Power estimation chip layout. 21

DESIGN OF QUANTIZED FIR FILTER USING COMPENSATING ZEROS

DESIGN OF QUANTIZED FIR FILTER USING COMPENSATING ZEROS Nivedita Yadav, O.P. Singh, Ashish Dixit Department of Electronics and Communication Engineering, Amity University, Lucknow Campus, Lucknow, (India)