Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System (RNS) and Dynamic Distributed Arithmetic Algorithm (DDAA) in modern telecommunication and multimedia applications is becoming more and more important because it allows interesting advantages in terms of area, power consumption and speed.. This paper presents a general conversion procedure based on a {2 n - 1,2 n, 2 n + 1 } moduli set. Based on the improved table look up RNS and DAA algorithm, an architecture which efficiently implements the digital fir filter is synthesized using Xilinx VirtexE.It is observed that upto 82.85% reduction in number of slices,upto 100% reduction in number of flip flops and upto 87.21% reduction in number of Look Up Tables(LUT) is achieved. The speed of the filter is improved by 30.98%. Keywords: Residue arithmetic, Distributed arithmetic, FIRfilters, high speed, VLSI. 1 Introduction The advantages of Residue Number System (RNS) processing are discussed in several publications and books [5], [8], [9]. Error free computation, simplified and fast addition and multiplication, possibility to obtain parallel architectures are among the more important advantages. The use of the RNS allows the decomposition of a given dynamic range in slices of smaller range on which the computation can be efficiently implemented in parallel [1],[2]. In fact, the implementation of the converters constitutes a fixed overhead on the total area, delay and power dissipation [6], [7]. For these reasons the output conversion, which is generally performed using the Chinese Remainder Theorem (CRT) and Mixed Radix Conversion (MRC) still appears to be a crucial point in the realization of competitive RNS subsystems and, therefore, represents one of the main topics in the recent RNS research activities. Previous work demonstrated that FIR filters implemented in the Residue Number System (RNS) offer better performance of filters realized in the traditional binary system in terms of area and power dissipation [3],[4].This research paper deals with the design of digital fir filter based on improved table look up scheme for CRT and Dynamic Distributed Arithmetic (DDA).The block diagram of digital fir filter based on DAA and CRT is shown in fig.1. X (n) Binary to Residue converter Dynamic Distributed Arithmetic based FIR filter Residue to Binary converter Y (n) Figure 1: Block diagram of digital fir filter 2 Residue Number System A residue number system is defined by a set of N integer constants,{m 1, m 2, m 3,..., m N },referred to as the moduli. Let M be the least common multiple of all the m i. Any arbitrary integer X smaller than M can be represented in the defined residue number system as a set of N smaller integers {x 1, x 2, x 3,..., x N }with x i = X mod m i representing the residue class of X to that modulus.m is then the product of all the m i. The only requirement for a modulus to be in a set is that it has to be a pair-wise relatively prime to any other moduli in the set [5]. A moduli set can have moduli that have common factor, hence there are not relatively prime to one another. The example for this type of moduli set used in this research is {2 n - 1,2 n, 2 n + 1 }. Using CRT a binary number X can be found from its residue representation by n X = j R j / j (1) j=1 M Where X is the decimal representation of a number

j International Journal of Science and Engineering Research (IJ0SER), r j is the j th residue digit of X i.e. R j = X mod is the j th moduli X[n] Input L U A D Reg Buffer T D Y[n] j = M / and M = П here j and R j / j are constant. So equation (1) can be re written as n X = C j R j (2) j=1 M where C j = j 1 / j a = 1 / j is called multiplicative inverse. It can be calculated using the following formula a = 1 / j = ( j) m i -2 mod mj Equation (2) expands to X = C 1 R 1 +C 2 R 2 +.+C n R n ) mod M For j=1 to n, C j is known and can be stored in a look up table. However, to implement X,it requires n multiplication units and summation units capable of handling n inputs. This method is not suitable to meet the goal of maximizing the speed of the conversion. 3 Dynamic Distributed Arithmetic Based fir filter The input sequence is fed into the input buffer register at the input sample rate. The co-efficient are also fed to the corresponding buffer. The serial output is presented to the RAM based shift registers The RAM based shift register stores the data in a particular address. The outputs of registered LUTs are added and loaded to the scaling accumulator from LSB to MSB and the result which is the filter output will be accumulated on to the output register over the time. For an n bit input, n+1 clock cycles are needed for a symmetrical filter to generate the output. if there is any change in h[n], it will be updated and the resultant content is stored in the LUTs. This is shown in fig.2. For example, consider m1= 7, m2 = 8, m3 = 9. so M will be 504.Using the formulae stated above,the values of a 1 =4, a 2 =7 and a 3 =5.So C 1 = 288, C 2 = 441 and C 3 = 280. Therefore X= (288 R 1 + 441 R 2 + 280 ) mod 504. The direct implementation of the X requires 504 entries in the table shown in Table.1.This increases the size of look up table, latency and decreases the speed of operation. h[n] Co- Efficient Buffer S H I F T Figure 2: Block diagram of Dynamic Distributed Algorithm based FIR filter 4 Proposed method In this proposed method of RNS to binary conversion, the speed is increased by reducing the size of the table look up. Also this paper proposes the new formulae to find out the multiplicative inverses. The new given formulae perform only shifting operation compared to conventional method in which the calculation of multiplicative inverses are difficult and time consuming process. The following are the new formulae for the calculation of multiplicative inverses a 1 =2 n-1, a 2 =2 n -1, a 1 =2 n-1 +1 (3) R2 R 1 X 0 0 1 288 0 0 2 576 0 0 3 864 0 0 4 1152 0 0 5 1440 0 0 6 1728 0 1 0 441 0 1 1 729 To be continued till 504 entries Table 1: Direct implementation of X The depth of the table 1 can be reduced by using three separate tables, one for each moduli set to facilitate parallel access which is shown in Table 2.

Access R 2 R 1 X mod M 0 0 1 288 0 0 2 72 R 1 0 0 3 360 0 0 4 144 0 0 5 432 0 0 6 216 0 1 0 441 0 2 0 378 0 3 0 315 R 2 0 4 0 252 0 5 0 189 0 6 0 126 0 7 0 63 1 0 0 280 2 0 0 56 3 0 0 336 4 0 0 112 5 0 0 392 6 0 0 168 7 0 0 448 8 0 0 224 Table 2: Look up table for parallel Access The items fetched from the three tables are added together and mod M operation is done for final X value. For example, to convert the residue numbers {R1, R 2, } = {3, 4, 5} with moduli set {7,8,9}. From the table 2. R 1 =3 corresponds to 864, R 2 =4 corresponds to 1764 and =5 corresponds to 1400.Then the value of X is given by X= (864+1764+1400) mod 504 = 500.In this method of conversion mod operation must be performed which is time consuming and costly to implement in hardware. This mod operation can be eliminated by considering the following property. [X+ (a. m)] mod M =X for a=1,2.n. This is true for [X- (a. m)] mod M =X for a=1,2.n. From the table 2. the residue number { R 1, R 2, } ={1,0,0}with moduli set {7,8,9}corresponds to decimal number 288.It also corresponds to(-216) by the above definition. The new improved table shown in table 3. If { R 1, R 2, }={3,4,5}with moduli set {7,8,9},the corresponding decimal number is computed as follows: Conventional method (from table 2) {3,4,5} ={3,0,0} = 360 +{0,4,0} = 252 +{0,0,5} = 392 --------- 1004 mod 504 ---------- So the X value is 500.Here mod operation is must. Proposed method (from table 3) {3,4,5} ={3,0,0} = -144 +{0,4,0} = 252 +{0,0,5} = -112 --------- -004 ---------- If the sums of X values are negative which are not desired results (0<X<M), just add value of M to the X value. So the X value is -004+504 = 500. In this method, R 1 =6 corresponds to the decimal number 216 and R 1 =1 corresponds to the decimal number- 216. So R1=1 and R 1 = 6 are dual of each other. Hence it is possible to eliminate half of the entries in the table.3.in this proposed method, the speed can be increased by parallel accessing of residues R 1, R 2 and.also the area required for the implementation of the converter is reduced by reducing the size of the look up table using the property of the residue number.(note that { R 1, R 2, } = {0,0,0} is trivial and is eliminated from the table). Xilinx VirtexE are programmed using Verilog HDL; a popular hardware description language [10]. The language has capabilities to describe the behavioral nature of a design, the data flow of a design, a design s structural composition, delays and a waveform generation mechanism. Models written in this language can be verified using a Verilog simulator. Access R3 R 2 R 1 X mod M 0 0 1 288 (-216) 0 0 2 72 R1 0 0 3 360 (-144) 0 0 4 144 0 0 5 432 (-72) 0 0 6 216 0 1 0 441 (-63) 0 2 0 378 (-126) 0 3 0 315 (-189) R 2 0 4 0 252 0 5 0 189 0 6 0 126 0 7 0 63 1 0 0 280 (-224) 2 0 0 56 3 0 0 336 (-168) 4 0 0 112 5 0 0 392 (-112) 6 0 0 168 7 0 0 448 (-56) 8 0 0 224 Table 3: Proposed look up table

Delay (ns) Memory( MB) 15 Array International Journal of Science and Engineering Research (IJ0SER), 5 Results and Discussion The goal of research work is to compare the number of resources consumed by the DDA method with improved table look up scheme for RNS with that produced by other conventional methods. For our experiments, we considered 4 tap FIR filters and targeted the Xilinx VirtexE device. The constants were normalized to 4 digit of precision and the input samples were assumed to be 4 bits wide. We used the Xilinx Integrated Software Environment (ISE) for performing synthesis and implementation of the designs. All the designs were synthesized for maximum performance. Table 4. shows the reduction in the number of resources, in terms of the number of Slices, Look Up Tables (LUTs) and the number of Flip Flops (FFs).From the table 4., it is observed that the number of slices reduced by 82.85% and LUTs by 87.21%.The number of flip flops also reduced by 100% compared to other methods of filter implementation. The number of Input Output Block (IOB) requires is only 90 out of 126. Pro Method / Direct Baugh posed parameter Braun Array Wooley method Slices (3584) 175 217 229 208 30 Flip Flops (7168) 48 48 48 48 0 LUT (7168) 255 381 399 363 51 IOB (141) 126 126 126 126 90 Table 4: Resources utilization for the various filters. From the table.5, it is noted that proposed method only needs shifters, adders and subtractors.but direct method requires multipliers and registers. All other methods requires only adder - subtracter, register and shifters. Method / Direct Braun Baugh Array Proposed method parameter Wooley Registers 12 12 12 12 0 Adders/ Sub tractors 9 9 9 9 9 Multipliers 12 shifters 9 Table 5: Macro Statistics for the various filters. From the figure 3, it is cleared that the delay of proposed method is very much less than that of other methods of filter implementation. So its speed is increased by approximately 31%. 25 20 Delay Comparison 10 Braun Direct Baugh Wooley 5 Pro posed method 0 1 Types of filter Figure 3: Delay comparison of various filters The area occupied by the proposed method is reduced by approximately 7% compared to other methods of filter implementation as shown in figure 4. 80 70 Area Comparison 60 30 Baugh Wooley 50 Direct 40 Braun 20 Array 10 0 1 Types of filter Figure 4: Area comparison of various filters 6 Conclusion Pro posed method This research work presented a multiplier less technique, based on the DDA method with improved table look up based RNS for low area and high speed implementations of FIR filters. The validation carried out over VertexE devices where we observed significant speed improvement and area reductions over traditional methods. In future, we would like to modify our

algorithm to make use of the limited number of embedded resources available on the FPGA devices. References 1.A. Del Re, A. Nannarelli, and M. Re. Implementation of Digital Filters in Carry-Save Residue Number System, Proc. of 35th Asilomar Conference on Signals, Systems, and Computers, November 4-7( 2001) 2 A. Nannarelli, M. Re and GC. Cardarilli. Tradeoffs between Residue Number System and Traditional FIR Filters, Proc. of 2001 IEEE International Symposium on Circuits and Systems, Vol.II, pp.305-308, (2001). 3.G. C. Cardarilli, A. Del Re, A. Nannarelli, and M. Re. Residue Number System Reconfigurable Datapath,,Proc. of IEEE International Symposium on Circuits and Systems, Vol.II, pp. 756-759,(2002). 4.G. C. Cardarilli, A. Del Re, A. Nannarelli, and M. Re. Power Characterization of Digital Filters Implemented on FPGA, Proc. of IEEE International Symposium on Circuits and Systems, Vol.V, pp. 801-804(2002). 5. M.A. Sodestrand, W.K. Jenkins, G. A. Jullien, F. J. Taylor. "Residue Number System Arithmetic: Modern Applications in Digital Signal Processing", New York: IEEE Press (1986). 6.M.A. Soderstrand and K.Al Marayati. VLSI implementationof very high-order FIR filters, IEEE International Symposium on Circuits and Systems Vol.2, pp.1436-1439, (1995). 7.[9]M.N. Mahesh and M. Mehndale. `Low power realization of residue number system based fir filters'', Thirteenth International Conference on VLSI Design, pp. 30-33,(2000). 8. N.S. Szabo and R.I. Tanaka, "Residue Arithmetic and its Applications in Computer Technology",New York: McGraw- Hill. 9.S. K. Mitra, J. F. Kaiser."Handbook for Digital Signal Processing", John Wiley & Sons. 10.S.Palnitkar. Verilog HDL primer, SunSoft Press.