Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Similar documents
Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA.

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

Design and Study of Enhanced Parallel FIR Filter Using Various Adders for 16 Bit Length

Analysis and Synthesis of Weighted-Sum Functions

An Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 }

A High-Speed Realization of Chinese Remainder Theorem

Residue Number Systems Ivor Page 1

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Optimization of new Chinese Remainder theorems using special moduli sets

Low-complexity generation of scalable complete complementary sets of sequences

On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli

Forward and Reverse Converters and Moduli Set Selection in Signed-Digit Residue Number Systems

On the Complexity of Error Detection Functions for Redundant Residue Number Systems

EECS150 - Digital Design Lecture 21 - Design Blocks

A Deep Convolutional Neural Network Based on Nested Residue Number System

GENERALIZED ARYABHATA REMAINDER THEOREM

DESIGN AND IMPLEMENTATION OF EFFICIENT HIGH SPEED VEDIC MULTIPLIER USING REVERSIBLE GATES

Design and Implementation of Efficient Modulo 2 n +1 Adder

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))

FPGA accelerated multipliers over binary composite fields constructed via low hamming weight irreducible polynomials

Pipelined Viterbi Decoder Using FPGA

Computer Architecture 10. Residue Number Systems

A VLSI Algorithm for Modular Multiplication/Division

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES

International Journal of Advanced Research in Computer Science and Software Engineering

On LUT Cascade Realizations of FIR Filters

Fast Fir Algorithm Based Area- Efficient Parallel Fir Digital Filter Structures

doi: /TCAD

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at

High Speed Time Efficient Reversible ALU Based Logic Gate Structure on Vertex Family

A Gray Code Based Time-to-Digital Converter Architecture and its FPGA Implementation

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

GF(2 m ) arithmetic: summary

A 32-bit Decimal Floating-Point Logarithmic Converter

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

Hardware Operator for Simultaneous Sine and Cosine Evaluation

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA

Design and Implementation of REA for Single Precision Floating Point Multiplier Using Reversible Logic

Tate Bilinear Pairing Core Specification. Author: Homer Hsing

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

Information encoding and decoding using Residue Number System for {2 2n -1, 2 2n, 2 2n +1} moduli sets

Chinese Remainder Algorithms. Çetin Kaya Koç Spring / 22

Efficient random number generation on FPGA-s

Low Power, High Speed Parallel Architecture For Cyclic Convolution Based On Fermat Number Transform (FNT)

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

Logic Design II (17.342) Spring Lecture Outline

Lecture 8: Sequential Multipliers

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

DE58/DC58 LOGIC DESIGN DEC 2014

HARDWARE IMPLEMENTATION OF FIR/IIR DIGITAL FILTERS USING INTEGRAL STOCHASTIC COMPUTATION. Arash Ardakani, François Leduc-Primeau and Warren J.

Chinese Remainder Theorem

Sample Test Paper - I

Introduction to Digital Logic Missouri S&T University CPE 2210 Subtractors

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

A Low-Error Statistical Fixed-Width Multiplier and Its Applications

Performance Evaluation of Signed-Digit Architecture for Weighted-to-Residue and Residue-to-Weighted Number Converters with Moduli Set (2 n 1, 2 n,

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder

AREA EFFICIENT MODULAR ADDER/SUBTRACTOR FOR RESIDUE MODULI

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

Hardware Acceleration of the Tate Pairing in Characteristic Three

Implementation of Nonlinear Template Runner Emulated Digital CNN-UM on FPGA

ECE380 Digital Logic. Positional representation

A NOVEL APPROACH FOR HIGH SPEED CONVOLUTION OF FINITE AND INFINITE LENGTH SEQUENCES USING VEDIC MATHEMATICS

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

CHAPTER 2 NUMBER SYSTEMS

Efficient Polynomial Evaluation Algorithm and Implementation on FPGA

A Novel Efficient Hardware Implementation of Elliptic Curve Cryptography Scalar Multiplication using Vedic Multiplier

ABHELSINKI UNIVERSITY OF TECHNOLOGY

THE discrete sine transform (DST) and the discrete cosine

Elliptic Curve Group Core Specification. Author: Homer Hsing

A HIGH-SPEED PROCESSOR FOR RECTANGULAR-TO-POLAR CONVERSION WITH APPLICATIONS IN DIGITAL COMMUNICATIONS *

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series

Design and Implementation of High Speed CRC Generators

An Implementation of an Address Generator Using Hash Memories

DSP Configurations. responded with: thus the system function for this filter would be

COVER SHEET: Problem#: Points

Addition of QSD intermediat e carry and sum. Carry/Sum Generation. Fig:1 Block Diagram of QSD Addition

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Product Obsolete/Under Obsolescence. Quantization. Author: Latha Pillai

Introduction to the Xilinx Spartan-3E

FPGA Implementation of Ripple Carry and Carry Look Ahead Adders Using Reversible Logic Gates

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

Hardware Design I Chap. 4 Representative combinational logic

ISSN (PRINT): , (ONLINE): , VOLUME-5, ISSUE-7,

Design of Sequential Circuits

VHDL DESIGN AND IMPLEMENTATION OF C.P.U BY REVERSIBLE LOGIC GATES

High Performance GHASH Function for Long Messages

Tunable Floating-Point for Energy Efficient Accelerators

Digital/Analog Arithmetic with Continuous-Valued Residues

Cost/Performance Tradeoff of n-select Square Root Implementations

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Design and Implementation of a Low Power RSA Processor for Smartcard

Transcription:

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System (RNS) and Dynamic Distributed Arithmetic Algorithm (DDAA) in modern telecommunication and multimedia applications is becoming more and more important because it allows interesting advantages in terms of area, power consumption and speed.. This paper presents a general conversion procedure based on a {2 n - 1,2 n, 2 n + 1 } moduli set. Based on the improved table look up RNS and DAA algorithm, an architecture which efficiently implements the digital fir filter is synthesized using Xilinx VirtexE.It is observed that upto 82.85% reduction in number of slices,upto 100% reduction in number of flip flops and upto 87.21% reduction in number of Look Up Tables(LUT) is achieved. The speed of the filter is improved by 30.98%. Keywords: Residue arithmetic, Distributed arithmetic, FIRfilters, high speed, VLSI. 1 Introduction The advantages of Residue Number System (RNS) processing are discussed in several publications and books [5], [8], [9]. Error free computation, simplified and fast addition and multiplication, possibility to obtain parallel architectures are among the more important advantages. The use of the RNS allows the decomposition of a given dynamic range in slices of smaller range on which the computation can be efficiently implemented in parallel [1],[2]. In fact, the implementation of the converters constitutes a fixed overhead on the total area, delay and power dissipation [6], [7]. For these reasons the output conversion, which is generally performed using the Chinese Remainder Theorem (CRT) and Mixed Radix Conversion (MRC) still appears to be a crucial point in the realization of competitive RNS subsystems and, therefore, represents one of the main topics in the recent RNS research activities. Previous work demonstrated that FIR filters implemented in the Residue Number System (RNS) offer better performance of filters realized in the traditional binary system in terms of area and power dissipation [3],[4].This research paper deals with the design of digital fir filter based on improved table look up scheme for CRT and Dynamic Distributed Arithmetic (DDA).The block diagram of digital fir filter based on DAA and CRT is shown in fig.1. X (n) Binary to Residue converter Dynamic Distributed Arithmetic based FIR filter Residue to Binary converter Y (n) Figure 1: Block diagram of digital fir filter 2 Residue Number System A residue number system is defined by a set of N integer constants,{m 1, m 2, m 3,..., m N },referred to as the moduli. Let M be the least common multiple of all the m i. Any arbitrary integer X smaller than M can be represented in the defined residue number system as a set of N smaller integers {x 1, x 2, x 3,..., x N }with x i = X mod m i representing the residue class of X to that modulus.m is then the product of all the m i. The only requirement for a modulus to be in a set is that it has to be a pair-wise relatively prime to any other moduli in the set [5]. A moduli set can have moduli that have common factor, hence there are not relatively prime to one another. The example for this type of moduli set used in this research is {2 n - 1,2 n, 2 n + 1 }. Using CRT a binary number X can be found from its residue representation by n X = j R j / j (1) j=1 M Where X is the decimal representation of a number

j International Journal of Science and Engineering Research (IJ0SER), r j is the j th residue digit of X i.e. R j = X mod is the j th moduli X[n] Input L U A D Reg Buffer T D Y[n] j = M / and M = П here j and R j / j are constant. So equation (1) can be re written as n X = C j R j (2) j=1 M where C j = j 1 / j a = 1 / j is called multiplicative inverse. It can be calculated using the following formula a = 1 / j = ( j) m i -2 mod mj Equation (2) expands to X = C 1 R 1 +C 2 R 2 +.+C n R n ) mod M For j=1 to n, C j is known and can be stored in a look up table. However, to implement X,it requires n multiplication units and summation units capable of handling n inputs. This method is not suitable to meet the goal of maximizing the speed of the conversion. 3 Dynamic Distributed Arithmetic Based fir filter The input sequence is fed into the input buffer register at the input sample rate. The co-efficient are also fed to the corresponding buffer. The serial output is presented to the RAM based shift registers The RAM based shift register stores the data in a particular address. The outputs of registered LUTs are added and loaded to the scaling accumulator from LSB to MSB and the result which is the filter output will be accumulated on to the output register over the time. For an n bit input, n+1 clock cycles are needed for a symmetrical filter to generate the output. if there is any change in h[n], it will be updated and the resultant content is stored in the LUTs. This is shown in fig.2. For example, consider m1= 7, m2 = 8, m3 = 9. so M will be 504.Using the formulae stated above,the values of a 1 =4, a 2 =7 and a 3 =5.So C 1 = 288, C 2 = 441 and C 3 = 280. Therefore X= (288 R 1 + 441 R 2 + 280 ) mod 504. The direct implementation of the X requires 504 entries in the table shown in Table.1.This increases the size of look up table, latency and decreases the speed of operation. h[n] Co- Efficient Buffer S H I F T Figure 2: Block diagram of Dynamic Distributed Algorithm based FIR filter 4 Proposed method In this proposed method of RNS to binary conversion, the speed is increased by reducing the size of the table look up. Also this paper proposes the new formulae to find out the multiplicative inverses. The new given formulae perform only shifting operation compared to conventional method in which the calculation of multiplicative inverses are difficult and time consuming process. The following are the new formulae for the calculation of multiplicative inverses a 1 =2 n-1, a 2 =2 n -1, a 1 =2 n-1 +1 (3) R2 R 1 X 0 0 1 288 0 0 2 576 0 0 3 864 0 0 4 1152 0 0 5 1440 0 0 6 1728 0 1 0 441 0 1 1 729 To be continued till 504 entries Table 1: Direct implementation of X The depth of the table 1 can be reduced by using three separate tables, one for each moduli set to facilitate parallel access which is shown in Table 2.

Access R 2 R 1 X mod M 0 0 1 288 0 0 2 72 R 1 0 0 3 360 0 0 4 144 0 0 5 432 0 0 6 216 0 1 0 441 0 2 0 378 0 3 0 315 R 2 0 4 0 252 0 5 0 189 0 6 0 126 0 7 0 63 1 0 0 280 2 0 0 56 3 0 0 336 4 0 0 112 5 0 0 392 6 0 0 168 7 0 0 448 8 0 0 224 Table 2: Look up table for parallel Access The items fetched from the three tables are added together and mod M operation is done for final X value. For example, to convert the residue numbers {R1, R 2, } = {3, 4, 5} with moduli set {7,8,9}. From the table 2. R 1 =3 corresponds to 864, R 2 =4 corresponds to 1764 and =5 corresponds to 1400.Then the value of X is given by X= (864+1764+1400) mod 504 = 500.In this method of conversion mod operation must be performed which is time consuming and costly to implement in hardware. This mod operation can be eliminated by considering the following property. [X+ (a. m)] mod M =X for a=1,2.n. This is true for [X- (a. m)] mod M =X for a=1,2.n. From the table 2. the residue number { R 1, R 2, } ={1,0,0}with moduli set {7,8,9}corresponds to decimal number 288.It also corresponds to(-216) by the above definition. The new improved table shown in table 3. If { R 1, R 2, }={3,4,5}with moduli set {7,8,9},the corresponding decimal number is computed as follows: Conventional method (from table 2) {3,4,5} ={3,0,0} = 360 +{0,4,0} = 252 +{0,0,5} = 392 --------- 1004 mod 504 ---------- So the X value is 500.Here mod operation is must. Proposed method (from table 3) {3,4,5} ={3,0,0} = -144 +{0,4,0} = 252 +{0,0,5} = -112 --------- -004 ---------- If the sums of X values are negative which are not desired results (0<X<M), just add value of M to the X value. So the X value is -004+504 = 500. In this method, R 1 =6 corresponds to the decimal number 216 and R 1 =1 corresponds to the decimal number- 216. So R1=1 and R 1 = 6 are dual of each other. Hence it is possible to eliminate half of the entries in the table.3.in this proposed method, the speed can be increased by parallel accessing of residues R 1, R 2 and.also the area required for the implementation of the converter is reduced by reducing the size of the look up table using the property of the residue number.(note that { R 1, R 2, } = {0,0,0} is trivial and is eliminated from the table). Xilinx VirtexE are programmed using Verilog HDL; a popular hardware description language [10]. The language has capabilities to describe the behavioral nature of a design, the data flow of a design, a design s structural composition, delays and a waveform generation mechanism. Models written in this language can be verified using a Verilog simulator. Access R3 R 2 R 1 X mod M 0 0 1 288 (-216) 0 0 2 72 R1 0 0 3 360 (-144) 0 0 4 144 0 0 5 432 (-72) 0 0 6 216 0 1 0 441 (-63) 0 2 0 378 (-126) 0 3 0 315 (-189) R 2 0 4 0 252 0 5 0 189 0 6 0 126 0 7 0 63 1 0 0 280 (-224) 2 0 0 56 3 0 0 336 (-168) 4 0 0 112 5 0 0 392 (-112) 6 0 0 168 7 0 0 448 (-56) 8 0 0 224 Table 3: Proposed look up table

Delay (ns) Memory( MB) 15 Array International Journal of Science and Engineering Research (IJ0SER), 5 Results and Discussion The goal of research work is to compare the number of resources consumed by the DDA method with improved table look up scheme for RNS with that produced by other conventional methods. For our experiments, we considered 4 tap FIR filters and targeted the Xilinx VirtexE device. The constants were normalized to 4 digit of precision and the input samples were assumed to be 4 bits wide. We used the Xilinx Integrated Software Environment (ISE) for performing synthesis and implementation of the designs. All the designs were synthesized for maximum performance. Table 4. shows the reduction in the number of resources, in terms of the number of Slices, Look Up Tables (LUTs) and the number of Flip Flops (FFs).From the table 4., it is observed that the number of slices reduced by 82.85% and LUTs by 87.21%.The number of flip flops also reduced by 100% compared to other methods of filter implementation. The number of Input Output Block (IOB) requires is only 90 out of 126. Pro Method / Direct Baugh posed parameter Braun Array Wooley method Slices (3584) 175 217 229 208 30 Flip Flops (7168) 48 48 48 48 0 LUT (7168) 255 381 399 363 51 IOB (141) 126 126 126 126 90 Table 4: Resources utilization for the various filters. From the table.5, it is noted that proposed method only needs shifters, adders and subtractors.but direct method requires multipliers and registers. All other methods requires only adder - subtracter, register and shifters. Method / Direct Braun Baugh Array Proposed method parameter Wooley Registers 12 12 12 12 0 Adders/ Sub tractors 9 9 9 9 9 Multipliers 12 shifters 9 Table 5: Macro Statistics for the various filters. From the figure 3, it is cleared that the delay of proposed method is very much less than that of other methods of filter implementation. So its speed is increased by approximately 31%. 25 20 Delay Comparison 10 Braun Direct Baugh Wooley 5 Pro posed method 0 1 Types of filter Figure 3: Delay comparison of various filters The area occupied by the proposed method is reduced by approximately 7% compared to other methods of filter implementation as shown in figure 4. 80 70 Area Comparison 60 30 Baugh Wooley 50 Direct 40 Braun 20 Array 10 0 1 Types of filter Figure 4: Area comparison of various filters 6 Conclusion Pro posed method This research work presented a multiplier less technique, based on the DDA method with improved table look up based RNS for low area and high speed implementations of FIR filters. The validation carried out over VertexE devices where we observed significant speed improvement and area reductions over traditional methods. In future, we would like to modify our

algorithm to make use of the limited number of embedded resources available on the FPGA devices. References 1.A. Del Re, A. Nannarelli, and M. Re. Implementation of Digital Filters in Carry-Save Residue Number System, Proc. of 35th Asilomar Conference on Signals, Systems, and Computers, November 4-7( 2001) 2 A. Nannarelli, M. Re and GC. Cardarilli. Tradeoffs between Residue Number System and Traditional FIR Filters, Proc. of 2001 IEEE International Symposium on Circuits and Systems, Vol.II, pp.305-308, (2001). 3.G. C. Cardarilli, A. Del Re, A. Nannarelli, and M. Re. Residue Number System Reconfigurable Datapath,,Proc. of IEEE International Symposium on Circuits and Systems, Vol.II, pp. 756-759,(2002). 4.G. C. Cardarilli, A. Del Re, A. Nannarelli, and M. Re. Power Characterization of Digital Filters Implemented on FPGA, Proc. of IEEE International Symposium on Circuits and Systems, Vol.V, pp. 801-804(2002). 5. M.A. Sodestrand, W.K. Jenkins, G. A. Jullien, F. J. Taylor. "Residue Number System Arithmetic: Modern Applications in Digital Signal Processing", New York: IEEE Press (1986). 6.M.A. Soderstrand and K.Al Marayati. VLSI implementationof very high-order FIR filters, IEEE International Symposium on Circuits and Systems Vol.2, pp.1436-1439, (1995). 7.[9]M.N. Mahesh and M. Mehndale. `Low power realization of residue number system based fir filters'', Thirteenth International Conference on VLSI Design, pp. 30-33,(2000). 8. N.S. Szabo and R.I. Tanaka, "Residue Arithmetic and its Applications in Computer Technology",New York: McGraw- Hill. 9.S. K. Mitra, J. F. Kaiser."Handbook for Digital Signal Processing", John Wiley & Sons. 10.S.Palnitkar. Verilog HDL primer, SunSoft Press.