A New Design of Multiplier using Modified Booth Algorithm and Reversible Gate Logic

Similar documents
Implementation of Parallel Multiplier Accumulator based on Radix-2 Modified Booth Algorithm Shashi Prabha Singh 1 Uma Sharma 2

A Novel, Low-Power Array Multiplier Architecture

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

Parallel MAC Based On Radix-4 & Radix-8 Booth Encodings

Department of Electrical & Electronic Engineeing Imperial College London. E4.20 Digital IC Design. Median Filter Project Specification

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

Uncertainty in measurements of power and energy on power networks

CSE4210 Architecture and Hardware for DSP

NUMERICAL DIFFERENTIATION

Pulse Coded Modulation

Power Efficient Design and Implementation of a Novel Constant Correction Truncated Multiplier

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

High Performance Rotation Architectures Based on the Radix-4 CORDIC Algorithm

One-sided finite-difference approximations suitable for use with Richardson extrapolation

Lecture 4: Adders. Computer Systems Laboratory Stanford University

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl

Formulas for the Determinant

A High-Performance and Energy-Efficient FIR Adaptive Filter using Approximate Distributed Arithmetic Circuits

Module #6: Combinational Logic Design with VHDL Part 2 (Arithmetic)

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Foundations of Arithmetic

Continued..& Multiplier

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Improvement of Histogram Equalization for Minimum Mean Brightness Error

FFT Based Spectrum Analysis of Three Phase Signals in Park (d-q) Plane

Combinational Circuit Design

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION

VQ widely used in coding speech, image, and video

Kernel Methods and SVMs Extension

COEFFICIENT DIAGRAM: A NOVEL TOOL IN POLYNOMIAL CONTROLLER DESIGN

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Implementation and Study of Reversible Binary Comparators

Generalized Linear Methods

Topic 23 - Randomized Complete Block Designs (RCBD)

FPGA Implementation of Pipelined CORDIC Sine Cosine Digital Wave Generator

Bit Juggling. Representing Information. representations. - Some other bits. - Representing information using bits - Number. Chapter

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

A Hybrid Variational Iteration Method for Blasius Equation

On balancing multiple video streams with distributed QoS control in mobile communications

x = , so that calculated

A Low Error and High Performance Multiplexer-Based Truncated Multiplier

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Energy Storage Elements: Capacitors and Inductors

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Bit-Parallel Word-Serial Multiplier in GF(2 233 ) and Its VLSI Implementation. Dr. M. Ahmadi

ONE DIMENSIONAL TRIANGULAR FIN EXPERIMENT. Technical Advisor: Dr. D.C. Look, Jr. Version: 11/03/00

Novel Pre-Compression Rate-Distortion Optimization Algorithm for JPEG 2000

Lecture 10 Support Vector Machines II

arxiv:cs.cv/ Jun 2000

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

System in Weibull Distribution

Air Age Equation Parameterized by Ventilation Grouped Time WU Wen-zhong

A Novel Feistel Cipher Involving a Bunch of Keys supplemented with Modular Arithmetic Addition

829. An adaptive method for inertia force identification in cantilever under moving mass

Low Complexity Soft-Input Soft-Output Hamming Decoder

Improving XOR-Dominated Circuits by Exploiting Dependencies between Operands

A Fast FPGA based Architecture for Determining the Sine and Cosine Value

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Lab 2e Thermal System Response and Effective Heat Transfer Coefficient

Aging model for a 40 V Nch MOS, based on an innovative approach F. Alagi, R. Stella, E. Viganò

Some Consequences. Example of Extended Euclidean Algorithm. The Fundamental Theorem of Arithmetic, II. Characterizing the GCD and LCM

Temperature. Chapter Heat Engine

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Lecture 14: Forces and Stresses

Calculation of time complexity (3%)

Three-Phase Distillation in Packed Towers: Short-Cut Modelling and Parameter Tuning

Application of Nonbinary LDPC Codes for Communication over Fading Channels Using Higher Order Modulations

Estimating Delays. Gate Delay Model. Gate Delay. Effort Delay. Computing Logical Effort. Logical Effort

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Section 8.3 Polar Form of Complex Numbers

Numerical Heat and Mass Transfer

Investigation of a New Monte Carlo Method for the Transitional Gas Flow

Tutorial 2. COMP4134 Biometrics Authentication. February 9, Jun Xu, Teaching Asistant

Physics 5153 Classical Mechanics. Principle of Virtual Work-1

Quadratic speedup for unstructured search - Grover s Al-

Hardware Implementation of Multiple Vector Quantization Decoder

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Area Time Efficient Scaling Free Rotation Mode Cordic Using Circular Trajectory

The optimal delay of the second test is therefore approximately 210 hours earlier than =2.

ECE559VV Project Report

Appendix B: Resampling Algorithms

Digital Signal Processing

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

8.6 The Complex Number System

Errors for Linear Systems

Entropy Coding. A complete entropy codec, which is an encoder/decoder. pair, consists of the process of encoding or

Clock-Gating and Its Application to Low Power Design of Sequential Circuits

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

The Improved Montgomery Scalar Multiplication Algorithm with DPA Resistance Yanqi Xu, Lin Chen, Moran Li

Lecture 3: Shannon s Theorem

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Problem Set 9 Solutions

EGR 544 Communication Theory

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

FE REVIEW OPERATIONAL AMPLIFIERS (OP-AMPS)( ) 8/25/2010

EEE 241: Linear Systems

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing

Variability-Driven Module Selection with Joint Design Time Optimization and Post-Silicon Tuning

Fundamental loop-current method using virtual voltage sources technique for special cases

Message modification, neutral bits and boomerangs

Transcription:

Internatonal Journal of Computer Applcatons Technology and Research A New Desgn of Multpler usng Modfed Booth Algorthm and Reversble Gate Logc K.Nagarjun Department of ECE Vardhaman College of Engneerng, Hyderabad, Inda S.Srnvas Department of ECE Vardhaman College of Engneerng, Hyderabad, Inda Abstract: In ths paper we propose a new concept for multplcaton by usng modfed booth algorthm and reversble logc functons. Modfed booth algorthm produces less delay compare to normal multplcaton process. Modfed booth algorthm reduces the number partal products whch wll reduces maxmum delay count a the output. by combnng modfed booth algorthm wth reversble gate logc t wll produces further less delay compare to all other. In the past years reversble logc functons has developed as an mportant research area. Implementng reversble logc has the advantage of reducng the gate count, garbage outputs as well as constant nputs. Addton subtracton operatons are realzed usng reversble DKG gate. Ths modfed booth algorthm wth reversble gate logc are syntheszed and smulated by usng Xlnx 13.2 ISE smulator. Keywords: MBE, Reversble logc gate, DKG, reversble logc crcuts. Carry save adder tree. 1. INTRODUCTION The Contnuous advances of the mcroelectronc technologes makes better use of nput energy, to encode the data more effcently, to transmt the nformaton more faster and relable, etc. In Partcular, many of these technologes handle low power consumpton to meet the requrements of varous outboard applcatons. In these applcatons, a multpler s a fundamental arthmetc unt and used n a great extent n crcuts. The fastness of multplcaton and addton arthmetc s decdes the executon speed and performance of the total calculaton. Because the multpler needs the longest delay wthn the basc operatonal blocks n dgtal systems, the crtcal path s evaluated by multpler n general. For hgh speed multplcaton, the modfed radx-4 booth s algorthm (MBA)[1] s generally used however ths cannot completely solve the problem. In general, the multpler uses Booth s algorthm and array of full adders (FAs)[3], or Wallace tree rather than the array of FAs.,.e., ths multpler prmarly conssts of the three parts: Booth encoder[6], a tree to compact the partal products such as Wallace tree[1], and the fnal adder[1]. Because Wallace tree s to add the partal products. The most effcent way to gan the fastness of a multpler s to cut down the number of the partal products as multplcaton proceeds a seres of addtons for the partal products. To cut down the number of calculaton steps for the partal products, MBA[1] algorthm has been employed mostly where Wallace tree has taken the role of ncreasng the fastness to add the partal products. To ncrease the fastness of the MBA algorthm, many parallel multplcaton archtectures have been explored and they have employed to varous dgtal flterng calculatons. here In ths desgn we use reversble logc gates[2] n the place of full adders to reduce the power and delay. A reversble logc crcut should have features lke use mnmum number of reversble gates[2], use mnmum number of garbage outputs, use mnmum constant nputs. Fg 1: Basc arthmetc steps for multplcaton and accumulaton In ths paper, A New Desgn of Multpler usng Modfed Booth Algorthm And Reversble Gate Logc are mplemented.secton2 dscusses overvew of MAC.Secton3 ntroduces reversble logc gate functon Secton4 covers the proposed MAC and archtecture of CSA tree. Expermental results are showng the smulaton results of proposed desgn. 2. OVERVIEW OF MAC In ths secton, general basc MAC[1] operaton s ntroduced. A multpler can be dvded nto three functonal steps. The frst s radx-2 modfed Booth encodng[1] n whch a partal product s produced from the multplcand and the multpler. second s the adder array or partal product compresson to www.jcat.com 743

Internatonal Journal of Computer Applcatons Technology and Research add all the partal products and change them nto the form of sum and carry. The last step s the fnal addton n whch the fnal multplcaton result s produced by addng the sum and carry. If the process to accumulate the multpled results s ncluded, a MAC conssts of four steps, as shown n Fg. 1, whch shows the operatonal steps. Fg 2: Hardware archtecture of general MAC A Basc hardware archtecture of ths MAC s shown n Fg. 2. It carres out the multplcaton operaton by multplyng the nput multpler X and the multplcand Y. Ths result s added to the prevous multplcaton result Z as accumulaton step. Fg. 2. Hardware archtecture of the general MAC. The - bt 2 s complement bnary number can be expressed as N 2 N 1 2 N 1 2 0 X x x x 0,1.. (1) If the equaton (1) s expressed n base-4 type redundant sgn dgt form to apply the radx-2 Booth s algorthm, t would be. N /21 X 0 d 4 21 2 21... (2) d 2x x x.. (3) If the equaton (2) s used, multplcaton can be expressed as N / 21 2 X Y d 2 Y 0..(4) If these equatons are used, the before mentoned multplcaton accumulaton results can be expressed as N /21 2N 1 0 j0.(5) P X Y Z d 2 Y z 2. Each of two terms on the rght sde of (5) s evaluated ndependently and the fnal result s produced by addng the two results. The MAC archtecture enforced by (5) s called the standard desgn. If the N-bt data are multpled, the number of generated partal products s proportonal to N. In order to add that partal products serally, the tme of executon s also proportonal to N. The fastest archtecture of a multpler, whch uses radx-2 Booth encodng to generate partal products and a Wallace tree based on CSA as the adder array to add the partal products. If radx-2 Booth encodng s used, the number of partal products,.e., the nputs to the Wallace tree, s reduced to half, resultng n the decrease n CSA tree step. In addton, the sgned multplcaton based on 2 s complement numbers s also possble. Due to these reasons, most current used multplers adopt the Booth encodng. 3.Reversble Logc DKG Gate Reversble functon s the man objectve of the reversble logc theory. A 4* 4 reversble logc DKG[2] gate that can work unquely as a reversble Full adder and a reversble Full subtractor s shown n Fg 3. It can be verfed that nput pattern representng to a partcular output pattern can be unquely determned. If the nput A=0, the proposed gate acts as a reversble Full adder DKG gate, and f nput A=1, then t acts as a reversble Full subtractor DKG gate. It has been proved that at least the two garbage outputs are requred by the reversble full-adder crcut to make the output combnatons unque. P B Fg 3: Reversble DKG gate Q AC A D R ( A B)( C D) CD S B C D The bnary full adder/subtractor s capable of handlng the one bt of each nput along wth a carry n/borrow n generated as a carry out/ borrow from addton of prevous lower order bt poston. n bnary full adders/subtractors are cascaded to add two bnary numbers each of n bts. P A g1 Q B g2 R A( B C) BC carry S A B C sum Fg 4: DKG gate mplemented as Full adder P A g1 _ Q C g2 R A( B C) BC Barrow S A B C Dfference Fg 5: DKG gate mplemented as Full subtractor A Parallel adder/subtractor s an cascaded of full adders/subtractors and nputs are smultaneously appled. The www.jcat.com 744

Internatonal Journal of Computer Applcatons Technology and Research carry/borrow produced at a stage s propagated to the next stage. When the control nput A=0, the crcut behaves as a parallel adder, generates a 4 bt sum and a carry out, as shown n Fg 4. If the control nput A=1, the crcut behaves as a parallel subtractor, generates a 4 bt dfference and a borrow out, as shown n Fg 5. Fg 7: Hardware archtecture of the proposed MAC Fg 6: Reversble Parallel adder when A/S=0 4. PROPOSED ARCHITECTURE In ths secton, By examnng the varous archtectures, we have observed that delay can be mproved by usng hgher radx MBA whch reduces number of partal product rows that eventually reduces number of multplcaton thereby mprovng speed. Thus we propose a new hgh speed and area effcent MAC archtectures whch wll be an mprovement over the exstng Archtecture by replacng Radx-2 wth Radx-4 and Modfed Booth Encoder n the Multplcaton Stage. The output of multplcaton and accumulaton stages wll be combned usng hybrd reducton through CSA, CLA and HA whch enhancng speed and effcency. In addton, the Fnal Stage of CSA wll nclude a New Reversble logc DKG gate was replaced a full adder. The new adder wll be an mprovement over the exstng CSA. by replacng Reversble logc DKG[2] gate n CSA The desgn wll be mplemented usng Verlog language and smulated usng Xlnx ISE13.2 Smulator. We expect the proposed MAC wll be useful n hgh performance sgnal processng system. In Fg. 7 the proposed archtecture were shown and t contan booth encoder and CSA & Accumulator tree and the tree contans half adder reversble logc DKG gate whch wll reduce the delay of the adder and a fnal adder to add the fnal sum and carry. The archtecture of the hybrd-type CSA that comples wth the operaton of the proposed MAC s shown n Fg.8, whch performs 8X8-bt operaton. It was formed based on the prevous archtectures. In Fg.8, S s to smplfy the sgn expanson and N s to compensate 1 s complement number nto 2 s complement number. S[] and C[] and correspond to the th bt of the feedback sum and carry. Z[] s the th bt of the sum of the lower bts for each partal product that were added n advance and Z[]s the prevous result. In addton corresponds to the th bt of the I th partal product. Snce the multpler s for 8 bts, totally four partal products(p 0 [7:0]~P 3 [7:0]) are generated from the Booth encoder., Ths CSA requres at least four rows of DKG full adders for the four partal products. Thus, totally fve DKG full adder rows are necessary snce one more level of rows are needed for accumulaton. For an 8-bt MAC operaton, the level of CSA s (n/2+1). The whte square n Fg. 8 represents an reversble DKG full adder and the gray square s a half adder (HA). The rectangular symbol wth fve nputs s a 2-bt CLA wth a carry nput. The crtcal path n ths CSA s determned by the 2-bt CLA. It s also possble to use FAs to mplement the CSA wthout CLA. However, f the lower bts of the prevously generated partal product are not processed n advance by the CLAs, the number of bts for the fnal adder wll ncrease. When the entre multpler or MAC s consdered, t degrades the performance. the characterstcs of the proposed CSA archtecture have been summarzed and brefly compared wth other archtectures. For the number system, the proposed CSA uses 1 s complement, but ours uses a modfed CSA array wthout sgn extenson. The bggest dfference between ours and the others s the type of values that s fed back for accumulaton. Ours has the smallest number of nputs to the fnal adder. In the carry save adder archtecture more number of full adders were used and that wll replaced by reversble DKG gate n the proposed system by ths the overall performance was further ncreased the half adders n carry save adder were used as t s because no bgger dfference n the normal half adder and reversble DKG half adder. In addton, we compared the proposed archtecture wth that the prevous booth multpler. Because of the dffcultes n comparng other factors, only delay s compared. The szes of both MACs were 8X8 bts and mplemented by usng Xlnx 13.2 smulator. The delay of ours was 30.38ns whle n prevous t was 34.35ns,whch means that ours mproved about 4.2ns of the delay performance. Ths mprovement s manly due to the reversble logc DKG full adder. The archtecture from the prevous should nclude a normal full adders that wll consumes maxmum delay for that we used s very effectve and gves maxmum performance. www.jcat.com 745

Internatonal Journal of Computer Applcatons Technology and Research S0 C[1] P0[7] C[0] Z[6] P0[7] Z[6] P0[6] Z[5] P0 [5] Z[4] P0[4] Z[3] P0[3] Z[2] P0[2] Z[1] P0 [1] Z[0] P0[0] 1 0 S1 C[3] C[2] N0 P1[7] P1[7] P1[6] P1[5] P1[4] P1[3] P1[2] P1[1] P1[0] Z[1:0] S2 C[5] C[4] N1 P1[7] P2[7] P2[6] P2[5] P2[4] P2[3] P2[2] P2[1] P2[0] Z[3:2] S3 C[7] C[6] N2 P3[7] P3[7] P3[6] P3[5] P3[4] P3[3] P3[2] P3[1] P3[0] Z[5:4] N3 S[7] S[6] S[5] S[4] S[3] S[2] S[1] S[0] Z[7:6] C[7] S[7] C[6] S[6] C[5] S[5] C[4] S [4] C[3] S[3] C[2] S[2] C[1] S[1] C[0] S[0] Fg 8: Hardware archtecture of proposed CSA tree 5. EXPERIMENTAL RESULTS The proposed archtecture was smulated and syntheszed by usng Xlnx 13.2 tool and cadence vrtuoso Fg 9: TOP Level Schematc of proposed MAC www.jcat.com 746

Internatonal Journal of Computer Applcatons Technology and Research Fg 10: Output of Modfed booth algorthm wth reversble gate logc Table 1.Comparson table of dfferent parameters Parameter MBE [1] MBE wth reversble gate logc No. of slces used 178 out of 4656 (3%) 162 out of 4656 (3%) No. of LUT s 355 out of 9312(3%) 294 out of 9312 (3%) Total Delay 34.535ns 30.38ns 6.CONCLUSION In ths paper, a new MAC archtecture to perform the multplcaton-accumulaton, To effcently process the dgtal sgnal processng and multmeda applcaton ths archtecture was proposed. the overall MAC performance has been mproved By elmnatng the ndependent accumulaton process that has the greatest delay and mergng t to the compresson process of the partal products, almost twce as much as n the prevous work and by replacng full adder n the CSA wth reversble logc gate further mproves the performance. The proposed hardware was mplemented and syntheszed through Xlnx ISE 13.2 tool. Consequently, the proposed archtecture can be used effectvely where we requrng hgh throughput such as a real-tme dgtal sgnal processng. [3] O. L. MacSorley, Hgh speed arthmetc n bnary computers, Proc.IRE, vol. 49, pp. 67 91, Jan. 1961. [4] S. Weser and M. J. Flynn, Introducton to Arthmetc for Dgtal Systems Desgners. New York: Holt, Rnehart and Wnston, 1982. [5] A.R. Omond,Computer Arthmetc Systems. Englewood Clffs, NJ:Prentce-Hall, 1994. [6] A.D.Booth, A sgned bnary multplcaton technque, Quart. J.Math., vol. IV, pp. 236 240, 1952. [7] C. S. Wallace, A suggeston for a fast multpler, IEEE Trans. Electron Computer., vol. EC-13, no. 1, pp. 14 17, Feb. 1964. [8] A.R. Cooper, Parallel archtecture modfed booth multpler, Proc. Inst. Electr Engg. Vol. 135, pp 1251-1988. 7.REFERENCES [1] A New VLSI Archtecture of Parallel Multpler Accumulator Based on Radx-2 Modfed Booth Algorthm Young-Ho Seo, Member,IEEE,and Dong-Wook Km, Member, IEEE [2] A Dstngush Between Reversble And Conventonal Logc Gates B.Raghu Kanth, B.Mural Krshna, M. Srdhar, V.G. Shant Swaroop, www.jcat.com 747