On the Design of an On-line Complex FIR Filter

On th sign of an On-lin Complx FIR Filtr Robrt McIlhnny Computr Scinc partmnt California Stat Univrsity, Northridg Northridg, CA 91330 rmcilhn@csun.du Miloš.Ercgovac Computr Scinc partmnt Univrsity of California, Los Angls Los Angls, CA 90095 milos@cs.ucla.du Abstract In this papr, w prsnt a novl implmntation for an N-tap complx finit impuls rspons (FIR) filtr, using complx numbr on-lin arithmtic, basd on adopting a rdundant complx numbr systm (RCNS) to rprsnt complx oprands as a singl numbr. W prsnt cost comparisons with (i) a ral numbr on-lin arithmtic approach, and (ii) a ral numbr paralll arithmtic approach, to dmonstrat a significant improvmnt in cost. I. INTROUCTION Th N-tap finit implus rspons (FIR) filtr is dfind as an output squnc y n (n =1,...) of an input squnc x n (n =1,...), inwhich y n = N 1 k=0 h k x n k (1) whr h k (k = 0, 1,...,N 1) ar th filtr cofficints. Th standard implmntation is shown in Figur 1. Assuming m-bit prcision, it rquirs N m-bit multiplirs and an N- oprand m-bit addr. For a complx FIR filtr, th filtr cofficints as wll th th input and output squncs ar complx numbrs. This significantly incrass th siz of th dsign, sinc an m-bit complx numbr multiplir is quivalnt to 4 m-bit ral numbr multiplirs and 2 m-bit ral numbr addrs. Sinc ara is a critical factor in FPGA dsign, w propos an approach that utilizs a radix 2j numbr systm, and that yilds a significant lowr cost than an altrnativ radix 2 on-lin implmntation and a ral numbr bit-paralll implmntation. x(n)... Rg Rg Rg h(0) h(1)... h(n-1) Fig. 1. Addr y(n) FIR filtr implmntation II. COMPLEX NUMBER ON-LINE FLOATING-POINT ARITHMETIC On-lin arithmtic [3] is a class of arithmtic oprations in which all oprations ar prformd digit srially, in a most significant digit first (MSF) mannr. Svral advantags, compard to convntional paralll arithmtic includ: (i) ability to ovrlap dpndnt oprations, sinc on-lin algorithms produc th output srially, most-significant digit first, nabling succssiv oprations to bgin bfor prvious oprations hav compltd; (ii) low-bandwidth communication, sinc intrmdiat rsults pass to and from moduls digitsrially, so connctions nd only b on digit wid; and (iii) support for variabl prcision, sinc onc a dsird prcision is obtaind, succssiv outputs can b ignord. On of th ky paramtrs of on-lin arithmtic is th on-lin dlay, dfind as th numbr of digits of th oprand(s) ncssary in ordr to gnrat th first digit of th rsult. Each succssiv digit of th rsult is gnratd on pr cycl. This is illustratd in Figur 2, with on-lin dlay δ =4. Th latncy of an onlin arithmtic oprator, assuming m-digit prcision is thn δ + m 1. input comput output Fig. 2. δ=4 On-lin dlay of a function Complx numbr on-lin arithmtic [5] uss a class of on-lin arithmtic oprators on complx numbr oprands. For fficint rprsntation, a Rdundant Complx Numbr Systm (RCNS) [1] is adoptd. A RCNS a radix rj systm, in which digits ar in th st { a,...,0,...,a}, whrr 2 and r 2 /2 a r 2 1. Such a numbr systm can b dnotd RCNS rj,a. A Rdundant Complx Numbr Systm with r =2, a =3dnotd RCNS 2j,3, allows as of th dfinition of primitiv on-lin arithmtic moduls, as wll as as of convrsion to and from othr rprsntations. This numbr systm was introducd as Quartr-imaginary Numbr Systm in [4]. For implmntation of th complx FIR filtr, 0-7803-8622-1/04/$20.00 2004 IEEE 478 Authorizd licnsd us limitd to: Univ of Calif Los Angls. ownloadd on cmbr 8, 2008 at 00:29 from IEEE Xplor. Rstrictions apply.

in ordr to prmit a rlativly wid rang of input valus, w assum floating-point arithmtic. Two on-lin floating-point arithmtic oprations ar usd: (i) RCNS 2j,3 on-lin floatingpoint addition; and (ii) RCNS 2j,3 on-lin floating-point constant cofficint multiplication. Th rcurrnc algorithms and implmntation paramtrs whn mappd to a Xilinx Virtx FPGA ar discussd in dtail. Using RCNS 2j,3, a floating-point complx numbr x = (X R + jx I ) (2j) x can b normalizd with rgard ithr to th ral componnt X R or th imaginary componnt X I, dpnding on which has largr absolut valu. Th xponnt x is shard btwn th ral and imaginary componnt. Exponnt ovrflow/undrflow can b handld by stting an xcption flag, and allowing procssing of rsults (although rronous) to continu. A RCNS 2j,3 fraction x is considrd normalizd if 2 1 max( X R, X I ) < 1. Th output of a complx numbr opration can b undrnormalizd for svral rasons: 1. Th rang of an output dtrmind by th on-lin algorithm allows it to b undrnormalizd. 2. igit cancllation rsulting from th addition/subtraction of numbrs with th sam xponnt valu. In this papr, w assum oprands of an RCNS 2j,3 onlin algorithm hav non-zro most significant digits and ar normalizd. Whn th rsult Z xcds th rang of a normalizd fraction (i.. max( Z R,Z I ) 1) thn th xponnt is incrmntd. Whn th rsult is blow th rang of a normalizd fraction (i.. max( Z R,Z I < 1 2 ), thn th xponnt is dcrmntd and lading zros ar discardd. Th normalization algorithm which taks as input th gnratd output digit z k, th output xponnt and th on-lin dlay for th arithmtic opration δ is shown blow. This is similar to th normalization algorithm prsntd in [2] for radix-2 on-lin rotation. (z k,,δ) don =0 /* Computation */ if k =(δ 2) and z k 0thn = +2 if k =(δ 1) and z k 0and not(don) thn = +1 ls if k δ and z k =0and not(don) thn = 1 ls if (k δ and z k 0)thn nd if III. RECOING ALGORITHMS Although RCNS 2j,3 allows flxibility in rprsntation, thr ar also svral drawbacks: Handling digits 3 and 3 rquirs producing significand multipls 3X and 3X, rquiring an xtra addition stp. A significand X with fractional ral and imaginary componnts X R and X I can hav intgr digits, such as (11.3212) 2j = 3 8 + 3 8j, which can complicat nsuring complx significands within th rang max( X R, X I ) < 1. To handl ths cass, svral rcoding moduls ar prsntd: (i) digit-st rcoding; and (ii) most-significant-digit rcoding. A. igit-st rcoding In ordr to rduc th complxity introducd by handling digits 3 and 3, digit-st rcoding initially rcods a RCNS 2j,3 digit x k { 3,...,3} into a pair of digits (t k 2,w k ),inwhicht k 2 { 1, 0, 1} and w k { 2,...,2} such that x k = 4t k 2 + w k.thnarcns 2j,2 digit χ k is computd as χ k = t k + w k. In ordr to rstrict χ k { 2,...,2}, two cass of pairs of valus must b prvntd: (i) t k =1, w k =2, (ii) t k = 1, w k = 2. Todoso,x k+2 is xamind. If x k+2 2 and x k =2, which could allow th first cas, x k is rcodd as (1, 2), othrwis as (0, 2). Inth sam way, if x k+2 2 and x k = 2, which could allow th scond cas, x k is rcodd as (1, 2), othrwis as (0, 2) Thn it is assurd that χ k { 2,...,2}. Th digit-st rcoding algorithm SREC is shown blow. SREC(x k,x k+2 ) (t k 2,w k )= χ k = t k + w k (1, 1) if x k = 3 (1, 2) if x k = 2 and x k+2 2 (0, 2) if x k = 2 and x k+2 < 2 (0, 1) if x k = 1 (0, 0) if x k =0 (0, 1) if x k =1 (0, 2) if x k =2and x k+2 > 2 (1, 2) if x k =2and x k+2 2 (1, 1) if x k =3 B. Most-significant-digit rcoding In ordr to handl carris producd whn prforming oprations on significands consisting of RCNS 2j,3 digits, mostsignificant-digit rcoding rcods most-significant rsidual digits w 1,w 0 { 1, 0, 1} of rspctiv wights (2j) 1 =2j and (2j) 0 =1, and digits w 1,w 2 { 3,...,3}, of rspctiv wights (2j) 1 and (2j) 2, into digits ω 1,ω 2 { 3,...,3} of rspctiv wights (2j) 1 and (2j) 2. Th algorithm MSREC for rcoding gnral digits w k 2 and w k into digit ω k is shown blow. 479 Authorizd licnsd us limitd to: Univ of Calif Los Angls. ownloadd on cmbr 8, 2008 at 00:29 from IEEE Xplor. Rstrictions apply.

MSREC(w k 2,w k ) ω k = 3 if (w k 2 =0and w k = 3) or (w k 2 =1and w k =1) 2 if (w k 2 =0and w k = 2) or (w k 2 =1and w k =2) 1 if (w k 2 =0and w k = 1) or (w k 2 =1and w k =3) 0 if w k 2 =0and w k =0 1 if (w k 2 =0and w k =1)or (w k 2 = 1 and w k = 3) 2 if (w k 2 =0and w k =2)or (w k 2 = 1 and w k = 2) 3 if (w k 2 =0and w k =3)or (w k 2 = 1 and w k = 1) Exponnt calculation x y xk yk SUBE ALIGN y k y k SWAP x k + y k,0 - - + MMP - + z + k-δ,1 Significand calculation + x k,1 - zk-δ,1 + y k,1 - x k,1 + + - + PPM - - y k,1 + x k,0 - - + MMP - + z + k-δ,0 + y k,0 - x k,0 + + - - zk-δ,0 + PPM - IV. RCNS 2j,3 ON-LINE FLOATING-POINT AITION RCNS 2j,3 floating-point addition (z = x + y) isdfind such that givn inputs x =(X R + jx I ) (2j) x and y = (Y R + jy I ) (2j) y, th output z =(Z R + jz I ) (2j) z is producd such that Z R = X R + Y R Z I = X I + Y I = max( x, y ) Each output digit at stp k, namly z k is gnratd basd on input digits x k+δ 1 and y k+δ 1. Th algorithm is shown blow, whr W E [k] is th low-prcision stimat of th vnindxd (ral) componnt of th rcurrnc W [k]. Th dsign of a m-digit significand and -bit xponnt RCNS 2j,3 onlin floating point addr is shown in Figur 3. Th SUBE unit computs th diffrnc of th xponnts. Th ALIGN unit prforms alignmnt of oprand y to synchroniz th arrival of th input digits. Th SWAP unit xchangs th oprands if ncssary. Th PPM and MMP moduls ar simpl full-addrs that appropriatly ngat (indicatd by - on th port) inputs and outputs to prform borrow-sav addition. Th unit normalizs th rsult by updating th output xponnt z k.a summary of cost of individual moduls is shown in Tabl I. Th dsign rquirs 3m+4+4 CLB slics. Assuming m =24 and =8, th cost is 108 CLB slics. TABLE I COST OF RCNS 2j,3 ON-LINE FLOATING-POINT AER Modul SUBE ALIGN SWAP CLB slics 3m PPM/MMP 4 2 Total cost 3m +4 +4 (2) Fig. 3. RCNS 2j,3 on-lin floating-point addr RCNS 2j,3 On-lin Floating-Point Addition d = x y =max( x, y ) W [ δ +1]=0 z 0 =0 for k = δ +2 to 0 do (x k+δ 1,y k+δ 1 )= (0,y k+δ 1 ) if d < 0 (x k+δ 1, 0) if d > 0 (x k+δ 1,y k+δ 1 ) if d =0 W [k] =2j(W [k 1]) + (2j) δ+1 (x k+δ 1 + y k+δ 1 ) /* Rcurrnc */ for k =1to m do (x k+δ 1,y k+δ 1 )= W [k] =2j(W [k 1] z k 1 ) +(2j) δ+1 (x k+δ 1 + y k+δ 1 ) z k = W E [k]+ 1 2 ) = (z k,,δ) (0,y k+δ 1 ) if k d and d < 0 (x k+δ 1, 0) if k d and d 0 (x k+δ 1 d,y k+δ 1 ) if k> d and d < 0 (x k+δ 1,y k+δ 1 d ) if k> d and d 0 480 Authorizd licnsd us limitd to: Univ of Calif Los Angls. ownloadd on cmbr 8, 2008 at 00:29 from IEEE Xplor. Rstrictions apply.

V. RCNS 2j,3 ON-LINE FLOATING-POINT CONSTANT COEFFICIENT MULTIPLICATION RCNS 2j,3 floating-point cofficint multiplication (z = xy) is dfind such that givn constant cofficint paralll input x = (X R + jx I ) (2j) x and variabl input y = (Y R + jy I ) (2j) y, th output z =(Z R + jz I ) (2j) z is producd such that Z R = X R Y R X I Y I Z I = X R Y I + X I Y R = x + y (3) Each output digit at stp k, namly z k is gnratd basd on paralll input vctor X and input digit y k+δ 1. Th algorithm is shown blow, whr W E [k] is th low-prcision stimat of th vn-indxd (ral) componnt of th rcurrnc W [k]. Th dsign of a m-digit significand and -bit xponnt RCNS 2j,3 on-lin floating point constant cofficint multiplir is shown in Figur 4. Th AER unit computs th sum of th xponnts. Th digit-vctor multiplir computs th product Xy k at ach itration. Th borrow-sav addr computs th sum W k of th prvious rsidual W k 1 and th intrmdiat product Xy k, and stors th rsult in th rgistr REG W. Th unit normalizs th rsult basd on th currnt output xponnt, th on-lin dlay δ, and th output digit z k. A summary of cost of individual moduls is shown in Tabl II. Th dsign rquirs 8m +3 +32 CLB slics. Assuming m =24and =8, th cost is 248 CLB slics. TABLE II COST OF RCNS 2j,3 ON-LINE FLOATING-POINT CONSTANT COEFFICIENT MULTIPLIER Fig. 4. Modul CLB slics Addr SREC 12 igit-vctor multiplir 4m Borrow-sav addr x δ Addr 4m 2 MSREC 20 Total cost 8m +3 +32 y y k SREC X igit-vctor Multiplir MSREC z k-δ Borrow-sav Addr Rg W RCNS 2j,3 on-lin floating-point constant cofficint multiplir RCNS 2j,3 On-lin Floating-Point Constant Cofficint Multiplication = x + y W [ δ +1]=0 Y [ δ +1]=0 z 0 =0 for k = δ +2 to 0 do W [k] =(2j)(W [k 1]) +(2j) δ+1 (Xy k+δ 1 ) Y [k] =Y [k 1] + y k+δ 1 (2j) k δ+1 /* Rcurrnc */ for k =1to m do W [k] =(2j)(W [k 1] z k 1 ) +(2j) δ+1 (Xy k+δ 1 ) z k = W E [k]+ 1 2 ) Y [k] =Y [k 1] + y k+δ 1 (2j) k δ+1 = (z k,,δ) VI. IMPLEMENTATION Each tap (slic) of th FIR filtr, assuming 24-digit significand and 8-bit xponnt floating-point oprands, consists of 24 digit-wid rgistrs to stor individual digits of inputs x(n),x(n 1),...,x(n N 1) and a complx numbr on-lin floating point constant cofficint multiplir. Each multiplir product is fd to on of th oprands of a complx numbr on-lin floating point addr. Th paralll addr in Figur 1 can b implmntd as a binary tr of complx numbr on-lin floating point addrs, ach on initially adding two intrmdiat multiplir outputs and producing an intrmdiat sum output, until th final output y(n) is computd. A. Radix 2j on-lin ntwork An N-tap complx FIR filtr can b dsignd as a ntwork of radix-2j on-lin floating-point arithmtic oprators, whr, assuming m-digit significands and -bit xponnts, a radix 2j on-lin floating-point addr has a cost of 3m +4 +4 CLB slics, and a radix 2 on-lin floating-point multiplir has a cost of 8m +8 +32 CLB slics. For m =24and =8,thcost of a radix 2j on-lin floating-point addr is 108 CLB slics and th cost of a radix 2j on-lin floating-point multiplir is 248 CLB slics. Sinc for an N-tap complx FIR filtr, N 1 radix 2j floating-point addrs and N radix 2j floating-point multiplirs ar usd, thn th cost is 356N 108 CLB slics. B. Radix 2 on-lin ntwork An N-tap complx FIR filtr can b altrnativly dsignd as a ntwork of radix-2 on-lin floating-point arithmtic oprators, whr, assuming m-digit significands and -bit 481 Authorizd licnsd us limitd to: Univ of Calif Los Angls. ownloadd on cmbr 8, 2008 at 00:29 from IEEE Xplor. Rstrictions apply.

xponnts, a radix 2 on-lin floating-point addr has a cost of 1.5m +3 +2 CLB slics, and a radix 2 on-lin floatingpoint multiplir has a cost of 3m +3 +2 CLB slics. For m =24and =8, th cost of a radix 2 on-lin floatingpoint addr is 70 CLB slics and th cost of a radix 2 on-lin floating-point multiplir is 98 CLB slics. Sinc for an N-tap complx FIR filtr, 3N 1 radix 2 floating-point addrs and 4N radix 2 floating-point multiplirs ar usd, thn th cost is 602N 70 CLB slics. C. Radix 2 paralll ntwork An N-tap complx FIR filtr can b altrnativly dsign as a ntwork of radix-2 paralll arithmtic oprators. Th library of Xilinx CORE arithmtic moduls [6], which can b scald in trms of prcision is usd. Sinc th moduls ar dfind for fixd-point arithmtic, appropriat xponnt handling units ar usd to support floating-point arithmtic. For 24-bit significands and 8-bit xponnts, th cost of a radix 2 paralll floating-point addr is 30 CLB slics and th cost of a radix 2 paralll floating-point multiplir is 320 CLB slics. Sinc for an N-tap complx FIR filtr, 3N 1 radix 2 floatingpoint addrs and 4N radix 2 floating-point multiplirs ar usd, thn th cost is 1370N 30 CLB slics. REFERENCES [1] T. Aoki, Y. Ohi, and T. Higuchi, Rdundant complx numbr arithmtic for high-spd signal procssing, 1995 IEEE Workshop on VLSI Signal Procssing, Oct. 1995, pp. 523-532. [2] M.. Ercgovac and T. Land, On-lin schm for computing rotation factors, Journal of parall and distributd computing, 1988. pp. 209-227. [3] M.. Ercgovac and T. Lang, igital Arithmtic, Morgan Kaufmann Publishrs, 2004. [4].E. Knuth, Th art of computr programming, Vol. 2, 1973. [5] R. McIlhnny, Complx numbr on-lin arithmtic for rconfigurabl hardwar: algorithms, implmntations, and applications, Ph.. issrtation, Univrsity of California, Los Angls, 2002. [6] Xilinx Corporation, Xilinx ata Book, 2004.. Cost comparison Th cost of th proposd radix 2j on-lin ntwork, and th altrnativ radix 2 on-lin ntwork and th radix 2 paralll ntwork ar compard for th implmntation of an N- tap complx FIR filtr for common valus of N, including N=8,16,64, and 256. In ach cas, w assum floating-point oprands consisting of 24-digit (or bit) significands and 8-bit xponnts, as shown in Tabl III. TABLE III COMPARISON OF COSTS FOR FLOATING-POINTN-TAP COMPLEX FIR FILTER (m =24, =8) N RCNS 2j,3 Radix-2 Radix-2 on-lin on-lin paralll 8 2740 4746 10930 16 5588 9562 21890 64 22676 38458 87650 256 91028 154042 350690 VII. CONCLUSION W hav dmonstratd a nw approach for implmntating an N-tap complx FIR filtr, basd on using complx numbr on-lin arithmtic moduls which adopt a rdundant complx numbr systm (RCNS) for fficint rprsntation. Significant improvmnt in cost in comparison to a radix-2 on-lin approach and a radix-2 paralll approach hav bn shown. This offrs motivation for furthr rsarch into othr applications utilizing complx numbr oprations. 482 Authorizd licnsd us limitd to: Univ of Calif Los Angls. ownloadd on cmbr 8, 2008 at 00:29 from IEEE Xplor. Rstrictions apply.