VLSI IMPLEMENTATION OF PARALLEL- SERIAL LMS ADAPTIVE FILTERS Run-Bo Fu, Paul Fotie Dept. of Electical and Copute Engineeing, Laval Univesity Québec, Québec, Canada GK 7P4 eail: fotie@gel.ulaval.ca Abstact - In this pape, a paallel ealization of the LMS algoith in FPGA is pesented. It is based on a paallel-seial ultiplie, in which one of the inputs and the outputs ae tansfeed in seies, ost significant digit fist. The ipleentation is elatively low in coplexity. The odulaity exhibited is attactive fo VLSI ipleentations. It can be ealized on a single chip o on a few odula chips fo ost pactical applications. I. Intoduction Adaptive filtes have found any applications in the aeas of counications and signal pocessing, such as echo cancelation, channel equalization, noise cancelation, and syste identification []. The diffeence between adaptive filtes and conventional digital filtes is that the foe needs an appopiate algoith fo updating the filte coefficients. The least-ean-squae (LMS) algoith, due to its siplicity and good convegence behavio, has been widely used in pactical applications []. Since LMS adaptive filteing involves a lage nube of coputations, paallel ipleentation of the LMS algoith is necessay fo eal-tie applications. Howeve, the ipleentation would be expensive in this case because N ultiplies would be needed fo N filte coefficients. Multiplie aays ae nown to be fast. But because of thei coplexity, just one aay can be ipleented on a chip. Anothe altenative is the use of paallel-seial ultiplies which ae slowe but also less coplex. In this pape, a paallel-seial ultiplie is used [3], [4]. It is based on an algoith which pefos ultiplication in the ost significant digit (MSD) fist anne. The pipeline of the ultiplie depends of the use of a edundant nube syste. The ipleentation of the LMS algoith pesented in this pape has thee steps, the coputation of the convolution tes, the suation of these tes and the adaptation algoith. By an elaboate design of the thee inds of ultiplies used, the thee steps ae ovelapped in a full pipeline fashion. The edundant aithetic addes and ultiplies which ae used fo the ealization of the LMS algoith ae pesented in Section II. The LMS algoith is descibed biefly in Section III. Then, the paallel ealization of the LMS algoith is shown in Section IV. Finally, the ipleentation of the algoith in FPGA is discussed in Section V. II. Redundant aithetic addes and ultiplies The ultiplication discussed in this section is caied out by suing the patial poducts in a deceasing ode. The patial esult is epesented in a signed-digit-binay (SDB) nube. The patial poducts ae pefoed in two s copleent (C) epesentation o SDB epesentation.. Redundant and hybid addes Two inds of nube epesentation ae used in the ultiplication. Let X and Y be C and SDB nubes, espectively, X = x x j j x, x {, } j X [, ] Y = y j j y j {,, } Y [, ] whee digit y j is coded by two bits c j and s j, such that y j = c j s j c j, s j {, } Unde this epesentation, the edundant adde (Fig..a) and hybid adde (Fig..b) ae obtained. The edundant adde caies out the addition of two SDB nubes. And the latte adde caies out the addition of an SDBN nube with a C nube. We will also need to convet an SDB nube into C epesentation in a data bloc fashion. Let X and Y be a C and SDB nube espectively. () () (3)
Y = y j j = ( c j s j ) j ( c j s j ) j = (4) By copaing equations (4) and (), we can get the convesion cicuit illustated in Fig.. It is obvious that the convesion is caied out by siply adding the two C nubes, C and S, which epesent the vecto foed by c j and s j, with an offset of -, the bit x is the invese of the ost significant cay bit.. Seial-paallel ultiplication The seial-paallel ultiplication descibed in this pape is caied out by suing the patial poducts in a deceasing ode. This allows fo geneating the esult digit by digit, ost significant fist, in a pipeline anne [5]. Assuing that the weights of the patial poducts decease by a facto, the sequence of the patial poducts can be descibed as { y j j : y y, j=,, } j ax The facto is the basis of the nube syste of the ultiplie opeand, and is the total steps of the accuulation. Without loss of geneality, we assue that the initial value of patial accuulation is x. The patial accuulation is expessed as X = x o y j j (6) Z j is the esult digit, which is extacted at each step, ost significant fist, fo the patial accuulation. The patial esult at step is epesented as Z = z j j z j { ρ,..., ρ} (7) whee [6] ---------- < ρ < (8) The esidual W which is the diffeence between the two the patial accuulations is given by the following expession: W = ( X Z ) x y j j z j j = W = W y z (5) (9) () The ultiplication is ealized as follows: Initialize fo W x W ' ( W y ) -- z Select ( W ') W W ' z -- End fo. In the pocedue of the ealization, thee issues, the selection ule, the doain of y j and the eo bound, should be appoached. To extact the esult digit fo the patial accuulation, we can convet the ost significant n bits of W ' into C nube. The ost significant 3 bits, of the C nube epesents the extacted digit. To ae sue that the extacted digit does not exceed the doain of [-ρ,..., ρ], y j is bound by ρ < W ' < ρ () With W ' = ( W y ) --, we have -- () ρ -- < W y < -- ρ -- Afte selection of z, W = W ' z --, so -- < W (3) < -- Fo () and (3), we have -- (4) ρ y -- ρ The input doain of y j is defined as Y C -- ρ (5) -- ρ [, ] The accuulation pocedue stops afte n steps. Since the esult is geneated one digit at each step, the less significant pat of the total esult is then ignoed. The nube which is ignoed in the last step is W n. In ode fo the esult to decease as the accuulation pogesses by a facto, a bound on the eo is found X n Z n = W n n < -- n (6) The pocedue descibed above esults in an easie design than the ethod descibed in [3]. We need not deteine the doain of convegence and the selection inteval by a
coplicated pocedue. It is enough to deteine the doain of W and Y though equation (3) and (5). Fo (3) and (5), we obseve that the doains of W and Y depend not only on ρ and, but also, the nube of bits to be conveted fo a edundant notation into to C. With definite ρ and, thee is a lowe bound fo. III. LMS algoith The LMS algoith is descibed as N Y ( n) = x i( n) c i( n) i = (7) E ( n) = Z ( n) Y ( n) c i( n ) = c i( n) µe ( n) x i( n) The convegence speed is a function of the step size µ which ust obey the following condition to insue that the convegence will be in a quadatic ean sense µ < (8) N ------------ P whee P is the powe of the piay signal. IV. Pipeline ealization of the LMS algoith By using the ultiplie in Section II, the LMS algoith is ealized in thee steps. The thee steps ae ovelapped in a pipeline anne. The fist step is to copute the convolution tes x i c i,i=,,,, N. We assue that the input x i epesented as a C nube as the ultiplicand, the coefficient c i is expessed as a edundant nube with ρ =, and the esult of the ultiplication is epesented as Z = z j 4 j (9) This eans that the paaetes and ρ ae defined as follows: = 4, ρ = 3 () A hybid adde is used. Fo = and fo equation (4), we have 3 3 ----- x () 6 i -- 8 Fig. 3 illustates the ultiplie used to cay out the convolution tes. The suation of the N extacted digits and the coputation of the eo ae caied out in the second step. The N digits ae sued up by a binay adde tee. The adde ipleentation is lie that of Fig..a, except that the pai of bits to the fa ight ust be eplaced by (,). In this way, the adde can add fou C nubes and give two nubes in C as its esult. Fo the addition in the binay-tee, the least significant bit of the esult in evey stage could not be cut down siply as it is the case fo usual addes because the extacted digit is in ost significant fist fashion. This eans that it needs a 3 bit adde in the fist stage of the binay tee, a 4 bit adde in the second stage of the binay tee, a 5 bit adde in the thid stage, and so on. The saple peiod depends on the nube of stages of the adde tee. If it caies out fou nubes to be added in one stage instead of two nubes, the nube of stages of the binay tee will decease fo log N to log 4 N. The sequence geneated by the binay tee deceases by a facto of 4 at each step. An accuulato (Fig. 4) is used to su this sequence, geneating the final esult digit by digit, ost significant fist. In this accuulato, a edundant adde is used. Since the esult will be used as the ultiplicand in the next step, the esult of the accuulation is epesented as Z = z j 4 j () This eans that = 4 ρ = Fo equation (4), with = 3, we have 5 x i 3 (3) (4) The last step is to copute the new coefficients. It also needs N ultiplies (Fig. 5). The ultiplicand is in C and saved in a paallel egiste. The ultiplie is the eo fo step, MS fist. The esulting coefficient is ead bac in step to cay out the next saple peiod. Hence, the esult of the ultiplication is also pesented as in equation (). The diffeence with the accuulation used in step is that a hybid adde is used hee. Fo the LMS filte with N = 8 coefficients, the pipeline peiod is indicated in Table. The saple peiod is cloc cycles. V. FPGA ipleentation Ou ipleentation of the LMS filte is odula. We need two types of odules, one fo the convolution coputations and one fo the updates. Moeove, all counications between odules ae seial, which equies naow buses. Each odule is quite easy to build because the ultiplies can be placed egulaly. The binay tees exhibit soe egulaity, which is an attactive chaacteistic fo VLSI ipleentations. Fast and siple hadwae pogaability ae the ey FPGA featues that educe anufactuing costs and allow the apid developent of custo coponents. With in-cicuit epogaability, FPGA can play an ipotant ole in the eseach on the algoith ipleentations. To achieve custoization flexibility, FPGAs sacifice aea and soe pocessing speed. The paallel ealization of the LMS
algoith poposed in this pape has been designed by seveal Xilinx FPGA chips (XC4). The cloc peiod is 5 ns. To ipleent the LMS algoith, two types of FPGA chips ae designed. The convolution coputation and adapto which shae the sae coefficients, copose the fist odule (Fig. 6), and the suato coposes a second odule. Moe than one Module chips ae used to copute the convolution and the new coefficients. The patial convolution esults ae sued and the eo signal is poduced by the second type of chip. The eo signal is ead bac to the fist ind of chip (Fig. 7). As an exaple of ipleentation, 8 ultiplies and adde can be put on a Xilinx 4 cicuit to ealize a 4 coefficient Module chip with bit input data and bit coefficients. This design uses 7 contol logic blocs, 39 I/ blocs and 4 pots. The cloc speed is MHz and, since it needs cloc cycles fo each adaptation, the data ate is MHz. Afte ipleenting the algoith in FPGA, we plan to cay on the ealization of the poposed pipeline LMS filte on application specific integated cicuit (ASIC). VI. Conclusion We have ipleented in FPGA an LMS filte using an achitectue based on edundant aithetic. The ipleentation is both siple and odula. The pefoance (sapling ate) is adequate fo any DSP and digital counication applications. Refeences [] B. Widow and S. D. Steans, Adaptive signal pocessing, Seies in signal pocessing, Pentice Hall, Englewood Cliffs, NJ, 985. [] S. Hayin, Adaptive filte theoy, Pentice-Hall, Englewood Cliffs, NJ, 987. [3] M. Lapointe, H. T. Huynh, and P. Fotie, Systeatic design of pipeline ecusive filtes, IEEE Tansactions on Coputes, vol. 4, no. 4, pp. 43-46, Apil 993. [4] M. Lapointe, P. Fotie, and H. T. Huynh, Fast paallel ealization of the LMS algoith in O(logN) coputation tie, 5th Biennial Syposiu on Counications, Kingston, Canada, June 99. [5] N. Weste and K. Eshaghian, Pinciples of CMOS VLSI design: A syste pespective, Addison-Wesley, Reading, MA, 985. [6] A. Avizienis, Signed-digit nube epesentations fo fast paallel aithetic, IRE Tans. Electon. Coput., vol., pp. 389-4, Septebe, 96. b b b 3 b 4 a a a 3 a 4 c c c c 3 c 4 (a) b b b b a a a 3 b 4 3 a 4 c c c c 3 c 4 (b) Figue. (a) edundant adde, (b) hybid adde. s s s s 3 4 c c c 3 c 4 y y y y 3 y 4 Figue. SDB to C convete. n 3 4 5 6 7 8 9 3 4 C i (n) X c c c c 3 c 4 c 5 c 6 c 7 c 8 c 9 c c c c 3 M i (n) 3 4 5 6 7 8 9 E(n) e e e e 3 e 4 e 5 e 6 e 7 e 8 e 9 C i (n) c c c c 3 c 4 c 5 c 6 Table. Pipeline cycle of LMS algoith, n = 3 4 5 6 7 8 9 3 4 5... C i (n).
Multiplicand X i(n) (C) x x x x 3 X X X X X X X X X X W 4 W 5 W 6 W 7 W 8 W 9 W W W W 3 c i Initialize to d d d W 4 W 5 W 6 W 7 W 8 W 9 W W esult digit Figue 3. Multiplie fo convolution coputation. S 3 S 4 S 5 S 6 q 4 q 5 W 6 S S W9 W 7 W 8 W W W e e e q 4 q 5 W 6 W 7 W 8 W 9 W Initialize to o -Zn Figue 4. Convolution accuulato. X i(n) (C) q 4 q 5 W6 W7 x x x x 3 x 4 X X X X X W 8 W 9 W W X W x 5 e (n) c c c q 4 q 5 W 6 W 7 W 8 W9 W Figue 5. Multiplie fo adaptation coputation.
e (n) X in Adapto Convolve Adapto Convolve...... Adapto Convolve X out Z Z Z p Figue 6. Bloc diaga of Module. Module Module Module X (n) X (n-p) X (n-p) X(n-Np) X (n-n)...... Module e (n) Figue 7. Bloc diaga of FPGA ipleentation of the LMS algoith.