Design and Implementation of Cosine Transforms Employing a CORDIC Processor

C16 1 Desig ad Implemetati f Csie Trasfrms Emplyig a CORDIC Prcessr Sharaf El-Di El-Nahas, Ammar Mttie Al Hsaiy, Magdy M. Saeb Arab Academy fr Sciece ad Techlgy, Schl f Egieerig, Alexadria, EGYPT ABSTRACT COrdiate Rtati DIgital Cmputer (CORDIC) is widely used i recet DSP applicatis, due t its simple ad well-desiged apprach that utilizes ly add ad shift peratis istead f multipliers. I this wrk, we itrduce a Field Prgrammable Gate Array (FPGA) implemetati f a CORDIC prcessr that prvides high perfrmace ad at the same time a efficiet implemetati area. This is achieved by icreasig the umber f segmets f the sigs vectr curve. This has led t a icrease i the umber f cmparatrs, hwever with small effect the implemetati area. As a applicati, we have develped a Discrete Csie Trasfrm (DCT) mdule that emplys the CORDIC cre t geerate csie terms. I. INTRODUCTION The Crdiate rtati algrithm, first itrduced by Vlder [1] fr fast cmputatis f trigmetric fuctis ad their iverses, is a well-kw ad widely studied methd fr plae vectr aalyses. The CORDIC methd ca be used fr multiplicati ad divisi, as well as fr cversi betwee biary ad mixed radix umber systems. Walther [2] has demstrated a uified CORDIC algrithm that ca be used t calculate trigmetric, expetial ad square rt fuctis alg with their iverses. Fudametally, the CORDIC methd evaluates elemetary fuctis simply by lkup tables, shift ad add peratis istead f multiplicati. A small umber f the rder f, where bits f precisi is required i the evaluati f the fuctis f pre-calculated fixed cstats is required t be stred i the lk-up table. The CORDIC algrithm has advatageus gemetrical iterpretatis. This is clear with trigmetric ad expetial fuctis that are evaluated via rtatis i the circular, hyperblic ad liear crdiate systems, respectively. Their iverses ca be implemeted i a vectrig mde i the apprpriate crdiate system [3]. The rgaizati f this paper is as fllws: secti 2 explais the thery f CORDIC, secti 3 illustrates CORDIC implemetatis, DCT architecture is explaied i secti 4, secti 5 shws the simulati results, ad fially we ed this paper with ur cclusis. II. CORDIC THEORY Rtatig a vectr i a cartesia plae by the agle Φ this ca be arraged s that x' = cs Φ [ x - y ta Φ ] (1) y' = cs Φ [ y + x ta Φ ] (2) If the rtati agles are restricted s that ta (Φ) = ± 2 -i, the multiplicati by the taget term is reduced t a simple shift perati. Arbitrary agles f rtati are btaiable by perfrmig a series f successively smaller elemetary rtatis. If the decisi at each iterati i, is which directi t relate rather tha whether r t t rtate, the the cs (Φ) term becmes a cstat (because cs (-Φ) = cs (Φ)). The iterative rtati ca w be expressed as: x i+1 = K i [ x i + m. d i. 2 i. y i ] (3) y i+1 = K i [ y i d i. 2 i. x i ] (4)

C16 2 Where d i = ±1, K i is a cstat ad m steers the chice f rectagular (m = 0), circular (m = 1), r hyperblic (m = 1) crdiate systems. I this wrk, we are wrkig i a circular crdiate system, s m will be always equal t e. Hwever, the required micr-rtatis are t perfect rtatis; they icrease the legth f the vectr, i rder t maitai a cstat vectr legth, the btaied results have t be scaled by a scalig factr K. Nevertheless, assumig csecutive rtatis i psitive ad egative directis r bth, the scalig factr is cstat ad ca be precmputed accrdig t the fllwig equati: -1-1 -2i -1 2 i ( ) (5) K = k = 1+ 2 i=0 i=0 Remvig the scalig cstat frm the iterative equatis yields a shift-add algrithm fr vectr rtati. The prduct f the K ca be applied elsewhere i the system r treated as part f a system prcessig gai r by iitiatig the rtatig vectr by the reciprcal f the gai f a certai umber f iteratis. The agle f a cmpsite rtati is uiquely defied by the sequece f the directis f the elemetary rtatis. That sequece ca be represeted by a decisi vectr. The set f all pssible decisi vectrs is a agular measuremet system based biary arctagets. A better cversi methd uses a additial addersubtractr that accumulates the elemetary rtati agles at each sigle iterati. The elemetary agles ca be expressed i ay cveiet agular uit. Thse agular values are supplied by a small lkup Table (e etry per iterati) r are hardwired, depedig the implemetati. The agle accumulatr adds a third differece equati t the CORDIC algrithm z i+1 = z i - d i. ta -1 (2 i ) (6) The CORDIC rtatr is rmally perated i e f tw mdes, the rtati mde ad the vectrig mde[5]. I the rtati mde, a vectr (x, y) is rtated by a agle θ. The agle accumulatr is iitialized with the desired rtati agle θ. The rtati decisi per iterati is made t dimiish the magitude f the residual agle i the agle accumulatr. The decisi per is therefre based the sig f the residual agle after each step. Naturally, if the iput agle is already expressed i the biary arctaget base, the agle accumulatr may be elimiated. Fr rtati mde the CORDIC equatis are Where x i+1 = x i + y i. d i. 2 i (7) y i+1 = y i - x i. d i. 2 i (8) z i+1 = z i - d i. ta -1 (2 i ) (9) d i = -1 if z i < 0, +1 therwise This prvides the fllwig results: x = A [ x cs z y si z ] (10) y = A [ y cs z + x si z ] (11) z = 0 (12) -2i A ( ) 1 2 = 1+ 2 (13) I vectrig mde, legth R ad the agle twards the x-axis α f a vectr (x, y) are cmputed. the CORDIC rtatr rtates the iput vectr thrugh whatever agle is ecessary t alig the result vectr with the x-axis The result f the vectrig perati is a rtati agle ad the scaled magitude f the rigial vectr (the x cmpet f the result) The vectrig fucti wrks by seekig t miimize the y cmpet f the residual vectr at each rtati. The sig f the residual y cmpet is used t determie which directi t rtate ext. If the agle accumulatr is iitialized with zer, it will ctai the traversed agle at the ed f the iteratis. I vectrig mde the CORDIC equatis are: Where x i+1 = x i - y i. d i. 2 i (14) y i+1 = y i + x i. d i. 2 i (15) z i+1 = z i - d i. ta -1 (2 i ) (16) d i = +1 if y i < 0, -1 therwise This prvides the fllwig results:

C16 3 2 2 x = A x + y 0 (17) y = 0 (18) -1 y z = z +ta ( x ) (19) -2i ( ) 1 2 A = 1+ 2 (20) This set f equatis will be used t desig the micr-architecture shw i the fllwig secti. III. IMPLEMENTATION OF THE CORDIC We have used the uflded structure f the CORDIC [5] t implemet the three mai equatis 7,8 ad 9 r 14,15 ad 16. Figure 1 shws that at each iterati we eed three adder-subtractrs ad a ROM t stre the arcta value, the shift perati will be hardwired i the system. Based a ivative methd t implemet the CORDIC [4] we ca be able t replace the z accumulati path that is used t make the decisi f the rtati directi, with a blck f cmparatrs, adder ad a register t stre the decisi vectr, r the sigs vectr. The micr-architecture is shw i Figure 2. Figure 1: CORDIC Implemetati Figure 2: New architecture The sigs vectr is geerated frm a liear equati that depeds ly the iput agle, Figure 3. -C0 -C 0 1 d = θ + -C 0 1 2 -C 0 1 2 3 (21)

C16 4 All these cstats deped ly the umber f iteratis, ad ca be calculated usig a Matlab mdel fr the CORDIC. The mst prper ly way t icrease the accuracy f the sigs vectr is t use ather curve that has mre segmets, but this will icrease the umber f cmparatrs. Accrdigly, a higher real estate area fr implemetati is required. Accrdig t the equati f DCT Figure 3: A fur-segmet curve IV. DCT MODULE USING THE CORDIC ( ) N -1 π 2x+1 u C( u ) = α( u) f ( x) cs x=0 2N, fr u = 0, 1, 2,., N-1 (22) Takig N equals t 16 pits, psitive r egative umber, there will be 16 csie fucti i the trasfrmati, but i the first csie fucti, at u = 0, the agle will be always equal zer, s its value will be 1. The iput pits t be trasfrmed will be etered sequetially, s at each clck tick we have a ew pit ready t be trasfrmed, because f this we use 15 CORDIC mdules t calculate the DCT f 16 pits, this helps t icrease thrughput, ad decrease latecy t 32 clck cycles. Each CORDIC mdule calculates csie fuctis f a sigle pit i the frequecy dmai as shw i Figure 4. The csie argumet is give by: ( ) θ = π 2x+1 u 2N It ctais tw variables x & u, bth f them assume take the value frm 0 t 15. Sice we d t have multipliers i ur desig, we have t put these argumets i a Lk Up Tables (LUTs). Accrdigly, the CORDIC mdule accepts iput agle ly i the rage frm 0 t 90 degrees. Therefre, we have t replace θ with ather agle i the rage frm 0 t 90 degrees. The value f the csie cmpet f this agle equals t the abslute value f the csie cmpet f the rigial agle. Mrever, we eed e bit mre fr each ew agle t idicate half f the agle. Hece, the csie cmpet will be psitive i the right had side ad egative i the left had side. The half bits will be stred i a LUT ad cected t the accumulatr t decide the ext perati will be addig r subtractig, as shw i figure 4. We ca geerate all agles usig cuters, but there is a advatage i usig LUT istead f cuters, because by usig LUT we ca stre Φ istead f the rigial agle θ, its csie cmpet is free frm the CORDIC gai ad als multiplied by ca adjust this agle t keep the CORDIC errr t miimum value. Φ = cs -1 ( 2 / 1.64676025 * cs θ) (23) 2, mrever we

C16 5 T save time ad area, we ca stre the sigs vectr i LUTs istead f the agle. I this case, we d t eed ay cmparatr ad we ca use a sigs vectr curve that gives very accurate results. Usig the LUT f the half bits t ctrl the accumulatr peratis saves the area ad speed f the pre ad pst-prcessrs that determie the quarter f the iput agle ad put the utput f the CORDIC i the right quarter. Figure 4: The DCT architecture V. EXPERIMENTAL RESULTS The desig is implemeted usig Xilix ISE 8.1 tls fr sythesis i a Vertix2Pr ad MdelSim SE 6.1 fr Simulati. The simulati f the 16-pit DCT is shw i figure 5. The sixtee iput pits are etered sequetially, e every clck cycle. Each pit is iserted t all 15 CORDIC mdules but with differet agles frm the LUTs, it takes 15 clck cycles t fiish the CORDIC iteratis ad eters the accumulatr, after 32 clck cycles the results f accumulatrs are passed t the utput prts all i the same time ad scaled by the factr 2 15. If the step betwee agles is 0.5 degree, the the maximum errr i the csie values f the CORIDC mdule is 1.831054*10-4 due t agle quatizati. The step betwee ay tw agles must be mre tha 0.447624 degree. This errr is acceptable i ur applicati because i the 16-pit DCT the step betwee the agles equals 5.625 degree. The maximum errr i DCT values is 0.002746. Figure 5: Simulati f the 16-pit DCT

C16 6 The maximum frequecy that shuld be used is 266.951 MHz, the ttal umber f slice registers is 10,327 ut f 27,392 (37%), the ttal equivalet gate cut fr the desig is 189,090, the flr pla is shw i figure 6. This flr pla prvides the required verificati f a cmpleted implemetati. Figure 6: Flr pla f the 16-pit DCT i Vertix2Pr CONCLUSIONS CORDIC-based desigs have several advatages whe buildig micr-architectures with multipliers. This priciple is used t save the implemetati area measured by the gate cut f the FPGA material. We ca reduce the area, ad at the same time keep a relatively high perfrmace adptig the CORDIC desig priciple. Mrever, t imprve ur desig, we use a liear crrelati betwee the rtati agle θ ad the crrespdig directi f all micr-rtatis ad icreasig the umber f segmets t imprve the accuracy f the results. This imprvemet f the accuracy will be the expese f a relatively small icrease i implemetati area. I the DCT mdule, we have made full use f this apprach by strig the agles i a LUT t prduce results already cmputed by the DCT factr. The reciprcal f the CORDIC gai is ccurretly cmputed withut usig ay additial multipliers. REFERENCES [1] J. E. Vlder, The CORDIC Trigmetric Cmputig Techique, IRE Tras. Electric Cmputers, vl. EC 8, pp. 330 334, Sep. 1959. [2] J. S. Walther, A uified Algrithm fr Elemetary Fuctis, i Prceedigs f the 38th Sprig Jit Cmputer Cferece, Atlatic City, New Jersey, pp. 379 385, 1971. [3] E. Grass, B. Sarker, ad K. Maharata A Dual-Mde Sychrus / Asychrus CORDIC Prcessr, Prceedigs f the 8th IEEE Iteratial Sympsium Asychrus Circuits ad Systems, Machester, UK, 2002. [4] M.W. Kharrat, M. Lulu, ad N. Masmudi, A ew methd t implemet CORDIC algrithm, i Prc. IEEE It. Cf. Electrics, Circuits ad Systems, Malta, vl. 2, pp. 715 718, Sept. 2001. [5] R.Adraka. A Survey f CORDIC Algrithms fr FPGA Based Cmputers, Prc. Of the 1998 CM/SIGDA Sixth Iteratial Sympsium FPGAs, Mterey, CA, pp.191-200, February 1998.