Microcontrollers. Fast - Fourier - Transformation. þ additional file AP EXE available

Microcontrollers Apote AP634 þ additional file AP634.EXE available ast - ourier - Transformation The ast ourier Transformation (T is an algorithm frequently used in various applications, like telecommunication, signal and image processing. It transforms the time domain into the frequency domain where the spectrum of amplitude and frequency can be analyzed. K. Westerholz / Siemens HL MCB AT Semiconductor Group 3.97, Rel.

Abstract... 3 T - Derivation of the algorithm... 3 3 Implementation... 7 4 low chart of 4-point real valued T... 5 lowchart of the Midloop... 6 Register Use and content... 3 AP634 Apote - Revision History Actual Revision : Rel. Previous Revison: ne Page of actual Rel. Page of prev. Rel. Subjects changes since last release Semiconductor Group of 3 AP634 3.97

Abstract The ast ourier Transformation (T is an algorithm frequently used in various applications, like telecommunication, signal and image processing. It transforms the time domain into the frequency domain where the spectrum of amplitude and frequency can be analyzed. This application te describes an implementation of a real-valued 4 point decimation in time radix- T for the C66 microcontroller family. Assuming that the code is started out of the internal ROM via a 6-bit demultiplexed bus, an execution time of ms has been achieved for a C65 running at 5 MHz internal clock. The code comprises 88 bytes. This application te is based on an application te performed by pls (Programmierbare Logik Systeme, Hoyerswerda, Germany. T - Derivation of the algorithm Starting from the continuous time ourier Transformation, the discrete ourier Transformation can be derived as a function of sample points (: nk X( k = W x( n n= nk jπ/ nk k=,,...,- W e nk j nk = = π π cos sin This formula can be regarded as a matrix W nk multiplied by the input data vector x(n. This simple DT has the complexity. The coefficients of the matrix W nk will be deted in the following as twiddle factors. In order to derive the radix- T algorithm, we decompose the transformation into two partial transformations, one containing the input data with even indices, and the other with k k+( / k + k odd. Exploiting symmetries W = W and W = W of the twiddle factors a decomposition is achieved that only requires ( + multiplications. / nk k X( k = x( n W + W x( n+ W n= / k = Pk ( + W Qk ( / n= nk / / nk k Xk ( + / = x( nw W x( n+ W n= / k = Pk ( W Qk ( / n= nk / k (/- Semiconductor Group 3 of 3 AP634 3.97

The number of multiplications can be further cut to ( log by repetitive employing this process to the partial transformations P(k and Q(k till they are completely decomposed into point DTs (Discrete ourier Transformations. The -point DT is commonly referred to as the butterfly operation. The butterfly requires two complex multiplications resulting in four real valued multiplications for computing the terms P(k and W k Q(k. Pn+ = P n + Q n W k n=stage Qn+ = P n - Q n W k since W k = e -j(π/k = cos(u - j sin(u and u = (π/k Re(P n Im(P n Re(Q n Im(Q n W k Re(P n+ Im(P n+ Re(Q n+ Im(Q n+ Having only real valued input data x(n, the computational effort of a point T can be reduced to a / point complex T. irstly, even indexed data hn ( = x( n and odd indexed data gn ( = x( n+ are separated. The index k is running from to -. h(n and g(n have the spectra H(k and G(k respectively. The spectrum X(k can be decomposed into the spectra H(k and G(k as follows: ourier Transformation πk πk xn ( Xk ( = Hk ( + j(cos jsin Gk ( In order to cut the above point transformation into an / point transformation a complex input vector y(n = h(n + jg(n is formed with the index n running from to /-. The real input values are formed by the even indexed input data h(n. The imaginary part is formed by the odd indexed input data g(n. Then y(n is transformed into the frequency domain resulting in a spectrum consisting of a superposition of the spectra H(k and G(k. yn ( = hn ( + jgn ( ourier Transformation Yk ( = Hk ( + jgk ( = Rk ( + jik ( ow the complex spectra H(k and G(k have to be extracted out of the complex spectrum Yk ( = Hk ( + jgk ( = Rk ( + jik (. By employing symmetry relations, the spectra H(k and G(k can be derived from the spectrum Y(n as follows: { Hk} = Im { Hk ( } Re ( Rk ( + R ( / k Ik ( + I ( / k { Gk} = Im { Gk ( } Re ( Ik ( I ( / k = Rk ( R ( / k = These spectra inserted into the equation for X(n deliver the full spectrum. Semiconductor Group 4 of 3 AP634 3.97

In order to illuminate the T algorithm, the example given below shall demonstrate the computational process during an 8-point T. The input data consists of 8 complex numbers X,... X 7 which can also be perceived as 6 real numbers x(...x(5. Input Stage Stage Stage 3 Output X X X X 3 X 4 X 5 X 6 X 7 X( X( X( X(3 X(4 X(5 X(6 X(7 X(8 X(9 X( X( X( X(3 X(4 X(5 W W W W W W W W W W W W 3 X( X( X(8 X(9 X(4 X(5 X( X(3 X( X(3 X( X( X(6 X(7 X(4 X(5 X X 4 X X 6 X X 5 X 3 X 7 Semiconductor Group 5 of 3 AP634 3.97

Corresponding to the above butterfly operation, the input data are connected with each other. Due to the nature of this operation the input data is sequentially ordered, while the output data is in bitreversed order. In each stage four (=/ butterflies are computed. The twiddle factor W in the st stage is always W 8 =. In the nd stage the upper half consists again of the butterflies from the previous stage. The second half of the butterflies are computed with twiddle factor W 8. In the third stage the twiddle factors of the previous stage appear again in the upper half and the lower half contains the factors W 8 and W 3 8. or an 8-point-T the twiddle factors W k = e -j(π/k = cos(x - j sin(x have the following values. Twiddle factor cos(x sin(x W 8 = e -j(π/4 ½ ½ W 8 = e -j(π/ W 3 8 = e -j(3π/4 -½ ½ W 4 8 = e -jπ = - - W 5 8 = e -j(5π/4 = - ½ ½ 6 = e -j(3π/ = - 7 = e -j(7π/4 = 3 -½ ½ 4 3 5 Im(W 6 7 Re(W Because of the symmetry of the twiddle factor only the first four (W8,..., W8 3 must be computed. These four twiddle factors lead to degenerated butterflies. They can be used to precompute the butterflies in the first three stages of the T to reduce the number of multiplications. or the twiddle factors W, W, W and W 3 the butterfly computation can be simplified. or W k = W = we have cos(x = ; sin(x =. This results in a butterfly without any multiplications: Pn+ = [Re(Pn + Re(Qn + j [Im(Pn + Im(Qn] Qn+ = [Re(Pn - Re(Qn] + j [Im(Pn - Im(Qn] Semiconductor Group 6 of 3 AP634 3.97

or W k = W = ½ ( - j we get cos(x = sin(x = ½. This results in a butterfly for the third stage where only two multiplications must be executed: Pn+ = [Re(Pn + Re(Qn cos(x + Im(Qn cos(x] + j [Im(Pn + Im(Qn cos(x - Re(Qn cos(x] Qn+ = [Re(Pn - Re(Qn cos(x - Im(Qn cos(x] + j [Im(Pn - Im(Qn cos(x + Re(Qn cos(x] or W k = W = -j we have cos(x = ; sin(x =. This results in a butterfly without any multiplications used in the second and third stage: Pn+ = [Re(Pn + Im(Qn] + j [Im(Pn - Re(Qn] Qn+ = [Re(Pn - Im(Qn] + j [Im(Pn + Re(Qn] or W k = W 3 = ½ (- - j we get -cos(x = sin(x = ½. This results in a butterfly for the third stage where only two multiplications must be executed: Pn+ = [Re(Pn + Re(Qn cos(x - Im(Qn cos(x] + j [Im(Pn + Im(Qn cos(x + Re(Qn cos(x] Qn+ = [Re(Pn - Re(Qn cos(x + Im(Qn cos(x] + j [Im(Pn - Im(Qn cos(x - Re(Qn cos(x] The output of the decimation in time T shows a bitreversed order that has to be ordered to calculate the final frequency spectrum. Supposing the input data has been in a sequential order, the indices of the output data can be easily computed by bit reversing the binary presentation of the input indices. The table below gives an example of the bit reversal for an 8 point T. Order of Input data Order of output data bitreversed index binary binary index 4 3 6 4 5 5 6 3 7 7 Semiconductor Group 7 of 3 AP634 3.97

3 Implementation or the implementation we assume real valued input data. Based on this assumption, a real valued 4 point T can be reduced to a 5 point complex T followed by an unweave phase in order to compute the 4 point spectrum. The 5 point complex T consists of log (5 = 9 stages and each stage calculates 5/ = 56 butterflies. The twiddle factors in the first three stages are the same as for the 8 point complex T. The input data is stored in the table T_I which consists of 4 6 bit words. Since we perform an in-place T, the input data field will be overwritten by the output data. However, the final output will be stored in the separate TOUT output field. or providing the trigomic functions, a table (T_DAT is stored encompassing the precomputed sinus and cosines values. Since cos(x = sin(x+π/4 the table consists of ¼ sinus period and ½ cosines period. Together they form a ¾ sinus period. To rearrange the bitreversed output of the complex T and to calculate the twiddle factor W k, a bitreversal table (T_BR has also been precomputed. The input data and the trigometric function table are represented by a 5 bit fixed-point fraction in two s complement. This means that the MSB of such a number represents the sign, followed by the 5 bits representing a fraction of one. s, b b b3 b4... b5 sig n - -... -5 examples: binary hex dez value 7 3767 + 6 4576 +.75 A -4576 -.75 8-3768 - Since the addition of two numbers having the same sign might cause an overflow, numbers have to be divided by before adding them. This is done by an arithmetic shift right. When multiplying two 5-bit-numbers, te that the signs of both are multiplied too, and the result is stored in the 3-bit wide multiplication register. s 5 bit s 5 bit = ss 3 bit Therefore the result is equal to a scaled multiplication that means that it consists of the multiplication and a subsequent arithmetic shift right as a side effect. Because 3 bit precision is t required, only the first 6 bits contained in the MDH register are used for further calculations. Attached you will find a program flow chart. The first part of the program consists of a 5- point complex radix- T which is executed in nine stages. To avoid multiplications by Semiconductor Group 8 of 3 AP634 3.97

means of the degenerated butterflies, 5 different Mid- and Inloops are implemented. The idea is to cut the amount of calculation needed by using the degenerated butterflies in all stages. This is very effective in the first three stages, but time savings decreases in the rear stages. Regarding the 5 complex point T, the number of twiddle factors amounts 5. However, due to the symmetry only the first 56 (..55 are used. In the program source the twiddle factors deted as W, W4, W8, W refer to the angles, 9, 45 and 35. Thus, corresponds to W, W8 to W4, W 8 to W8, and W 3 8 to W. Stage :Since in stage one all twiddle factors are W, all 56 butterflies are performed in Inloop_. Stage : Stage 3: Stage 4: 8 butterflies with W in Inloop_ 8 butterflies with W4 in Inloop_ 64 butterflies with W in Inloop_ 64 butterflies with W4 in Inloop_ 64 butterflies with W8 in Inloop_ 64 butterflies with W in Inloop_3 3 butterflies with W in Inloop_ 3 butterflies with W4 in Inloop_ 3 butterflies with W8 in Inloop_ 3 butterflies with W in Inloop_3 8 butterflies with Wk in Inloop The second part of the program unweaves the bitreversed output of the 5-point T to extract the 4 point real valued T. Optional, the final stage of the algorithm calculates an amplitude spectrum. On the last page you will find the register use during program execution. Assuming that the code is started out of the internal ROM, the input data is stored in the external RAM and accessed via a 6-bit demultiplexed bus without wait states (Syscon: ROME=, Buscon: MTTC=, MCTC =, BTYP = an execution time of ms for a real valued 4-point T is achieved for the C65 running at 5 Mhz internally. The code size amounts 88 bytes without data tables. The optional computation of the amplitude spectrum consumes additional ms. The execution time is independent from the input data. Changing the number of sample points _, the number of stages exp, and reducing the tables, the algorithm can be tailored for various resolutions. In the table below you will find execution times for different numbers of sample points. 64-Points 56- Points 4- Points SAB-C65 (5 MHz,56 ms,6 ms,4 ms SAB-C67CR (,7 ms 3,3 ms 3 ms MHz Semiconductor Group 9 of 3 AP634 3.97

This figures demonstrate that the C66 architecture is even superior to some signal processors. This result is founded on a multiplication execution time of 4ns and the RISC like register file. Using a more complex radix-4 T algorithm combining two butterflies a further run time saving can be expected. Keeping the input and output tables in the internal RAM an additional speed up can be achieved for the 64 and 56 point T. If only parts of the spectrum have to be analyzed, a partial T can be performed [7] by omitting all calculations t contributing to the frequency window. Literature [] Application ote AP-75 "An Algorithm for MCS-96 Products including Supporting Routines and Examples", intel September 986 [] MS-C-Program [3] TMS3 High Performance 6/3-Bit Digital Signal Processor Product Description Texas Instruments 985 [4] Digital Signal Processing Applications with the TMS3 amily Texas Instruments 988 [5] TMS3C5 Digital Signal Processor Product Description Texas Instruments 986 [6] S.K. Mitra and J.. Kaiser, Handbook for Digital Signal Processing, John Wiley & Sons, 993 [7] Maurice Bellanger, Digital Processing of Signals, John Wiley & Sons, 989 Semiconductor Group of 3 AP634 3.97

low chart of 4-point real valued T Start Outloop for the stages - 9 Midloop_ Inloop_: compute butterflies with twiddlefactor W all elements processed Midloop_ Inloop_: compute butterflies with twiddlefactor W4 all elements processed Midloop_ Inloop_: compute butterflies with twiddlefactor W8 Midloop_3 Inloop_3: compute butterflies with twiddlefactor W all elements processed Midloop Inloop: computation of remaining butterflies all elements processed ED_Midloop 9th stage executed unweave complex T to get the result of the real valued T bitreversal of T_I and generate butterflies Absolute value "Absolute" (calculate absolute values and write results to T_OUT calculate absolute values last element processed End Semiconductor Group of 3 AP634 3.97

lowchart of the Midloop Midloop Init counter of Inloop set input pointers to index Determination of Twiddle-factor Inloop Read input values Re(P m, Im(P m, Re(Q m, Im(Q m from T_I Calculate Butterfly P m+ = PR + QRcos(X + QIsin(X + j [PI + QIcos(X - QRsin(X] Q m+ = PR - QRcos(X + QIsin(X + j [PI - QIcos(X - QRsin(X] Write output values Re(P m+, Im(P m+, Re(Q m+, Im(Q m+ to T_I increment index pointers decrement counter in_loop counter in_loop > increment counter mid_loop End Midloop counter mid_loop < 4 modify help counters decrement counter out_loop 9th stage executed? Semiconductor Group of 3 AP634 3.97

Register Use and content Stage to 9: Stage Stage Stage3 Stage4 Stage5 Stage6 Stage7 Stage8 Stage9 ote R 9 8 7 6 5 4 3 Outloop R 4 5 56 8 64 3 6 8 4 counter R 48 4 5 56 8 64 3 6 8 R3, *-...*...*...*...*...*...*...*...* Midloop 4-4 -4-4 -4-4 -4-4 -4 R4 R5 pp pp pp pp pp pp pp pp pp pp_t_i R6 pq pq pq pq pq pq pq pq pq pq_t_i R7 B B cos[k] cos[k] cos[k] cos[k] cos[k] cos[k] cos[k] R8 B B B sin[k] sin[k] sin[k] sin[k] sin[k] sin[k] R9 B B B B B B B B B R B B B B B B B B B R B B B B B B B B B R B B B B B B B B B R 56... 8... 64... 3... 6... 8... 4......, Inloop 3 R 56 8 64 3 6 8 4 4 R 5 _I n = number of samples (4; B = butterfly; Twid = twiddle factor; pp= pointer onto element P n ; pq= pointer onto element Q n Stage and procedure Absolute : Stage Absolute Stage Absolute R Input (Low R8 cos[k] R Input a Input (High R9 Input c R Output R R3 temp temp R Input d R4 sin[k] temp R,,4,6,8,..., R5 Input b temp R3 R6 temp R4 T_OUT R7 temp temp R5 Semiconductor Group 3 of 3 AP634 3.97