Microcontrollers. Fast - Fourier - Transformation. þ additional file AP EXE available

Similar documents
Fall 2011, EE123 Digital Signal Processing

LABORATORY MANUAL MICROPROCESSOR AND MICROCONTROLLER

ELEG 305: Digital Signal Processing

Frequency-domain representation of discrete-time signals

EE123 Digital Signal Processing

DSP Algorithm Original PowerPoint slides prepared by S. K. Mitra

The Fourier transform allows an arbitrary function to be represented in terms of simple sinusoids. The Fourier transform (FT) of a function f(t) is

! Circular Convolution. " Linear convolution with circular convolution. ! Discrete Fourier Transform. " Linear convolution through circular

CMP 338: Third Class

Fourier analysis of discrete-time signals. (Lathi Chapt. 10 and these slides)

EDISP (NWL3) (English) Digital Signal Processing DFT Windowing, FFT. October 19, 2016

DHANALAKSHMI COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EC2314- DIGITAL SIGNAL PROCESSING UNIT I INTRODUCTION PART A

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

The Fourier Transform (and more )

Chapter 2: The Fourier Transform

Fundamentals of the DFT (fft) Algorithms

E The Fast Fourier Transform

Complement Arithmetic

ELEG 305: Digital Signal Processing

x (2) k even h n=(n) + (n+% x(6) X(3), x (5) 4 x(4)- x(), x (2), Decomposition of an N-point DFT into 2 N/2-point DFT's.

Chapter 4 Discrete Fourier Transform (DFT) And Signal Spectrum

1. Calculation of the DFT

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

DISCRETE FOURIER TRANSFORM

Logic and Computer Design Fundamentals. Chapter 8 Sequencing and Control

The Discrete Fourier transform

DFT & Fast Fourier Transform PART-A. 7. Calculate the number of multiplications needed in the calculation of DFT and FFT with 64 point sequence.

Lecture 20: Discrete Fourier Transform and FFT

Computer Architecture, IFE CS and T&CS, 4 th sem. Representation of Integer Numbers in Computer Systems

VU Signal and Image Processing. Torsten Möller + Hrvoje Bogunović + Raphael Sahann

Test Pattern Generator for Built-in Self-Test using Spectral Methods

Module 3. Convolution. Aim

Representing a Signal

Low-Power Twiddle Factor Unit for FFT Computation

IT DIGITAL SIGNAL PROCESSING (2013 regulation) UNIT-1 SIGNALS AND SYSTEMS PART-A

Project Two RISC Processor Implementation ECE 485

Digital Signal Processing. Midterm 2 Solutions

Design of Sequential Circuits

Discrete Fourier Transform

EEC 281 VLSI Digital Signal Processing Notes on the RRI-FFT

Chapter 8. Low-Power VLSI Design Methodology

VII. Discrete Fourier Transform (DFT) Chapter-8. A. Modulo Arithmetic. (n) N is n modulo N, n is an integer variable.

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

History & Binary Representation

ECE380 Digital Logic. Positional representation

Number Representations

1 1.27z z 2. 1 z H 2

Frequency-Domain C/S of LTI Systems

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Discrete Time Signals and Systems Time-frequency Analysis. Gloria Menegaz

WORKBOOK. Try Yourself Questions. Electrical Engineering Digital Electronics. Detailed Explanations of

Transforms and Orthogonal Bases

EDISP (NWL2) (English) Digital Signal Processing Transform, FT, DFT. March 11, 2015

Fundamentals of Digital Design

WEEK10 Fast Fourier Transform (FFT) (Theory and Implementation)

Practical Considerations in Fixed-Point FIR Filter Implementations

On-Line Hardware Implementation for Complex Exponential and Logarithm

hexadecimal-to-decimal conversion

5.1 The Discrete Time Fourier Transform

PS403 - Digital Signal processing

5 8 LED MAX6950/MAX6951 MAX6950/MAX6951 SPI TM QSPI TM MICROWIRE TM 7 LED LED 2.7V MAX LED MAX LED 16 (0-9 A-F) RAM 16 7 LED LED

Solution (a) We can draw Karnaugh maps for NS1, NS0 and OUT:

VALLIAMMAI ENGINEERING COLLEGE. SRM Nagar, Kattankulathur DEPARTMENT OF INFORMATION TECHNOLOGY. Academic Year

COMPLEX NOISE-BITS AND LARGE-SCALE INSTANTANEOUS PARALLEL OPERATIONS WITH LOW COMPLEXITY

Lecture 8: Sequential Multipliers

Residue Number Systems Ivor Page 1


MAHALAKSHMI ENGINEERING COLLEGE-TRICHY

Design at the Register Transfer Level

Introduction to HPC. Lecture 20

System Data Bus (8-bit) Data Buffer. Internal Data Bus (8-bit) 8-bit register (R) 3-bit address 16-bit register pair (P) 2-bit address

Digital Signal Processing

Menu. Review of Number Systems EEL3701 EEL3701. Math. Review of number systems >Binary math >Signed number systems

EE482: Digital Signal Processing Applications

Contents. Digital Signal Processing, Part II: Power Spectrum Estimation

Hardware Design I Chap. 4 Representative combinational logic

Analysis and Synthesis of Weighted-Sum Functions

Power Series Solutions We use power series to solve second order differential equations

(Boolean Algebra, combinational circuits) (Binary Codes and -arithmetics)

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

Logic Design II (17.342) Spring Lecture Outline

THE discrete sine transform (DST) and the discrete cosine

Chap 2. Discrete-Time Signals and Systems

Chapter 2 Algorithms for Periodic Functions

Polyphase filter bank quantization error analysis

FOURIER ANALYSIS using Python

Chapter 6: Applications of Fourier Representation Houshou Chen

Chapter 4 Number Representations

6.111 Lecture # 12. Binary arithmetic: most operations are familiar Each place in a binary number has value 2 n

CHAPTER 2 NUMBER SYSTEMS

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Radar Systems Engineering Lecture 3 Review of Signals, Systems and Digital Signal Processing

Chirp Transform for FFT

Question Bank. UNIT 1 Part-A

Conversions between Decimal and Binary

ENGIN 112 Intro to Electrical and Computer Engineering

Canonic FFT flow graphs for real-valued even/odd symmetric inputs

DIGITAL SIGNAL PROCESSING

ww.padasalai.net

Example. Evaluate. 3x 2 4 x dx.

Transcription:

Microcontrollers Apote AP634 þ additional file AP634.EXE available ast - ourier - Transformation The ast ourier Transformation (T is an algorithm frequently used in various applications, like telecommunication, signal and image processing. It transforms the time domain into the frequency domain where the spectrum of amplitude and frequency can be analyzed. K. Westerholz / Siemens HL MCB AT Semiconductor Group 3.97, Rel.

Abstract... 3 T - Derivation of the algorithm... 3 3 Implementation... 7 4 low chart of 4-point real valued T... 5 lowchart of the Midloop... 6 Register Use and content... 3 AP634 Apote - Revision History Actual Revision : Rel. Previous Revison: ne Page of actual Rel. Page of prev. Rel. Subjects changes since last release Semiconductor Group of 3 AP634 3.97

Abstract The ast ourier Transformation (T is an algorithm frequently used in various applications, like telecommunication, signal and image processing. It transforms the time domain into the frequency domain where the spectrum of amplitude and frequency can be analyzed. This application te describes an implementation of a real-valued 4 point decimation in time radix- T for the C66 microcontroller family. Assuming that the code is started out of the internal ROM via a 6-bit demultiplexed bus, an execution time of ms has been achieved for a C65 running at 5 MHz internal clock. The code comprises 88 bytes. This application te is based on an application te performed by pls (Programmierbare Logik Systeme, Hoyerswerda, Germany. T - Derivation of the algorithm Starting from the continuous time ourier Transformation, the discrete ourier Transformation can be derived as a function of sample points (: nk X( k = W x( n n= nk jπ/ nk k=,,...,- W e nk j nk = = π π cos sin This formula can be regarded as a matrix W nk multiplied by the input data vector x(n. This simple DT has the complexity. The coefficients of the matrix W nk will be deted in the following as twiddle factors. In order to derive the radix- T algorithm, we decompose the transformation into two partial transformations, one containing the input data with even indices, and the other with k k+( / k + k odd. Exploiting symmetries W = W and W = W of the twiddle factors a decomposition is achieved that only requires ( + multiplications. / nk k X( k = x( n W + W x( n+ W n= / k = Pk ( + W Qk ( / n= nk / / nk k Xk ( + / = x( nw W x( n+ W n= / k = Pk ( W Qk ( / n= nk / k (/- Semiconductor Group 3 of 3 AP634 3.97

The number of multiplications can be further cut to ( log by repetitive employing this process to the partial transformations P(k and Q(k till they are completely decomposed into point DTs (Discrete ourier Transformations. The -point DT is commonly referred to as the butterfly operation. The butterfly requires two complex multiplications resulting in four real valued multiplications for computing the terms P(k and W k Q(k. Pn+ = P n + Q n W k n=stage Qn+ = P n - Q n W k since W k = e -j(π/k = cos(u - j sin(u and u = (π/k Re(P n Im(P n Re(Q n Im(Q n W k Re(P n+ Im(P n+ Re(Q n+ Im(Q n+ Having only real valued input data x(n, the computational effort of a point T can be reduced to a / point complex T. irstly, even indexed data hn ( = x( n and odd indexed data gn ( = x( n+ are separated. The index k is running from to -. h(n and g(n have the spectra H(k and G(k respectively. The spectrum X(k can be decomposed into the spectra H(k and G(k as follows: ourier Transformation πk πk xn ( Xk ( = Hk ( + j(cos jsin Gk ( In order to cut the above point transformation into an / point transformation a complex input vector y(n = h(n + jg(n is formed with the index n running from to /-. The real input values are formed by the even indexed input data h(n. The imaginary part is formed by the odd indexed input data g(n. Then y(n is transformed into the frequency domain resulting in a spectrum consisting of a superposition of the spectra H(k and G(k. yn ( = hn ( + jgn ( ourier Transformation Yk ( = Hk ( + jgk ( = Rk ( + jik ( ow the complex spectra H(k and G(k have to be extracted out of the complex spectrum Yk ( = Hk ( + jgk ( = Rk ( + jik (. By employing symmetry relations, the spectra H(k and G(k can be derived from the spectrum Y(n as follows: { Hk} = Im { Hk ( } Re ( Rk ( + R ( / k Ik ( + I ( / k { Gk} = Im { Gk ( } Re ( Ik ( I ( / k = Rk ( R ( / k = These spectra inserted into the equation for X(n deliver the full spectrum. Semiconductor Group 4 of 3 AP634 3.97

In order to illuminate the T algorithm, the example given below shall demonstrate the computational process during an 8-point T. The input data consists of 8 complex numbers X,... X 7 which can also be perceived as 6 real numbers x(...x(5. Input Stage Stage Stage 3 Output X X X X 3 X 4 X 5 X 6 X 7 X( X( X( X(3 X(4 X(5 X(6 X(7 X(8 X(9 X( X( X( X(3 X(4 X(5 W W W W W W W W W W W W 3 X( X( X(8 X(9 X(4 X(5 X( X(3 X( X(3 X( X( X(6 X(7 X(4 X(5 X X 4 X X 6 X X 5 X 3 X 7 Semiconductor Group 5 of 3 AP634 3.97

Corresponding to the above butterfly operation, the input data are connected with each other. Due to the nature of this operation the input data is sequentially ordered, while the output data is in bitreversed order. In each stage four (=/ butterflies are computed. The twiddle factor W in the st stage is always W 8 =. In the nd stage the upper half consists again of the butterflies from the previous stage. The second half of the butterflies are computed with twiddle factor W 8. In the third stage the twiddle factors of the previous stage appear again in the upper half and the lower half contains the factors W 8 and W 3 8. or an 8-point-T the twiddle factors W k = e -j(π/k = cos(x - j sin(x have the following values. Twiddle factor cos(x sin(x W 8 = e -j(π/4 ½ ½ W 8 = e -j(π/ W 3 8 = e -j(3π/4 -½ ½ W 4 8 = e -jπ = - - W 5 8 = e -j(5π/4 = - ½ ½ 6 = e -j(3π/ = - 7 = e -j(7π/4 = 3 -½ ½ 4 3 5 Im(W 6 7 Re(W Because of the symmetry of the twiddle factor only the first four (W8,..., W8 3 must be computed. These four twiddle factors lead to degenerated butterflies. They can be used to precompute the butterflies in the first three stages of the T to reduce the number of multiplications. or the twiddle factors W, W, W and W 3 the butterfly computation can be simplified. or W k = W = we have cos(x = ; sin(x =. This results in a butterfly without any multiplications: Pn+ = [Re(Pn + Re(Qn + j [Im(Pn + Im(Qn] Qn+ = [Re(Pn - Re(Qn] + j [Im(Pn - Im(Qn] Semiconductor Group 6 of 3 AP634 3.97

or W k = W = ½ ( - j we get cos(x = sin(x = ½. This results in a butterfly for the third stage where only two multiplications must be executed: Pn+ = [Re(Pn + Re(Qn cos(x + Im(Qn cos(x] + j [Im(Pn + Im(Qn cos(x - Re(Qn cos(x] Qn+ = [Re(Pn - Re(Qn cos(x - Im(Qn cos(x] + j [Im(Pn - Im(Qn cos(x + Re(Qn cos(x] or W k = W = -j we have cos(x = ; sin(x =. This results in a butterfly without any multiplications used in the second and third stage: Pn+ = [Re(Pn + Im(Qn] + j [Im(Pn - Re(Qn] Qn+ = [Re(Pn - Im(Qn] + j [Im(Pn + Re(Qn] or W k = W 3 = ½ (- - j we get -cos(x = sin(x = ½. This results in a butterfly for the third stage where only two multiplications must be executed: Pn+ = [Re(Pn + Re(Qn cos(x - Im(Qn cos(x] + j [Im(Pn + Im(Qn cos(x + Re(Qn cos(x] Qn+ = [Re(Pn - Re(Qn cos(x + Im(Qn cos(x] + j [Im(Pn - Im(Qn cos(x - Re(Qn cos(x] The output of the decimation in time T shows a bitreversed order that has to be ordered to calculate the final frequency spectrum. Supposing the input data has been in a sequential order, the indices of the output data can be easily computed by bit reversing the binary presentation of the input indices. The table below gives an example of the bit reversal for an 8 point T. Order of Input data Order of output data bitreversed index binary binary index 4 3 6 4 5 5 6 3 7 7 Semiconductor Group 7 of 3 AP634 3.97

3 Implementation or the implementation we assume real valued input data. Based on this assumption, a real valued 4 point T can be reduced to a 5 point complex T followed by an unweave phase in order to compute the 4 point spectrum. The 5 point complex T consists of log (5 = 9 stages and each stage calculates 5/ = 56 butterflies. The twiddle factors in the first three stages are the same as for the 8 point complex T. The input data is stored in the table T_I which consists of 4 6 bit words. Since we perform an in-place T, the input data field will be overwritten by the output data. However, the final output will be stored in the separate TOUT output field. or providing the trigomic functions, a table (T_DAT is stored encompassing the precomputed sinus and cosines values. Since cos(x = sin(x+π/4 the table consists of ¼ sinus period and ½ cosines period. Together they form a ¾ sinus period. To rearrange the bitreversed output of the complex T and to calculate the twiddle factor W k, a bitreversal table (T_BR has also been precomputed. The input data and the trigometric function table are represented by a 5 bit fixed-point fraction in two s complement. This means that the MSB of such a number represents the sign, followed by the 5 bits representing a fraction of one. s, b b b3 b4... b5 sig n - -... -5 examples: binary hex dez value 7 3767 + 6 4576 +.75 A -4576 -.75 8-3768 - Since the addition of two numbers having the same sign might cause an overflow, numbers have to be divided by before adding them. This is done by an arithmetic shift right. When multiplying two 5-bit-numbers, te that the signs of both are multiplied too, and the result is stored in the 3-bit wide multiplication register. s 5 bit s 5 bit = ss 3 bit Therefore the result is equal to a scaled multiplication that means that it consists of the multiplication and a subsequent arithmetic shift right as a side effect. Because 3 bit precision is t required, only the first 6 bits contained in the MDH register are used for further calculations. Attached you will find a program flow chart. The first part of the program consists of a 5- point complex radix- T which is executed in nine stages. To avoid multiplications by Semiconductor Group 8 of 3 AP634 3.97

means of the degenerated butterflies, 5 different Mid- and Inloops are implemented. The idea is to cut the amount of calculation needed by using the degenerated butterflies in all stages. This is very effective in the first three stages, but time savings decreases in the rear stages. Regarding the 5 complex point T, the number of twiddle factors amounts 5. However, due to the symmetry only the first 56 (..55 are used. In the program source the twiddle factors deted as W, W4, W8, W refer to the angles, 9, 45 and 35. Thus, corresponds to W, W8 to W4, W 8 to W8, and W 3 8 to W. Stage :Since in stage one all twiddle factors are W, all 56 butterflies are performed in Inloop_. Stage : Stage 3: Stage 4: 8 butterflies with W in Inloop_ 8 butterflies with W4 in Inloop_ 64 butterflies with W in Inloop_ 64 butterflies with W4 in Inloop_ 64 butterflies with W8 in Inloop_ 64 butterflies with W in Inloop_3 3 butterflies with W in Inloop_ 3 butterflies with W4 in Inloop_ 3 butterflies with W8 in Inloop_ 3 butterflies with W in Inloop_3 8 butterflies with Wk in Inloop The second part of the program unweaves the bitreversed output of the 5-point T to extract the 4 point real valued T. Optional, the final stage of the algorithm calculates an amplitude spectrum. On the last page you will find the register use during program execution. Assuming that the code is started out of the internal ROM, the input data is stored in the external RAM and accessed via a 6-bit demultiplexed bus without wait states (Syscon: ROME=, Buscon: MTTC=, MCTC =, BTYP = an execution time of ms for a real valued 4-point T is achieved for the C65 running at 5 Mhz internally. The code size amounts 88 bytes without data tables. The optional computation of the amplitude spectrum consumes additional ms. The execution time is independent from the input data. Changing the number of sample points _, the number of stages exp, and reducing the tables, the algorithm can be tailored for various resolutions. In the table below you will find execution times for different numbers of sample points. 64-Points 56- Points 4- Points SAB-C65 (5 MHz,56 ms,6 ms,4 ms SAB-C67CR (,7 ms 3,3 ms 3 ms MHz Semiconductor Group 9 of 3 AP634 3.97

This figures demonstrate that the C66 architecture is even superior to some signal processors. This result is founded on a multiplication execution time of 4ns and the RISC like register file. Using a more complex radix-4 T algorithm combining two butterflies a further run time saving can be expected. Keeping the input and output tables in the internal RAM an additional speed up can be achieved for the 64 and 56 point T. If only parts of the spectrum have to be analyzed, a partial T can be performed [7] by omitting all calculations t contributing to the frequency window. Literature [] Application ote AP-75 "An Algorithm for MCS-96 Products including Supporting Routines and Examples", intel September 986 [] MS-C-Program [3] TMS3 High Performance 6/3-Bit Digital Signal Processor Product Description Texas Instruments 985 [4] Digital Signal Processing Applications with the TMS3 amily Texas Instruments 988 [5] TMS3C5 Digital Signal Processor Product Description Texas Instruments 986 [6] S.K. Mitra and J.. Kaiser, Handbook for Digital Signal Processing, John Wiley & Sons, 993 [7] Maurice Bellanger, Digital Processing of Signals, John Wiley & Sons, 989 Semiconductor Group of 3 AP634 3.97

low chart of 4-point real valued T Start Outloop for the stages - 9 Midloop_ Inloop_: compute butterflies with twiddlefactor W all elements processed Midloop_ Inloop_: compute butterflies with twiddlefactor W4 all elements processed Midloop_ Inloop_: compute butterflies with twiddlefactor W8 Midloop_3 Inloop_3: compute butterflies with twiddlefactor W all elements processed Midloop Inloop: computation of remaining butterflies all elements processed ED_Midloop 9th stage executed unweave complex T to get the result of the real valued T bitreversal of T_I and generate butterflies Absolute value "Absolute" (calculate absolute values and write results to T_OUT calculate absolute values last element processed End Semiconductor Group of 3 AP634 3.97

lowchart of the Midloop Midloop Init counter of Inloop set input pointers to index Determination of Twiddle-factor Inloop Read input values Re(P m, Im(P m, Re(Q m, Im(Q m from T_I Calculate Butterfly P m+ = PR + QRcos(X + QIsin(X + j [PI + QIcos(X - QRsin(X] Q m+ = PR - QRcos(X + QIsin(X + j [PI - QIcos(X - QRsin(X] Write output values Re(P m+, Im(P m+, Re(Q m+, Im(Q m+ to T_I increment index pointers decrement counter in_loop counter in_loop > increment counter mid_loop End Midloop counter mid_loop < 4 modify help counters decrement counter out_loop 9th stage executed? Semiconductor Group of 3 AP634 3.97

Register Use and content Stage to 9: Stage Stage Stage3 Stage4 Stage5 Stage6 Stage7 Stage8 Stage9 ote R 9 8 7 6 5 4 3 Outloop R 4 5 56 8 64 3 6 8 4 counter R 48 4 5 56 8 64 3 6 8 R3, *-...*...*...*...*...*...*...*...* Midloop 4-4 -4-4 -4-4 -4-4 -4 R4 R5 pp pp pp pp pp pp pp pp pp pp_t_i R6 pq pq pq pq pq pq pq pq pq pq_t_i R7 B B cos[k] cos[k] cos[k] cos[k] cos[k] cos[k] cos[k] R8 B B B sin[k] sin[k] sin[k] sin[k] sin[k] sin[k] R9 B B B B B B B B B R B B B B B B B B B R B B B B B B B B B R B B B B B B B B B R 56... 8... 64... 3... 6... 8... 4......, Inloop 3 R 56 8 64 3 6 8 4 4 R 5 _I n = number of samples (4; B = butterfly; Twid = twiddle factor; pp= pointer onto element P n ; pq= pointer onto element Q n Stage and procedure Absolute : Stage Absolute Stage Absolute R Input (Low R8 cos[k] R Input a Input (High R9 Input c R Output R R3 temp temp R Input d R4 sin[k] temp R,,4,6,8,..., R5 Input b temp R3 R6 temp R4 T_OUT R7 temp temp R5 Semiconductor Group 3 of 3 AP634 3.97