Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose Special Purpose here that is dedicated architecture Standard Processor Algorithm Special Purpose An architecture is developed to fulfill special requirements, could be hardware mapped time-multiplexed μ processor or DSP Programable/Flexible Short design time/ttm Low price? Dedicated architecture High calculation capacity Low power consumption Low price at volume What is volume? the architecture can then be implemented on either FPGA ASIC
4 4 X Operand X Registers Y Y 4 4 56 56 ALU Shifter Accumulators A (56) B (56) 56 Fixed point DSP Motorola DSP56x Standard DSPs are MAC (Multiply-Accumulate) based and usually have single cycle multiplier, may be pipelined 56 Double wordlength out, 4 4 altenative is mult with reduced wordlength output, e.g. 4 56 Shifter/ Limiter 4 4 guard bits scaling Architectural options OTS (Off The Shelf) processors Programmable microprocessors or DSP Based on generic computational units, for DSPs usually MAC Prefabbed or IP cores Time-multiplexed application specific processors Several algorithmic operations performed on same hardware unit Trades reduced HW for longer computation time Hardware mapped architectures One (or more) hardware unit per algorithmic operation High HW cost and high throughput Hardware Implementation Techniques Hardware Solution Design for FPGA or ASIC Hardware description language, e.g. VHDL, or Verilog Simulation FPGA Full Custom Cell library Synthesis P&R Field Programmable Gate Arrays Already fabricated silicon Full Custom Fabrication necessary Configuration Post-layout sim. Reconfigurable Fast Turn Around Prototyping High Calculation Capacity High Utilization Low Power Low Price at Volume Fabrication
Preprocessed array that can be programmed, e.g. VHDL/Verilog Block RAM BANK BANK Heterogeneous Programmable Platforms FPGA Fabric BAN NK 7 BAN NK IOB Embedded PowerPc Embedded memories CLB Hardwired multipliers BANK 6 BANK Routing Xilinx Vertex-II Pro Timing BANK 5 BANK 4 High-speed I/O Courtesy Xilinx We will discuss this again at the end of the course. Number Representation
Floating vs. Fixed point In floating point a value is represented by mantissa determining the resolution/precision o ec s o e m b exponent determining the dynamic range In fixed point we only have a single value Floating point gives higher dynamic range but the cost is high in energy area calculation time For energy efficient implementations fixed point is preferred Binary numbers, unsigned integers MSB LSB Most Significant Bit Least Significant Bit () Nbits () () () N ord (4) (5) (6) (7) Dynamic range and Resolution Nr. of Nr. of Resolution Dynamic Range bits levelsl V fs.5v V LSB.5 5 4 6.5V.5V 8 56 mv 8V 496.mV 8V 6 65 56 7.6μV 4V How do we use the bits? Depends on the application! Unsigned Number Representation Fixed radix (base) systems The digits a {,,,... r } in a radix r system: l i r ai i k r k a k l k r a k r a r a r a r + + + a l described in a fixed point positional number system: ai ai aa. a a l Fractional part
Example: Unsigned Number l i ai { a {,,,... 9} in radix } i k k a k l k + ak a+ a+ a a l Example: Unsigned Number l i a i { a {,} in radix } i k k a k l k + ai a+ a+ a a l l i ai { a {,} in radix } i k k k l a k + a i a + a + a a l. 4 i + i + i + i + i + i + i + i 8+ + + 4 8 Signed Digit Number Representation The digits a { α,, r α } in a radix r system: l i r ak i k Example Radix : a { 4,,, 4, 5} ( 5) + 5 + 5 95 (. 5) + 5.+.5.95 Signed Number Representation Sign Magnitude One s Complement Two s Complement
Signed Magnitude Unsigned numbers with a sign-bit One s Complement Signed numbers by inverting (Complement) - - Two Zeros - Two Zeros - Signed Magnitude - + Low Power? + Easy to convert to Negative One's - Complement - - + Easy to convert to Negative Two s Complement Most widely used fixed point numbering system Complement + LSB - - Two's Complement - - 4 + Easy Addition - Not so easy to convert to Neg. Two s Complement The digits a {,} in a radix system: l k a i k + ai i k k a k l k a k a a a + + + a l described in a fixed point positional number system: a k a k a a. a a l Sign Bit Fractional part
Example: s complement Sign Extension in Two s Complement. 4 i + i + i + i + i + i + i + i 8+ + + 4 8 If nothing else said we assume numbers x < k a k k + a k a + a ka k k k a + k + ak a + a k+ a k k k k + ak a k ak a a + + + Example: h The Wordlength, i.e. nr of bits D D D h h UMTS-filter float h Every extra bit costs energy/power delay area the wordlength has to be reduced 7bits The Wordlength, i.e. nr of bits h D D D h h The output of adder output needs an extra bit to be sure of no overflow, e.g. decimal: + 4 binary: + h multiplier MxN bits M+N bits for full precision sometimes M+N- Precision has to be limited
Basic Building Blocks D D D h h h h Basic Building Blocks In the FIR filter adders multipliers registers in other algrithms also: shift, minus, division,... left shift is multiply by right shift is a dived by but is low complexity! Comparing Basic Building Blocks High Complexity Divider Generic Multiplier Fixed Multiplier Adder/Subtarct a a a a b b b b a a a a b b b b a a a a b b b b a a a a b b b b a a a a b b b b Scaling and Round-off Noise Shifter Low a a a a b b b b s s s s p 6 p 5 p 4 p p p p
Two Types Quantization Coefficient Quantization Non-Ideal Transfer Function Compare to analog component variations Signal Quantization Round-off Noise Limit Cycles Round-off Noise Quantization Affect the output as a random disturbance Limit Cycle Oscillations Undesired periodic components Due to non-linear behavior in the feedback (rounding or overflow) Quantization Analysis Using real rounding, truncation, and overflow Give exact result Tricky - need integer representation Using noise models Floating point representation can still be used Suitable for Matlab, C/C++... Rounding Truncation Rounding/Truncation is always there! Especially necessary in recursive systems Q Without t quantization - infinite it wordlength Multiplication n+m output bits Addition n+ output bits
Level X+ Truncation and Rounding Level X+ Truncation Rounding Truncation Rounding No energy added to the system Often used in recursive algorithms Truncation towards zero Level X Level X -4 - - - -4 - - - -4 - - - Truncation Rounding All values approximated Values approximated in the same direction up or down Max error LSB Max error / LSB DC error All values goes towards -infinity Rounded to even Add LSB before truncation if negative Scaling Example Where Scaling is Needed Adjust signal range to fit the hardware Unchanged transfer function (Scaled coefficients might move the pole-zeros) u(n) -.5 u(n) un ( ) ± 4 Trade-off Scale up to reduce roundoff noise Scale down to avoid overflow But you loose precision! Overflow
f(n) f(n) Scaling Safe scaling if β i β β f ( i) Where f(i) is the unit sample response Example: Safe Scaling 7 xn ( ) and ( n ) 7 y give safe scaling β i f(i). + 7. +.5.5 h(n) /7. -.7 7 5.5 Increased roundoff noise Internal scaling might improve 7/ (Linear phase FIR. Note the strength reduction) Example: Safe Scaling β 5 i i f(i) i (. ) 5 + 5 + 5 +. 5 (. ) (. ) (. ) -.5 -.5 Geometric series Scaling Safe scaling is pessimistic Alternative is scaling with β i ( f ( i ) ) In practice: Scaling with β ±n Easy to do - a shift u(n) u(n) Increased internal wordlength an alternative Original filter with overflow.5
Pacemaker example The Electrocardiogram (EGM) - 5 5 5 5 4 45.5 -.5-5 5 5 5 4 45 5.5 -.5 5 5 5 5 4 45 time [ms] The Interfered signal Filtering Performance.5 ) EGM + Interference -.5-4 6 8 from AC hand drill db SNR T().5 4 6 8 time, [ms] Output of the GLRT and threshold EGM with added interference
Wavelet Filterbank Bit-optimization Signals have been monitored to determine the upper bound of the wordlength Comparison of worst-case wordlength and implemented wordlength at the wavelet output: F ( z) + z + z + z G ( ) b z + z y y y y 4 y 5 y 6 N wc N+6 N+7 N+ N+ N+4 N+5 N Imp N+ N+ N+ N+ N+ N+ Example: Internal Scaling VHDL bit-level simulation 4-point FFT Compared with Matlab floating-point simulation Optimized internal scaling A 6-point Radix- FFT W W 4 W Basic Butterfly unit W Data In Radix Radix Radix Radix Radix Data Out W 6 W W 8 W 4 Counter Stage 5 Stage 4 Stage Stage Stage W W W W W W 4 Clock W 4 White noise input Source: Fredrik Kristensen W 5 W W 6 W 7 W W 4 W W 6 W 4
Example: Internal Scaling VHDL bit-level simulation 4-point FFT Compared with Matlab floating-point simulation Optimized internal scaling 8-bits -bits -bits 4-bits 4-bits -bits Radix Radix Radix Radix Radix Data In Data Out Limit Cycles Counter Stage 5 Stage 4 Stage Stage Stage Clock White noise input Source: Fredrik Kristensen Limit Cycles Example: zero input oscillations in nd order IIR Q b Limit Cycles Zero Input Example: zero input oscillations Rounding after multiplication X(n) Q b Truncation after multiplication 489 5 b.9565; b.975 56 6 Source: Lars Wanhammar, DSP Integrated circuits
Limit Cycles y Limit Cycles y Poles close to the unity circle Changing the precision move the poles! Matlab: zplane(,[ -.9565.975]).8 Poles close to the unity circle. Matlab: zplane(,[ -.9565.975]) -..8 -.4.6 -.6.4 -.8 - - -.5 Very difficult problem Real Part zplane(,[ -.975.975]).5.8.6.4. Imaginary Part Often not accepted in audio.4 Imaginary P Part Zero input oscillations Imaginary Part.6 -. -.4 -.6 In I general, l no solutions l ti for f structures t t - -.6 - Can be limited by increased internal wordlength -.5 Real Part.5 -.8 - Can in some nd order structures be eliminated by pole positioning nd -. -.4 -.8 > nd order. - -.5 Real Part.5 order Wave Digital Filters are free from parasitic oscillations From Adder Saturation Arithmetic O Overflow fl Oscillations O ill ti C out-msb NOF C in-msb POF Cout-msb -bit two s complement sum Oscillations are limited by saturation Cin-msb Signbit Correct sum Saturated Output -bit saturated sum Correct sum Overflow if Cout-msb differs from Cin-msb Overflow change the sign
Limit Cycles y due to overflow Limit cycles due to overflow Lab Example Zero Input 5 Wrap around Amplitud de Two s Complement Arithmetic input unquantized output quantized output Lab Example 5 input unquantized output quantized output -5-4 5 Time 6 7 S t Saturation ti 8 9 Amplitude Saturated Arithmetic -5-4 5 Time 6 7 8 9 Source: Lars Wanhammar, DSP Integrated circuits Limit cycles due to overflow Lab Example 5 input unquantized output quantized output Wrap around Amplitud de Lab Example 5 Simple Noise Analysis input unquantized output quantized output -5 4 5 Time 6 7 S t Saturation ti 8 9 Amplitude - -5-4 5 Time 6 7 8 9
Scaling and White Noise Input β δ i β f (), i Safe scaling f (), i possible overflow i ( ) unit sample response, ( ) Variance white noise input i f i f i 5 bits Q 8 bits u(n) a 8 bits Rounding e(n) u(n) Model a Safe scaling but not guaranteed δ sets the probability for an overflow Typically y one overflow every 6 sample is accepted in audio [Wanhammar] u(n) 5 bits Q 8 bits a 8 bits Rounding u(n) ( ) u n e(n) en ( ) u n un ( ) ( ) Model Modeled with added noise as an input error a Roundoff Noise If the quantization error probability is uniformly distributed in the interval Δ Δ ( W ) en ( ) where Δ W is the number of bits after the rounding Δ P e (e) ½ LSB Δ Δ e
Roundoff Noise Δ / Variance E[ ( e e ) ] ( e e ) Pe ( e) de [ e ] Δ / Mean value e E[( en )] Δ / Δ / e Δ /8 Δ /8 Δ W e de Δ Δ / Δ Δ Δ / Δ P e (e) ½ LSB Δ Δ e Example: Roundoff Noise In the case of rounding (mean) the variance and the average power are the same, i.e. if a value is rounded the quantization noise becomes: σ e W If we scale down one bit: ( W ) W 4 σ e Signal to Noise Ratio (SNR) One extra bit reduces quantization error by a factor 4 Signal to Noise Ratio (SNR) Signal power (variance) 4σ SNR log e 6. σ e db SNR σ x log σ W σ e x log x Good to remember: 6 db increase in SNR per bit Roundoff error power (variance)
Signal to Noise Ratio (SNR) Example: Full scale sinus wave rounded to 8 bits A SNR log 5 db; - A 8 Roundoff Noise: Addition E [( e + e ) ] E [ e + e e + e ] E [ e ] + E[ e e ] + E[ e ] zero if e (n) u and u independent u (n) u (n) + E e E [ e ] + [ ] e (n) Example: Roundoff Noise Example: SNR First order IIR-filter, the variance is: Example: Full scale sinus, rounded to 8 bits in IIR a σ σ σ a a..σ e f () i ( ( ) ( ) ( ) ) i e + a + a + a e e a.5.σ e a σ u(n).998 5 e a i σ e f ( i ) σ No feedback a.998 a SNR ( e 5 σ e u(n) 5dB e(n) SNR db Narrow band filter e(n)