Finite Word Length Effects and Quantisation Noise. Professors A G Constantinides & L R Arnaut

Finite Word Length Effects and Quantisation Noise 1

Finite Word Length Effects Finite register lengths and A/D converters cause errors at different levels: (i) input: Input quantisation (ii) system: Coefficient (= multiplier) quantisation (iii) output: Operation (products) truncated or rounded due to finite machine word length

Input Quantisation Discretisation of time (sampling) vs. signal values For each sample: search nearest level & round off actual value to this level (S&H) finite precision digitized value 3 Most values cannot be represented exactly quantisation error e(k) quantisation e i (t), e o (t) Rounded value can be above or below actual value: random bounded fluctuation of quantisation error: conceived as quantisation noise Q e n q (t) ( t) = e ( t) n ( t) o i + sampling q t

Quantisation Error (Linear) Quantisation error e for quantisation step Q: e o (k) Output Linear mid-tread quantiser 4 Q R Q Q Nearest-level rounding e( k) Q = least significant bit (determines precision, resolution) Quantisation is irreversible process ( loss of information) sampling above Nyquist rate e i Input (k) e i (k) to stay within dynamic range R to avoid clipping of e o (k) (overflow error)

Uniform Quantisation for Rounding 5 Ideally, pdf of e for rounding assumes uniform distribution between bounds (granular noise) p(e) Q Q 1 / Q - Not exactly true for sinusoidal signal, but corrections are generally small for high-amplitude (=spanning range of quantiser), wide-band signals - Rounding preferred over truncation, because it always yields unbiased quantisation error Quantisation noise power (mean-square error): σ = Q Q e p( e)de = E{ e } ( Q / ) = 3 Q = 1

Quantisation Unit-Amplitude Signal Let input = signal of unit amplitude (e.g. sine) Normalised total signal power: Ifb bits used for binary: b Thus, b subintervals, so that Q hence SNR P = = P σ 1 = = 1 3. (1.5) + b b = σ = 3 - also called SQNR (signal-to-quantisation-noise ratio) +1-1 e i (t) SNR = 10log10 + 10log10( ) = (1.76 + 6b)dB b 6 - finiteness of dynamic range affects (reduces) value of 1.76

Applications DECT (speech) One requires: SNR > 5 db, i.e., 4 bits In practice: extra 30 db of dynamic range required largest signal for nonsinusoidal signal corresponds to 0 db, not -3 db Thus, -30-5=-55 db below maximum; requires minimum 9 bit-encoding HiFi (e.g., CD/DVD, in-flight data streaming): Min. 1 bit-encoding (SNR = 78.3 db); typ. 16-bit (96 db) because of imperfect A/D devices In practice: using companders (compressor+expander): Pre-emphasis of input signal before quantisation Takes advantage of nonuniform probability of analogue values Result: nonuniformly quantised output signal (use small Q when signal is small) S-shaped (nonlinear) I/O characteristic 7 Noise reduction in audio: Dolby, dbx

Companders Comparison: Uniform quantization, 7 bits + 1 sign bit: SNR=44 db Nonuniform quantisation (nonlinear transformation): µ-55 characteristic (US, Japan) (UNIX sound, JAVA, etc.): Exploits logarithmic characteristic of loudness (human sound perception) Allows for digitization of speech using only 8 bits for SNR=77dB ln(1 + µ x ) f ( x) = sgn( x), 1< x < + 1 ln(1 + µ ) A-87.6 characteristic (Europe): international convention A x sgn( x), 1+ ln( A) f ( x) = 1+ ln Ax sgn( x), 1+ ln( A) 0 x 1/ A 1/ A x 1 8

Digital Telephone Digital telephone transmission: voice B = 3.4 khz (analogue) Sampling: F = B = 6.8 khz (ideal filters) With filter margin: 8 khz 8 bits/sample, 8k samples/sec 64 kbps 9

Multilevel Coding Data transmission: Multilevel coding Instead of transmission of single bits: grouping bits, e.g., in pairs: 0 0 0 1 1 0 1 1 lower symbol rate (i.e., smaller channel bandwidth) or higher bit rate, but decoding more difficult (larger quantisation error, or larger power needed) Illustrates the dilemma noise vs. bandwidth (speed) vs. accuracy vs. power Symbol rate (modulation rate) = 1 / duration of 1 pulse Expressed in baud Bit rate = symbol rate number of bits per symbol Expressed in bps In multilevel coding: symbol rate bit rate 10

Coefficient Quantisation: Second-Order Systems Consider a simple example of finite precision of the coefficients a,b of a second order system with two complex θ conjugate poles ρ e ± j : 1 1 H ( z) = 1 1+ az = 1 + bz 1 ρ cosθ z + ρ z where a = ρ cosθ, b = ρ Quantisation error of coefficients affects location of poles and zeroes imperfect frequency response 11 Sensitivity of frequency response of filter to quantisation error is minimised if filter is implemented as cascade of nd -order filters (can be shown)

Coefficient Quantisation For H ( z) = 1 1 (1 + b z + b z ) instability (oscillation) can occur for i.e., when poles of H(z) are either 1 (i) both on unit circle when complex, or (ii) one real pole outside unit circle b 1 < 1 Instability under the "effective pole" model is considered as follows

Effective Pole Model In the time domain, from H ( z) = Y ( z) X ( z) : y( n) = x( n) b1 y( n 1) b y( n ) With b 1, instability issue means Q[ b y( n ) ] is indistinguishable from y( n ), [ ] where represents quantisation operation Q 13

Effective Pole Model 14 With rounding, b y( n ) 0.5 and y( n ) ± are indistinguishable (for integers) if b y( n ) ± 0.5 = y( n ) Hence ± 0.5 y( n ) = 1 b With both positive and negative b : ± 0.5 y( n ) = 1 b b =1 is the effective pole for coefficient quantisation noise (oscillation)

Dead Band Limit Cycle 15 The range of integers ± 0.5 1 b constitutes a set of integers that cannot be individually distinguished as separate or distinguished from asymptotic system behaviour. The band of integers is known as the dead band. In the second order system, under rounding, the output assumes a cyclic set of values of the dead band. This is a limit cycle. Dead band hysteresis (no action in dead band) 0.5 1 b, + 1 0.5 b

Effective Pole: Oscillations 16 Consider the transfer function If poles are complex then discrete impulse (unit sample) response sequence is with G( z) = 1 1 (1 + b z + b z ) ρ = y h k k 1 = xk b1 yk 1 b yk k ρ =.sin k sinθ b, θ = cos [( + 1) θ ] 1 b 1 b h k

Effective Pole: Oscillations If b =1 then the response is sinusoidal (oscillatory) with angular frequency ω = T 1 1 b1 cos Thus, product quantisation causes instability implying an effective pole at b =1. 17

Limit Cycle of nd Order System: Example Consider infinite precision computations for yk = xk + yk 0. yk x0 = 10 10 8 6 1 9 (k=) (k=1) x y k k = = 0 ; 0; k k < 0 0 y k-1 4 0 - (k=3) response converges to the origin without limit -4-6 -8-10 -10-5 0 5 10 18 y k

Limit Cycle of nd Order System: Example Now the same operation with integer precision 10 8 6 4 0 - -4-6 -Reponse does not converge to the origin, but assumes cyclically a set of values with nondecreasing quantisation error: the Limit Cycle -System can only be driven out of its limit cycle only if new & sufficiently large input is applied 19-8 -10-10 -5 0 5 10 -Truncation (as opposed to rounding) can eliminate most limit cycles. However, truncation can cause biased quantisation error

Output Quantisation Linear modelling of product quantisation modelled as x(n) ~ x ( n ) [ ] Q x(n) ~ n + x ( n) = x( n) + q( ) q(n) Now interested in output of multiplier 0

Output Quantisation Recall: for rounding operations, q(n) is Q Q uniformly distributed between and, where Q is the quantisation step (i.e. in a word length of b bits with sign+magnitude b representation mod, Q = ) A discrete-time system with quantisation at the output of each multiplier may be considered as a multi-input linear system 1

Output Quantisation Each q λ (n) contributes to output accuracy Then y ( n) { (n)} = q1( n)... q( n)... q p ( n) x h(n) { y(n) } r = 0 x( r ). h( n r ) where h λ (n) is the impulse response of the system from λ th output of the multiplier to y(n). + p r = λ = 1 0 q λ ( r ). h λ ( n r )

Output Quantisation 3 Avoid output quantification error (clipping) by avoiding overflow in output caused by q For zero input (free running), i.e., x( n) = 0, n : y( n) p qˆ. h ( n r) λ λ= 1 r= 0 where qˆ λ is an upper bound for q λ ( r), λ, r Q This bound is the maximum error Hence Q p y( n). hλ ( n r) λ= 1 n= 0 λ

Output Quantisation However, error does not exceed unit step input: h λ ( n) n= 0 n= 0 h( n) hence y ( n) pq. n=0 h( n) (conservative) 4 Thus, we can estimate the maximum range at the output to avoid clipping from the system parameters h(n) and quantisation level Q