Digital Speech Processing Lecture Short-Time Fourier Analysis Methods - Filter Bank Design
Review of STFT j j ˆ m ˆ. X e x[ mw ] [ nˆ m] e nˆ function of nˆ looks like a time sequence function of ˆ looks like a transform j ˆ X e defined for nˆ 3,,,...; ˆ π nˆ m j ˆ. Interpretations of X e nˆ j. nˆ fixed, ˆ variable; X ˆ nˆ e DTFT x[ mw ] [ n m] DFT View OLA implementation n nˆ ˆ X e x n e w n j ˆ j ˆ n. variable, fixed; n [ ] [ ] Linear Filtering view filter bank implementation FBS implementation
Review of STFT 3. Sampling Rates in Time and Frequency ˆ j ˆ recover xn [ ] exactly from Xnˆ e j. time: We has bandwidth of BHertz B samples/sec rate FS Hamming Window: B Hz sample at L 4F S Hz or every L/4 samples L. frequency: wn [ ] is time limited to Lsamples need at least L frequency samples to avoid time aliasing 3
Review of STFT with OLA method can recover xn exactly using lower sampling rates in either time or frequency, e.g., can sample every L samples and divide by window, or can use fewer than L frequency samples filter bank channels; but these methods are highly subject to aliasing errors with any modifications to STFT can use windows LPF that are longer than L samples and still recover with N < L frequency channels; e.g., ideal LPF is infinite in time duration, but with zeros spaced N samples apart where F / N is the BW of the ideal LPF S 4
Review of STFT H e j Xe j H e j + Ye j H N- e j h [ n] w[ n] e k N % j j He H e k N hn %[ ] h[ n] δ [ n] wnpn [ ] [ ] k k j n pn [ ] N δ [ n rn] r hn %[ ] N wrn [ ] δ [ n rn] w[ ] + wn [ ] +... r k k need to design digital filters that match criteria for exact reconstruction of xn [ ] and which still work with modifications to STFT 5
Tree-Decimated Filter Banks 6
Tree-Decimated Filter Banks can sample STFT in time and frequency using lowpass filter window which is moved in jumps of R<L samples if L N where: L is the window length, R is the window shift, and N is the number of frequency channels for a given channel at k, the sampling rate of the STFT need only be twice the bandwidth of the window Fourier transform down-sample STFT estimates by a factor of R at the transmitter up-sample back to original sampling rate at the receiver final output formed by convolution of the up-sampled STFT with an appropriate lowpass filter, f [n] 7
Filter Bank Channels Fully decimated and interpolated filter bank channels; a analysis with bandpass filter, downshifting frequency and down-sampling; b synthesis with up-sampler followed by lowpass interpolation filter and frequency up-shift; c analysis with frequency down-shift followed by lowpass filter followed by down-sampling; d synthesis with up-sampling followed by frequency up-shift followed by bandpass filter. 8
Full Implementation of Analysis-Synthesis System Full implementation of STFT analysis/synthesis system with channel decimation by a factor of R, and channel interpolation by the same factor R including a box for short-time modifications. 9
Review of Down and Up-Sampling x[n] x [n] xd [ n] x[ nr] R j R j πr/ R r X e X e d aliasing addition R xn [ / R] n, ±, ±,... xu[ n] otherwise j jr X e X e u imaging removal
Filter Bank Reconstruction assuming no short time modifications k X k X e w rr m x m e rr N k yn PkY rr e e N k N Pk [ XrR[ kfn ] [ rr] ] e N k rr j j m m W rr j j n xm wrr mfn rr Pke m WrR r N k recognizing that when Pk, k N j n m N k e δ n m qn k q k j n k k N j n m can show that the condition for yn xn, nis r wrr n+ qnfn rr δ q m n k
Filter Bank Reconstruction frequency domain equations, assuming: jkn jkn h [ n] w[ n] e ; g [ n] f[ n] e ; k k k jk j k jk j k ; ; H e W e G e F e k R l jk N j Ye PkG k e N R j Xe RN π π j l j l R R Hk e X e N k k jk j PkG e H e + R π N π j l j j R l k R Xe PkGk e Hk e l RN k j j He % Xe + aliasing terms k k
Filter Bank Reconstruction conditions for perfect reconstruction ˆ j j Ye Xe N j j k j He % PkG k e Hk e RN k and RN N k π j l R jk PkG e H e for l,,..., R k Flat Gain k Alias Cancellation 3
Filter Bank Reconstruction 4
Decimated Analysis- Synthesis Filter Bank practical solution for the case wn fn, N R -band solution where the conditions for exact reconstruction become 4 and j j j π j π P F e W e + P F e W e j j π j π j π P F e W e 4 + P F e W e j j aliasing cancels exactly if P P with F e W e giving We We e 4 yn ˆ xn M j j π jm 5
Decimated Analysis- Synthesis Filter Bank when aliasing is completely eliminated by suitable choice of filters, the overall analysis-synthesis system has frequency response: where N j j k PkW [ ] e e RN k He % j j j We e W e F e h% [ n] w [ n] p[ n] where e w [ n] w[ n] f[ n] e N jk pn [ ] Pk [ ] e RN n k 6
Decimated Analysis- Synthesis Filter Bank To design a decimated analysis/synthesis filter bank system, need to determine the following:. the number of channels, N, and the decimation/interpolation ratio R. Usually determined by the desired frequency resolution. the window, w[n], and the interpolation filter impulse response, f [n]. These are lowpass filters that should have good frequency-selective properties such that the bandpass channel responses do not overlap into more than one band on either side of the channel. 3. the complex gain factors, P[k], k,,,n-. These constants are important for achieving flat overall response and alias cancellation. 7
Maximally Decimated Filter Banks we show here that it is possible to obtain exact reconstruction with RN and N<L. RN is termed maximal decimation because this is the largest decimation factor that can be used with an N- channel filter bank and still achieve exact reconstruction. nominal bandwidth of the bandpass channel signal is increased from ±π/n to ±π down-sampling by N accomplishes frequency downshifting normally accomplished by modulation 8
Maximally Decimated Filter Banks 9
Analysis-Synthesis Filter Bank all modulators removed
Two Channel Filter Banks
Two Channel Filter Banks For this two-band system, we have: R N ; πk, k, k Applying the frequency domain analysis we get: j j j j j j Ye G e H e + G e H e Xe j j π j j π j π + G e H e + G e H e X e not including multiplier / N or the gain factors P[], P[] Aliasing can be eliminated completely by choosing the filters such that: j j π j j π G e H e + G e H e aliasing cancelation Perfect reconstruction requires: j j j j + flat gain condition G e H e G e H e
Quadrature Mirror Filter QMF Banks Alias cancellation achieved if the filters are chosen to satisfy the conditions: h n e h n H e H e jπn j j π [ ] [ ] j g [ n] h [ n] G e H e j g n h n G e H e j j π [ ] [ ] Note that there is only a single lowpass filter, with impulse response h [ ] n on which all other filters are based; filter has nominal cutoff of π / rad with a narrow transition to a stopband with adequate attenuation to isolate the two bands The filter h[ n] e h [ n] is the complementary highpass filter jπ n The filters h [ n] and h[ n] are called Quadrature Mirror Filters QMF 3
Quadrature Mirror Filter QMF Banks Substituting for the filter relationships we get: H e H e H e H e j j π j π j π i.e., perfect alias cancellation for any choice of lowpass filter j j j π Y e H e H e j % j j X e H e X e For perfect reconstruction we require that: j j π jm H e H e e i.e., flat magnitude with delay of M samples, due to the delay of the causal filters of the filter bank 4
Quadrature Mirror Filters Assume FIR lowpass filter of length L samples such that h [ n] h [ L n], n L ; this filter has linear phase with a delay of L / samples and a frequency response of the form: H e A e j j j L / e L j where A e is real. Using this lowpass filter in the must be odd filter bank, the condition for perfect reconstruction becomes: He % A e e A e e If L is odd, we get: j j jπ L j π j L j j j j L H% π e A e A e + e linear phase is guaranteed, but not perfect reconstruction 5
Quadrature Mirror Filters For flat gain we need to satisfy the condition: j j A e A e π + Solution is to use high order linear phase QMF filters Johnston proposed an algorithm for design of the basic lowpass filter that minimized the deviation from unity while providing a desired level of stopband attenuation 6
Quadrature Mirror Filters π h[ n] w[ n] g[ n] j n h[ n] w[ n] e g [ n] 7
Quadrature Mirror Filters Lowpass filter designed by windowing Highpass filter is mirror image of lowpass filter practical realization of QMF filters note that aliasing is cancelled, but the overall frequency response is not perfectly flat 8
Quadrature Mirror Filters 9
Issues with Perfect Reconstruction The motivation for the subband approach is to process different speech properties independently in each band and thus being able to localize the band compact bandwidth is important. Also,based on the original motivation of short-time analysis, temporal focus within a limited time duration is also important. Filter bank design is thus result of a trade-off between perfect reconstruction and bandwidth concentration in a joint criterion: J α π s j j W e d + α T e d Stop-band leakage Deviation from PR π When processing e.g., quantization, a non-linear operation is involved, harmonic distortions are inevitable and perfect reconstruction cannot be achieved, as seen in the simple example. There exists a design procedure Smith-Barnwell for QMF filters that attempts to meet the design constraints as best as possible 3
3. Start with a sequence of length, say, L- L even that satisfies that is, even samples, except at n., the value of the sequence at n, needs to be large enough to satisfy the positivity condition see below.. Factor such that all roots of are inside the unit circle. 3. Positivity condition: 4. The mirror channel: 5. To eliminate aliasing: Smith-Barnwell QMF Design { } z W z W n g z G Z C ; n C n g n g n δ + W z > j j j j e W e W e W e G, L N e e W e W N j j j or equivalently n N w n w z z W z W n N or equivalently n w n f z W z F n or equivalently n w n f z W z F n and z W z F z W z F
Smith-Barnwell QMF Design - Example L 8, N L 7 G z { g n } W z W Z z Length of w n L, then g n has length L choose sin πn / g n, n, ±, ±, L, ± 7 πn g n [-.455.6 Factorize G z W z W z w n [.89 n w n w N n [-.46 f.383 at n.346 -.8...637,. is added -.33 -.6 to -.3 -. -. satisfy th -.6.637 e positivity.59.8.. condition.383 W z :length L with all roots inside the unit circle.59.3 -.33 -.346 -.455] -.46].89 n f n w n [-.46.8.59 -.3 -.33.346.89 ] n n w n [.89 -.346 -.33.3.59 -.8 -.46] all arrays start with n -] 3
QMF Frequency Responses Example Magnitude db Phase degree, x 5-5 - -5 - - -4-6 -8 - - Normalized frequency xπ 33
Tree-Structured Filter Banks 34
Tree-Structured Filter Banks 35
Sampling Pattern for Tree- Structured Filter Banks 36
Tree-Structured Filter Banks 37
Tree-Structured QMF Filterbank 38
Quantized Channel Signals δ [n] h [ ] n e [ n ] g [ n] δ [ n ] yˆ [ n] yˆ [ n] δ [ n ] h [ ] g [ n ] n e [ n ] δ [n] yˆ n [ ] Quantization noise is concentrated in the band that it is generated in. j j + j e e σ σ P e G e G e ˆ y 39
Subband Coding advantages of subband coding each subband can be encoded according to the perceptual criterion appropriate to that band quantization noise can be confined to the band that produces it low energy bands can be encoded so as to produce less noise applicable to coding in the 9.6-6 Kbps range 4
Practical Example of Subband Coder subband coder using tree-structured filer bank Crochiere, 98 4
Subband Coder Subjective Quality 4
Subband Coder Waveforms 43
Concept of Time-Frequency Resolution When interested in the changing characteristics of the signal, short time Fourier transform or short time spectral analysis is used. j F, t f τ g t τ e τ dτ Temporal spread : σ t gt is the window function providing focus in time. t t c g t dt g t Frequency spread : σ c G d G d If g t is Gaussian, it achieves the minimum time - bandwidth product σ t σ.5. Continuous prolate spheroidal wave function: a bandlimited signal that have the highest energy concentration in a specified time interval. dt To maintain a similar level of uncertainty, the window function should be short to provide time resolution for high frequency components and long to provide frequency resolution for low frequency components. 44
Wavelet-Based Methods Analysis Filters Subband method Synthesis Filters Band R Q Q - R Band Band R Q Q - R Band........ Band N R Q Q - R Band N Analysis Filters Wavelet Transform Wavelet method R Q Q - R R Q Q - R R Q Q - R Synthesis Filters Inverse Transform What is in the Wavelet Transform block? What is the difference? 45
Wavelet Transform Built on set of expansion or basis functions that are derived from a mother wavelet through scaling and translation: t τ Let ψ t be a mother wavelet : ψ τ t ψ Continuous Wavelet Transform: * * t τ * τ F w a, τ f t ψ a, τ t dt f t ψ dt f τ * ψ a a a a * τ F w a, τ f at ψ t dt stretch or compress the signal for analysis a a a, τ scalogram; F, τ spectrogram Let Ψ F{ ψ t} and Ψ a, τ F{ ψ a, τ t} Inverse transform: Admissibility condition: f t F w a, Fw a, τ ψ a, τ t Cψ a Ψ C ψ d < a dadτ a Ψ 46
47 Wavelet and Fourier Transform a f a dt a t t f a a F w τ ψ τ τ ψ τ * * *, } { } {,, t t a a τ τ ψ ψ F and F Ψ Ψ } { t f F F τ τ τ τ ψ ψ j a a e a a a t a t Ψ Ψ,, *,, * τ ψ τ τ a af a f a a F a W w Ψ F F transform. the wavelet is the frequency domain representation of, a W
Wavelet and Its Spectrum Ψt σ Ψ t / a aσ, < a < / a σ a σ,< a / a / a 48
Scaling and Translation For discrete wavelet transform, a m m a and τ nτ a ψ a, τ m n τ If a ψ and τ m/ m m, n t ψ t and, m, n are integers m/ m t ψ, t a ψ a t n m, n Z n m, n Z If the set {, t} is complete, it is called affine wavelets. F ψ m n Orthogonality condition : ψmn t ψ pq t dt δ m p δ n q m/ m w a, τ wm, n a f t ψ a t nτ dt f w m t t, nψ m, n m n *,, dyadic or octave sampling The orthogonality condition may not be easy to satisfy for a general set of wavelets; but it is still possible to invert discrete wavelet transform, sometimes. 49
STFT and CWT 3 5 4 8 frequency time frequency time Constant bandwidth Constant-Q bandwidth 5
Discrete Wavelet Transforms 5
Other Simpler Transforms or Filter Banks Short-time analysis can also be viewed as data transformation, executed in a block-by-block manner, using for example: Discrete Cosine Transform Discrete Sine Transform Filter bank based on DFT/FFT General remarks: Discussion so far aims at a filter bank framework that allows perfect reconstruction i.e., no loss of information in the original signal if desired; But many speech processing systems do not require perfect reconstruction; rather, one may want to focus on extraction of key parameters or features from the speech signal; Therefore, other analysis methods e.g., transformations that do not have inverse may still apply. 5
DFT and DCT N-point DFT Discrete Fourier Transform X k N n x n e jπ / N kn N ~ j N k j N kn x π / π / n X e e x n + rn N k r implies Possible excessive discontinuity at edges N-point symmetric N-point DFT N-point DCT Relates to real part of N-point DFT when the signal is symmetric X k N n x n e jπ / N kn and x n xn n X k N n x ncosπkn / N But, be careful about the point of symmetry!! 53
This image cannot currently be displayed. Discrete Cosine Transforms Extensively used in audio, image and video N knπ c x + x N + x ncos n N k DCT-I k DCT-II c k c kπ N x n cos n + n N k Most common DCT; equivalent up to a scale factor of to a DFT of 4N real inputs of even symmetry with the even-indexed elements set to zero; i.e., an artificial shift of half sample in the original sampling setup. DCT-III IDCT N nπ x + x n cos k + n N even around x and odd around xn ; c k is even around k and odd around k N. Still more variants. 54
Discrete Sine Transform S [ ] N s ij i, j -D case; eliminate one dimension for -D s ij j + i + π sin, i, j,,, L, N N + N + Relates to imaginary part of ~N-point DFT 3-point sequence 8-point antisymmetric sequence Sequence corresponds To 8-point DFT 55
56 Discrete Time Fourier Transform - Revisited The DTFT discrete time Fourier transform of an N- point sequence is Sample the DTFT at The result is the DFT Discrete Fourier Transform If we compute the inverse DFT, we obtain k π/nk, k,,k, N. continuous. Frequency is N n n j j e n x e X + r N k kn N j k N j rn n x e e X N n x ~ / / π π / / k X e n x e X N n kn N j N k j π π
DFT for Power Spectral Density Estimation X e j N n x n e jn P k S X e j : k th channel processing weighting function Examples of non-uniform filter banks S k S P, k k Perfect reconstruction may not be a requirement. 57