Noise Reduction. Two Stage Mel-Warped Weiner Filter Approach

Size: px

Start display at page:

Download "Noise Reduction. Two Stage Mel-Warped Weiner Filter Approach"

Aubrey Fowler
6 years ago
Views:

1 Noise Reduction Two Stage Mel-Warped Weiner Filter Approach

Intellectual Property Advanced front-end feature extraction algorithm ETSI ES 202 050 V1.

2 Intellectual Property Advanced front-end feature extraction algorithm ETSI ES V1.1.3 ( ) European Telecommunications Standards Institute ETSI Technical Committee Speech Processing, Transmission and Quality Aspects (STQ).

3 Noise Reduction Based on Weiner filter theory Noise reduction is performed in two stages Input signal is de-noised in the first stage. Second stage dynamic noise reduction based on SNR of processed signal

4 First Stage PSD Spectrum Mean WF Mel Mel Apply Estimation Design Filter-Bank IDCT Filter VADNest To Second Stage

5 Second Stage From First Stage PSD Spectrum Mean WF Mel Gain Mel Apply Estimation Design Filter-Bank Factorization IDCT Filter OFF Output

Buffering 1 frame = 80 samples 1 buffer = 4 frames Buffer 1 Buffer 2 0 1 2 3 0 1 2 3 A B

6 Buffering 1 frame = 80 samples 1 buffer = 4 frames Buffer 1 Buffer A B C D E F G H De-noised (1 st Stage) De-noised (output) B C D new F G H A De-noised (output)

7 Spectrum Estimation Input signal is divided into overlapping frames of N in = 200 samples. A 25ms frame length and 10ms frame shift (80 samples) are used. Each frame S w (n) is windowed with a Hanning window of length N in.

8 Spectrum Estimation s w (n ) s in (n ) w Hann (n ) w where Hann ( n) cos 2 ( n N in 0.5) Padding from N in up to N FFT -1, N FFT = 256 S FFT (n ) s (n ), 0 n N 1 w in 0, Nin n N FFT 1

9 Spectrum Estimation Frequency representation: S ( n), where bin frequency index X ( bin ) FFT FFT Power spectrum: P 2 bin X bin, 0 bin 2 N FFT Smoothing: Pin bin P 2 bin P (2 2 bin 1), 0 bin N FFT 4

10 Power Spectral Density Mean Compute for each P in (bin) the mean over the last T PSD = 2 frames. P in _ psd bin, t i 0 P in bin, t 1

11 Wiener Filter Design A forgetting factor (weight) is computed for each frame, λ NSE. If (t < 100 frames) λ NSE = 1 1/t else λ NSE = 0.99

12 Wiener Filter Design First stage noise spectrum estimate is updated based on VAD flag: If flag = 0 P 1/2 noise(bin,t n ) = min(λ NSE P 1/2 noise(bin,t n -1)+(1- λ NSE ) PSD mean,exp(-10)) If flag = 1 P 1/2 noise(bin,t) = P 1/2 noise(bin,t n ) (last non speech frame)

Wiener Filter Design Second stage is updated permanently: If (t < 11) else P noise (bin,t) = λ NSE P noise (bin,t n -1)+(1- λ NSE ) PSD mean update = 0.

13 Wiener Filter Design Second stage is updated permanently: If (t < 11) else P noise (bin,t) = λ NSE P noise (bin,t n -1)+(1- λ NSE ) PSD mean update = P inpsd (bin,t)/(p inpsd (bin,t)+ P noise (bin,t-1) ) (1+1/(1+0.1 P inpsd (bin,t) /(P inpsd (bin,t-1))) P noise (bin,t) = P noise (bin,t-1) update

14 Wiener Filter Design Noiseless spectrum is estimated: P 1/2 den(bin,t) = 0.98 P 1/2 den(bin,t-1)+(1-0.98) T[PSD mean -P 1/2 noise(bin,t) ] where the threshold function T is T z bin, t z bin 0, t if z( bin, t) otherwise 0

15 Wiener Filter Design The priori SNR is calculated: bin, t P P den noise bin bin, t, t The filter transfer function is H bin, t 1 bin, t bin, t

16 Wiener Filter Design The filter transfer function is used to improve noiseless signal estimation: P 1 2 bin, t H bin, t P bin t 1 2, den 2 inpsd The improved priori SNR is: bin, t 2, bin, t Pden 2 bin, t max 2 22 db Pnoise

17 Voice Activity Detection VAD is used to detect noise frames Find frame energy: If frame threshold < 10 long term energy factor ( LTE ) = 1-1/t Else LTE = 0.97; Calculate frame energy: M 1 64 S in n i 0 frameen 0.5 ln 64 2

18 Voice Activity Detection Use frame energy to update mean energy: If frame energy - mean energy < 20 (SNR threshold) or t < 10 Then if (frameen < meanen) or (t < 10) Else If (meanen < 80) meanen = meanen + (1 - LTE ) * (frameen - meanen) meanen = meanen+(1-0.99) * (frameen - meanen) meanen = 80

19 Voice Activity Detection Is the current frame speech? If t > 4 if (frameen - meanen) > 15 IT IS SPEECH nbspeechframe++ else if nbspeechframe > 4 hangover = 15, nbspeechframe = 0 if (hangover!= 0) IT IS SPEECH else IT IS NOT SPEECH

20 Mel Filter Bank The linear frequency Weiner filter coefficients are smoothed and transformed to the Melfrequency scale. The mel scale is a scale of pitches judged by listeners to be equal in distance one from another.

Discrete Cosine Transform: h WF 24 n H k IDCT k, n 0 n 24 IDCT k 0 mel 2

21 Mel IDCT The time-domain impulse response of the Wiener filter is computed from the Mel-Wiener filter coefficients by using Mel-warped inverse Discrete Cosine Transform: h WF 24 n H k IDCT k, n 0 n 24 IDCT k 0 mel 2 mel mel k 2 n f centr k, n cos df ( k ) f samp df k f centr k 1 f k 1 f samp centr

22 Gain Factorization Factorization of the Wiener filter Mel-warped coefficients is performed to control the aggression of noise reduction in the second stage. The de-noised frame signal energy is calculated as: E 65 1 / 2 t P bin t, den den 3 bin 0

23 Gain Factorization The noise energy of the current frame is estimated as: E 65 1 / 2 t P bin t, noise noise bin 0

24 Gain Factorization The smoothed SNR is evaluated using 3 denoised frame energies and the noise energy Ratio E den E If (Ratio > ) Then Else t 2 E t 1 E t den den t E t E t noise noise SNR avg (t) = 6.67 log 10 (Ratio) SNR avg (t) = noise

25 Gain Factorization To decide the degree of aggression, the SNR is tracked: If {(SNR avg (t) SNR low-track (t-1)) < 10 or t < 10} calculate λ SNR (t) SNR low-track (t) = λ SNR (t) SNR low-track (t -1)+(1- λ SNR (t)) SNR avg (t) Else SNR low-track (t) = SNR low-track (t -1)

26 Gain Factorization Gain factorization applies more aggressive noise reduction to purely noisy frames and less to frames containing speech. The aggression coefficient takes on a value of 10% for speech + noise frames and 80% for noise frames.

27 Apply Filter The causal impulse response is obtained, truncated and weighted by a Hanning window. The input signal is filtered with the filter impulse response to produce the noise-reduced signal.

28 Offset Compensation A filter is used to remove the DC offset over the frame length interval (80 samples). S nr _ of ( n) S ( n) S ( n 1) (1 1 / 1024 ) S nr nr nr _ of ( n 1) Where Snr is the noise reduced signal

29 Results Noisy test file: After de-noise:

30 Results Footloose: Not Footloose:

31 Results: why didn t this work? Hair dryer: Still there?!?!:

32 Results Hair dryer: Gone:

Feature extraction 1

Centre for Vision Speech & Signal Processing University of Surrey, Guildford GU2 7XH. Feature extraction 1 Dr Philip Jackson Cepstral analysis - Real & complex cepstra - Homomorphic decomposition Filter