Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs

Size: px

Start display at page:

Download "Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs"

Angel Dalton
5 years ago
Views:

1 Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University 5. Oktober 2016 Computer Science, Electrical Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold Häb-Umbach

2 Table of Contents 1 Problem formulation and motivation 2 Causal DNN-based noise PSD estimator 3 Experimental evaluation 4 Conclusions and outlook A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 1 / 8

3 Noise PSD estimation in spectral speech enhancement Clean speech s(t) contaminated by an additive noise d(t) resulting in y(t) = s(t)+d(t) STFT - Y(k, l) = S(k, l)+d(k, l) Y(k, l) ˆγ(k, l) Calculate Noise spectral 2 PSD gain estimator ˆλD (k, l) function G(k, l) Ŝ(k, l) DNN Noise power spectral density (PSD) λ D (k, l) = E [ D(k, l) 2] is used in a priori SNR estimation and calculation of gain function G(k, l) From Y(k, l) 2 in presence of non-stationary noise challenging task Use a Deep Neural Network (DNN) for noise PSD estimation! But how? A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 1 / 8

4 Techniques of 10 state-of-the-art noise PSD trackers techniques All estimators are causal! 1. SPP estimation 2. Minimum search 3. Bias compensation 4. Bayesian inference 5. Output smoothing noise PSD trackers 1. SPP-based (Hirsch-93, Gerkmann-12) 2. MS-based (Martin-01, Chinaev-15) 3. MCRA-based (Cohen-02, Fan-07, Kum-09) 4. IMCRA (Cohen-03) 5. MMSE-SPP (Yu-09) 6. MMSE-BM (Hendriks-10) proposed DNN-based A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 2 / 8

5 Table of Contents 1 Problem formulation and motivation 2 Causal DNN-based noise PSD estimator 3 Experimental evaluation 4 Conclusions and outlook A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 3 / 8

6 LSTM network for causal estimation of noise spectral mask DNN task: estimate a soft noise spectral mask M D (k, l) from Y(k, l) by targeting an ideal binary mask (IBM) for noise { 1, D(k, l) > 10 S(k, l) IBM D (k, l) = 0, else. Table : Network configuration for STFT window length of 1024 Layer Units Type Non-Linearity p dropout L1 512 LSTM Tanh 0.5 L FF ELU 0.5 L FF ELU 0.5 L4 513 FF Sigmoid 0.0 Causality: estimation of M D (k, l) based only on data of previous frames M D (k, l) [0, 1] is soft noise-only presence probability estimation A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 3 / 8

Example of noise-only presence probability

von Gerkmann-12 max 1 frequency 0 frequency

structured noise spectral mask M D (k, l) A.

7 Example of noise-only presence probability (NPP) estimation Noisy spectrogram (1-SPP) von Gerkmann-12 max 1 frequency 0 frequency time min time 0 IBM for noise DNN-based NPP estimate 1 1 frequency frequency time 0 time 0 DNN provides smoothed however structured noise spectral mask M D (k, l) A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 4 / 8

8 Output smoothing using DNN-based noise spectral mask Noisy spectrogram DNN-based noise spectral mask max 1 frequency 0 frequency time min time 0 Low-complexity NPP-based noise PSD estimator with Output smoothing ˆλ D (k, l) = [1 M D (k, l)] ˆλ D (k, l 1)+M D (k, l) Y(k, l) 2 controlled by a soft noise spectral mask M D (k, l) [0, 1] Speech presence M D (k, l) = 0: ˆλD (k, l) = ˆλ D (k, l 1) hold Noise only M D (k, l) = 1: ˆλ D (k, l) = Y(k, l) 2 update A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 5 / 8

9 Table of Contents 1 Problem formulation and motivation 2 Causal DNN-based noise PSD estimator 3 Experimental evaluation 4 Conclusions and outlook A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 6 / 8

10 Performance measures and experimental setup Evaluation of noise PSD estimation Noise PSD reference: periodogram D(k, l) 2 Measures: log-error mean (LEM) and log-error variance (LEV) Impact on enhanced signal Mean opinion score - listening quality objective (MOS-LQO) Output global signal-to-noise ratio SNR out Development data of CHiME-3 database (5 th microfone) 4 highly non-stationary noisy environments (bus, cafe, ped, str) 3 h of speech with averaged MOS-LQO in = 1.29 and SNR in 5.6 db 25% for parameter optimization of 10 used noise PSD trackers 75% for comparison with the proposed DNN-based approach A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 6 / 8

11 Experimental results on CHiME-3 challenge LEV state-of-the-art noise trackers for recommended (rec) parameters (a) the smaller the better (b) the bigger the better opt param rec Hirsch Gerkmann-12 Martin-01 Chinaev-15 Cohen-02 Fan Kum-09 Cohen-03 Yu-09 Hendriks-10 DNN-based LEM SNR out Optimized (opt) parameter over all performance measures scaled on [0, 1] Improvement of 10% in LEM and 24% in SNR out Length of the window for Minimum search 0.25s compared to [0.6, 1.1]s DNN-based approach clearly outperforms all state-of-the-art noise trackers MOS-LQO A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 7 / 8

12 Table of Contents 1 Problem formulation and motivation 2 Causal DNN-based noise PSD estimator 3 Experimental evaluation 4 Conclusions and outlook A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 8 / 8

13 Conclusions and outlook Conclusions 10 state-of-the-art noise PSD trackers are categorized and optimized DNN-based causal noise-only presence probability estimator proposed In nonstationary noise compared to considered noise PSD trackers Reduced estimation error and estimator s variance Better trade-off between speech quality and noise reduction Outlook Applying DNN for speech presence probability estimation used in gain functions with speech presence uncertainty A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach 8 / 8

14 Thank you for your attention! Questions? Paderborn University Department of Communications Engineering Web: nt.upb.de Computer Science, Electrical Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold Häb-Umbach

A Priori SNR Estimation Using Weibull Mixture Model

A Priori SNR Estimation Using Weibull Mixture Model 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University