A Priori SNR Estimation Using Weibull Mixture Model

Size: px

Start display at page:

Download "A Priori SNR Estimation Using Weibull Mixture Model"

Edwina Lawson
6 years ago
Views:

1 A Priori SNR Estimation Using Weibull Mixture Model 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University 7. Oktober 2016 Computer Science, Electrical Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold Häb-Umbach

2 Table of contents 1 Problem formulation and motivation 2 A priori SNR estimation based on Weibull mixture model 3 Experimental evaluation 4 Conclusions and outlook A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 1 / 10

3 Problem formulation and motivation Single-channel clean speech s(t) contaminated by an additive noise n(t): y(t) = s(t)+n(t) STFT - Y(k, l) = S(k, l)+n(k, l) Noise PSD tracker ˆλN(k, l) noise power spectral density (PSD) k - frequency bin l - frame index Y(k, l) 2 Y(k, l) 2 A priori SNR estimator ˆξ(k, l) Gain G(k, l) Ŝ(k, l) ŝ(t) ISTFT function A priori SNR ξ(k, l) = λ S(k,l) λ N a key component in enhancement system (k,l) λ S (k, l) = E [ S(k, l) 2] - clean speech PSD, λ N (k, l) = E [ N(k, l) 2] - noise PSD Motivated by a generalized spectral subtraction (GSS) denoising Y(k, l) α for α R >0 not restricted to (α = 1) or (α = 2) with assumption Y(k, l) α = S(k, l) α + N(k, l) α A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 1 / 10

4 Table of contents 1 Problem formulation and motivation 2 A priori SNR estimation based on Weibull mixture model 3 Experimental evaluation 4 Conclusions and outlook A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 1 / 10

5 Normalized α-order magnitude (NAOM) domain A priori SNR estimator Y(k, l) 2 ˆλN(k, l) Estimate PSα(k) and go into NAOM domain Yα(k, l) λnα(k, l) Estimate parameter of WMM psα(s) λm(k, l) πm(k, l) Estimate clean speech NAOMs Ŝα(k, l) Calculate ˆξ(k, l) a priori SNR Normalize Y(k, l) α to a root of an averaged power P Sα (k) of S(k, l) α Y α(k, l) = Y(k, l) α = Sα(k, l)+nα(k, l) with PSα (k) P Statistical models independent of speaker loudness S α (k) = 1 L Normalized energy of clean speech NAOMs E[S 2 α (k)] = 1 L S(k, l) 2α S α(k, l) & N α(k, l) realizations of random variables S α(k) & N α(k) l=1 Estimate S α(k, l) from Y α(k, l) given models for S α(k)& N α(k) A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 2 / 10

6 Modeling of noise NAOM coefficients N α (k, l) N(k, l) N c(n; 0, λ N (k, l)) N α(k, l) Weibull distributed p Nα(k,l)(n) = Weib(n; λ Nα (k, l), α) Shape parameter α R >0 Scale parameter λ Nα (k, l) = λ N(k, l) R α >0 PSα (k) Model N α(k) with Weibull PDF p Nα(k)(n) = Weib(n; λ Nα (k), α) with λ Nα (k) = 1 L λ Nα (k, l) L l=1 NAOM coefficients of white noise signal and estimated p Nα(k)(n) Weib(n; 1, α) p Nα(n) Weibull PDF for λ = 1 and different α n Histogram and Weibull PDF for α = Noise NAOMs Weibull PDF n A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 3 / 10

7 Modeling of NAOM coefficients of clean speech S α (k, l) S(k, l) N c(n; 0, λ S (k, l)) Bimodal Weibull mixture model (WMM) to model S α(k) 2 p Sα(k)(s) = π m(k) Weib(s; λ m(k), β) m=1 10 Histogram and estimated WMM Clean speech NAOMs Bimodal WMM m = 1 component m = 2 component m = 1 : silence m = 2 : activity π m(k) [0, 1]: weights λ m(k): scale parameters β: shape parameter p Sα (s) 1 β α : additional degree of freedom in the model Clean speech NAOMs & estimated WMM (α = 0.7; β = 2.5) s A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 4 / 10

8 Estimation of WMM parameters and clean speech NAOMs A priori SNR estimator Y(k, l) 2 ˆλN(k, l) Estimate PSα(k) and go into NAOM domain Yα(k, l) λnα(k, l) Estimate parameter of WMM psα(s) λm(k, l) πm(k, l) Estimate clean speech NAOMs Ŝα(k, l) Calculate ˆξ(k, l) a priori SNR Set λ 1(k) acc. to ξ min usually used in a priori SNR estimation [Cappe 94] Expectation Maximization algorithm to estimate λ 2(k), π m(k) After EM, weights π m(k) are corrected with the constraint E[S 2 α(k)] = 1 A priori SNR estimator Y(k, l) 2 ˆλN(k, l) Estimate PSα(k) and go into NAOM domain Yα(k, l) λnα(k, l) Estimate parameter of WMM psα(s) λm(k, l) πm(k, l) Estimate clean speech NAOMs Ŝα(k, l) Calculate ˆξ(k, l) a priori SNR Maximum a posteriori (MAP) estimation: Ŝα MAP (k, l) = argmax p Sα(k) Y α(k,l)(s y) s Y α(k, l) is a realisation of random variable Y α(k) = S α(k) + N α(k) Approximative computationally efficient solution for β = α = 1 A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 5 / 10

9 Calculation of a priori SNR and causal implementation A priori SNR estimator Y(k, l) 2 ˆλN(k, l) Estimate PSα(k) and go into NAOM domain Yα(k, l) λnα(k, l) Estimate parameter of WMM psα(s) λm(k, l) πm(k, l) Estimate clean speech NAOMs Ŝα(k, l) Calculate ˆξ(k, l) a priori SNR Go back into domain of power spectral density by calculating [Ŝα(k, l) P ] 2 α Sα(k) ˆξ(k, l) = max, ξ min λ N (k, l) Causal implementation of WMM-based a priori SNR estimators Calculate P Sα (k) and λ Nα (k) in a causal way Causal EM for λ 2(k) and π 2(k) with one EM-iteration per time frame Note, parameters α and β have to be set appropriately optimization A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 6 / 10

10 Table of contents 1 Problem formulation and motivation 2 A priori SNR estimation based on Weibull mixture model 3 Experimental evaluation 4 Conclusions and outlook A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 6 / 10

11 Experimental evaluation Data and setup Clean speech: Wall Street Journal database 16 khz (male and female) 7 different noise types of Noisex92 database: white, pink, f16, hfchannel, factory-1, factory-2, babble Input global SNR from 5 db up to 25 db in 5 db steps Spectral speech enhancement framework Noise PSD tracking using Minimum statistics approach [Martin 01] A priori SNR estimation with ξ min = 18 db [Cappe 94] Proposed WMM-based approach with Wiener filter Reference approach: Decision Directed [Ephraim 84] A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 7 / 10

12 Optimization of α and β Speech quality maximization in terms of wide-band mean opinion score listening quality objective (MOS-LQO) with MOS-LQO = max( MOS-LQO WMM MOS-LQO DD, 0) Averaging over genders, noise types and input global SNR values (α opt, β opt) = (0.64, 2.7) MOS-LQO β α A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 8 / 10

13 Final experimental results Clean speech: WSJ database signals other than used for optimization Estimation error Itakura-Saito distance (ISD) and estimator s variance logarithmic error variance (LEV): the smaller the better Resulting ISD, LEV and MOS-LQO values averaged over noise types SNR, db AVG ISD LEV MOS-LQO DD WMM DD WMM DD WMM A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 9 / 10

14 Conclusions and outlook Conclusions Novel causal a priori SNR estimator based on a bimodal Weibull mixture model for the normalized α-order spectral magnitudes (NAOMs) Optimization of the proposed approach by maximization of speech quality Power exponent α opt = 0.64 smaller than 1 (spectral magnitudes) Shape factor β opt = 2.7 a heavier tailed Weibull distribution Compared to the wide-spread Decision Directed approach: Reduced error and variance of the WMM-based a priori SNR estimator Improvement of speech quality of the enhanced signals Higher computational effort Outlook Reduction of computational effort fixed speaker-independent models Development of model-based spectral enhancement using generalized (arbitrary) power exponent in the spirit of generalized spectral subtraction A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 10 / 10

15 Thank you for your attention! Questions? Paderborn University Department of Communications Engineering Web: nt.upb.de Computer Science, Electrical Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold Häb-Umbach

16 Resulting WMM parameter and audio samples log(λ) π k Figure : Resulting WMM parameter over frequency bins λ mean λ mean 1 (k) 2 (k) π mean π mean 1 (k) 2 (k) Exemplarily speech samples: Noisy DD WMM A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 10 / 10

Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs

Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach Department of Communications