Enhancement of Noisy Speech. State-of-the-Art and Perspectives

Size: px

Start display at page:

Download "Enhancement of Noisy Speech. State-of-the-Art and Perspectives"

Paul Shepherd
5 years ago
Views:

1 Enhancement of Noisy Speech State-of-the-Art and Perspectives Rainer Martin Institute of Communications Technology (IFN) Technical University of Braunschweig July, 2003

2 Applications of Noise Reduction Hands-free telephony. Robust speech recognition. Robust speech coding (ETSI/3GPP AMR, MELPe, ITU-T 4 kbit/s codecs). Hearing aids and cochlear implants. Restoration of historic recordings. Forensic applications. 2

3 Ingredients Models of speech production Signal theory Room acoustics Psychoacoustics Models of speech perception Objective: Improve quality and intelligibility! Combine signal theoretic and perceptive approaches! 3

4 Noise Reduction in the Spectral Domain Spectral analysis noise reduction synthesis: segmentation analysis t DFT noise reduction IDFT synthesis overlap/add t Advantages of spectral processing: good separation of speech and noise decorrelation of spectral components integration of psychoacoustic models 4

5 ements Principles of Noise Reduction λ frame index k frequency bin index a priori knowledge D F T Y (λ, Ω k ) estimation of speech coefficients estimation of y(i) = noise power echnung s(i) + n(i) spectral density P nn (λ, Ω k ) Ŝ(λ, Ω k ) I D F T ŝ(i) ledge a priori knowledge 5

6 Principles of Noise Reduction lacements ise spectrum ed spectrum power / db noisy signal spectrum noisy car noise signal spectrum noisy enhanced car noise signal spectrum Frequency / Hz 6

7 Estimation of Speech Coefficients Linear estimators e.g. Wiener Filter Non-linear estimators MMSE Short Time Spectral Amplitude estimator [Ephraim & Malah, 1984, 1985] Psychoacoustic methods [Gustafsson et al. 1998] MMSE estimation based on supergaussian priors [Martin 2002] 7

8 MMSE Estimation Optimal estimate for independent real and imaginary parts: E{S Y } = E{S R Y R } + je{s I Y I } Estimation of either the real or the imaginary part: E{S Y } = Application of Bayes theorem: E{S Y } = 1 p(y ) S p(s Y )ds S p(y S )p(s )ds What is the appropriate prior density p(s )? 8

9 Some Answers and Some Questions DFT coefficients are asymptotically complex Gaussian distributed! [Brillinger, 1981] Typical frame size in mobile communications: ms < span of correlation of (voiced) speech! Do the asymptotic assumptions hold for speech signals??? No! See, e.g., [Porter and Boll, 1984]. 9

10 Prior Densities for Real and Imaginary Part Gaussian pdf: p(s ) = 1 πσs exp ( ) S2 σs 2 Wiener filter Laplacian pdf: p(s ) = 1 σ s exp ( 2 S ) σ s Gamma pdf: p(s ) = 4 ) πσ s 2 S 1 3 S 2 exp ( 2σs 10

11 Histogram of DFT Coefficients for Speech histogram, pdf dotted: Gaussian pdf dashed: Laplacian pdf solid: Gamma pdf S R 11

12 Histogram of Speech Coefficients (enlarged) 20 histogram, pdf dotted: Gaussian pdf dashed: Laplacian pdf solid: Gamma pdf S R 12

13 Histogram of DFT Coefficients for Car Noise histogram, pdf N R dotted: Gaussian pdf dashed: Laplacian pdf 13

14 Histogram of Car Coefficients (enlarged) 10 8 histogram, pdf N R dotted: Gaussian pdf dashed: Laplacian pdf 14

15 Non-linear MMSE Estimator frag replacements E{SR YR} Gamma speech pdf Wiener filter 10 log( σ2 s σ 2 n ) = +15 db 0 db 1 10 db Y R Laplacian Noise and Gamma Speech Prior σ 2 s + σ 2 n = 2 15

16 Segmental SNR Improvement (White Noise) seg. SNR after 10 g replacements enhancement 5 0 Laplace/Laplace seg. SNR before enhancement Gamma/Gauß Wiener no enhancement 16

17 Relative Improvement w.r.t. Wiener Filter g replacements segmental SNR of input signal 17

18 Background Noise PSD Estimation Methods: Voice activity detection; Soft-decision methods; Biased compensated tracking of spectral ima [Martin 1994, 2001] Assumptions: Speech and noise are statistically independent; Speech is not always present; Noise is more stationary than speech. 18

19 Minimum Statistics: Basic Principle periodogram (frequency bin k=25) smoothed periodogram (k=25) imum of smoothed periodogram db 60 rag replacements ogram (k=25) d periodogram frame index 19

20 Minimum Statistics: Bias cements n error probability density function mean error smoothed periodogram imum of D = 40 values x

21 150 = 256 Mean of Minimum E{imum} PSfrag replacements Q eq = Q eq = Q 140 = 32 D Q eq = 512 Q eq = 128 Q eq = 64 Q eq = 32 Q eq = 8 Q eq = 4 21 D: length of imum search window Q eq = 1/var{P (λ, Ω k )} norm

22 PSfrag replacements Minimum Statistics: What s New? Minimum Statistic, version 1994 fixed smoothing parameter α fixed bias compensation Minimum Statistic, version 2001 signal dependent optimal smoothing signal dependent bias compensation fast imum update

23 56 PSfrag replacements Minimum 56 Statistics (version 2001) Q 90 = 2 Q 80 = db 50 requency bin k=25) eriodogram (k=25) 40 othed periodogram db 30 frame index periodogram (frequency bin k=25) smoothed periodogram (k=25) imum of smoothed periodogram 56 Estimation of noise power spectral density without voice activity detection! frame index 23 56

24 Relative Estimation Error Speech pause: PSfrag replacements Algorithms white noise vehicular noise street noise MinStat 1994 ( α = 0.6) (0.11) (0.13) (0.21) MinStat (0.041) (0.041) (0.13) 56 Algorithms white noise vehicular noise street noise MinStat (0.14) 0.02 (0.17) (0.28) (in parentheses: variance of estimation error) Speech activity (3 without speech pauses): MinStat 1994 (α = 0.6) 0.64 (0.77) 0.77 (1.04) 0.59 (1.9) 24 56

25 PSfrag replacements Two Channel Noise Reduction x 1 (k) x 2 (k) T adaptive time delay estimation preem- phasis T h1 + - T H T H h y hppre1 (k) h1 + h w y hppre2 (k) 56 preem- phasis deem- phasis

26 ements PSfrag replacements Coherence of Noise (Diffuse Sound Field) The complex coherence γ x1 x 2 (Ω) of two signals x 1 (k) Q = and 2 x 2 (k) is defined as Φ x1 x γ x1 x 2 (Ω) = 2 (e jω ) Φx1 x 1 (e jω ) Φ x2 x 2 (e jω ). = 128 = = 512 γ x1 x (f) 2 2 d = 10 cm d = 20 cm d = 40 cm 0.5 d = 60 cm = = f 4000 = 512 khz 26

27 Coherence of Speech in a Car db γx 1 x 2 (f) PSfrag replacements f/hz power spectral density Coherence f/hz 56

28 = 128 = 256 = 512 Two Channel Noise Reduction ŝ prompt memory noise reduction 2 microphones d mic = 0.4 m s 1 + n 1 s 2 + n 2 PSfrag replacements n 28 s 56 56

29 6 = 128 = 256 = 512 PSfrag replacements First-Order Differential Microfone 5 F A A? D, A = O 56 Y (jω) = S(jω)e jω ( d 2c cos(α) ) [ 1 e jω d c(cos(α)+ ct d ) ] Y (jω) S(jω) = 2 sin ( ωd 2c 29 ( cos(α) + ct d 56 - G K = E = J E ))

30 PSfrag replacements PSfrag replacements Directivity Patterns (d m, f = 1kHz ) 56 PSfrag replacements Q = 256 Q Q = = Dipole ( Tc/d = 0), f = 1000 Hz PSfrag replacements 5dB 10dB 15dB 90 0dB Azimuth angle in degrees Q 210 = 32 PSfrag replacements Q = 256 Q Q = = dB 120 5dB Hyper Cardioid ( Tc/d = 0.34), f = 1000 Hz dB 15dB Azimuth angle in degrees Q 210 = dB 5dB 10dB 15dB Cardioid ( Tc/d = 1), f = 1000 Hz Azimuth angle in degrees dB 15dB dB 5dB Q 120= Super Cardioid ( Tc/d = 0.57), f = 1000 Hz Azimuth angle in degrees

31 PSfrag replacements Delay-and-Sum Beamformer source s(k) θ y 1 (k) y 2 (k) y 3 (k) y N (k) T 1 T 2 T 3 T N noise n l (k) i.i.d. noise: Gain G = 10 log(n) ỹ 1 (k) ỹ 2 (k) ỹ 3 (k) ỹ N (k) ŷ(k)

32 PSfrag replacements Design of Fixed Beamformers with MATLAB

33 Directivity Pattern PSfrag replacements

34 = 256 = 512 PSfrag replacements Arrays for Speech Acquisition in Cars = 128 = 256 = cm 4 cm 4 cm 5 cm 5.25 cm Y 34 X microphones 1, 2, 3, 4, 5 linear array microphones 1, 2, 7, 4, 5 planar array [Martin et al. 2001] 56 6

35 PSfrag replacements Delay-and-sum vs. Superdirective Arrays gain [db] superdirective delay-and-sum frequency [Hz] 56

36 PSfrag replacements Linear and Planar Microphone Arrays gain [db] planar, superdirective linear, superdirective frequency [Hz] 56

37 = 128 = 256 = 512 = 128 = 256 = 512 Adaptive Beamformer (GSC) N N N. E N * A = B H A H *? E C = J H E N, A = O PSfrag replacements K J E? D = A = F J E L A E I A O + =? A A H I

38 Conclusions PSfrag replacements Find better ways to exploit statistics of signals! Incorporate models of speech production Develop better background noise estimation methods Design algorithms for high quality and intelligibility 56 Exploit spatial selectivity using multiple microphones Understand processing in the auditoryq system: = 8 Enhance perceptionally important features Use perceptive models to reduce complexity of algorithms 56 38

39 Selected References PSfrag replacements

New Statistical Model for the Enhancement of Noisy Speech

New Statistical Model for the Enhancement of Noisy Speech Electrical Engineering Department Technion - Israel Institute of Technology February 22, 27 Outline Problem Formulation and Motivation 1 Problem