Noise Compensation for Subspace Gaussian Mixture Models

Size: px

Start display at page:

Download "Noise Compensation for Subspace Gaussian Mixture Models"

Wilfrid Crawford
5 years ago
Views:

1 Noise ompensation for ubspace Gaussian Mixture Models Liang Lu University of Edinburgh Joint work with KK hin, A. Ghoshal and. enals Liang Lu, Interspeech, eptember, 2012

2 Outline Motivation ubspace GMM (GMM) works well in matched speech condition [Povey et al., 2011] In mismatched condition (i.e. noise), the gain disappears Goal Noise compensation for GMM Method Model space compensation Joint uncertainty decoding (JUD) [Liao and Gales, 2005] Liang Lu, Interspeech, eptember, 2012

3 HMM-GMM acoustic model j 1 j j +1 Liang Lu, Interspeech, eptember, 2012

4 ubspace Gaussian Mixture Models [Povey et al., 2011] wi Mi Σi i =1,...,I j 1 j j +1 v jk Global M i is the basis for means w i is the basis for weights Σ i is the covariance matrix tate-dependent v jk is low dimensional vectors (e.g. 40dim) Gaussian mean: µ jki = M i v jk Liang Lu, Interspeech, eptember, 2012

5 ubspace Gaussian Mixture Models More intuitively, suppose we have an acoustic space like this Liang Lu, Interspeech, eptember, 2012

6 ubspace Gaussian Mixture Models We then partition the whole acoustic space into I regions. his can be done by learning a GMM using the training data I 2 3 Liang Lu, Interspeech, eptember, 2012

7 ubspace Gaussian Mixture Models We then introduce some parameters to structure each region w i Σ i Mi 3 Σ i - model the covariance of this region M i - span the basis for Gaussian mean w i - span the basis for Gaussian weight Liang Lu, Interspeech, eptember, 2012

8 ubspace Gaussian Mixture Models Given a class with some data, such as an HMM state j 1 j j v jk 3 Liang Lu, Interspeech, eptember, 2012

9 ubspace Gaussian Mixture Models hen we learn a GMM for this class j 1 j j v jk 3 Liang Lu, Interspeech, eptember, 2012

10 Noise compensation Larger modelling power higher recognition accuracy. Our systems on Aurora 4, the #Gaussians is 6.4M (GMM), vs. 50k (GMM). GMM vs. GMM 5.2% vs. 7.7% on clean condition GMM vs. GMM 59.9% vs. 59.3% on noisy condition an we do noise compensation for GMMs? WE GMM clean GMM clean GMM noisy GMM noisy Liang Lu, Interspeech, eptember, 2012

11 Noise compensation here are numerous work on noise compensation for robust A [Deng, 2011] Feature domain pectral subtraction, cmn/cvn epstral mean square error estimation Algonquin plice Feature space vector aylor series (V) Model domain MLL, noise constraint MLL PM, Data-driven PM (DPM), iterative DPM V, joint uncertainty decoding (JUD) Linear spline interpolation (LI) Unscented transform (U) Hybrid Noise adaptive training Liang Lu, Interspeech, eptember, 2012

12 Noise compensation for GMM Model space compensation for GMM Not data-driven but using heuristic knowledge Mismatch function y = f (x, h, n, α) [Acero, 1990] α denotes the phase term between noise and speech [Deng et al., 2004]. lean speech x hanel noise h Noisy speech y Additive noise n Liang Lu, Interspeech, eptember, 2012

13 Noise compensation for GMM Model space compensation for GMM Not data-driven but using heuristic knowledge Mismatch function y = f (x, h, n, α) [Acero, 1990] α denotes the phase term between noise and speech [Deng et al., 2004]. lean speech x hanel noise h Noisy speech y Additive noise n Liang Lu, Interspeech, eptember, 2012

14 Noise compensation for GMM Model space compensation for GMM Not data-driven but using heuristic knowledge Mismatch function y = f (x, h, n, α) [Acero, 1990] α denotes the phase term between noise and speech [Deng et al., 2004]. lean speech x hanel noise h Noisy speech y Additive noise n Liang Lu, Interspeech, eptember, 2012

denotes the phase term between noise and speech [Deng et al., 2004].

15 Noise compensation for GMM Model space compensation for GMM Not data-driven but using heuristic knowledge Mismatch function y = f (x, h, n, α) [Acero, 1990] α denotes the phase term between noise and speech [Deng et al., 2004]. lean speech x hanel noise h Noisy speech y Additive noise n Liang Lu, Interspeech, eptember, 2012

16 Noise compensation for GMM he mismatch function is y = f (x, h, n, α) = x + h + log [1 + exp ( 1 (n x h) ) + 2α exp ( 1 (n x h)/2 ) ]. (1) }{{} phase term where be the D matrix. Liang Lu, Interspeech, eptember, 2012

17 Noise compensation Aim: estimate µ y and Σ y for each Gaussian component. Difficulty: y = f (x, h, n, α) is highly nonlinear, no analytic solution! olution: Vector aylor series (V) approximation [Moreno et al., 1996] ost: eal time factor > 100, memory > 10G for (medium size) GMM with 6.4M Gaussian Inelegant: Direct apply V will destroy the compact of structure of GMMs Liang Lu, Interspeech, eptember, 2012

18 Noise compensation Aim: estimate µ y and Σ y for each Gaussian component. Difficulty: y = f (x, h, n, α) is highly nonlinear, no analytic solution! olution: Vector aylor series (V) approximation [Moreno et al., 1996] ost: eal time factor > 100, memory > 10G for (medium size) GMM with 6.4M Gaussian Inelegant: Direct apply V will destroy the compact of structure of GMMs Liang Lu, Interspeech, eptember, 2012

19 Noise compensation Aim: estimate µ y and Σ y for each Gaussian component. Difficulty: y = f (x, h, n, α) is highly nonlinear, no analytic solution! olution: Vector aylor series (V) approximation [Moreno et al., 1996] ost: eal time factor > 100, memory > 10G for (medium size) GMM with 6.4M Gaussian Inelegant: Direct apply V will destroy the compact of structure of GMMs Liang Lu, Interspeech, eptember, 2012

20 Noise compensation Aim: estimate µ y and Σ y for each Gaussian component. Difficulty: y = f (x, h, n, α) is highly nonlinear, no analytic solution! olution: Vector aylor series (V) approximation [Moreno et al., 1996] ost: eal time factor > 100, memory > 10G for (medium size) GMM with 6.4M Gaussian Inelegant: Direct apply V will destroy the compact of structure of GMMs Liang Lu, Interspeech, eptember, 2012

21 Noise compensation Aim: estimate µ y and Σ y for each Gaussian component. Difficulty: y = f (x, h, n, α) is highly nonlinear, no analytic solution! olution: Vector aylor series (V) approximation [Moreno et al., 1996] ost: eal time factor > 100, memory > 10G for (medium size) GMM with 6.4M Gaussian Inelegant: Direct apply V will destroy the compact of structure of GMMs Liang Lu, Interspeech, eptember, 2012

22 Noise compensation olution: Joint uncertainty decoding (JUD) V JUD V vs. JUD Liang Lu, Interspeech, eptember, 2012

23 Noise compensation Applying JUD to GMM 1... I 2 3 ost: eal time factor 10 for GMM with 6.4M Gaussians Liang Lu, Interspeech, eptember, 2012

24 Experiments Database Aurora 4 dataset lean speech and noisy speech with N [5db - 15db] lose-talking microphone and desk-mounted microphone 15 hour training data 330 testing utterances ystem configuration 39dim MF #triphone states: 3.1k (GMM) vs. 3.9k (GMM) #Gaussians: 50k (GMM) vs. 6.4M (GMM) #regression classes: 112 (GMM) vs. 400 (GMM) Liang Lu, Interspeech, eptember, 2012

25 Noise compensation experiments GMM GMM GMM GMM GMM 10 0 Baseline JUD V Liang Lu, Interspeech, eptember, 2012

26 Experiments esults by tuning the value of phase factors V/GMM system JUD/GMM system JUD/GMM system Word Error ate (\%) he value of phase factor JUD/GMM system achieved 16.8% WE on Aurora 4 database Liang Lu, Interspeech, eptember, 2012

27 emarks he phase term is very effective for noise compensation imilar improvements were also observed in other studies, e.g. [Li et al., 2009] he reasons maybe it can compensate for the linearization bias and performs domain compensation [Li et al., 2009] Our insight is it may helps to avoid the over estimation of the noise model Liang Lu, Interspeech, eptember, 2012

28 emarks he phase term is very effective for noise compensation imilar improvements were also observed in other studies, e.g. [Li et al., 2009] he reasons maybe it can compensate for the linearization bias and performs domain compensation [Li et al., 2009] Our insight is it may helps to avoid the over estimation of the noise model Liang Lu, Interspeech, eptember, 2012

29 emarks he phase term is very effective for noise compensation imilar improvements were also observed in other studies, e.g. [Li et al., 2009] he reasons maybe it can compensate for the linearization bias and performs domain compensation [Li et al., 2009] Our insight is it may helps to avoid the over estimation of the noise model Liang Lu, Interspeech, eptember, 2012

30 emarks he phase term is very effective for noise compensation imilar improvements were also observed in other studies, e.g. [Li et al., 2009] he reasons maybe it can compensate for the linearization bias and performs domain compensation [Li et al., 2009] Our insight is it may helps to avoid the over estimation of the noise model Liang Lu, Interspeech, eptember, 2012

31 onclusion GMM is a promising alternative for acoustic modelling Noise compensation using JUD works well for GMMs he phase term is particular effective for the noise compensation Future works will be on noise adaptive training, compensation in log-spectral domain. Liang Lu, Interspeech, eptember, 2012

32 Liang Lu, Interspeech, eptember, 2012

33 Noise compensation With JUD, the marginal likelihood can be obtained as ( ) p(y m) A (r) N A (r) y + b (r) ; µ m, Σ m + Σ (r) b. (2) he transformation is done in the feature space, applied to each frame omputation is saved since that the #frame #Gaussians he transformation should be diagonalized in GMM systems, but not in GMM system since we used full covariance matrix Liang Lu, Interspeech, eptember, 2012

34 Experiments able: GMM systems with α = 0. Methods lean Avg lean model M model V JUD able: GMM systems with α = 0. Methods lean Avg lean model M model JUD Liang Lu, Interspeech, eptember, 2012

35 Acero, A. (1990). Acoustic and Enviromental obustness in Automatic peech ecognition. PhD thesis, arnegie Mellon University. Deng, L., Droppo, J., and Acero, A. (2004). Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE ransactions on peech and Audio Processing, 12(2): Droppo, J., Acero, A., and Deng, L. (2002). Uncertainty decoding with PLIE for noise robust speech recognition. In Proc. IAP. IEEE. Gales, M. (1995). Model-based techniques for noise robust speech recognition. PhD thesis, ambridge University. Liang Lu, Interspeech, eptember, 2012

36 Hu, Y. and Huo, Q. (2006). An HMM compensation approach using unscented transformation for noisy speech recognition. hinese poken Language Processing, pages Li, J., Deng, L., Yu, D., Gong, Y., and Acero, A. (2009). A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. omputer peech & Language, 23(3): Liao, H. and Gales, M. (2005). Joint uncertainty decoding for noise robust speech recognition. In Proc. INEPEEH. iteseer. Moreno, P., aj, B., and tern,. (1996). A vector aylor series approach for environment-independent speech recognition. In Proc. IAP, volume 2, pages IEEE. Liang Lu, Interspeech, eptember, 2012

37 Povey, D., Burget, L., Agarwal, M., Akyazi, P., Kai, F., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., astrow, A., ose,., chwarz, P., and homas,. (2011). he subspace Gaussian mixture model A structured model for speech recognition. omputer peech & Language, 25(2): Liang Lu, Interspeech, eptember, 2012

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 9, SEPTEMBER 2013 1791 Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models Liang Lu, Student Member, IEEE,