Full-covariance model compensation for

Size: px

Start display at page:

Download "Full-covariance model compensation for"

Franklin Stevens
5 years ago
Views:

1 compensation transms Presentation Toshiba, 12 Mar 2008

2 Outline compensation transms compensation transms

3 Outline compensation transms compensation transms

4 Noise model x clean speech; n additive ; h convolutional ; -currupted speech y is y[k] = x[k] h[k] + n[k] In the mel-cepstral domain, static coefficients ( s ), y s = x s + h + C log ( 1 + exp ( C 1 (n s x s h) )) = x s + f (x s, n s, h). compensation transms Noise model M n = {µ n, Σ n, h}: the additive n is Gaussian; the convolutional h is stationary; they are ML-estimated [1].

5 Vector Taylor series Approximate with a first-order vector Taylor series expansion [2, 3]: y s = x s + f (x s, n s, h) x s + f (x s 0, n s 0, h) + (x s x s 0) f x s + (ns n s 0) f n s compensate static µ s y : compensation transms µ s y = E{y s } µ s x + f (µ s x, µ s n, h) similar Σ.

6 DPMC Data-driven parallel model combination [4] A Monte Carlo method. Draw N samples from distributions x s, n s, h; apply y s = x s + f (x s, n s, h) compensation transms estimate y s N ( µ s y, Σ s y ). As N, DPMC compensation becomes optimal.

7 Joint uncertainty decoding Joint uncertainty decoding finds compensation p(y t m) N (A (r) y t + b (r) ; µ (m), Σ (m) + Σ (r) bias ) from the joint distribution of the clean speech x and -corrupted speech y regression class r [ ] ([ ] [ ]) x µ x (r) Σ (r) x Σ xy (r) N y µ y (r), Σ yx (r) Σ y (r). compensation transms Normally, VTS is used to find the joint distribution. But DPMC can also be used!

8 Outline compensation transms compensation transms

9 Outline compensation transms compensation transms

10 Dynamic parameter compensation with VTS VTS compensation uses the continuous time approximation [5] delta and delta-delta parameters: so that y ys t µ y = E{y } ys x s µ x. compensation transms Attractive because simple.

11 Dynamic parameter compensation with DPMC are linear combinations of consecutive observations. If we knew y s t w... y s t+w, compensation could be exact: [ y s t y t ] = [ = A y s t 1 y s t y s t+1. ] y s t 1 y s t y s t+1 compensation transms

12 Dynamic parameter compensation with DPMC Draw N samples from distributions x s t w n s t w h.,.,. ; h estimate convert to x s t+w y s y y 2 y s t w. y s t+w N n s t+w N (µ y, Σ y ) ; ( Aµ y, AΣ y A T ). compensation transms

13 Dynamic parameter compensation Average KL divergence per feature dimension of components to a single-pass retrained system: KL divergence Uncompensated VTS DPMC compensation transms Dimension DPMC is better at model-based compensation than VTS.

14 Outline compensation transms compensation transms

15 compensation transms Different correlations in different conditions. Thanks to Hank Liao the figure and Toshiba Research Ltd. the data.

16 Compensating correlations System: Resource Management with artificial ; single-component system robustness; known profile with diagonal Σ n. Word error rates: Compensation Diagonal Full Uncompensated 38.2 % 64.6 % VTS (diagonal statistics) 15.5 % 18.1 % VTS (block statistics) 14.4 % 15.5 % DPMC (full statistics) 13.0 % 10.8 % Single-pass retrained 12.4 % 7.5 % compensation transms DPMC can compensate correlations. VTS cannot.

17 Outline compensation transms compensation transms

18 Joint Uncertainty Decoding with DPMC Joint uncertainty decoding currently uses VTS to find joint distribution [ ] ([ ] [ ]) x µ x (r) Σ (r) x Σ xy (r) N y µ y (r), Σ yx (r) Σ y (r). But it can use DPMC. compensation transms

19 Joint uncertainty decoding with DPMC Draw N samples from distributions x s t w n s t w h.,.,. ; h estimate convert to [ x y x s t+w x s t w. x s t+w y s t w. y s t+w ] N n s t+w N (µ x, y, Σ x, y ) ; ([ µx µ y ] [ Σx Σ, xy Σ yx Σ y ]). compensation transms

20 semi-tied covariance matrices semi-tied covariance matrices [6] convert joint uncertainty decoding to: p(y t m) N (ỹ t ; µ (m), Σ (m) + Σ (r) bias ) Σ (r) bias is full; but Σ (m) diag is diagonal; so that decoding is faster. N (A (r) ỹ t ; A (r) µ (m), Σ (m) diag ) compensation transms

21 Putting it together Recognition hypothesis Noise estimate µ n, Σ n, µ h JUD transm A (r) jnt, b(r) jnt, Σ(r) bias compensation transms Semi-tied covariance matrices A (r), Σ (m) diag

22 Outline compensation transms compensation transms

23 Resource Management corpus (1000 word vocabulary); artificial at 20 db; 9.5K components; 12 MFCCs, 0th coefficient, delta, acceleration; 16 regression classes; estimated profile [1] per speaker (other than SPR). compensation transms

24 Joint uncertainty decoding Joint m Transm WER VTS Diagonal 9.5 % DPMC Diagonal 8.6 % DPMC Full 6.8 % DPMC-based joint outperms VTS-based joint. Full compensation is better than diagonal. compensation transms

25 Overview Compensation WER Clean 38.0 % VTS, diagonal 8.5 % DPMC-joint, full 6.8 % semi-tied 6.8 % SPR joint 7.4 % SPR semi-tied 6.7 % Speaker-specific transm except single-pass retrained. semi-tied is faster to decode than Joint without the loss of accuracy. compensation transms

26 Outline compensation transms compensation transms

27 DPMC compensation outperms VTS: models dynamic parameters better; can model correlations; but is slower: VTS m required. DPMC-joint outperms VTS-joint. With predictive semi-tied covariance matrices decoding is fast. Future work: Faster method than DPMC; more accurate m the model. compensation transms

28 References I H. Liao and M. J. F. Gales, Joint uncertainty decoding robust large vocabulary, Cambridge University Engineering Department, Tech. Rep. cued/f-infeng/tr.552, November Appendix References P. J. Moreno, Speech recognition in noisy environments, Ph.D. dissertation, Carnegie Mellon University, A. Acero, L. Deng, T. Kristjansson, and J. Zhang, Hmm adaptation using vector Taylor series noisy, in Proceedings of the International Conference on Spoken Language Processing, vol. 3, 2000, pp M. J. F. Gales, -based techniques robust, Ph.D. dissertation, Cambridge University, 1995.

29 References II R. A. Gopinath, M. J. F. Gales, P. S. Gopalakrishnan, S. Balakrishnan-Aiyer, and M. A. Picheny, Robust in - permance of the IBM continuous speech recognizer on the ARPA spoke task, in Proceedings of the ARPA Workshop on Spoken Language System Technology, 1999, pp Appendix References M. J. F. Gales and R. C. van Dalen, linear transms robust, in Proceedings of the Automatic Speech Recognition and Understanding Workshop, 2007, pp

Model-Based Approaches to Robust Speech Recognition

Model-Based Approaches to Robust Speech Recognition Mark Gales with Hank Liao, Rogier van Dalen, Chris Longworth (work partly funded by Toshiba Research Europe Ltd) 11 June 2008 King s College London Seminar