Statistical Phrase-Based Speech Translation Lambert Mathias 1 William Byrne 2 1 Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University 2 Machine Intelligence Laboratory Departent of Engineering Cambridge University May 5 2006 / CLSP Student Seminar
Outline 1 2 3 4
Outline 1 2 3 4
Outline 1 2 3 4
Outline 1 2 3 4
Outline 1 2 3 4
A model based approach to translation is easy to formulate Target Speech Target Sentence Source Sentence A t J 1 s I 1 P(A t J 1) P(t J 1 s I 1) P(s I 1)
Serial Architecture Recognition t b 1 J = argmax P(A t1)p(t J 1) J t 1 J Translation s b 1 I = argmax P( t bj 1 si 1) P(s1) I s 1 I Integrated architecture j ff bs 1 I = argmax P(s1){max I P(t J t 1 J 1 s1) I P(A t1)} I s I 1 Given the ASR models and the translation models, speech translation is easy to do!
Serial Architecture Recognition t b 1 J = argmax P(A t1)p(t J 1) J t 1 J Translation s b 1 I = argmax P( t bj 1 si 1) P(s1) I s 1 I Integrated architecture j ff bs 1 I = argmax P(s1){max I P(t J t 1 J 1 s1) I P(A t1)} I s I 1 Given the ASR models and the translation models, speech translation is easy to do!
Serial Architecture Recognition t b 1 J = argmax P(A t1)p(t J 1) J t 1 J Translation s b 1 I = argmax P( t bj 1 si 1) P(s1) I s 1 I Integrated architecture j ff bs 1 I = argmax P(s1){max I P(t J t 1 J 1 s1) I P(A t1)} I s I 1 Given the ASR models and the translation models, speech translation is easy to do!
Recovering from ASR errors Translating alternative hypotheses Processing the monolingual information available on the target side Correcting disfluencies on the target side Coupling ASR to MT 1-best transcription N-best lists (R.Zhang et al 2004, V.H. Quan et al 2005) Word graphs (S. Saleem et al 2004, E. Matusov et al 2005) Key Idea Maximum ASR signal transfer to the translation component
Recovering from ASR errors Translating alternative hypotheses Processing the monolingual information available on the target side Correcting disfluencies on the target side Coupling ASR to MT 1-best transcription N-best lists (R.Zhang et al 2004, V.H. Quan et al 2005) Word graphs (S. Saleem et al 2004, E. Matusov et al 2005) Key Idea Maximum ASR signal transfer to the translation component
Recovering from ASR errors Translating alternative hypotheses Processing the monolingual information available on the target side Correcting disfluencies on the target side Coupling ASR to MT 1-best transcription N-best lists (R.Zhang et al 2004, V.H. Quan et al 2005) Word graphs (S. Saleem et al 2004, E. Matusov et al 2005) Key Idea Maximum ASR signal transfer to the translation component
Outline 1 2 3 4
Target Speech Target Sentence Target Phrase Source Phrase Source Sentence A t J 1 v R 1 u K 1 s I 1 Models P(A t J 1 ) P(t J 1 vr 1 ) P(v R 1 uk 1 ) P(u K 1 si 1 ) P(s I 1 ) FSMs L Ω Φ W G ASR Word Lattice Target Phrase Segmentation Transducer Phrase Translation, Reordering Transducer Source Phrase Segmentation Transducer Source Language Model The final translation is given by j bs 1 I = argmax max P(A t1) J v 1 R,uK 1,K {z } s I 1 max t J 1 L ASR Word Lattice P(t J 1, v R 1, u K 1 s I 1) P(s I 1) {z } Translation Model ff.
The translation is from a lattice of phrase sequences Target phrase segmentation: P(A v R 1 ) = P(v R 1 tj 1 ) P(A tj 1 ) Corrresponding FSM Q = Ω L Acoustic scores retained during target segmentation Implemented as a best-path search through the translation FSM T T = G W Φ Q Simple formulation with minimal changes to existing models We are now ready to translate speech!
The translation is from a lattice of phrase sequences Target phrase segmentation: P(A v R 1 ) = P(v R 1 tj 1 ) P(A tj 1 ) Corrresponding FSM Q = Ω L Acoustic scores retained during target segmentation Implemented as a best-path search through the translation FSM T T = G W Φ Q Simple formulation with minimal changes to existing models We are now ready to translate speech!
The translation is from a lattice of phrase sequences Target phrase segmentation: P(A v R 1 ) = P(v R 1 tj 1 ) P(A tj 1 ) Corrresponding FSM Q = Ω L Acoustic scores retained during target segmentation Implemented as a best-path search through the translation FSM T T = G W Φ Q Simple formulation with minimal changes to existing models We are now ready to translate speech!
NOTE Original Problem: How to translate ASR word lattices? New Problem: How to efficiently extract phrases from ASR lattices? Phrases are extracted using the GRM Library Controlling ambiguity in phrase extraction Pruning the ASR word lattice Extract phrases under the posterior distribution P Q = P(v1 R A) = t 1 J P(v1 R tj 1 ) P(A tj 1 ) P(tJ 1 ) P(A) The target LM P(t J 1) does not show up in the original formulation!
NOTE Original Problem: How to translate ASR word lattices? New Problem: How to efficiently extract phrases from ASR lattices? Phrases are extracted using the GRM Library Controlling ambiguity in phrase extraction Pruning the ASR word lattice Extract phrases under the posterior distribution P Q = P(v1 R A) = t 1 J P(v1 R tj 1 ) P(A tj 1 ) P(tJ 1 ) P(A) The target LM P(t J 1) does not show up in the original formulation!
NOTE Original Problem: How to translate ASR word lattices? New Problem: How to efficiently extract phrases from ASR lattices? Phrases are extracted using the GRM Library Controlling ambiguity in phrase extraction Pruning the ASR word lattice Extract phrases under the posterior distribution P Q = P(v1 R A) = t 1 J P(v1 R tj 1 ) P(A tj 1 ) P(tJ 1 ) P(A) The target LM P(t J 1) does not show up in the original formulation!
Well-formedness of target sentence in text-based MT Weak translation models Need to choose the right t J 1 from a set of hypotheses Experimentally shown to improve translation quality 1 Need to correctly incorporate the target LM (In progress...) 1 D. Dechelotte, H. Schwenk an Jean-Luc Gauvain, Olivier Galibert, and Lori Lamel. Investigating Translation of Parliament Speeches. ASRU, 2005
A neat trick to include the target LM P(t J 1) 2 j ff bs 1 I = argmax maxp(t1, J s1) I P(A t1) I t 1 J s I 1 P(t J 1, s I 1) = JY j=1 P(t j, s j t j 1 j m, g s j 1 j m ) Involves a complicated procedure in order to estimate the m-gram tuple based model Considers only a single segmentation of the parallel text So what s new in our framework Unified modeling framework of the underlying ASR and SMT system Different model parameterization Direct extension of the text based MT system, straightforward implementation 2 E Matusov, S Kanthak, H Ney. On the integration of speech recognition and statistical machine translation, InterSpeech, 2005.
A neat trick to include the target LM P(t J 1) 2 j ff bs 1 I = argmax maxp(t1, J s1) I P(A t1) I t 1 J s I 1 P(t J 1, s I 1) = JY j=1 P(t j, s j t j 1 j m, g s j 1 j m ) Involves a complicated procedure in order to estimate the m-gram tuple based model Considers only a single segmentation of the parallel text So what s new in our framework Unified modeling framework of the underlying ASR and SMT system Different model parameterization Direct extension of the text based MT system, straightforward implementation 2 E Matusov, S Kanthak, H Ney. On the integration of speech recognition and statistical machine translation, InterSpeech, 2005.
Outline 1 2 3 4
OpenLab 2006 EPPS Spanish to English task, 2005 TC-STAR evaluation data Spanish Source DEV EVAL Monotone Verbatim Transcription - 44.16 Phrase ASR 1-Best 44.48 39.71 Order ASR lattice 44.74 39.93 How many new foreign phrases does speech translation introduce? Verbatim ASR ASR transcription 1-best pruned #Spanish 58438 96991 163065 phrases How many new foreign phrases were found in bitext? Verbatim ASR ASR transcription 1-best pruned #Spanish 24511 24481 33138 phrases
OpenLab 2006 EPPS Spanish to English task, 2005 TC-STAR evaluation data Spanish Source DEV EVAL Monotone Verbatim Transcription - 44.16 Phrase ASR 1-Best 44.48 39.71 Order ASR lattice 44.74 39.93 How many new foreign phrases does speech translation introduce? Verbatim ASR ASR transcription 1-best pruned #Spanish 58438 96991 163065 phrases How many new foreign phrases were found in bitext? Verbatim ASR ASR transcription 1-best pruned #Spanish 24511 24481 33138 phrases
OpenLab 2006 EPPS Spanish to English task, 2005 TC-STAR evaluation data Spanish Source DEV EVAL Monotone Verbatim Transcription - 44.16 Phrase ASR 1-Best 44.48 39.71 Order ASR lattice 44.74 39.93 How many new foreign phrases does speech translation introduce? Verbatim ASR ASR transcription 1-best pruned #Spanish 58438 96991 163065 phrases How many new foreign phrases were found in bitext? Verbatim ASR ASR transcription 1-best pruned #Spanish 24511 24481 33138 phrases
Outline 1 2 3 4
Presented a generative model of speech-to-text translation 3 Tight coupling of the ASR and SMT models via word lattices Initial results on speech translation a little dissapointing Modeling problems: Proper integration of the target LM Phrases extracted from ASR lattices are not in bitext ASR errors Disfluencies, silences and other spoken language phenomena ASR output error-correction disfluency removal, inserting phrase boundaries, SU detection etc... 3 L. Mathias and W. Byrne. Statistical phrase-based speech translation. ICASSP, 2006