Triplet Lexicon Models for Statistical Machine Translation

Size: px

Start display at page:

Download "Triplet Lexicon Models for Statistical Machine Translation"

Elaine Lorraine Sparks
5 years ago
Views:

Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.

1 Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer CLSP Student Seminar February 6, 2009 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6 Computer Science Department RWTH Aachen University, Germany Ganitkevitch Triplet Lexicon Models for SMT 1 / 18 February 6, 2009

2 Introduction - Models in SMT Lexical translation models e.g. IBM model 1 (Brown et al.; 93) p(f J 1 ei 1 ) = 1 (I + 1) J J j=1 I p ibm1 (f j e i ) i=0 Phrase-based translation (Zens, Och; 02) Using segmentation S = ( f K 1, ẽk 1, ãk 1 ): Pr(f J 1 ei 1 ) = S Pr(S e I 1 ) Pr( f K 1 ẽk 1, ãk 1 )? day of time a suggest may I if wenn ich eine Uhrzeit vorschlagen darf? Language modeling: n-gram models Ganitkevitch Triplet Lexicon Models for SMT 2 / 18 February 6, 2009

3 Motivation & Related Work Most current approaches feature local context models.... the bill el proyecto de ley.... pay pagar.... Global context information could help with WSD, paraphrasing, stylistic coherence Approaches to broaden context: IBM model 1 WSD features for phrases (Carpuat, Wu; 07) WSD features for hierarchical rules (Chan, Ng, Chiang; 07) Ganitkevitch Triplet Lexicon Models for SMT 3 / 18 February 6, 2009

4 Triplet Model Extension of IBM model 1 by second trigger Capture word splits (e.g. verb-prefix) Train dependencies across phrases Lexical WSD with broad context Maximum likelihood training via EM algorithm Probablity for unconstrained triplet model p all : p all (f j e I 1 ) = 2 I(I 1) I i=1 I k=i+1 α all (f j e i, e k ). an mal einfach ich fange dann I will just start. Optional: min k i max Ganitkevitch Triplet Lexicon Models for SMT 4 / 18 February 6, 2009

5 Trigger Space For every f j sum over all (e, e ) in trigger space TS(j) TS(j) = { (e i, e k ) i, k {1,..., I}, i k } e I 1 e I 1 e' e Ganitkevitch Triplet Lexicon Models for SMT 5 / 18 February 6, 2009

6 Trigger Space Comparison Unconstrained triplet model p all (f e, e ): e' Context analogous to IBM1 Large lexica ( 4.6G triplets on EPPS) Long training times ( 9h per iteration on EPPS) e Unconstrained triplet model with max : e' Restricts e to local words around e Lexicon size reduced ( 2.2G for max = 10) Significatly shorter training time ( 3h on EPPS) e Ganitkevitch Triplet Lexicon Models for SMT 6 / 18 February 6, 2009

7 Trigger Space Comparison Phrase-bounded triplet model p phr (f e, e ): e' e~ Distinguishes local and distant context via forced alignments Significantly smaller lexica ( 1.3G on EPPS) Shorter training times ( 2.5h on EPPS) e~ e Path-aligned triplet model p align (f e, e ): e' Fixes e to pre-aligned words Small lexicon size ( 260M on EPPS) Short training times ( 1h on EPPS) e Ganitkevitch Triplet Lexicon Models for SMT 7 / 18 February 6, 2009

8 Examples - Lexicon Entries EPPS, TC-Star 2007, English-Spanish, p all & p align p all e e f p(f e, e ) paguen 0.52 bill countries agrario 0.36 países 0.08 p all bill taxpayer pagar 0.50 factura 0.30 contribuyente 0.16 p align bill taxpayer factura 1.00 draft ley 0.96 factura 0.60 agriculture ley 0.39 Ganitkevitch Triplet Lexicon Models for SMT 8 / 18 February 6, 2009

9 Corpora & Rescoring Models applied in n-best rescoring framework Corpora and n-best lists used in experiments TER Lang. Training sent. n-best lists 1-best Oracle IWSLT 2007 ChEn 43k 10k-best, PBT IWSLT 2008 ChEn 38k + 120k 10k-best, SysComb TC-Star 2007 EnEs 1.3M 10k-best, PBT Ganitkevitch Triplet Lexicon Models for SMT 9 / 18 February 6, 2009

10 Training Iterations - IWSLT 2007 ChEn Perplexity of training and development sets, p all fe, e 0, occ 1 IWSLT 2007 Chinese-English, test05 used as development set train PPL dev PPL dev TER perplexity TER EM iterations Ganitkevitch Triplet Lexicon Models for SMT 10 / 18 February 6, 2009

11 Histogram Pruning - IWSLT 2007 ChEn Histogram pruning & coverage: p all fe lexicon size 12M 10M 4M 2M lexicon size % of events covered coverage (%) TER dev TER occurence threshold Ganitkevitch Triplet Lexicon Models for SMT 11 / 18 February 6, 2009

12 Effect of Maximum Distance Constraint - EPPS EnEs TC-Star 2007, English-Spanish, PBT lists, oracle TER p align ef + fe, 10 EM iterations, using IBM4 word alignments dev07 test06 test07 BLEU TER BLEU TER BLEU TER baseline p align, occ max = max = max = Ganitkevitch Triplet Lexicon Models for SMT 12 / 18 February 6, 2009

13 Variant Comparison - EPPS EnEs TC-Star 2007, English-Spanish, PBT lists, oracle TER All ef + fe, 10 EM iterations dev07 test06 test07 BLEU TER BLEU TER BLEU TER Memory Time baseline IBM1 fe p align, max = 5, occ G 8.4h p phr, max =, occ G 24h p all, max = 10, occ G 28h Ganitkevitch Triplet Lexicon Models for SMT 13 / 18 February 6, 2009

14 Evaluation Results - IWSLT 2008 ChEn IWSLT 2008, Chinese-English, BTEC CRR System combination lists, oracle TER 20.13, optimized on test05 Using p all, ef +fe, no e 0, occ 2, 20 EM iterations, trained on additional HIT data test05 test08 BLEU TER BLEU WER baseline IBM p all NGram + WP + LM IBM p all Ganitkevitch Triplet Lexicon Models for SMT 14 / 18 February 6, 2009

15 Examples - Translation Improvements IWSLT 2008, Chinese-English, BTEC CRR source 我要靠近海德公园的酒店 reference I would like a hotel near the Hyde Park. baseline I would like Hyde Park Hotel. p all I would like a hotel close to Hyde Park. GALE 2008, Chinese-English, Newswire source 中国队下半场也换了 3 名球员, 效果则不佳 reference china also substituted three players in the second half... baseline the chinese team in the second half, have been changed for three... p all the chinese team also replaced three players in the second half... Ganitkevitch Triplet Lexicon Models for SMT 15 / 18 February 6, 2009

16 Summary Introduced triplet model p all (f e, e ) Large lexicon sizes Histogram pruning Constrained models p phr and p align Long training times Maximum distance constraint max EM training converges quickly Triplet model is competetive in rescoring on wide range of tasks Slight improvements over IBM1 ( BLEU) Ganitkevitch Triplet Lexicon Models for SMT 16 / 18 February 6, 2009

17 Outlook Incorporation into the decoder Explicitly restrict trigger positions: i < k for (f, e i, e k ) Train triplet model on word classes to reduce lexicon size: p all (f e, c e ) Introduce a trigger distance prior: p all (f e, e ) d(f, e, e ) Ganitkevitch Triplet Lexicon Models for SMT 17 / 18 February 6, 2009

18 Thank you for your attention Juri Ganitkevitch Ganitkevitch Triplet Lexicon Models for SMT 18 / 18 February 6, 2009

19 Literature Dempster, Laird; 77 A. P. Dempster, N. M. Laird, and D. B. Rubin: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, Vol. 39, No. 1, pp. 1 22, Brown, Mercer; 93 Brown, Della Pietra, Della Pietra, Mercer (IBM Research): The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, Vol. 19, No. 2, pp , Rosenfeld; 96 Rosenfeld (CMU): A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer, Speech and Language, Vol. 10, pp , Tillmann, Ney; 97 Tillmann, Ney (RWTH Aachen): Word Triggers and the EM Algorithm. Proc. of SIG Workshop on Computational Natural Language Learning (ACL), pp , Zens, Och; 02 Zens, Och, Ney (RWTH Aachen): Phrase-Based Statistical Machine Translation. German Conf. on Artificial Intelligence, pp , September Ganitkevitch Triplet Lexicon Models for SMT 19 / 18 February 6, 2009

20 Literature Och, Ney; 04 Och, Ney (RWTH Aachen): The alignment template approach to statistical machine translation. Computational Linguistics, Vol. 30, No. 4, pp , Carpuat, Wu; 07 Carpuat, Wu (HKU): Improving Statistical Machine Translation using Word Sense Disambiguation. Proc. of 2007 Joint Conference on Empirical Methods in NLP and CNLL, pp , Chan, Ng, Chiang; 07 Chan, Ng, Chiang (University of Singapore, ISI): Word Sense Disambiguation improves Statistical Machine Translation. Proc. of the 45th Annual Meeting of the ACL, pp , Hasan et al.; 08 S. Hasan, J. Ganitkevitch, H. Ney and J. Andrés Ferrer (RWTH Aachen): Triplet Lexicon Models for Statistical Machine Translation. Proc. of the Conference on Empirical Methods in Natural Language Processing, pp , Ganitkevitch Triplet Lexicon Models for SMT 20 / 18 February 6, 2009

21 Triplet Model Equations Unconstrained triplet model: p all (f j e I 1 ) = 2 I(I 1) I I α all (f j e i, e k ) i=1 k=i+1 Ganitkevitch Triplet Lexicon Models for SMT 21 / 18 February 6, 2009

22 Triplet Model Equations For a given sentence pair and word alignment (f J 1, ei 1, A) Let a ij = 1 f j aligned to e i, otherwise a ij = 0 A = {a ij } Path-aligned triplet model: p align (f j e I 1, A) = 1 Z j I I a ij (1 δ(i, i )) α align (f j e i, e i ), i=1 i =1 Ganitkevitch Triplet Lexicon Models for SMT 22 / 18 February 6, 2009

23 Triplet Model Equations For a given sentence pair and segmentation (f J 1, ei 1, sm 1 ) s m = ( f m, ẽ m ) Let π ij = 1 m : f j f m e i ẽ m, otherwise π ij = 0 Π = {π ij } Phrase-bounded triplet model: p phr (f j e I 1, Π) = 1 Z j I I π ij (1 π i j)α phr (f j e i, e i ), i=1 i =1 Ganitkevitch Triplet Lexicon Models for SMT 23 / 18 February 6, 2009

24 Evaluation Results - IWSLT 2008 ArEn IWSLT 2008, Arabic-English, BTEC CRR Using p all, ef + fe, no e 0, occ 2, 20 EM iterations 23k training sentences, only LM uses additional data test05 test08 BLEU TER WER BLEU WER baseline IBM1 fe p all p all + 6-gramLM + WP Ganitkevitch Triplet Lexicon Models for SMT 24 / 18 February 6, 2009

25 Examples - Lexicon Entries EPPS, TC-Star 2007, English-Spanish, IBM model 1 e f p(f e) factura 0.19 IBM1 bill ley 0.18 proyecto 0.11 pagar 0.07 Ganitkevitch Triplet Lexicon Models for SMT 25 / 18 February 6, 2009

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology