Multiple System Combination. Jinhua Du CNGL July 23, 2008

Size: px

Start display at page:

Download "Multiple System Combination. Jinhua Du CNGL July 23, 2008"

Charla Watts
5 years ago
Views:

1 Multiple System Combination Jinhua Du CNGL July 23, 2008

2 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments & Analysis Conclusions & Future Work

3 System Combination Introduction Motivation many kinds of SMT methods are presented Rulebased Examplebased MT Syntax-based Phrase-based Hierarchical PB String-to-tree Tree-to-string Tree-to-tree

4 Motivation (cont.) decoding process is an optimal search with risk (pruning) different translation model, different characteristics in translation hypothesis combine different and useful features to get a better performance Phrase-based: the continuity of N-gram Syntax-based: more syntactic structure others: different characteristics combination: MBR decoding & CN decoding Hewavitharana (CMU): In 2005IWSLT Eval, ROVER was used Matusov: Using GIZA++ to align the hypotheses and construct a WTN to generate new result Sim (CU): Using TER metric to align N-best and build CN to generate new hyp Rosti (BBN): 1)3 levels combination 2)Improved word-level combination

5 Current achievements In 2006 NIST MT Eval, ISI first used combination Rank 1: C2E Large Data Track: The first time to overtake Google In 2008 NIST MT Eval, most of the organizations used combination (ISI, Google, Microsoft, SRI, BBN, ICT, IA etc.) Rank 1: MSR- combined the results of MSR, NRC, SRI IA (rank 13): combined 3 systems-pb,hpb & SBMT In 2008 IWSLT Eval, NLPR in IA performed well by using combination techniques Systems: PB(2), HPB, SAMT(CMU),BTG-based SMT (Xiong), Mose Current combination algorithms are robust, but new methods are also under way

6 Combination Strategies Sentence-level A kind of n-best re-ranking by nature The key differences: system confidence & hypothesis confidence Log-linear model No new hypothesis is generated Phrase-level Extracting new phrase table from each system s target-to-source phrase alignments Estimate the phrase confidence Add the phrase confidence as a feature to Log-linear model and re-decoding New hypothesis is generated Word-level The dominant combination strategy Derived from ASR Key techniques: MBR, Confusion Network

7 Key Techniques in Word-level Combination Confusion Network Principle: directed acyclic graph Hyps: May I have a table? Could I get a desk? Can I have the table? Alignment: A skeleton Alignment metrics

8 Key Techniques in Word-level Combination Decoding: Voting: With LM: Log-linear Model: Minimum Bayes-Risk Decoding L iˆ = (arg max P( w )) = arg max( N( w ) / N ) r j, k j, k j j= 1 k j= 1 k j= 1 Minimize the expectation loss of translation errors to select a best hypothesis as the alignment skeleton Loss Function: 4-gram BLEU L L iˆ = arg max Pr( wj) P( wj / wj 1... w1) λ N 1 j s log pe ( F) = log( λ pwli (, )) + νle ( ) N jn, j l jn, i= 1 l= 1 + γpl( E ) + μn ( E ) + ξn ( E ) j, n nulls j, n words j, n MBR: L ( E, E') = 1- BLEU( E, E') BLEU R( E, A ) = E [ L(( E, A),( E, A ))] P( E, A, F)

9 Key Techniques in Word-level Combination Hypothesis posterior estimation In fact, for the computation convenience, we have two methods to estimate this: Uniform Distribution Adjust the weight according to system confidence = = N i i j j F E P F E P F E P 1,..., ), ( ), ( ) / ( 1 ( / ) j PE F N = 1 1 ( / ) * 1 j j N j j PE F N λ λ = = =

10 Key Techniques in Word-level Combination Alignment Metrics WER(Word Error Rate) Ins + Del + Sub WER( E, Er ) = 100% N TER(Translation Error Rate) Ins + Del + Sub + Shft TER( E, Er ) = 100% N GIZA-TER(improved) Extract a target-to-target vocabulary from N-best hypothesis GIZA++ -> target-to-target alignment and then re-ordering Using TER to build the CN

11 An example: use GIZA-TER to build CN Original hypotheses Alignment Pair by GIZA++ Original WER: Phrase Shift according to Alignment Pair WER after Phrase Shift WTN 1. Economic Indicators Will Show That the European Economy Remained Weak Euro Area this Week 2. Euroland Economic Indicator Will Show That Europe Economy Remained Weak this Week {1:2 2:3 3:4 4:5 5:6 7:7 8:8 9:9 10:10 13:11 14:12} 1. ***** Economic Indicators Will Show That the European Economy Remained Weak Euro Area this Week 2. Euroland Economic Indicator Will Show That ***** Europe Economy Remained Weak ***** ***** this Week Edit Distance: Subs-2; Ins-1; Del-3 WER:42.86% 1. Economic Indicators Will Show That the European Economy Remained Weak Euro Area this Week 2. Economic Indicator Will Show That Euroland Europe Economy Remained Weak this Week 1. Economic Indicators Will Show That the European Economy Remained Weak Euro Area this Week 2. Economic Indicator Will Show That Euroland Europe Economy Remained Weak ***** ***** this Week Edit Distance: Subs-3; Ins-0; Del-2 WER:35.71% Economic Indicators Will Show That the European Economy Remained Weak Euro Area this Week Economic Indicator Will Show That Euroland Europe Economy this Week

12 System Combination Framework in IA Framework use MBR decoding to select an optimal hypothesis from N-best lists of all systems as the alignment reference. use GIZA-TER to align other hypotheses in N-best lists with the reference. use the beam search to generate the optimal translation. CN decoding N 1 j s log pe ( F) = log( λ pwli (, )) + νle ( ) N jn, j l jn, i= 1 l= 1 + γpl( E ) + μn ( E ) + ξn ( E ) j, n nulls j, n words j, n E ˆ ( F θ ) = arg max p ( E F ) j j j, n j E j, n

13 Large-Scale Experiments & Analysis Systems system type Features Decoding P-B Phrase-based 10 Beam H-PB Hierarchical P-B 6 CKY Corpus (LDC resources) Parallel: HK News & Hansards; Xinhua News; FBIS;NEs & Dic; etc. Monolingual: Gigaword 1&2 (Xinhua Part) Testset: 2005 NIST MT Eval. Testset (1082) Data Type #Sent. #Eng. Voc #Chn. Voc Parallel 3.4 Million 131K 118K Monolingual 9.6 M M 1,679K

14 Experiment Tasks & Results Task 1: MBR decoding System P-B H-PB MBR-BLEU 1 (Uniform) MBR-BLEU+ 2 (weights) Task 2: Alignment Reference 2005-BLEU NIST Alignment Ref. P-B H-PB MBR-BLEU BLEU NIST : MBR-BLEU: use uniform distribution to estimate the hypothesis posterior 2: MBR-BLEU+: use system weight to estimate the hypothesis posterior

15 Experiment Tasks & Results (Cont.) Task 3: Alignment Methods Method WER TER GIZA-TER-NoVoc GIZA-TER 2005-BLEU NIST Task 4: Log-linear Model with More Features Features P(w) P(w)+P LM BLEU NIST

16 Conclusions & Future Work Conclusions Describe the combination techniques, and proposed a complete and effective combination framework Proposed an improved alignment metric-giza-ter Experimental results show that the performance of the combination framework is better than that of any individual system used in experiment Future Work In future, we will introduce more features and useful information such as word lattice to improve the combination performance Use more different type systems to combination such as tree-to-string (Liu), string-totree (Knight), BTG-based (Xiong) and so on Do research on different combination algorithms

17 Questions?

18 CAS-IA System Description Jinhua Du CNGL July 23, 2008

19 Outline Hardware in IA Pre-process & Data MT System Configuration for Evaluation Achievements Conclusions

20 Hardware Machines Type Operating System Number CPU RAM Desktop PC Windows Pentium 4, 3.0G 2.0G Server Linux (Ubuntu) 1 Xeon 2.0G G Parallel Computing Condor Grid Computing Module developed by ASR group

21 Pre-process & Data Pre-processing encoding conversion & filter punctuation and number conversion (full-shaped -> half-shaped, etc.) case conversion (only the initial alphabet of the initial word), abbreviation processing Chinese word segment (ICT or IA tool), English tokenization Data for NIST Parallel: 3.4 M (if adds UN corpus, up to 10M) Monolingual: 3.4M + 9.6M(gigaword1&2) + 1.4M(giga3) = 14.4M Data for IWSLT Parallel: BTEC(20K or 40K); LDC Monolingual: BTEC; Gigaword Data Filter: only need the high correlation data, very important for spoken evaluation (More better data, more better performance)

22 System Configuration Modules Pre-processing Alignment Post-preprocessing & Models Generation Decoding & MER Training System Combination & Post-Processing

23 Achievements (zh-en) The 3 rd MT Symposia in China ( rank 3) Limited (830K pairs) Unlimited (3M pairs)

24 Achievements (zh-en) NIST MT Eval System Primary(combination) HPB STTB PB BLEU IBM BLEU

25 Achievements (zh-en) IWSLT2008 More systems to be combined 2 PB systems developed by CASIA Moses SAMT (CMU) Hierarchical PB BTG-based system (Xiong) Better performance (bleu+meteor)/ 2 bleu meteor (bleu+meteor)/ 2 bleu meteor tch.crr nlpr.crr

26 Conclusions More better data, better performance System combination is very helpful to improve the performance Evaluation is different from theoretical research: empirical methods and tricks are usually more effective For better rank, should be prepare in advance and build a temporarily team for evaluation Evaluation is a horrible thing for student: more time, more energy and less paper (joke but true) Develop systems for application purpose

27 Thanks

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories

Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best