Bitext Alignment for Statistical Machine Translation
|
|
- Katrina Dennis
- 5 years ago
- Views:
Transcription
1 Bitext Alignment for Statistical Machine Translation Yonggang Deng Advisor: Prof. William Byrne Thesis Committee: Prof. William Byrne, Prof. Trac Tran Prof. Jerry Prince and Prof. Gerard Meyer Center for Language and Speech Processing The Johns Hopkins University Baltimore, MD th December 2005 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 1 / 42
2 Bitext and Bitext Alignment Bitext: a collection of text in two languages Bitext Alignment: finding translation equivalence within bitext,,.,.,.,. Z_ It is necessary to resolutely remove obstacles in rivers and lakes. 4. It is necessary to strengthen monitoring and forecast work and scientifically dispatch people and materials. It is necessary to take effective measures and try by every possible means to provide precision forecast. Before the flood season comes, it is necessary to seize the time to formulate plans for forecasting floods and to carry out work with clear _ ]Z Y. Deng (Johns Hopkins) Bitext Alignment for SMT 2 / 42
3 Bitext and Bitext Alignment Bitext: a collection of text in two languages Bitext Alignment: finding translation equivalence within bitext,,.,.,.,. Z_ It is necessary to resolutely remove obstacles in rivers and lakes. 4. It is necessary to strengthen monitoring and forecast work and scientifically dispatch people and materials. It is necessary to take effective measures and try by every possible means to provide precision forecast. Before the flood season comes, it is necessary to seize the time to formulate plans for forecasting floods and to carry out work with clear _ ]Z Y. Deng (Johns Hopkins) Bitext Alignment for SMT 2 / 42
4 Bitext and Bitext Alignment Bitext: a collection of text in two languages Bitext Alignment: finding translation equivalence within bitext,,.,.,.,. Z_ It is necessary to resolutely remove obstacles in rivers and lakes. 4. It is necessary to strengthen monitoring and forecast work and scientifically dispatch people and materials. It is necessary to take effective measures and try by every possible means to provide precision forecast. Before the flood season comes, it is necessary to seize the time to formulate plans for forecasting floods and to carry out work with clear _ ]Z Y. Deng (Johns Hopkins) Bitext Alignment for SMT 2 / 42
5 Why automatic bitext alignment? Critical and beneficial in many multilingual NLP tasks provides basic ingredients in building a Machine Translation system Hand alignment is expensive for large corpora Desired properties language independent: Chinese, Arabic, Spanish, French... no linguistic knowledge: from scratch, unsupervised, statistical huge amount of data: effectiveness and efficiency Y. Deng (Johns Hopkins) Bitext Alignment for SMT 3 / 42
6 Statistical Machine Translation (SMT) Source Channel Target Source Decoding E P(F E) F Ê = argmax E P(E)P(F E) Translation Model P(F E) needs BITEXTs ` ^ _ M c `c^x_ EZ / _\Z_ O J / X] _ _ 8 _\ `c Z _ _ ]Z J / cxz_z_ `c 8M cx J/ ` c IJ Y. Deng (Johns Hopkins) Bitext Alignment for SMT 4 / 42
7 Outline 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 5 / 42
8 Outline Bitext Chunk Alignment Model 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 6 / 42
9 Chunk Alignment Bitext Chunk Alignment Model Problem: sentences are not translated 1-to-1 in sequence 1-to-n, n-to-1, m-to-n, order changes, real data challenge A Statistical Generative Chunk Alignment Model (Deng et al, 04) introduce a hidden chunk alignment variable document generating: fill in the blank two alignment algorithms are derived in a straightforward manor 5 e = e 1 w w # 1! 8 w w # 9! 20 w w # 21! 30 w w # 31! 38 w w Boundary marks 39! 50 e 1 e 2 e 3 e 4 e 5 m e = e1 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 7 / 42
10 Chunk Alignment Bitext Chunk Alignment Model Problem: sentences are not translated 1-to-1 in sequence 1-to-n, n-to-1, m-to-n, order changes, real data challenge A Statistical Generative Chunk Alignment Model (Deng et al, 04) introduce a hidden chunk alignment variable document generating: fill in the blank two alignment algorithms are derived in a straightforward manor 5 e = e 1 w w # 1! 8 w w # 9! 20 w w # 21! 30 w w # 31! 38 w w Boundary marks 39! 50 e 1 e 2 e 3 e 4 e 5 f 1 f 2 f 3 f 4 m e = e1 n α( n m) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 7 / 42
11 Chunk Alignment Bitext Chunk Alignment Model Problem: sentences are not translated 1-to-1 in sequence 1-to-n, n-to-1, m-to-n, order changes, real data challenge A Statistical Generative Chunk Alignment Model (Deng et al, 04) introduce a hidden chunk alignment variable document generating: fill in the blank two alignment algorithms are derived in a straightforward manor 5 e = e 1 w w # 1! 8 w w # 9! 20 w w # 21! 30 w w # 31! 38 w w Boundary marks 39! 50 e 1 e 2 e 3 e 4 e 5 K=3 f 1 f 2 f 3 f 4 m e = e1 n K α ( n m) β ( K m, n) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 7 / 42
12 Chunk Alignment Bitext Chunk Alignment Model Problem: sentences are not translated 1-to-1 in sequence 1-to-n, n-to-1, m-to-n, order changes, real data challenge A Statistical Generative Chunk Alignment Model (Deng et al, 04) introduce a hidden chunk alignment variable document generating: fill in the blank two alignment algorithms are derived in a straightforward manor 5 e = e 1 w w # 1! 8 w w # 9! 20 w w # 21! 30 w w # 31! 38 w w Boundary marks 39! 50 e 1 e 2 e 3 e 4 e 5 a 3 (5,4) a = (1,2,1,1 1 ) a = (3,4,2,3) a = (5,5,4,4) K=3 f 1 f 2 f 3 f 4 m e = e1 n K a K ( m, ) 1 n α ( n m) β ( K m, n) P( a K 1 m, n, K) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 7 / 42
13 Chunk Alignment Bitext Chunk Alignment Model Problem: sentences are not translated 1-to-1 in sequence 1-to-n, n-to-1, m-to-n, order changes, real data challenge A Statistical Generative Chunk Alignment Model (Deng et al, 04) introduce a hidden chunk alignment variable document generating: fill in the blank two alignment algorithms are derived in a straightforward manor 5 e = e 1 w w # 1! 8 w w # 9! 20 w w # 21! 30 w w # 31! 38 w w Boundary marks 39! 50 e 1 e 2 e 3 e 4 e 5 4 f = f 1 a 3 (5,4) a = (1,2,1,1 1 ) a = (3,4,2,3) a = (5,5,4,4) f 1 f 2 f 3 f 4 K=3 m e = e1 t 1! t 6 t t 7! 13 t t 14! 20 t t 21! 35 # # # Boundary marks a K 1 ( m, n f n 1 = f n K ) α ( n m) β ( K m, n) P( a K w) n K 1 m, n, K) P ( f ( a k ) e( a k )) = P( f1, K, a1 e1 k ( m ) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 7 / 42
14 Outline Bitext Chunk Alignment Algorithms 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 8 / 42
15 Bitext Chunk Alignment Algorithms Dynamic Programming (DP) f4 p(1,1)p(f5 e5) f3 f2 f1 p(2,2)p(f2,f3 e3,e4) e1 e2 e3 e4 e5 p(1,2)p(f1 e1,e2) Monotone chunk alignment Global optimum Y. Deng (Johns Hopkins) Bitext Alignment for SMT 9 / 42
16 Bitext Chunk Alignment Divisive Clustering (DC) Algorithms divide and conquer, iterative binary parallel splitting, reorder!" #$%& ' () * +, -. / : /;! < = >? AB D E FGHI J K 6 - L; MNOP 1 / QR #$ S T UV = WXYZ[\ K ] ^_` b c de 1 fg hi?j 1 6- k l m H n J >o 2- p/ qr s tu vw = x y 1 z { } ~ #$ ÄÅ ÇÉ S 7 Ñ Ö Ü 2- áà âä = ãå 1 p çé %& èê u ë í9 ìî 1 # 2 Äï ñó ñj ò ë ô 1 ö õú ë ùûü S 2- ß à = M 0 1 Æ!Ø= vw xy qr ± S 2- µ [\ ë µ u ë Äï òª 1 º/ æ k Æ ë!ø= vw ± 1 ø æ 1 ƒ ò Äï «ò» = èñ S Since the Korean Peninsula was split into two countries, the Republic of Korea has, while leaning its back on the " big tree " of the United States for security, carefully and consistently sought advanced weapons from the United States in a bid to confront the Democratic People 's Republic of Korea. An informed source in Seoul revealed to the Washington Post that the United States had secretly agreed to the request of South Korea earlier this year to " extend its existing missile range " to strike Pyongyang direct. This should have elated South Korea. But since the situation surrounding the peninsula has changed dramatically and the two heads of state of the two Koreas have met with each other and signed a joint statement, what should South Korea do now? It has no choice but spit back the " greasy meat " from its mouth and put the " missile expansion plan " on the back burner. A knowledgeable South Korean speaks the truth : " Because of the summit meeting, we have shelved our own missile plan. If we go ahead with it, it will spoil the excellent situation opened up by the summit meeting. " Y. Deng (Johns Hopkins) Bitext Alignment for SMT 10 / 42
17 Bitext Chunk Alignment Divisive Clustering (DC) Algorithms divide and conquer, iterative binary parallel splitting, reorder 1 Since the Korean Peninsula was split into two countries, the Republic of Korea has, while leaning its back on the " big tree " of the United States for security, carefully and consistently sought advanced weapons from the United States in a bid to confront the Democratic People 's Republic of Korea. An informed source in Seoul revealed to the Washington Post that the United States had secretly agreed to the request of South Korea earlier this year to " extend its existing missile range " to strike Pyongyang direct. This should have elated South Korea. But since the situation surrounding the peninsula has changed dramatically and the two heads of state of the two Koreas have met with each other and signed a joint statement, what should South Korea do now? It has no choice but spit back the " greasy meat " from its mouth and put the " missile expansion plan " on the back burner. A knowledgeable South Korean speaks the truth : " Because of the summit meeting, we have shelved our own missile plan. If we go ahead with it, it will spoil the excellent situation opened up by the summit meeting. " Y. Deng (Johns Hopkins) Bitext Alignment for SMT 10 / 42
18 Bitext Chunk Alignment Divisive Clustering (DC) Algorithms divide and conquer, iterative binary parallel splitting, reorder =U VWXY Z [\ ] ^_ $` 7a 5 #$ b c d e$ f g h i 7j = k? lm 5 n opqq r s tuvw x y e $ zj { }~ 5 7 Ä VW T Å ÇÉ? ÑÖÜá'( y à âäãåç é èê 5 ëí ìîmï 5 e$ : ñ ó v ò x lô #$ ö7 õú ù ü 5 VW 1ß T f #$ Æ Ø? ± 5 ö XY Rµ 0 + h 5 V # 12 S Sx ª º + æø T #$ H ƒ Q Æ «?» { * a 5 ;< ü õú BC T! " #$ %& '( ) * +,& -./ : ;< + BC 5 DE 89 F GH I 5 J 3 KL 12 M 3 NO? PQ RS T 2 1 Since the Korean Peninsula was split into two countries, the Republic of Korea has, while leaning its back on the " big tree " of the United States for security, carefully and consistently sought advanced weapons from the United States in a bid to confront the Democratic People 's Republic of Korea. An informed source in Seoul revealed to the Washington Post that the United States had secretly agreed to the request of South Korea earlier this year to " extend its existing missile range " to strike Pyongyang direct. This should have elated South Korea. But since the situation surrounding the peninsula has changed dramatically and the two heads of state of the two Koreas have met with each other and signed a joint statement, what should South Korea do now? It has no choice but spit back the " greasy meat " from its mouth and put the " missile expansion plan " on the back burner. A knowledgeable South Korean speaks the truth : " Because of the summit meeting, we have shelved our own missile plan. If we go ahead with it, it will spoil the excellent situation opened up by the summit meeting. " Y. Deng (Johns Hopkins) Bitext Alignment for SMT 10 / 42
19 Bitext Chunk Alignment Divisive Clustering (DC) Algorithms divide and conquer, iterative binary parallel splitting, reorder = u no Æ Ø ± 7] 5 µ ö b r 7 = ª? üò 5 º æøø ùƒ w ä ö [ 5 7 u T É ÑÖ? Üáàâ ä ã åçéèê ë íì 5 îï ñóòô 5 ö : õ ú ù û w ü k7 `a ^ _ 5 ß u 1 T b c d e fw gh? ij 5 k lm no Rp 0 + qr st 5 u 12 Sv Sw 3 + x 5 y z{ + }~ T ÄHÅ Ç 5 UQ V W X? YZ [ \ ] 5 ;< = ^_ `a BC T 2 +, -./ : ;< + BC 5 DE 89 F GH I 5 J 3 KL 12 M 3 NO? PQ RS T Since the Korean Peninsula was split into two countries, the Republic of Korea has, while leaning its back on the " big tree " of the United States for security, carefully and consistently sought advanced weapons from the United States in a bid to confront the Democratic People 's Republic of Korea. An informed source in Seoul revealed to the Washington Post that the United States had secretly agreed to the request of South Korea earlier this year to " extend its existing missile range " to strike Pyongyang direct. This should have elated South Korea. But since the situation surrounding the peninsula has changed dramatically and the two heads of state of the two Koreas have met with each other and signed a joint statement, what should South Korea do now? It has no choice but spit back the " greasy meat " from its mouth and put the " missile expansion plan " on the back burner. A knowledgeable South Korean speaks the truth : " Because of the summit meeting, we have shelved our own missile plan. If we go ahead with it, it will spoil the excellent situation opened up by the summit meeting. " Y. Deng (Johns Hopkins) Bitext Alignment for SMT 10 / 42
20 Bitext Chunk Alignment Algorithms A hierarchical chunking scheme DP+DC DP at sentence level followed by DC at sub-sentence level from coarse to fine, deriving short chunk pairs Advantage significantly reduce machine training time 21 hrs vs. 8 hrs make most of bitext usable for machine training 78% vs. 98% "There is no data like more data" (Robert Mercer, 1988) improve system performance by higher coverage Y. Deng (Johns Hopkins) Bitext Alignment for SMT 11 / 42
21 Outline Bitext Chunk Alignment Results 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 12 / 42
22 Bitext Chunk Alignment Results Unsupervised Sentence Alignment 122 Chinese/English document pairs selected from FBIS corpus sentence aligned by humans, 2,200 sentence pairs unsupervised from scratch, measured by Pre/Rec 1 Precision at each Iteration 1 Recall at each Iteration Precision A: DP w/ uniform P(x,y) B: DP w/ biased P(x,y) C: DC D: DP+DC w/ uniform P(x,y) E: DP+DC w/ biased P(x,y) F: DP+DC w/ uniform P(x,y) seeded from E Ite. 0 G: DC seeded from E Ite Iteration Recall A: DP w/ uniform P(x,y) B: DP w/ biased P(x,y) C: DC D: DP+DC w/ uniform P(x,y) E: DP+DC w/ biased P(x,y) F: DP+DC w/ uniform P(x,y) seeded from E Ite. 0 G: DC seeded from E Ite Iteration Y. Deng (Johns Hopkins) Bitext Alignment for SMT 13 / 42
23 Outline Bitext Word Alignment Introduction/Motivation 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 14 / 42
24 Word Alignment Bitext Word Alignment Introduction/Motivation Fundamental problem in Machine Translation Basis for phrase/syntax models Model relations from source s = s1 I to target t = tj 1 Word alignment a = a1 J: s a j t j, j = 1, 2,, J = hidden r.v. Conditional likelihood P(t, a s) = complete data Sentence translation P(t s) = a P(t, a s) = incomplete data s 0 s 1 s 2 s 3 s 4 NULL!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t Y. Deng (Johns Hopkins) Bitext Alignment for SMT 15 / 42
25 State of the Art Bitext Word Alignment Introduction/Motivation IBM Model-4 generated by GIZA++ Toolkit (Och & Ney, 03) The state of the art word alignments especially on large bitexts But Exact-EM is problematic, sub-optimal estimation algorithms used Difficult to compute statistics under the model Applications limited by word alignments only GOAL: improve word alignments of bitexts for better translation Comparable performance to Model-4 Fast efficient training, with controlled memory usage Use the model, not just the alignments Y. Deng (Johns Hopkins) Bitext Alignment for SMT 16 / 42
26 Bitext Word Alignment Introduction/Motivation IBM Model-4 Word Alignments (Brown et al, 93) s 0 NULL s 1 s 2 s 3 s 4 What makes the model powerful also makes computation complex Typical training procedure: Model-1, HMM, Model-4 Can we do something to HMM? Y. Deng (Johns Hopkins) Bitext Alignment for SMT 17 / 42
27 Bitext Word Alignment Introduction/Motivation IBM Model-4 Word Alignments (Brown et al, 93) s 0 NULL s 1 s 2 s 3 s 4!" #$ %& '()* Create a tablet for each source word What makes the model powerful also makes computation complex Typical training procedure: Model-1, HMM, Model-4 Can we do something to HMM? Y. Deng (Johns Hopkins) Bitext Alignment for SMT 17 / 42
28 Bitext Word Alignment Introduction/Motivation IBM Model-4 Word Alignments (Brown et al, 93) s 0 NULL s 1 s 2 s 3 s 4!" #$ %& '()* Table lookup to decide fertility: # of target words connected What makes the model powerful also makes computation complex Typical training procedure: Model-1, HMM, Model-4 Can we do something to HMM? Y. Deng (Johns Hopkins) Bitext Alignment for SMT 17 / 42
29 Bitext Word Alignment Introduction/Motivation IBM Model-4 Word Alignments (Brown et al, 93) s 0 NULL s 1 s 2 s 3 s 4!" #$ %& '()* the china s at an early date accession to world trade organization Sample target words from translation table i.i.d. What makes the model powerful also makes computation complex Typical training procedure: Model-1, HMM, Model-4 Can we do something to HMM? Y. Deng (Johns Hopkins) Bitext Alignment for SMT 17 / 42
30 Bitext Word Alignment Introduction/Motivation IBM Model-4 Word Alignments (Brown et al, 93) s 0 NULL s 1 s 2 s 3 s 4!" #$ %& '()* the china s at an early date accession to world trade organization Distortion Model china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 What makes the model powerful also makes computation complex Typical training procedure: Model-1, HMM, Model-4 Can we do something to HMM? Y. Deng (Johns Hopkins) Bitext Alignment for SMT 17 / 42
31 Bitext Word Alignment Introduction/Motivation HMM WtoW Model (Vogel et al, 96; Och & Ney, 03) s 1 s 2 s 3 s 4!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 State sequences word to word alignments Words are generated one by one, one transition emits one target word Y. Deng (Johns Hopkins) Bitext Alignment for SMT 18 / 42
32 Bitext Word Alignment Introduction/Motivation HMM WtoW Model (Vogel et al, 96; Och & Ney, 03) s 1 s 2 s 3 s 4!" #$ %& '()*!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 State sequences word to word alignments Words are generated one by one, one transition emits one target word Y. Deng (Johns Hopkins) Bitext Alignment for SMT 18 / 42
33 Bitext Word Alignment Introduction/Motivation HMM WtoW Model (Vogel et al, 96; Och & Ney, 03) s 1 s 2 s 3 s 4!" #$ %& '()*!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 State sequences word to word alignments Words are generated one by one, one transition emits one target word Y. Deng (Johns Hopkins) Bitext Alignment for SMT 18 / 42
34 Bitext Word Alignment Introduction/Motivation Make HMM More Powerful in Generating Observations s 1 s 2 s 3 s 4!" #$ %& '()*!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 Target phrases rather than words are emitted after jumping into a state State sequences word to phrase alignments Word-to-Phrase (WtoP) HMM (Deng & Byrne, 05) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 19 / 42
35 Bitext Word Alignment Introduction/Motivation Make HMM More Powerful in Generating Observations s 1 s 2 s 3 s 4!" #$ %& '()*!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 Target phrases rather than words are emitted after jumping into a state State sequences word to phrase alignments Word-to-Phrase (WtoP) HMM (Deng & Byrne, 05) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 19 / 42
36 Outline Bitext Word Alignment WtoP HMM Model 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 20 / 42
37 Bitext Word Alignment WtoP HMM Model Word-to-Phrase HMM Alignment Models Target sentence segmented into K phrases Phrase length sequence φ K 1, t = v K 1 Phrase alignment sequence a K 1 NULL: h1 K is a Bernoulli process, d(h k = 1) = 1 p 0, d(h k = 0) = p 0 h k = 1 s ak v k h k = 0 NULL v k Hidden random variable: Word-to-phrase alignment a = (K, a1 K, φ K 1, h1 K ) P(t, a s) = P(v K 1, K, a K 1, h K 1, φ K 1 s) = P(K J, s) P(a K 1, φ K 1, h K 1 K, J, s) P(v K 1 a K 1, h K 1, φ K 1, K, J, s) = P(K J, I) = Phrase Count η K K k=1 K k=1 p(a k a k 1, h k ; I) d(h k ) n(φ k ; h k s ak ) = Markov, Phrase Length P(v k h k s ak ) = Word-to-Phrase Translation Y. Deng (Johns Hopkins) Bitext Alignment for SMT 21 / 42
38 Bitext Word Alignment WtoP HMM Model Word-to-Phrase HMM Alignment Models Target sentence segmented into K phrases Phrase length sequence φ K 1, t = v K 1 Phrase alignment sequence a K 1 NULL: h1 K is a Bernoulli process, d(h k = 1) = 1 p 0, d(h k = 0) = p 0 h k = 1 s ak v k h k = 0 NULL v k Hidden random variable: Word-to-phrase alignment a = (K, a1 K, φ K 1, h1 K ) P(t, a s) = P(v K 1, K, a K 1, h K 1, φ K 1 s) = P(K J, s) P(a K 1, φ K 1, h K 1 K, J, s) P(v K 1 a K 1, h K 1, φ K 1, K, J, s) = P(K J, I) = Phrase Count η K K k=1 K k=1 p(a k a k 1, h k ; I) d(h k ) n(φ k ; h k s ak ) = Markov, Phrase Length P(v k h k s ak ) = Word-to-Phrase Translation Y. Deng (Johns Hopkins) Bitext Alignment for SMT 21 / 42
39 Bitext Word Alignment WtoP HMM Model Word-to-Phrase HMM Alignment Models Target sentence segmented into K phrases Phrase length sequence φ K 1, t = v K 1 Phrase alignment sequence a K 1 NULL: h1 K is a Bernoulli process, d(h k = 1) = 1 p 0, d(h k = 0) = p 0 h k = 1 s ak v k h k = 0 NULL v k Hidden random variable: Word-to-phrase alignment a = (K, a1 K, φ K 1, h1 K ) P(t, a s) = P(v K 1, K, a K 1, h K 1, φ K 1 s) = P(K J, s) P(a K 1, φ K 1, h K 1 K, J, s) P(v K 1 a K 1, h K 1, φ K 1, K, J, s) = P(K J, I) = Phrase Count η K K k=1 K k=1 p(a k a k 1, h k ; I) d(h k ) n(φ k ; h k s ak ) = Markov, Phrase Length P(v k h k s ak ) = Word-to-Phrase Translation Y. Deng (Johns Hopkins) Bitext Alignment for SMT 21 / 42
40 Bitext Word Alignment WtoP HMM Model Word-to-Phrase Translation Probabilities Replace weak i.i.d. word-for-word translation P(world trade organization f = dddd; 3) =? = t(world f) t(trade f) t(organization f) = i.i.d. = t(world f) t 2 (trade world, f) t 2 (organization trade, f) = bigram Model i.i.d. bigram P(world dddd) P(trade world, dddd) P(organization trade, dddd) P(world trade organization dddd, 3) Assigns higher probability to correct translation than i.i.d Incorporates context without losing algorithmic efficiency: DP Use same estimation techniques as used for bigram LMs Data sparseness, Witten-Bell smoothing Y. Deng (Johns Hopkins) Bitext Alignment for SMT 22 / 42
41 Bitext Word Alignment WtoP HMM Model Comparing Word-to-Phrase HMM to... Segmental Hidden Markov Models (Ostendorf et al, 96) states emit observation sequences WtoW HMM (Vogel et al, 96; Och & Ney, 03) N = 1 Extensions to WtoW HMM (Toutanova et al, 02) P(stay s) vs. P(stay = φ s) in modeling state durations IBM Model-4 fertility vs. phrase length Y. Deng (Johns Hopkins) Bitext Alignment for SMT 23 / 42
42 Outline Bitext Word Alignment Parameter Estimation 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 24 / 42
43 Bitext Word Alignment Parameter Estimation Forward-backward Algorithm State space S = {(i, φ, h) : 1 i I, 1 φ N, h = 0 or 1} Grid 2NI J s 1.. s i.. s I t 1 t 2.. t j- t j-.. t j t j+1.. t J α j (i, φ, h) = β j (i, φ, h) = i,φ,h α j φ (i, φ, h )p(i i, h; I) η n(φ; h s i ) P(t j j φ+1 h s i, φ) β j+φ (i, φ, h )p(i i, h ; I) η n(φ ; h s i ) P(t j+φ j+1 h s i, φ ) i,φ,h γ j (i, φ, h) = P(h s i v = t j j φ+1 θ, s, t) = α j(i, φ, h)β j (i, φ, h) i,h,φ α J (i, φ, h ) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 25 / 42
44 Bitext Word Alignment Parameter Estimation Embedded Estimation of Word-to-Phrase HMM Unsupervised training from scratch Model-1, 10 its (initial t-table) Model-2, 5 its (better t-table) WtoW HMM, 5 its (initial Markov model) WtoP HMM N=2, 3,.., each 5 its (Markov model, phrase length) (experience from ASR) WtoP HMM with bigram t-table, 5 its (bigram t-table) Parallel Implementation Partitioning training bitext E-step: Collect counts from each partition parallel M-step: Merge counts to update model parameters Memory efficient, virtually no limitation on training bitext size Y. Deng (Johns Hopkins) Bitext Alignment for SMT 26 / 42
45 Bitext Word Alignment Parameter Estimation Embedded Estimation of Word-to-Phrase HMM Unsupervised training from scratch Model-1, 10 its (initial t-table) Model-2, 5 its (better t-table) WtoW HMM, 5 its (initial Markov model) WtoP HMM N=2, 3,.., each 5 its (Markov model, phrase length) (experience from ASR) WtoP HMM with bigram t-table, 5 its (bigram t-table) Parallel Implementation Partitioning training bitext E-step: Collect counts from each partition parallel M-step: Merge counts to update model parameters Memory efficient, virtually no limitation on training bitext size Y. Deng (Johns Hopkins) Bitext Alignment for SMT 26 / 42
46 Outline Bitext Word Alignment Word Alignment Results 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 27 / 42
47 Bitext Word Alignment Bitext Alignment Results Word Alignment Results Test: NIST 2001 MT-eval set, 124 sentence pairs w/ manual word alignments Comparable performance to Model-4 on FBIS training bitext Increasing max phrase length N improves quality in C E direction Bigram translation probability improves word-to-phrase links A good balance between 1-1 and 1-N distribution can be achieved # of hypothesized links Links 1 N Links Overall AER Overall AER η Comparable performance when extending to large scale bitexts Y. Deng (Johns Hopkins) Bitext Alignment for SMT 28 / 42
48 Outline Bitext Phrase Alignment Inducing from Word Alignments 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 29 / 42
49 Bitext Phrase Alignment Inducing from Word Alignments Statistical Phrase Translation Models Phrase-based SMT performs better than word-based SMT Phrases Pair Inventory (PPI) extracted from word aligned bitext (Och et al, 99) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 30 / 42
50 Bitext Phrase Alignment Inducing from Word Alignments But word alignments are imperfect... There is no gang and money linked politics in hong kong and there will not be such politics in future either? Relying on the one-best word alignment may exclude some valid phrase pairs Goal is to define a probability distribution over phrase pairs Allows more control over generation of phrase pairs Y. Deng (Johns Hopkins) Bitext Alignment for SMT 31 / 42
51 Bitext Phrase Alignment Inducing from Word Alignments But word alignments are imperfect... There is no gang and money linked politics in hong kong and there will not be such politics in future either Relying on the one-best word alignment may exclude some valid phrase pairs Goal is to define a probability distribution over phrase pairs Allows more control over generation of phrase pairs Y. Deng (Johns Hopkins) Bitext Alignment for SMT 31 / 42
52 Bitext Phrase Alignment Inducing from Word Alignments But word alignments are imperfect... There is no gang and money linked politics in hong kong and there will not be such politics in future either Relying on the one-best word alignment may exclude some valid phrase pairs Goal is to define a probability distribution over phrase pairs Allows more control over generation of phrase pairs Y. Deng (Johns Hopkins) Bitext Alignment for SMT 31 / 42
53 Outline Bitext Phrase Alignment Model-based Phrase Pair Posterior 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 32 / 42
54 Bitext Phrase Alignment Model-based Phrase Pair Posterior Model-based Phrase Pair Posterior Doesn t rely on a single alignment i 1 i 2 There is no gang and money linked politics in hong kong and there will not be such politics in future either j 1 j 2 Define a set of alignments that align words to words in phrases A(i 1, i 2 ; j 1, j 2 ) = {a m 1 : a j [i 1, i 2 ] iff j [j 1, j 2 ]} Calculate the likelihood of the source phrase producing the target phrase P(t, A(i 1, i 2 ; j 1, j 2 ) s) = P(t, a s) a : a m 1 A(i 1,i 2 ;j 1,j 2 ) Obtain phrase pair posterior P(A(i 1, i 2 ; j 1, j 2 ) t, s) = P(t, A(i 1, i 2 ; j 1, j 2 ) s)/p(t s) Efficient DP-based implementation for WtoP HMM, Difficult for Model-4 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 33 / 42
55 Bitext Phrase Alignment Model-based Phrase Pair Posterior Model-based Phrase Pair Posterior Doesn t rely on a single alignment i 1 i 2 There is no gang and money linked politics in hong kong and there will not be such politics in future either j 1 j 2 Define a set of alignments that align words to words in phrases A(i 1, i 2 ; j 1, j 2 ) = {a m 1 : a j [i 1, i 2 ] iff j [j 1, j 2 ]} Calculate the likelihood of the source phrase producing the target phrase P(t, A(i 1, i 2 ; j 1, j 2 ) s) = P(t, a s) a : a m 1 A(i 1,i 2 ;j 1,j 2 ) Obtain phrase pair posterior P(A(i 1, i 2 ; j 1, j 2 ) t, s) = P(t, A(i 1, i 2 ; j 1, j 2 ) s)/p(t s) Efficient DP-based implementation for WtoP HMM, Difficult for Model-4 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 33 / 42
56 Bitext Phrase Alignment Model-based Phrase Pair Posterior Model-based Phrase Pair Posterior Doesn t rely on a single alignment i 1 i 2 There is no gang and money linked politics in hong kong and there will not be such politics in future either j 1 j 2 Define a set of alignments that align words to words in phrases A(i 1, i 2 ; j 1, j 2 ) = {a m 1 : a j [i 1, i 2 ] iff j [j 1, j 2 ]} Calculate the likelihood of the source phrase producing the target phrase P(t, A(i 1, i 2 ; j 1, j 2 ) s) = P(t, a s) a : a m 1 A(i 1,i 2 ;j 1,j 2 ) Obtain phrase pair posterior P(A(i 1, i 2 ; j 1, j 2 ) t, s) = P(t, A(i 1, i 2 ; j 1, j 2 ) s)/p(t s) Efficient DP-based implementation for WtoP HMM, Difficult for Model-4 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 33 / 42
57 Bitext Phrase Alignment Model-based Phrase Pair Posterior Model-based Phrase Pair Posterior Doesn t rely on a single alignment i 1 i 2 There is no gang and money linked politics in hong kong and there will not be such politics in future either j 1 j 2 Define a set of alignments that align words to words in phrases A(i 1, i 2 ; j 1, j 2 ) = {a m 1 : a j [i 1, i 2 ] iff j [j 1, j 2 ]} Calculate the likelihood of the source phrase producing the target phrase P(t, A(i 1, i 2 ; j 1, j 2 ) s) = P(t, a s) a : a m 1 A(i 1,i 2 ;j 1,j 2 ) Obtain phrase pair posterior P(A(i 1, i 2 ; j 1, j 2 ) t, s) = P(t, A(i 1, i 2 ; j 1, j 2 ) s)/p(t s) Efficient DP-based implementation for WtoP HMM, Difficult for Model-4 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 33 / 42
58 Bitext Phrase Alignment Model-based Phrase Pair Posterior Model-based Phrase Pair Posterior Doesn t rely on a single alignment i 1 i 2 There is no gang and money linked politics in hong kong and there will not be such politics in future either j 1 j 2 Define a set of alignments that align words to words in phrases A(i 1, i 2 ; j 1, j 2 ) = {a m 1 : a j [i 1, i 2 ] iff j [j 1, j 2 ]} Calculate the likelihood of the source phrase producing the target phrase P(t, A(i 1, i 2 ; j 1, j 2 ) s) = P(t, a s) a : a m 1 A(i 1,i 2 ;j 1,j 2 ) Obtain phrase pair posterior P(A(i 1, i 2 ; j 1, j 2 ) t, s) = P(t, A(i 1, i 2 ; j 1, j 2 ) s)/p(t s) Efficient DP-based implementation for WtoP HMM, Difficult for Model-4 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 33 / 42
59 Bitext Phrase Alignment Model-based Phrase Pair Posterior Augmented PPI for a Better Coverage Baseline PPI extracted from 1-best alignments using establishing techniques (Och et al., 99) GOAL: add phrase pairs to improve test set coverage For each foreign phrase v in test set not covered by the baseline for each sentence pair containing v find the English phrase u that maximizes the phrase pair posterior f (i 1, i 2 ) = P F E ( A(i 1, i 2 ; j 1, j 2 ) e l 1, f m 1 ) b(i 1, i 2 ) = P E F ( A(i 1, i 2 ; j 1, j 2 ) e l 1, f m 1 ) g(i 1, i 2 ) = f (1 1, i 2 ) b(i 1, i 2 ) (î1, î2) = argmax g(i 1, i 2 ), and set u = eî2 î1 1 i 1,i 2 l add (u, v) to the baseline PPI if posterior exceeds a threshold value Y. Deng (Johns Hopkins) Bitext Alignment for SMT 34 / 42
60 Outline Bitext Phrase Alignment Translation Results 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 35 / 42
61 Bitext Phrase Alignment Translation Results Transduce Translation Model (Kumar et al, 05) TTM Decoder - WFST implementation with monotone order Source Phrase Segmentation Phrase Insertion Phrase Translation Target Phrase Segmentation GRAIN EXPORTS ARE PROJECTED TO FALL BY 25 % GRAIN EXPORTS ARE_PROJECTED_TO FALL BY_25_% 1 GRAINS 1 EXPORTS ARE_PROJECTED_TO FALL BY_25_% DE GRAINS LES EXPORTATIONS DOIVENT FLECHIR DE_25_% DE GRAINS LES EXPORTATIONS DOIVENT FLECHIR DE 25 % Source Language Sentence Source Phrases Placement of Target Phrase Insertion Markers Target Phrases Target Language Sentence Y. Deng (Johns Hopkins) Bitext Alignment for SMT 36 / 42
62 Bitext Phrase Alignment Translation Results Automatic Machine Translation Evaluation hard problem! BLEU (Papeneni et al, 01) an automatic MT metric correlated well with human judgements geomantic mean of n-gram precisions weighted by brevity penalty Reference : mr. speaker, in absolutely no way. Hypothesis : in absolutely no way, mr. chairman. BLEU computation Sub-string-Matches(Truth,Hyp) BLEU ( 1-word 2-word 3-word 4-word = )1 7/8 3/7 2/6 1/5 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 37 / 42
63 Bitext Phrase Alignment Translation Results Translation Results: Small Systems Coverage BLEU Coverage BLEU eval02 eval03 eval Chinese-English 23.4 eval02 eval03 eval eval02 eval03 eval Arabic-English 36.5 eval02 eval03 eval Relaxing threshold in PPI augmenting improves coverage and BLEU score Balance coverage against phrase translation quality WtoP model can even be applied to augment Model-4 PPI Y. Deng (Johns Hopkins) Bitext Alignment for SMT 38 / 42
64 Bitext Phrase Alignment Translation Results Translation Results: Large Systems Model-4 Baseline WtoP HMM Baseline WtoP HMM Augmented Model-4 Baseline WtoP HMM Baseline WtoP HMM Augmented eva l02 eva l03 eva l04(news) 36 eva l02 eva l03 eva l04(news) Chinese-English Arabic-English Used all parallel corpora available from LDC C-E: 200M En. words (FBIS, Xinhua, HK News,..., all UN bitexts) A-E: 130M En. words (news, all UN bitexts) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 39 / 42
65 Conclusions Conclusions A hierarchial bitext chunking approach language independent, no linguistic knowledge required derived short chunk pairs, retain more of the available bitext The word-to-phrase HMM alignment model produces good quality word alignments over very large bitexts has efficient training algorithm with parallel implementation a powerful framework Model-based phrase pair distribution enables an improved phrase pair extraction strategy controlled balance coverage vs. quality WtoP HMM performs better than IBM Model-4 on large systems Y. Deng (Johns Hopkins) Bitext Alignment for SMT 40 / 42
66 Conclusions Machine Translation Toolkit (MTTK) Solutions for MT training, Used for JHU-CU 2005 NIST MT Eval Systems Document Alignments Length Statistics Bitext Chunking { t(f e) } Model 1 Training Chunk Alignments Filter High Quality Pairs Model 1 Training PPEM { t(f e) } AlignM1 Model 2 Training PPEM { t(f e), a(i j;l,m) } AlignM2 Phrase Alignments PPEHmm WtW HMM Training { t(f e), P(i i ;l) } WtP HMM Training N=2,3... AlignHmm Word Alignments PPEHmm { t(f e), P(i i ;l), n(phi;e) } AlignSHmm WtP HMM Training w/ Bigram t table PPEHmm { t(f e), P(i i ;l), n(phi;e), t2(f f,e) } AlignSHmm Y. Deng (Johns Hopkins) Bitext Alignment for SMT 41 / 42
67 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 42 / 42
Out of GIZA Efficient Word Alignment Models for SMT
Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza
More informationAutomatic Speech Recognition and Statistical Machine Translation under Uncertainty
Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.
More informationWord Alignment for Statistical Machine Translation Using Hidden Markov Models
Word Alignment for Statistical Machine Translation Using Hidden Markov Models by Anahita Mansouri Bigvand A Depth Report Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of
More informationMachine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples
Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationLexical Translation Models 1I
Lexical Translation Models 1I Machine Translation Lecture 5 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last Time... X p( Translation)= p(, Translation)
More informationTALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto
TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José
More informationStatistical Phrase-Based Speech Translation
Statistical Phrase-Based Speech Translation Lambert Mathias 1 William Byrne 2 1 Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University 2 Machine
More informationStatistical NLP Spring HW2: PNP Classification
Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional
More informationHW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment
Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi
More informationStatistical NLP Spring Corpus-Based MT
Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it
More informationCorpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1
Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationStatistical Machine Translation and Automatic Speech Recognition under Uncertainty
Statistical Machine Translation and Automatic Speech Recognition under Uncertainty Lambert Mathias A dissertation submitted to the Johns Hopkins University in conformity with the requirements for the degree
More informationLexical Translation Models 1I. January 27, 2015
Lexical Translation Models 1I January 27, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } { z } { p(e f,m)=
More informationVariational Decoding for Statistical Machine Translation
Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationstatistical machine translation
statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical
More informationMachine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013
Machine Translation CL1: Jordan Boyd-Graber University of Maryland November 11, 2013 Adapted from material by Philipp Koehn CL1: Jordan Boyd-Graber (UMD) Machine Translation November 11, 2013 1 / 48 Roadmap
More informationAlgorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview
More informationN-gram Language Modeling Tutorial
N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures
More informationStatistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single
More informationMultiple System Combination. Jinhua Du CNGL July 23, 2008
Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationIBM Model 1 for Machine Translation
IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationEfficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Graeme Blackwood, Adrià de Gispert, William Byrne Machine Intelligence Laboratory Cambridge
More informationNatural Language Processing (CSEP 517): Machine Translation
Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Hemanta K Maji Raghavendra Udupa U IBM India Research Laboratory, New Delhi {hemantkm, uraghave}@inibmcom Abstract Viterbi
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationSpeech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories
Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best
More informationWord Alignment III: Fertility Models & CRFs. February 3, 2015
Word Alignment III: Fertility Models & CRFs February 3, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } {
More informationThe Geometry of Statistical Machine Translation
The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationEvaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018
Evaluation Brian Thompson slides by Philipp Koehn 25 September 2018 Evaluation 1 How good is a given machine translation system? Hard problem, since many different translations acceptable semantic equivalence
More informationTopics in Natural Language Processing
Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationPair Hidden Markov Models
Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]
More informationN-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24
L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,
More informationTriplet Lexicon Models for Statistical Machine Translation
Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationAn Example file... log.txt
# ' ' Start of fie & %$ " 1 - : 5? ;., B - ( * * B - ( * * F I / 0. )- +, * ( ) 8 8 7 /. 6 )- +, 5 5 3 2( 7 7 +, 6 6 9( 3 5( ) 7-0 +, => - +< ( ) )- +, 7 / +, 5 9 (. 6 )- 0 * D>. C )- +, (A :, C 0 )- +,
More informationP(t w) = arg maxp(t, w) (5.1) P(t,w) = P(t)P(w t). (5.2) The first term, P(t), can be described using a language model, for example, a bigram model:
Chapter 5 Text Input 5.1 Problem In the last two chapters we looked at language models, and in your first homework you are building language models for English and Chinese to enable the computer to guess
More informationMachine Translation Evaluation
Machine Translation Evaluation Sara Stymne 2017-03-29 Partly based on Philipp Koehn s slides for chapter 8 Why Evaluation? How good is a given machine translation system? Which one is the best system for
More informationHidden Markov Modelling
Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationCS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations
More informationMulti-Task Word Alignment Triangulation for Low-Resource Languages
Multi-Task Word Alignment Triangulation for Low-Resource Languages Tomer Levinboim and David Chiang Department of Computer Science and Engineering University of Notre Dame {levinboim.1,dchiang}@nd.edu
More informationStatistical Machine Translation
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School University of Pisa Pisa, 7-19 May 2008 Part V: Language Modeling 1 Comparing ASR and statistical MT N-gram
More informationImproved Decipherment of Homophonic Ciphers
Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,
More informationGappy Phrasal Alignment by Agreement
Gappy Phrasal Alignment by Agreement Mohit Bansal UC Berkeley, CS Division mbansal@cs.berkeley.edu Chris Quirk Microsoft Research chrisq@microsoft.com Robert C. Moore Google Research robert.carter.moore@gmail.com
More informationA Recursive Statistical Translation Model
A Recursive Statistical Translation Model Juan Miguel Vilar Dpto. de Lenguajes y Sistemas Informáticos Universitat Jaume I Castellón (Spain) jvilar@lsi.uji.es Enrique Vidal Dpto. de Sistemas Informáticos
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationAn Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems
An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationA Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0
More informationRedoing the Foundations of Decision Theory
Redoing the Foundations of Decision Theory Joe Halpern Cornell University Joint work with Larry Blume and David Easley Economics Cornell Redoing the Foundations of Decision Theory p. 1/21 Decision Making:
More informationDecoding Revisited: Easy-Part-First & MERT. February 26, 2015
Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationDiscriminative Training
Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder
More informationspeaker recognition using gmm-ubm semester project presentation
speaker recognition using gmm-ubm semester project presentation OBJECTIVES OF THE PROJECT study the GMM-UBM speaker recognition system implement this system with matlab document the code and how it interfaces
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationLanguage Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN
Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationDesign and Implementation of Speech Recognition Systems
Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationDT2118 Speech and Speaker Recognition
DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More informationAN INTRODUCTION TO TOPIC MODELS
AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about
More informationSparse Models for Speech Recognition
Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations
More informationPart A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )
Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationHidden Markov Models. x 1 x 2 x 3 x N
Hidden Markov Models 1 1 1 1 K K K K x 1 x x 3 x N Example: The dishonest casino A casino has two dice: Fair die P(1) = P() = P(3) = P(4) = P(5) = P(6) = 1/6 Loaded die P(1) = P() = P(3) = P(4) = P(5)
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 8: Sequence Labeling Jimmy Lin University of Maryland Thursday, March 14, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationNatural Language Processing. Statistical Inference: n-grams
Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability
More informationORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation
ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way
More informationChapter 2 Segment Measurement and Coordinate Graphing
Geometry Concepts Chapter 2 Segment Measurement and Coordinate Graphing 2.2 Find length segments (1.3) 2.3 Compare lengths of segments (1.3) 2.3 Find midpoints of segments (1.7) 2.5 Calculate coordinates
More information1. Markov models. 1.1 Markov-chain
1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited
More informationWord Alignment via Submodular Maximization over Matroids
Word Alignment via Submodular Maximization over Matroids Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 21, 2011 Lin and Bilmes Submodular Word Alignment June
More informationLinear Transforms in Automatic Speech Recognition: Estimation Procedures & Integration of Diverse Acoustic Data
Linear Transforms in Automatic Speech Recognition: Estimation Procedures & Stavros Tsakalidis Advisor: Prof. William Byrne Thesis Committee: Prof. William Byrne, Prof. Sanjeev Khudanpur, Prof. Paul Sotiriadis,
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationConstructive Decision Theory
Constructive Decision Theory Joe Halpern Cornell University Joint work with Larry Blume and David Easley Economics Cornell Constructive Decision Theory p. 1/2 Savage s Approach Savage s approach to decision
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationNatural Language Processing
Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization
More informationCovariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data
Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Jan Vaněk, Lukáš Machlica, Josef V. Psutka, Josef Psutka University of West Bohemia in Pilsen, Univerzitní
More information