Bitext Alignment for Statistical Machine Translation

Size: px
Start display at page:

Download "Bitext Alignment for Statistical Machine Translation"

Transcription

1 Bitext Alignment for Statistical Machine Translation Yonggang Deng Advisor: Prof. William Byrne Thesis Committee: Prof. William Byrne, Prof. Trac Tran Prof. Jerry Prince and Prof. Gerard Meyer Center for Language and Speech Processing The Johns Hopkins University Baltimore, MD th December 2005 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 1 / 42

2 Bitext and Bitext Alignment Bitext: a collection of text in two languages Bitext Alignment: finding translation equivalence within bitext,,.,.,.,. Z_ It is necessary to resolutely remove obstacles in rivers and lakes. 4. It is necessary to strengthen monitoring and forecast work and scientifically dispatch people and materials. It is necessary to take effective measures and try by every possible means to provide precision forecast. Before the flood season comes, it is necessary to seize the time to formulate plans for forecasting floods and to carry out work with clear _ ]Z Y. Deng (Johns Hopkins) Bitext Alignment for SMT 2 / 42

3 Bitext and Bitext Alignment Bitext: a collection of text in two languages Bitext Alignment: finding translation equivalence within bitext,,.,.,.,. Z_ It is necessary to resolutely remove obstacles in rivers and lakes. 4. It is necessary to strengthen monitoring and forecast work and scientifically dispatch people and materials. It is necessary to take effective measures and try by every possible means to provide precision forecast. Before the flood season comes, it is necessary to seize the time to formulate plans for forecasting floods and to carry out work with clear _ ]Z Y. Deng (Johns Hopkins) Bitext Alignment for SMT 2 / 42

4 Bitext and Bitext Alignment Bitext: a collection of text in two languages Bitext Alignment: finding translation equivalence within bitext,,.,.,.,. Z_ It is necessary to resolutely remove obstacles in rivers and lakes. 4. It is necessary to strengthen monitoring and forecast work and scientifically dispatch people and materials. It is necessary to take effective measures and try by every possible means to provide precision forecast. Before the flood season comes, it is necessary to seize the time to formulate plans for forecasting floods and to carry out work with clear _ ]Z Y. Deng (Johns Hopkins) Bitext Alignment for SMT 2 / 42

5 Why automatic bitext alignment? Critical and beneficial in many multilingual NLP tasks provides basic ingredients in building a Machine Translation system Hand alignment is expensive for large corpora Desired properties language independent: Chinese, Arabic, Spanish, French... no linguistic knowledge: from scratch, unsupervised, statistical huge amount of data: effectiveness and efficiency Y. Deng (Johns Hopkins) Bitext Alignment for SMT 3 / 42

6 Statistical Machine Translation (SMT) Source Channel Target Source Decoding E P(F E) F Ê = argmax E P(E)P(F E) Translation Model P(F E) needs BITEXTs ` ^ _ M c `c^x_ EZ / _\Z_ O J / X] _ _ 8 _\ `c Z _ _ ]Z J / cxz_z_ `c 8M cx J/ ` c IJ Y. Deng (Johns Hopkins) Bitext Alignment for SMT 4 / 42

7 Outline 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 5 / 42

8 Outline Bitext Chunk Alignment Model 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 6 / 42

9 Chunk Alignment Bitext Chunk Alignment Model Problem: sentences are not translated 1-to-1 in sequence 1-to-n, n-to-1, m-to-n, order changes, real data challenge A Statistical Generative Chunk Alignment Model (Deng et al, 04) introduce a hidden chunk alignment variable document generating: fill in the blank two alignment algorithms are derived in a straightforward manor 5 e = e 1 w w # 1! 8 w w # 9! 20 w w # 21! 30 w w # 31! 38 w w Boundary marks 39! 50 e 1 e 2 e 3 e 4 e 5 m e = e1 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 7 / 42

10 Chunk Alignment Bitext Chunk Alignment Model Problem: sentences are not translated 1-to-1 in sequence 1-to-n, n-to-1, m-to-n, order changes, real data challenge A Statistical Generative Chunk Alignment Model (Deng et al, 04) introduce a hidden chunk alignment variable document generating: fill in the blank two alignment algorithms are derived in a straightforward manor 5 e = e 1 w w # 1! 8 w w # 9! 20 w w # 21! 30 w w # 31! 38 w w Boundary marks 39! 50 e 1 e 2 e 3 e 4 e 5 f 1 f 2 f 3 f 4 m e = e1 n α( n m) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 7 / 42

11 Chunk Alignment Bitext Chunk Alignment Model Problem: sentences are not translated 1-to-1 in sequence 1-to-n, n-to-1, m-to-n, order changes, real data challenge A Statistical Generative Chunk Alignment Model (Deng et al, 04) introduce a hidden chunk alignment variable document generating: fill in the blank two alignment algorithms are derived in a straightforward manor 5 e = e 1 w w # 1! 8 w w # 9! 20 w w # 21! 30 w w # 31! 38 w w Boundary marks 39! 50 e 1 e 2 e 3 e 4 e 5 K=3 f 1 f 2 f 3 f 4 m e = e1 n K α ( n m) β ( K m, n) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 7 / 42

12 Chunk Alignment Bitext Chunk Alignment Model Problem: sentences are not translated 1-to-1 in sequence 1-to-n, n-to-1, m-to-n, order changes, real data challenge A Statistical Generative Chunk Alignment Model (Deng et al, 04) introduce a hidden chunk alignment variable document generating: fill in the blank two alignment algorithms are derived in a straightforward manor 5 e = e 1 w w # 1! 8 w w # 9! 20 w w # 21! 30 w w # 31! 38 w w Boundary marks 39! 50 e 1 e 2 e 3 e 4 e 5 a 3 (5,4) a = (1,2,1,1 1 ) a = (3,4,2,3) a = (5,5,4,4) K=3 f 1 f 2 f 3 f 4 m e = e1 n K a K ( m, ) 1 n α ( n m) β ( K m, n) P( a K 1 m, n, K) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 7 / 42

13 Chunk Alignment Bitext Chunk Alignment Model Problem: sentences are not translated 1-to-1 in sequence 1-to-n, n-to-1, m-to-n, order changes, real data challenge A Statistical Generative Chunk Alignment Model (Deng et al, 04) introduce a hidden chunk alignment variable document generating: fill in the blank two alignment algorithms are derived in a straightforward manor 5 e = e 1 w w # 1! 8 w w # 9! 20 w w # 21! 30 w w # 31! 38 w w Boundary marks 39! 50 e 1 e 2 e 3 e 4 e 5 4 f = f 1 a 3 (5,4) a = (1,2,1,1 1 ) a = (3,4,2,3) a = (5,5,4,4) f 1 f 2 f 3 f 4 K=3 m e = e1 t 1! t 6 t t 7! 13 t t 14! 20 t t 21! 35 # # # Boundary marks a K 1 ( m, n f n 1 = f n K ) α ( n m) β ( K m, n) P( a K w) n K 1 m, n, K) P ( f ( a k ) e( a k )) = P( f1, K, a1 e1 k ( m ) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 7 / 42

14 Outline Bitext Chunk Alignment Algorithms 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 8 / 42

15 Bitext Chunk Alignment Algorithms Dynamic Programming (DP) f4 p(1,1)p(f5 e5) f3 f2 f1 p(2,2)p(f2,f3 e3,e4) e1 e2 e3 e4 e5 p(1,2)p(f1 e1,e2) Monotone chunk alignment Global optimum Y. Deng (Johns Hopkins) Bitext Alignment for SMT 9 / 42

16 Bitext Chunk Alignment Divisive Clustering (DC) Algorithms divide and conquer, iterative binary parallel splitting, reorder!" #$%& ' () * +, -. / : /;! < = >? AB D E FGHI J K 6 - L; MNOP 1 / QR #$ S T UV = WXYZ[\ K ] ^_` b c de 1 fg hi?j 1 6- k l m H n J >o 2- p/ qr s tu vw = x y 1 z { } ~ #$ ÄÅ ÇÉ S 7 Ñ Ö Ü 2- áà âä = ãå 1 p çé %& èê u ë í9 ìî 1 # 2 Äï ñó ñj ò ë ô 1 ö õú ë ùûü S 2- ß à = M 0 1 Æ!Ø= vw xy qr ± S 2- µ [\ ë µ u ë Äï òª 1 º/ æ k Æ ë!ø= vw ± 1 ø æ 1 ƒ ò Äï «ò» = èñ S Since the Korean Peninsula was split into two countries, the Republic of Korea has, while leaning its back on the " big tree " of the United States for security, carefully and consistently sought advanced weapons from the United States in a bid to confront the Democratic People 's Republic of Korea. An informed source in Seoul revealed to the Washington Post that the United States had secretly agreed to the request of South Korea earlier this year to " extend its existing missile range " to strike Pyongyang direct. This should have elated South Korea. But since the situation surrounding the peninsula has changed dramatically and the two heads of state of the two Koreas have met with each other and signed a joint statement, what should South Korea do now? It has no choice but spit back the " greasy meat " from its mouth and put the " missile expansion plan " on the back burner. A knowledgeable South Korean speaks the truth : " Because of the summit meeting, we have shelved our own missile plan. If we go ahead with it, it will spoil the excellent situation opened up by the summit meeting. " Y. Deng (Johns Hopkins) Bitext Alignment for SMT 10 / 42

17 Bitext Chunk Alignment Divisive Clustering (DC) Algorithms divide and conquer, iterative binary parallel splitting, reorder 1 Since the Korean Peninsula was split into two countries, the Republic of Korea has, while leaning its back on the " big tree " of the United States for security, carefully and consistently sought advanced weapons from the United States in a bid to confront the Democratic People 's Republic of Korea. An informed source in Seoul revealed to the Washington Post that the United States had secretly agreed to the request of South Korea earlier this year to " extend its existing missile range " to strike Pyongyang direct. This should have elated South Korea. But since the situation surrounding the peninsula has changed dramatically and the two heads of state of the two Koreas have met with each other and signed a joint statement, what should South Korea do now? It has no choice but spit back the " greasy meat " from its mouth and put the " missile expansion plan " on the back burner. A knowledgeable South Korean speaks the truth : " Because of the summit meeting, we have shelved our own missile plan. If we go ahead with it, it will spoil the excellent situation opened up by the summit meeting. " Y. Deng (Johns Hopkins) Bitext Alignment for SMT 10 / 42

18 Bitext Chunk Alignment Divisive Clustering (DC) Algorithms divide and conquer, iterative binary parallel splitting, reorder =U VWXY Z [\ ] ^_ $` 7a 5 #$ b c d e$ f g h i 7j = k? lm 5 n opqq r s tuvw x y e $ zj { }~ 5 7 Ä VW T Å ÇÉ? ÑÖÜá'( y à âäãåç é èê 5 ëí ìîmï 5 e$ : ñ ó v ò x lô #$ ö7 õú ù ü 5 VW 1ß T f #$ Æ Ø? ± 5 ö XY Rµ 0 + h 5 V # 12 S Sx ª º + æø T #$ H ƒ Q Æ «?» { * a 5 ;< ü õú BC T! " #$ %& '( ) * +,& -./ : ;< + BC 5 DE 89 F GH I 5 J 3 KL 12 M 3 NO? PQ RS T 2 1 Since the Korean Peninsula was split into two countries, the Republic of Korea has, while leaning its back on the " big tree " of the United States for security, carefully and consistently sought advanced weapons from the United States in a bid to confront the Democratic People 's Republic of Korea. An informed source in Seoul revealed to the Washington Post that the United States had secretly agreed to the request of South Korea earlier this year to " extend its existing missile range " to strike Pyongyang direct. This should have elated South Korea. But since the situation surrounding the peninsula has changed dramatically and the two heads of state of the two Koreas have met with each other and signed a joint statement, what should South Korea do now? It has no choice but spit back the " greasy meat " from its mouth and put the " missile expansion plan " on the back burner. A knowledgeable South Korean speaks the truth : " Because of the summit meeting, we have shelved our own missile plan. If we go ahead with it, it will spoil the excellent situation opened up by the summit meeting. " Y. Deng (Johns Hopkins) Bitext Alignment for SMT 10 / 42

19 Bitext Chunk Alignment Divisive Clustering (DC) Algorithms divide and conquer, iterative binary parallel splitting, reorder = u no Æ Ø ± 7] 5 µ ö b r 7 = ª? üò 5 º æøø ùƒ w ä ö [ 5 7 u T É ÑÖ? Üáàâ ä ã åçéèê ë íì 5 îï ñóòô 5 ö : õ ú ù û w ü k7 `a ^ _ 5 ß u 1 T b c d e fw gh? ij 5 k lm no Rp 0 + qr st 5 u 12 Sv Sw 3 + x 5 y z{ + }~ T ÄHÅ Ç 5 UQ V W X? YZ [ \ ] 5 ;< = ^_ `a BC T 2 +, -./ : ;< + BC 5 DE 89 F GH I 5 J 3 KL 12 M 3 NO? PQ RS T Since the Korean Peninsula was split into two countries, the Republic of Korea has, while leaning its back on the " big tree " of the United States for security, carefully and consistently sought advanced weapons from the United States in a bid to confront the Democratic People 's Republic of Korea. An informed source in Seoul revealed to the Washington Post that the United States had secretly agreed to the request of South Korea earlier this year to " extend its existing missile range " to strike Pyongyang direct. This should have elated South Korea. But since the situation surrounding the peninsula has changed dramatically and the two heads of state of the two Koreas have met with each other and signed a joint statement, what should South Korea do now? It has no choice but spit back the " greasy meat " from its mouth and put the " missile expansion plan " on the back burner. A knowledgeable South Korean speaks the truth : " Because of the summit meeting, we have shelved our own missile plan. If we go ahead with it, it will spoil the excellent situation opened up by the summit meeting. " Y. Deng (Johns Hopkins) Bitext Alignment for SMT 10 / 42

20 Bitext Chunk Alignment Algorithms A hierarchical chunking scheme DP+DC DP at sentence level followed by DC at sub-sentence level from coarse to fine, deriving short chunk pairs Advantage significantly reduce machine training time 21 hrs vs. 8 hrs make most of bitext usable for machine training 78% vs. 98% "There is no data like more data" (Robert Mercer, 1988) improve system performance by higher coverage Y. Deng (Johns Hopkins) Bitext Alignment for SMT 11 / 42

21 Outline Bitext Chunk Alignment Results 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 12 / 42

22 Bitext Chunk Alignment Results Unsupervised Sentence Alignment 122 Chinese/English document pairs selected from FBIS corpus sentence aligned by humans, 2,200 sentence pairs unsupervised from scratch, measured by Pre/Rec 1 Precision at each Iteration 1 Recall at each Iteration Precision A: DP w/ uniform P(x,y) B: DP w/ biased P(x,y) C: DC D: DP+DC w/ uniform P(x,y) E: DP+DC w/ biased P(x,y) F: DP+DC w/ uniform P(x,y) seeded from E Ite. 0 G: DC seeded from E Ite Iteration Recall A: DP w/ uniform P(x,y) B: DP w/ biased P(x,y) C: DC D: DP+DC w/ uniform P(x,y) E: DP+DC w/ biased P(x,y) F: DP+DC w/ uniform P(x,y) seeded from E Ite. 0 G: DC seeded from E Ite Iteration Y. Deng (Johns Hopkins) Bitext Alignment for SMT 13 / 42

23 Outline Bitext Word Alignment Introduction/Motivation 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 14 / 42

24 Word Alignment Bitext Word Alignment Introduction/Motivation Fundamental problem in Machine Translation Basis for phrase/syntax models Model relations from source s = s1 I to target t = tj 1 Word alignment a = a1 J: s a j t j, j = 1, 2,, J = hidden r.v. Conditional likelihood P(t, a s) = complete data Sentence translation P(t s) = a P(t, a s) = incomplete data s 0 s 1 s 2 s 3 s 4 NULL!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t Y. Deng (Johns Hopkins) Bitext Alignment for SMT 15 / 42

25 State of the Art Bitext Word Alignment Introduction/Motivation IBM Model-4 generated by GIZA++ Toolkit (Och & Ney, 03) The state of the art word alignments especially on large bitexts But Exact-EM is problematic, sub-optimal estimation algorithms used Difficult to compute statistics under the model Applications limited by word alignments only GOAL: improve word alignments of bitexts for better translation Comparable performance to Model-4 Fast efficient training, with controlled memory usage Use the model, not just the alignments Y. Deng (Johns Hopkins) Bitext Alignment for SMT 16 / 42

26 Bitext Word Alignment Introduction/Motivation IBM Model-4 Word Alignments (Brown et al, 93) s 0 NULL s 1 s 2 s 3 s 4 What makes the model powerful also makes computation complex Typical training procedure: Model-1, HMM, Model-4 Can we do something to HMM? Y. Deng (Johns Hopkins) Bitext Alignment for SMT 17 / 42

27 Bitext Word Alignment Introduction/Motivation IBM Model-4 Word Alignments (Brown et al, 93) s 0 NULL s 1 s 2 s 3 s 4!" #$ %& '()* Create a tablet for each source word What makes the model powerful also makes computation complex Typical training procedure: Model-1, HMM, Model-4 Can we do something to HMM? Y. Deng (Johns Hopkins) Bitext Alignment for SMT 17 / 42

28 Bitext Word Alignment Introduction/Motivation IBM Model-4 Word Alignments (Brown et al, 93) s 0 NULL s 1 s 2 s 3 s 4!" #$ %& '()* Table lookup to decide fertility: # of target words connected What makes the model powerful also makes computation complex Typical training procedure: Model-1, HMM, Model-4 Can we do something to HMM? Y. Deng (Johns Hopkins) Bitext Alignment for SMT 17 / 42

29 Bitext Word Alignment Introduction/Motivation IBM Model-4 Word Alignments (Brown et al, 93) s 0 NULL s 1 s 2 s 3 s 4!" #$ %& '()* the china s at an early date accession to world trade organization Sample target words from translation table i.i.d. What makes the model powerful also makes computation complex Typical training procedure: Model-1, HMM, Model-4 Can we do something to HMM? Y. Deng (Johns Hopkins) Bitext Alignment for SMT 17 / 42

30 Bitext Word Alignment Introduction/Motivation IBM Model-4 Word Alignments (Brown et al, 93) s 0 NULL s 1 s 2 s 3 s 4!" #$ %& '()* the china s at an early date accession to world trade organization Distortion Model china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 What makes the model powerful also makes computation complex Typical training procedure: Model-1, HMM, Model-4 Can we do something to HMM? Y. Deng (Johns Hopkins) Bitext Alignment for SMT 17 / 42

31 Bitext Word Alignment Introduction/Motivation HMM WtoW Model (Vogel et al, 96; Och & Ney, 03) s 1 s 2 s 3 s 4!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 State sequences word to word alignments Words are generated one by one, one transition emits one target word Y. Deng (Johns Hopkins) Bitext Alignment for SMT 18 / 42

32 Bitext Word Alignment Introduction/Motivation HMM WtoW Model (Vogel et al, 96; Och & Ney, 03) s 1 s 2 s 3 s 4!" #$ %& '()*!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 State sequences word to word alignments Words are generated one by one, one transition emits one target word Y. Deng (Johns Hopkins) Bitext Alignment for SMT 18 / 42

33 Bitext Word Alignment Introduction/Motivation HMM WtoW Model (Vogel et al, 96; Och & Ney, 03) s 1 s 2 s 3 s 4!" #$ %& '()*!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 State sequences word to word alignments Words are generated one by one, one transition emits one target word Y. Deng (Johns Hopkins) Bitext Alignment for SMT 18 / 42

34 Bitext Word Alignment Introduction/Motivation Make HMM More Powerful in Generating Observations s 1 s 2 s 3 s 4!" #$ %& '()*!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 Target phrases rather than words are emitted after jumping into a state State sequences word to phrase alignments Word-to-Phrase (WtoP) HMM (Deng & Byrne, 05) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 19 / 42

35 Bitext Word Alignment Introduction/Motivation Make HMM More Powerful in Generating Observations s 1 s 2 s 3 s 4!" #$ %& '()*!" #$ %& '()* china s accession to the world trade organization at an early date t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 t 11 t 12 Target phrases rather than words are emitted after jumping into a state State sequences word to phrase alignments Word-to-Phrase (WtoP) HMM (Deng & Byrne, 05) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 19 / 42

36 Outline Bitext Word Alignment WtoP HMM Model 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 20 / 42

37 Bitext Word Alignment WtoP HMM Model Word-to-Phrase HMM Alignment Models Target sentence segmented into K phrases Phrase length sequence φ K 1, t = v K 1 Phrase alignment sequence a K 1 NULL: h1 K is a Bernoulli process, d(h k = 1) = 1 p 0, d(h k = 0) = p 0 h k = 1 s ak v k h k = 0 NULL v k Hidden random variable: Word-to-phrase alignment a = (K, a1 K, φ K 1, h1 K ) P(t, a s) = P(v K 1, K, a K 1, h K 1, φ K 1 s) = P(K J, s) P(a K 1, φ K 1, h K 1 K, J, s) P(v K 1 a K 1, h K 1, φ K 1, K, J, s) = P(K J, I) = Phrase Count η K K k=1 K k=1 p(a k a k 1, h k ; I) d(h k ) n(φ k ; h k s ak ) = Markov, Phrase Length P(v k h k s ak ) = Word-to-Phrase Translation Y. Deng (Johns Hopkins) Bitext Alignment for SMT 21 / 42

38 Bitext Word Alignment WtoP HMM Model Word-to-Phrase HMM Alignment Models Target sentence segmented into K phrases Phrase length sequence φ K 1, t = v K 1 Phrase alignment sequence a K 1 NULL: h1 K is a Bernoulli process, d(h k = 1) = 1 p 0, d(h k = 0) = p 0 h k = 1 s ak v k h k = 0 NULL v k Hidden random variable: Word-to-phrase alignment a = (K, a1 K, φ K 1, h1 K ) P(t, a s) = P(v K 1, K, a K 1, h K 1, φ K 1 s) = P(K J, s) P(a K 1, φ K 1, h K 1 K, J, s) P(v K 1 a K 1, h K 1, φ K 1, K, J, s) = P(K J, I) = Phrase Count η K K k=1 K k=1 p(a k a k 1, h k ; I) d(h k ) n(φ k ; h k s ak ) = Markov, Phrase Length P(v k h k s ak ) = Word-to-Phrase Translation Y. Deng (Johns Hopkins) Bitext Alignment for SMT 21 / 42

39 Bitext Word Alignment WtoP HMM Model Word-to-Phrase HMM Alignment Models Target sentence segmented into K phrases Phrase length sequence φ K 1, t = v K 1 Phrase alignment sequence a K 1 NULL: h1 K is a Bernoulli process, d(h k = 1) = 1 p 0, d(h k = 0) = p 0 h k = 1 s ak v k h k = 0 NULL v k Hidden random variable: Word-to-phrase alignment a = (K, a1 K, φ K 1, h1 K ) P(t, a s) = P(v K 1, K, a K 1, h K 1, φ K 1 s) = P(K J, s) P(a K 1, φ K 1, h K 1 K, J, s) P(v K 1 a K 1, h K 1, φ K 1, K, J, s) = P(K J, I) = Phrase Count η K K k=1 K k=1 p(a k a k 1, h k ; I) d(h k ) n(φ k ; h k s ak ) = Markov, Phrase Length P(v k h k s ak ) = Word-to-Phrase Translation Y. Deng (Johns Hopkins) Bitext Alignment for SMT 21 / 42

40 Bitext Word Alignment WtoP HMM Model Word-to-Phrase Translation Probabilities Replace weak i.i.d. word-for-word translation P(world trade organization f = dddd; 3) =? = t(world f) t(trade f) t(organization f) = i.i.d. = t(world f) t 2 (trade world, f) t 2 (organization trade, f) = bigram Model i.i.d. bigram P(world dddd) P(trade world, dddd) P(organization trade, dddd) P(world trade organization dddd, 3) Assigns higher probability to correct translation than i.i.d Incorporates context without losing algorithmic efficiency: DP Use same estimation techniques as used for bigram LMs Data sparseness, Witten-Bell smoothing Y. Deng (Johns Hopkins) Bitext Alignment for SMT 22 / 42

41 Bitext Word Alignment WtoP HMM Model Comparing Word-to-Phrase HMM to... Segmental Hidden Markov Models (Ostendorf et al, 96) states emit observation sequences WtoW HMM (Vogel et al, 96; Och & Ney, 03) N = 1 Extensions to WtoW HMM (Toutanova et al, 02) P(stay s) vs. P(stay = φ s) in modeling state durations IBM Model-4 fertility vs. phrase length Y. Deng (Johns Hopkins) Bitext Alignment for SMT 23 / 42

42 Outline Bitext Word Alignment Parameter Estimation 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 24 / 42

43 Bitext Word Alignment Parameter Estimation Forward-backward Algorithm State space S = {(i, φ, h) : 1 i I, 1 φ N, h = 0 or 1} Grid 2NI J s 1.. s i.. s I t 1 t 2.. t j- t j-.. t j t j+1.. t J α j (i, φ, h) = β j (i, φ, h) = i,φ,h α j φ (i, φ, h )p(i i, h; I) η n(φ; h s i ) P(t j j φ+1 h s i, φ) β j+φ (i, φ, h )p(i i, h ; I) η n(φ ; h s i ) P(t j+φ j+1 h s i, φ ) i,φ,h γ j (i, φ, h) = P(h s i v = t j j φ+1 θ, s, t) = α j(i, φ, h)β j (i, φ, h) i,h,φ α J (i, φ, h ) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 25 / 42

44 Bitext Word Alignment Parameter Estimation Embedded Estimation of Word-to-Phrase HMM Unsupervised training from scratch Model-1, 10 its (initial t-table) Model-2, 5 its (better t-table) WtoW HMM, 5 its (initial Markov model) WtoP HMM N=2, 3,.., each 5 its (Markov model, phrase length) (experience from ASR) WtoP HMM with bigram t-table, 5 its (bigram t-table) Parallel Implementation Partitioning training bitext E-step: Collect counts from each partition parallel M-step: Merge counts to update model parameters Memory efficient, virtually no limitation on training bitext size Y. Deng (Johns Hopkins) Bitext Alignment for SMT 26 / 42

45 Bitext Word Alignment Parameter Estimation Embedded Estimation of Word-to-Phrase HMM Unsupervised training from scratch Model-1, 10 its (initial t-table) Model-2, 5 its (better t-table) WtoW HMM, 5 its (initial Markov model) WtoP HMM N=2, 3,.., each 5 its (Markov model, phrase length) (experience from ASR) WtoP HMM with bigram t-table, 5 its (bigram t-table) Parallel Implementation Partitioning training bitext E-step: Collect counts from each partition parallel M-step: Merge counts to update model parameters Memory efficient, virtually no limitation on training bitext size Y. Deng (Johns Hopkins) Bitext Alignment for SMT 26 / 42

46 Outline Bitext Word Alignment Word Alignment Results 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 27 / 42

47 Bitext Word Alignment Bitext Alignment Results Word Alignment Results Test: NIST 2001 MT-eval set, 124 sentence pairs w/ manual word alignments Comparable performance to Model-4 on FBIS training bitext Increasing max phrase length N improves quality in C E direction Bigram translation probability improves word-to-phrase links A good balance between 1-1 and 1-N distribution can be achieved # of hypothesized links Links 1 N Links Overall AER Overall AER η Comparable performance when extending to large scale bitexts Y. Deng (Johns Hopkins) Bitext Alignment for SMT 28 / 42

48 Outline Bitext Phrase Alignment Inducing from Word Alignments 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 29 / 42

49 Bitext Phrase Alignment Inducing from Word Alignments Statistical Phrase Translation Models Phrase-based SMT performs better than word-based SMT Phrases Pair Inventory (PPI) extracted from word aligned bitext (Och et al, 99) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 30 / 42

50 Bitext Phrase Alignment Inducing from Word Alignments But word alignments are imperfect... There is no gang and money linked politics in hong kong and there will not be such politics in future either? Relying on the one-best word alignment may exclude some valid phrase pairs Goal is to define a probability distribution over phrase pairs Allows more control over generation of phrase pairs Y. Deng (Johns Hopkins) Bitext Alignment for SMT 31 / 42

51 Bitext Phrase Alignment Inducing from Word Alignments But word alignments are imperfect... There is no gang and money linked politics in hong kong and there will not be such politics in future either Relying on the one-best word alignment may exclude some valid phrase pairs Goal is to define a probability distribution over phrase pairs Allows more control over generation of phrase pairs Y. Deng (Johns Hopkins) Bitext Alignment for SMT 31 / 42

52 Bitext Phrase Alignment Inducing from Word Alignments But word alignments are imperfect... There is no gang and money linked politics in hong kong and there will not be such politics in future either Relying on the one-best word alignment may exclude some valid phrase pairs Goal is to define a probability distribution over phrase pairs Allows more control over generation of phrase pairs Y. Deng (Johns Hopkins) Bitext Alignment for SMT 31 / 42

53 Outline Bitext Phrase Alignment Model-based Phrase Pair Posterior 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 32 / 42

54 Bitext Phrase Alignment Model-based Phrase Pair Posterior Model-based Phrase Pair Posterior Doesn t rely on a single alignment i 1 i 2 There is no gang and money linked politics in hong kong and there will not be such politics in future either j 1 j 2 Define a set of alignments that align words to words in phrases A(i 1, i 2 ; j 1, j 2 ) = {a m 1 : a j [i 1, i 2 ] iff j [j 1, j 2 ]} Calculate the likelihood of the source phrase producing the target phrase P(t, A(i 1, i 2 ; j 1, j 2 ) s) = P(t, a s) a : a m 1 A(i 1,i 2 ;j 1,j 2 ) Obtain phrase pair posterior P(A(i 1, i 2 ; j 1, j 2 ) t, s) = P(t, A(i 1, i 2 ; j 1, j 2 ) s)/p(t s) Efficient DP-based implementation for WtoP HMM, Difficult for Model-4 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 33 / 42

55 Bitext Phrase Alignment Model-based Phrase Pair Posterior Model-based Phrase Pair Posterior Doesn t rely on a single alignment i 1 i 2 There is no gang and money linked politics in hong kong and there will not be such politics in future either j 1 j 2 Define a set of alignments that align words to words in phrases A(i 1, i 2 ; j 1, j 2 ) = {a m 1 : a j [i 1, i 2 ] iff j [j 1, j 2 ]} Calculate the likelihood of the source phrase producing the target phrase P(t, A(i 1, i 2 ; j 1, j 2 ) s) = P(t, a s) a : a m 1 A(i 1,i 2 ;j 1,j 2 ) Obtain phrase pair posterior P(A(i 1, i 2 ; j 1, j 2 ) t, s) = P(t, A(i 1, i 2 ; j 1, j 2 ) s)/p(t s) Efficient DP-based implementation for WtoP HMM, Difficult for Model-4 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 33 / 42

56 Bitext Phrase Alignment Model-based Phrase Pair Posterior Model-based Phrase Pair Posterior Doesn t rely on a single alignment i 1 i 2 There is no gang and money linked politics in hong kong and there will not be such politics in future either j 1 j 2 Define a set of alignments that align words to words in phrases A(i 1, i 2 ; j 1, j 2 ) = {a m 1 : a j [i 1, i 2 ] iff j [j 1, j 2 ]} Calculate the likelihood of the source phrase producing the target phrase P(t, A(i 1, i 2 ; j 1, j 2 ) s) = P(t, a s) a : a m 1 A(i 1,i 2 ;j 1,j 2 ) Obtain phrase pair posterior P(A(i 1, i 2 ; j 1, j 2 ) t, s) = P(t, A(i 1, i 2 ; j 1, j 2 ) s)/p(t s) Efficient DP-based implementation for WtoP HMM, Difficult for Model-4 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 33 / 42

57 Bitext Phrase Alignment Model-based Phrase Pair Posterior Model-based Phrase Pair Posterior Doesn t rely on a single alignment i 1 i 2 There is no gang and money linked politics in hong kong and there will not be such politics in future either j 1 j 2 Define a set of alignments that align words to words in phrases A(i 1, i 2 ; j 1, j 2 ) = {a m 1 : a j [i 1, i 2 ] iff j [j 1, j 2 ]} Calculate the likelihood of the source phrase producing the target phrase P(t, A(i 1, i 2 ; j 1, j 2 ) s) = P(t, a s) a : a m 1 A(i 1,i 2 ;j 1,j 2 ) Obtain phrase pair posterior P(A(i 1, i 2 ; j 1, j 2 ) t, s) = P(t, A(i 1, i 2 ; j 1, j 2 ) s)/p(t s) Efficient DP-based implementation for WtoP HMM, Difficult for Model-4 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 33 / 42

58 Bitext Phrase Alignment Model-based Phrase Pair Posterior Model-based Phrase Pair Posterior Doesn t rely on a single alignment i 1 i 2 There is no gang and money linked politics in hong kong and there will not be such politics in future either j 1 j 2 Define a set of alignments that align words to words in phrases A(i 1, i 2 ; j 1, j 2 ) = {a m 1 : a j [i 1, i 2 ] iff j [j 1, j 2 ]} Calculate the likelihood of the source phrase producing the target phrase P(t, A(i 1, i 2 ; j 1, j 2 ) s) = P(t, a s) a : a m 1 A(i 1,i 2 ;j 1,j 2 ) Obtain phrase pair posterior P(A(i 1, i 2 ; j 1, j 2 ) t, s) = P(t, A(i 1, i 2 ; j 1, j 2 ) s)/p(t s) Efficient DP-based implementation for WtoP HMM, Difficult for Model-4 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 33 / 42

59 Bitext Phrase Alignment Model-based Phrase Pair Posterior Augmented PPI for a Better Coverage Baseline PPI extracted from 1-best alignments using establishing techniques (Och et al., 99) GOAL: add phrase pairs to improve test set coverage For each foreign phrase v in test set not covered by the baseline for each sentence pair containing v find the English phrase u that maximizes the phrase pair posterior f (i 1, i 2 ) = P F E ( A(i 1, i 2 ; j 1, j 2 ) e l 1, f m 1 ) b(i 1, i 2 ) = P E F ( A(i 1, i 2 ; j 1, j 2 ) e l 1, f m 1 ) g(i 1, i 2 ) = f (1 1, i 2 ) b(i 1, i 2 ) (î1, î2) = argmax g(i 1, i 2 ), and set u = eî2 î1 1 i 1,i 2 l add (u, v) to the baseline PPI if posterior exceeds a threshold value Y. Deng (Johns Hopkins) Bitext Alignment for SMT 34 / 42

60 Outline Bitext Phrase Alignment Translation Results 1 Bitext Chunk Alignment Chunk Alignment Chunking Algorithms Sentence Alignment Results 2 Bitext Word Alignment Introduction/Motivation Word-to-Phrase HMM Model Parameter Estimation Word Alignment Results 3 Bitext Phrase Alignment Inducing from Word Alignments Model-based Phrase Pair Posterior Translation Results 4 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 35 / 42

61 Bitext Phrase Alignment Translation Results Transduce Translation Model (Kumar et al, 05) TTM Decoder - WFST implementation with monotone order Source Phrase Segmentation Phrase Insertion Phrase Translation Target Phrase Segmentation GRAIN EXPORTS ARE PROJECTED TO FALL BY 25 % GRAIN EXPORTS ARE_PROJECTED_TO FALL BY_25_% 1 GRAINS 1 EXPORTS ARE_PROJECTED_TO FALL BY_25_% DE GRAINS LES EXPORTATIONS DOIVENT FLECHIR DE_25_% DE GRAINS LES EXPORTATIONS DOIVENT FLECHIR DE 25 % Source Language Sentence Source Phrases Placement of Target Phrase Insertion Markers Target Phrases Target Language Sentence Y. Deng (Johns Hopkins) Bitext Alignment for SMT 36 / 42

62 Bitext Phrase Alignment Translation Results Automatic Machine Translation Evaluation hard problem! BLEU (Papeneni et al, 01) an automatic MT metric correlated well with human judgements geomantic mean of n-gram precisions weighted by brevity penalty Reference : mr. speaker, in absolutely no way. Hypothesis : in absolutely no way, mr. chairman. BLEU computation Sub-string-Matches(Truth,Hyp) BLEU ( 1-word 2-word 3-word 4-word = )1 7/8 3/7 2/6 1/5 Y. Deng (Johns Hopkins) Bitext Alignment for SMT 37 / 42

63 Bitext Phrase Alignment Translation Results Translation Results: Small Systems Coverage BLEU Coverage BLEU eval02 eval03 eval Chinese-English 23.4 eval02 eval03 eval eval02 eval03 eval Arabic-English 36.5 eval02 eval03 eval Relaxing threshold in PPI augmenting improves coverage and BLEU score Balance coverage against phrase translation quality WtoP model can even be applied to augment Model-4 PPI Y. Deng (Johns Hopkins) Bitext Alignment for SMT 38 / 42

64 Bitext Phrase Alignment Translation Results Translation Results: Large Systems Model-4 Baseline WtoP HMM Baseline WtoP HMM Augmented Model-4 Baseline WtoP HMM Baseline WtoP HMM Augmented eva l02 eva l03 eva l04(news) 36 eva l02 eva l03 eva l04(news) Chinese-English Arabic-English Used all parallel corpora available from LDC C-E: 200M En. words (FBIS, Xinhua, HK News,..., all UN bitexts) A-E: 130M En. words (news, all UN bitexts) Y. Deng (Johns Hopkins) Bitext Alignment for SMT 39 / 42

65 Conclusions Conclusions A hierarchial bitext chunking approach language independent, no linguistic knowledge required derived short chunk pairs, retain more of the available bitext The word-to-phrase HMM alignment model produces good quality word alignments over very large bitexts has efficient training algorithm with parallel implementation a powerful framework Model-based phrase pair distribution enables an improved phrase pair extraction strategy controlled balance coverage vs. quality WtoP HMM performs better than IBM Model-4 on large systems Y. Deng (Johns Hopkins) Bitext Alignment for SMT 40 / 42

66 Conclusions Machine Translation Toolkit (MTTK) Solutions for MT training, Used for JHU-CU 2005 NIST MT Eval Systems Document Alignments Length Statistics Bitext Chunking { t(f e) } Model 1 Training Chunk Alignments Filter High Quality Pairs Model 1 Training PPEM { t(f e) } AlignM1 Model 2 Training PPEM { t(f e), a(i j;l,m) } AlignM2 Phrase Alignments PPEHmm WtW HMM Training { t(f e), P(i i ;l) } WtP HMM Training N=2,3... AlignHmm Word Alignments PPEHmm { t(f e), P(i i ;l), n(phi;e) } AlignSHmm WtP HMM Training w/ Bigram t table PPEHmm { t(f e), P(i i ;l), n(phi;e), t2(f f,e) } AlignSHmm Y. Deng (Johns Hopkins) Bitext Alignment for SMT 41 / 42

67 Conclusions Y. Deng (Johns Hopkins) Bitext Alignment for SMT 42 / 42

Out of GIZA Efficient Word Alignment Models for SMT

Out of GIZA Efficient Word Alignment Models for SMT Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza

More information

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.

More information

Word Alignment for Statistical Machine Translation Using Hidden Markov Models

Word Alignment for Statistical Machine Translation Using Hidden Markov Models Word Alignment for Statistical Machine Translation Using Hidden Markov Models by Anahita Mansouri Bigvand A Depth Report Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of

More information

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned

More information

Phrase-Based Statistical Machine Translation with Pivot Languages

Phrase-Based Statistical Machine Translation with Pivot Languages Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008

More information

Lexical Translation Models 1I

Lexical Translation Models 1I Lexical Translation Models 1I Machine Translation Lecture 5 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last Time... X p( Translation)= p(, Translation)

More information

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José

More information

Statistical Phrase-Based Speech Translation

Statistical Phrase-Based Speech Translation Statistical Phrase-Based Speech Translation Lambert Mathias 1 William Byrne 2 1 Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University 2 Machine

More information

Statistical NLP Spring HW2: PNP Classification

Statistical NLP Spring HW2: PNP Classification Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional

More information

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi

More information

Statistical NLP Spring Corpus-Based MT

Statistical NLP Spring Corpus-Based MT Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it

More information

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1 Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

Statistical Machine Translation and Automatic Speech Recognition under Uncertainty

Statistical Machine Translation and Automatic Speech Recognition under Uncertainty Statistical Machine Translation and Automatic Speech Recognition under Uncertainty Lambert Mathias A dissertation submitted to the Johns Hopkins University in conformity with the requirements for the degree

More information

Lexical Translation Models 1I. January 27, 2015

Lexical Translation Models 1I. January 27, 2015 Lexical Translation Models 1I January 27, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } { z } { p(e f,m)=

More information

Variational Decoding for Statistical Machine Translation

Variational Decoding for Statistical Machine Translation Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

statistical machine translation

statistical machine translation statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical

More information

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013 Machine Translation CL1: Jordan Boyd-Graber University of Maryland November 11, 2013 Adapted from material by Philipp Koehn CL1: Jordan Boyd-Graber (UMD) Machine Translation November 11, 2013 1 / 48 Roadmap

More information

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview

More information

N-gram Language Modeling Tutorial

N-gram Language Modeling Tutorial N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures

More information

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single

More information

Multiple System Combination. Jinhua Du CNGL July 23, 2008

Multiple System Combination. Jinhua Du CNGL July 23, 2008 Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

N-gram Language Modeling

N-gram Language Modeling N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

Sequences and Information

Sequences and Information Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols

More information

Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Graeme Blackwood, Adrià de Gispert, William Byrne Machine Intelligence Laboratory Cambridge

More information

Natural Language Processing (CSEP 517): Machine Translation

Natural Language Processing (CSEP 517): Machine Translation Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Hemanta K Maji Raghavendra Udupa U IBM India Research Laboratory, New Delhi {hemantkm, uraghave}@inibmcom Abstract Viterbi

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best

More information

Word Alignment III: Fertility Models & CRFs. February 3, 2015

Word Alignment III: Fertility Models & CRFs. February 3, 2015 Word Alignment III: Fertility Models & CRFs February 3, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } {

More information

The Geometry of Statistical Machine Translation

The Geometry of Statistical Machine Translation The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018 Evaluation Brian Thompson slides by Philipp Koehn 25 September 2018 Evaluation 1 How good is a given machine translation system? Hard problem, since many different translations acceptable semantic equivalence

More information

Topics in Natural Language Processing

Topics in Natural Language Processing Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Pair Hidden Markov Models

Pair Hidden Markov Models Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

Triplet Lexicon Models for Statistical Machine Translation

Triplet Lexicon Models for Statistical Machine Translation Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

An Example file... log.txt

An Example file... log.txt # ' ' Start of fie & %$ " 1 - : 5? ;., B - ( * * B - ( * * F I / 0. )- +, * ( ) 8 8 7 /. 6 )- +, 5 5 3 2( 7 7 +, 6 6 9( 3 5( ) 7-0 +, => - +< ( ) )- +, 7 / +, 5 9 (. 6 )- 0 * D>. C )- +, (A :, C 0 )- +,

More information

P(t w) = arg maxp(t, w) (5.1) P(t,w) = P(t)P(w t). (5.2) The first term, P(t), can be described using a language model, for example, a bigram model:

P(t w) = arg maxp(t, w) (5.1) P(t,w) = P(t)P(w t). (5.2) The first term, P(t), can be described using a language model, for example, a bigram model: Chapter 5 Text Input 5.1 Problem In the last two chapters we looked at language models, and in your first homework you are building language models for English and Chinese to enable the computer to guess

More information

Machine Translation Evaluation

Machine Translation Evaluation Machine Translation Evaluation Sara Stymne 2017-03-29 Partly based on Philipp Koehn s slides for chapter 8 Why Evaluation? How good is a given machine translation system? Which one is the best system for

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations

More information

Multi-Task Word Alignment Triangulation for Low-Resource Languages

Multi-Task Word Alignment Triangulation for Low-Resource Languages Multi-Task Word Alignment Triangulation for Low-Resource Languages Tomer Levinboim and David Chiang Department of Computer Science and Engineering University of Notre Dame {levinboim.1,dchiang}@nd.edu

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School University of Pisa Pisa, 7-19 May 2008 Part V: Language Modeling 1 Comparing ASR and statistical MT N-gram

More information

Improved Decipherment of Homophonic Ciphers

Improved Decipherment of Homophonic Ciphers Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,

More information

Gappy Phrasal Alignment by Agreement

Gappy Phrasal Alignment by Agreement Gappy Phrasal Alignment by Agreement Mohit Bansal UC Berkeley, CS Division mbansal@cs.berkeley.edu Chris Quirk Microsoft Research chrisq@microsoft.com Robert C. Moore Google Research robert.carter.moore@gmail.com

More information

A Recursive Statistical Translation Model

A Recursive Statistical Translation Model A Recursive Statistical Translation Model Juan Miguel Vilar Dpto. de Lenguajes y Sistemas Informáticos Universitat Jaume I Castellón (Spain) jvilar@lsi.uji.es Enrique Vidal Dpto. de Sistemas Informáticos

More information

State-Space Methods for Inferring Spike Trains from Calcium Imaging

State-Space Methods for Inferring Spike Trains from Calcium Imaging State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0

More information

Redoing the Foundations of Decision Theory

Redoing the Foundations of Decision Theory Redoing the Foundations of Decision Theory Joe Halpern Cornell University Joint work with Larry Blume and David Easley Economics Cornell Redoing the Foundations of Decision Theory p. 1/21 Decision Making:

More information

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015 Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

Discriminative Training

Discriminative Training Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder

More information

speaker recognition using gmm-ubm semester project presentation

speaker recognition using gmm-ubm semester project presentation speaker recognition using gmm-ubm semester project presentation OBJECTIVES OF THE PROJECT study the GMM-UBM speaker recognition system implement this system with matlab document the code and how it interfaces

More information

Hidden Markov Model and Speech Recognition

Hidden Markov Model and Speech Recognition 1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed

More information

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

AN INTRODUCTION TO TOPIC MODELS

AN INTRODUCTION TO TOPIC MODELS AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about

More information

Sparse Models for Speech Recognition

Sparse Models for Speech Recognition Sparse Models for Speech Recognition Weibin Zhang and Pascale Fung Human Language Technology Center Hong Kong University of Science and Technology Outline Introduction to speech recognition Motivations

More information

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 ) Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

Hidden Markov Models. x 1 x 2 x 3 x N

Hidden Markov Models. x 1 x 2 x 3 x N Hidden Markov Models 1 1 1 1 K K K K x 1 x x 3 x N Example: The dishonest casino A casino has two dice: Fair die P(1) = P() = P(3) = P(4) = P(5) = P(6) = 1/6 Loaded die P(1) = P() = P(3) = P(4) = P(5)

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 8: Sequence Labeling Jimmy Lin University of Maryland Thursday, March 14, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Natural Language Processing. Statistical Inference: n-grams

Natural Language Processing. Statistical Inference: n-grams Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability

More information

ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation

ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way

More information

Chapter 2 Segment Measurement and Coordinate Graphing

Chapter 2 Segment Measurement and Coordinate Graphing Geometry Concepts Chapter 2 Segment Measurement and Coordinate Graphing 2.2 Find length segments (1.3) 2.3 Compare lengths of segments (1.3) 2.3 Find midpoints of segments (1.7) 2.5 Calculate coordinates

More information

1. Markov models. 1.1 Markov-chain

1. Markov models. 1.1 Markov-chain 1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited

More information

Word Alignment via Submodular Maximization over Matroids

Word Alignment via Submodular Maximization over Matroids Word Alignment via Submodular Maximization over Matroids Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 21, 2011 Lin and Bilmes Submodular Word Alignment June

More information

Linear Transforms in Automatic Speech Recognition: Estimation Procedures & Integration of Diverse Acoustic Data

Linear Transforms in Automatic Speech Recognition: Estimation Procedures & Integration of Diverse Acoustic Data Linear Transforms in Automatic Speech Recognition: Estimation Procedures & Stavros Tsakalidis Advisor: Prof. William Byrne Thesis Committee: Prof. William Byrne, Prof. Sanjeev Khudanpur, Prof. Paul Sotiriadis,

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

Constructive Decision Theory

Constructive Decision Theory Constructive Decision Theory Joe Halpern Cornell University Joint work with Larry Blume and David Easley Economics Cornell Constructive Decision Theory p. 1/2 Savage s Approach Savage s approach to decision

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization

More information

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data

Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Jan Vaněk, Lukáš Machlica, Josef V. Psutka, Josef Psutka University of West Bohemia in Pilsen, Univerzitní

More information