Lexical Translation Models 1I. January 27, 2015

Size: px

Start display at page:

Download "Lexical Translation Models 1I. January 27, 2015"

Hugh Shelton
6 years ago
Views:

1 Lexical Translation Models 1I January 27, 2015

2 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } { z } { p(e f,m)= a2[0,n] m p(a f,m) p(e i f ai )

3 X p(e f,m)= a2[0,n] m p(a f,m) p(e i f ai ) p(e i f,f ) ai ai 1 p(e i f,f 1) ai ai p(e i f ai,e i 1 ) p(e i,e i+1 f ai ) What is the problem here?

4 X p(e f,m)= a2[0,n] m p(a f,m) p(e i f ai ) 1 p(e i f ai ) a2[0,n] m 1+n {z } p(a f,m) a2[0,n] m 1 Can we do something better here? 1+n p(e i f ai ) a2[0,n] m p(a i ) p(e i f ai )

5 p(e f,m) p(a i ) p(e i f ai ) a2[0,n] m Model 2 p(a i i, m, n) p(e i f ai ) a2[0,n] m

6 Model 2 p(a i i, m, n) p(e i f ai ) a2[0,n] m Model alignment with an absolute position distribution Probability of translating a foreign word at position to generate the word at position i (with target length mand source length n) p(a i i, m, n) a i EM training of this model is almost the same as with Model 1 (same conditional independencies hold)

7 Model 2 p(a i i, m, n) p(e i f ai ) a2[0,n] m natürlich ist das haus klein natürlich natürlich das haus ist klein of course the house is small

8 Model 2 p(a i i, m, n) p(e i f ai ) a2[0,n] m Pros Non-uniform alignment model Fast EM training / marginal inference Cons Absolute position is very naive How many parameters to model p(a i i, m, n)

9 Model 2 p(a i i, m, n) p(e i f ai ) a2[0,n] m How much do we know when we only know the source & target lengths and the current position? How many parameters do we need to model this? null j 0 =1 j 0 =2 j 0 =3 j 0 =4 j 0 =5 i =3 i =2 i =1 i =4 }m = 6 i =5 i =6 }n = 5

10 }m = 6 Model 2 p(a i i, m, n) p(e i f ai ) a2[0,n] m pos in target pos in source h(j, i, m, n) = i m j n null j 0 =1 target len source len j 0 =2 j 0 =3 b(j i, m, n) = exp h(j, i, m, n) P j 0 exp h(j 0, i, m, n) j 0 =4 j 0 =5 i =4 i =3 i =2 i =1 i =6 i =5 }n = 5 p(a i i, m, n) = ( p 0 if a i =0 (1 p 0 )b(a i i, m, n) otherwise

13 Words reorder in groups. Model this!

14 p(e f,m) p(a i ) p(e i f ai ) a2[0,n] m Model 2 p(a i i, m, n) p(e i f ai ) a2[0,n] m HMM p(a i a i 1 ) p(e i f ai ) a2[0,n] m

15 HMM p(a i a i 1 ) p(e i f ai ) a2[0,n] m Insight: words translate in groups Condition on previous alignment position Probability of translating a foreign word at position given that the previous position translated was p(a i a i 1 ) EM training of this model using forward-backward algorithm (dynamic programming) a i 1 a i

16 HMM p(a i a i 1 ) p(e i f ai ) a2[0,n] m Improvement: model jumps through the source sentence p(a i a i 1 )=j(a i a i 1 ) Relative position model rather than absolute position model

17 HMM p(a i a i 1 ) p(e i f ai ) a2[0,n] m Be careful! NULLs must be handled carefully. Here is one option (due to Och): p(a i a i ni )= ( p 0 if a i =0 (1 p 0 )j(a i a i ni ) otherwise n i is the index of the first non-null aligned word in the alignment to the left of i.

18 HMM p(a i a i 1 ) p(e i f ai ) a2[0,n] m Other extensions: certain word-types are more likely to be reordered j( f) j( C(f)) Condition the jump probability on the previous word translated j( f,e) j( A(f), B(e)) Condition the jump probability on the previous word translated, and how it was translated

19 Fertility Models The models we have considered so far have been efficient This efficiency has come at a modeling cost: What is to stop the model from translating a word 0, 1, 2, or 100 times? We introduce fertility models to deal with this

20 IBM Model 3

21 Fertility Fertility: the number of English words generated by a foreign word Modeled by categorical distribution Examples: n( f) Unabhaengigkeitserklaerung zum = (zu + dem) Haus

22 Fertility X p(e f,m)= a2[0,n] m p(a f,m) p(e i f ai ) Fertility models mean that we can no longer exploit conditional independencies to write p(a f,m) as a series of local alignment decisions. How do we compute the statistics required for EM training?

23 EM Recipe reminder If alignment points were visible, training fertility models would be easy We would and n( =3 f = Unabhaenigkeitserklaerung) = count(3, Unabhaenigkeitserklaerung) count(unabhaenigkeitserklaerung) But, alignments are not visible n( =3 f = Unabhaenigkeitserklaerung) = E[count(3, Unabhaenigkeitserklaerung)] E[count(Unabhaenigkeitserklaerung)]

24 Expectation & Fertility We need to compute expected counts under p(a f,e,m) Unfortunately p(a f,e,m) doesn t factorize nicely. :( Can we sum exhaustively? How many different a s are there? What to do?

25 Sample Alignments Monte-Carlo methods Gibbs sampling Importance sampling Particle filtering For historical reasons Use model 2 alignment to start (easy!) Weighted sum over all alignment configurations that are close to this alignment configuration Is this correct? No! Does it work? Sort of.

27 Pitfalls of Conditional Models IBM Model 4 alignment Our model's alignmen

28 Lexical Translation IBM Models 1-5 [Brown et al., 1993] Model 3: fertility Model 5: non-deficient model Widely used Giza++ toolkit Model 1: lexical translation, uniform alignment Model 2: absolute position model Model 4: relative position model (jumps in target string) HMM translation model [Vogel et al., 1996] Relative position model (jumps in source string) Latent variables are more useful these days than the translations

29 A few tricks... p(f e) p(e f)

30 A few tricks... p(f e) p(e f)

31 A few tricks... p(f e) p(e f)

32 Alignment Tool: fast_align

33 Announcements Wang Ling will be giving the lecture on Thursday his lectures are more entertaining than mine are - please attend!

Lexical Translation Models 1I

Lexical Translation Models 1I Machine Translation Lecture 5 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last Time... X p( Translation)= p(, Translation)