Natural Language Processing (CSEP 517): Machine Translation

Size: px
Start display at page:

Download "Natural Language Processing (CSEP 517): Machine Translation"

Transcription

1 Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59

2 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008, ch. 25), Collins (20, 203) A5 due May 28 (Sunday) 2 / 59

3 Evaluation Intuition: good translations are fluent in the target language and faithful to the original meaning. Bleu score (Papineni et al., 2002): Compare to a human-generated reference translation Or, better: multiple references Weighted average of n-gram precision (across different n) There are some alternatives; most papers that use them report Bleu, too. 3 / 59

4 Warren Weaver to Norbert Wiener, 947 One naturally wonders if the problem of translation could be conceivably treated as a problem in cryptography. When I look at an article in Russian, I say: This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. 4 / 59

5 Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source Y channel X 5 / 59

6 Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source Y channel X Y is the plaintext, the true message, the missing information, the output 6 / 59

7 Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source Y channel X Y is the plaintext, the true message, the missing information, the output X is the ciphertext, the garbled message, the observable evidence, the input 7 / 59

8 Noisy Channel Models Review A pattern for modeling a pair of random variables, X and Y : source Y channel X Y is the plaintext, the true message, the missing information, the output X is the ciphertext, the garbled message, the observable evidence, the input Decoding: select y given X = x. y = argmax p(y x) y = argmax y = argmax y p(x y) p(y) p(x) p(x y) }{{} channel model p(y) }{{} source model 8 / 59

9 Bitext/Parallel Text Let f and e be two sequences in V (French) and V (English), respectively. We re going to define p(f e), the probability over French translations of English sentence e. In a noisy channel machine translation system, we could use this together with source/language model p(e) to decode f into an English translation. Where does the data to estimate this come from? 9 / 59

10 IBM Model (Brown et al., 993) Let l and m be the (known) lengths of e and f. Latent variable a = a,..., a m, each a i ranging over {0,..., l} (positions in e). a 4 = 3 means that f 4 is aligned to e 3. a 6 = 0 means that f 6 is aligned to a special null symbol, e 0. p(f e, m) = l l a =0 a 2 =0 l a m=0 p(f, a e, m) = p(f, a e, m) a {0,...,l} m m p(f, a e, m) = p(a i i, l, m) p(f i e ai ) = i= m i= ( ) m l + θ m f i e ai = θ l + fi e ai i= 0 / 59

11 Example: f is German Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 4,... p(f, a e, m) = 7 + θ Noahs Noah s / 59

12 Example: f is German Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 4, 5,... p(f, a e, m) = 7 + θ Noahs Noah s 7 + θ Arche ark 2 / 59

13 Example: f is German Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 4, 5, 6,... p(f, a e, m) = 7 + θ Noahs Noah s 7 + θ war was 7 + θ Arche ark 3 / 59

14 Example: f is German Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 4, 5, 6, 8,... p(f, a e, m) = 7 + θ Noahs Noah s 7 + θ war was 7 + θ Arche ark 7 + θ nicht not 4 / 59

15 Example: f is German Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 4, 5, 6, 8, 7,... p(f, a e, m) = 7 + θ Noahs Noah s 7 + θ war was 7 + θ voller filled 7 + θ Arche ark 7 + θ nicht not 5 / 59

16 Example: f is German Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 4, 5, 6, 8, 7,?,... p(f, a e, m) = 7 + θ Noahs Noah s 7 + θ Arche ark 7 + θ war was 7 + θ nicht not 7 + θ voller filled 7 + θ Productionsfactoren? 6 / 59

17 Example: f is German Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 4, 5, 6, 8, 7,?,... p(f, a e, m) = 7 + θ Noahs Noah s 7 + θ Arche ark 7 + θ war was 7 + θ nicht not 7 + θ voller filled 7 + θ Productionsfactoren? Problem: This alignment isn t possible with IBM Model! Each f i is aligned to at most one e ai! 7 / 59

18 Example: f is English Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 0,... p(f, a e, m) = 0 + θ Mr null 8 / 59

19 Example: f is English Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 0, 0, 0,... p(f, a e, m) = 0 + θ Mr null 0 + θ, null 0 + θ President null 9 / 59

20 Example: f is English Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 0, 0, 0,,... p(f, a e, m) = 0 + θ Mr null 0 + θ, null 0 + θ President null 0 + θ Noah s Noahs 20 / 59

21 Example: f is English Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 0, 0, 0,, 2,... p(f, a e, m) = 0 + θ Mr null 0 + θ, null 0 + θ ark Arche 0 + θ President null 0 + θ Noah s Noahs 2 / 59

22 Example: f is English Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 0, 0, 0,, 2, 3,... p(f, a e, m) = 0 + θ Mr null 0 + θ President null 0 + θ, null 0 + θ Noah s Noahs 0 + θ ark Arche 0 + θ was war 22 / 59

23 Example: f is English Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 0, 0, 0,, 2, 3, 5,... p(f, a e, m) = 0 + θ Mr null 0 + θ President null 0 + θ, null 0 + θ Noah s Noahs 0 + θ ark Arche 0 + θ was war 0 + θ filled voller 23 / 59

24 Example: f is English Mr President, Noah's ark was filled not with production factors, but with living creatures. Noahs Arche war nicht voller Produktionsfaktoren, sondern Geschöpfe. a = 0, 0, 0,, 2, 3, 5, 4,... p(f, a e, m) = 0 + θ Mr null 0 + θ President null 0 + θ, null 0 + θ Noah s Noahs 0 + θ ark Arche 0 + θ was war 0 + θ filled voller 0 + θ not nicht 24 / 59

25 How to Estimate Translation Distributions? This is a problem of incomplete data: at training time, we see e and f, but not a. 25 / 59

26 How to Estimate Translation Distributions? This is a problem of incomplete data: at training time, we see e and f, but not a. Classical solution is to alternate: Given a parameter estimate for θ, align the words. Given aligned words, re-estimate θ. Traditional approach uses soft alignment. 26 / 59

27 Complete Data IBM Model Let the training data consist of N word-aligned sentence pairs: e (), f (), a (),..., e (N), f (N), a (N). Define: ι(k, i, j) = Maximum likelihood estimate for θ f e : { if a (k) i = j 0 otherwise c(e, f) c(e) = N k= N k= i:f (k) i =f m (k) i= j:e (k) j =e j:e (k) j =e ι(k, i, j) ι(k, i, j) 27 / 59

28 MLE with Soft Counts for IBM Model Let the training data consist of N softly aligned sentence pairs, e (), f (),,..., e (N), f (N). Now, let ι(k, i, j) be soft, interpreted as: Maximum likelihood estimate for θ f e : N k= ι(k, i, j) = p(a (k) i = j) N k= i:f (k) i =f m (k) i= j:e (k) j =e j:e (k) j =e ι(k, i, j) ι(k, i, j) 28 / 59

29 Expectation Maximization Algorithm for IBM Model. Initialize θ to some arbitrary values. 2. E step: use current θ to estimate expected ( soft ) counts. 3. M step: carry out soft MLE. θ f e ι(k, i, j) N k= N k= i:f (k) i =f m (k) i= l (k) j =0 θ (k) f i e (k) j θ (k) f i e (k) j j:e (k) j =e j:e (k) j =e ι(k, i, j) ι(k, i, j) 29 / 59

30 Expectation Maximization Originally introduced in the 960s for estimating HMMs when the states really are hidden. Can be applied to any generative model with hidden variables. Greedily attempts to maximize probability of the observable data, marginalizing over latent variables. For IBM Model, that means: max θ N k= p θ (f (k) e (k) ) = max θ N p θ (f (k), a e (k) ) Usually converges only to a local optimum of the above, which is in general not convex. Strangely, for IBM Model (and very few other models), it is convex! k= a 30 / 59

31 IBM Model 2 (Brown et al., 993) Let l and m be the (known) lengths of e and f. Latent variable a = a,..., a m, each a i ranging over {0,..., l} (positions in e). E.g., a 4 = 3 means that f 4 is aligned to e 3. p(f e, m) = p(f, a e, m) a {0,...,n} m m p(f, a e, m) = p(a i i, l, m) p(f i e ai ) i= = δ ai i,l,m θ fi e ai 3 / 59

32 IBM Models and 2, Depicted hidden Markov model y y 2 y 3 y 4 IBM and 2 a a 2 a 3 a 4 e e e e x x 2 x 3 x 4 f f 2 f 3 f 4 32 / 59

33 Variations Dyer et al. (203) introduced a new parameterization: δ j i,l,m exp λ i m j l (This is called fast align.) IBM Models 3 5 (Brown et al., 993) introduced increasingly more powerful ideas, such as fertility and distortion. 33 / 59

34 From Alignment to (Phrase-Based) Translation Obtaining word alignments in a parallel corpus is a common first step in building a machine translation system.. Align the words. 2. Extract and score phrase pairs. 3. Estimate a global scoring function to optimize (a proxy for) translation quality. 4. Decode French sentences into English ones. (We ll discuss 2 4.) The noisy channel pattern isn t taken quite so seriously when we build real systems, but language models are really, really important nonetheless. 34 / 59

35 Phrases? Phrase-based translation uses automatically-induced phrases... not the ones given by a phrase-structure parser. 35 / 59

36 Examples of Phrases Courtesy of Chris Dyer. German English p( f ē) the issue 0.4 das Thema the point 0.72 the subject 0.47 the thema 0.99 es gibt there is 0.96 there are 0.72 morgen tomorrow 0.90 will I fly 0.63 fliege ich will fly 0.7 I will fly / 59

37 Phrase-Based Translation Model Originated by Koehn et al. (2003). R.v. A captures segmentation of sentences into phrases, alignment between them, and reordering. Morgen fliege ich nach Pittsburgh zur Konferenz a f Tomorrow I will fly to the conference in Pittsburgh e a p(f, a e) = p(a e) p( f i ē i ) i= 37 / 59

38 Extracting Phrases After inferring word alignments, apply heuristics. 38 / 59

39 Extracting Phrases After inferring word alignments, apply heuristics. 39 / 59

40 Extracting Phrases After inferring word alignments, apply heuristics. 40 / 59

41 Extracting Phrases After inferring word alignments, apply heuristics. 4 / 59

42 Extracting Phrases After inferring word alignments, apply heuristics. 42 / 59

43 Extracting Phrases After inferring word alignments, apply heuristics. 43 / 59

44 Extracting Phrases After inferring word alignments, apply heuristics. 44 / 59

45 Extracting Phrases After inferring word alignments, apply heuristics. 45 / 59

46 Scoring Whole Translations s(e, a; f) = log p(e) }{{} + log p(f, a e) }{{} language model translation model Remarks: Segmentation, alignment, reordering are all predicted as well (not marginalized). This does not factor nicely. 46 / 59

47 Scoring Whole Translations Remarks: s(e, a; f) = log p(e) }{{} + log p(f, a e) }{{} language model translation model + log p(e, a f) }{{} reverse t.m. Segmentation, alignment, reordering are all predicted as well (not marginalized). This does not factor nicely. I am simplifying! Reverse translation model typically included. 47 / 59

48 Scoring Whole Translations Remarks: s(e, a; f) = β l.m. log p(e) }{{} +β t.m. log p(f, a e) }{{} language model translation model + β r.t.m. log p(e, a f) }{{} reverse t.m. Segmentation, alignment, reordering are all predicted as well (not marginalized). This does not factor nicely. I am simplifying! Reverse translation model typically included. Each log-probability is treated as a feature and weights are optimized for Bleu performance. 48 / 59

49 Decoding: Example Maria no dio una bofetada a la bruja verda Mary not give a slap to the witch green did not slap by hag bawdy no slap to the green witch did not give the the witch 49 / 59

50 Decoding: Example Maria no dio una bofetada a la bruja verda Mary not give a slap to the witch green did not slap by hag bawdy no slap to the green witch did not give the the witch 50 / 59

51 Decoding: Example Maria no dio una bofetada a la bruja verda Mary not give a slap to the witch green did not slap by hag bawdy no slap to the green witch did not give the the witch 5 / 59

52 Decoding Adapted from Koehn et al. (2006). Typically accomplished with beam search. Initial state: } {{... }, with score 0 f Goal state: } {{... }, e with (approximately) the highest score f Reaching a new state: Find an uncovered span of f for which a phrasal translation exists in the input ( f, ē) New state appends ē to the output and covers f. Score of new state includes additional language model, translation model components for the global score. 52 / 59

53 Decoding Example Maria no dio una bofetada a la bruja verda Mary not give a slap to the witch green did not slap by hag bawdy no slap to the green witch did not give the the witch,, 0 53 / 59

54 Decoding Example Maria no dio una bofetada a la bruja verda Mary not give a slap to the witch green did not slap by hag bawdy no slap to the green witch did not give the the witch, Mary, log p l.m. (Mary) + log p t.m. (Maria Mary) 54 / 59

55 Decoding Example Maria no dio una bofetada a la bruja verda Mary give a slap to the witch green did not slap by hag bawdy slap to the the green witch the witch, Mary did not, log p l.m. (Mary did not) + log p t.m. (Maria Mary) + log p t.m. (no did not) 55 / 59

56 Decoding Example Maria no dio una bofetada a la bruja verda Mary to the witch green did not by hag bawdy slap to the the green witch the witch, Mary did not slap, log p l.m. (Mary did not slap) + log p t.m. (Maria Mary) + log p t.m. (no did not) + log p t.m. (dio una bofetada slap) 56 / 59

57 Machine Translation: Remarks Sometimes phrases are organized hierarchically (Chiang, 2007). Extensive research on syntax-based machine translation (Galley et al., 2004), but requires considerable engineering to match phrase-based systems. Recent work on semantics-based machine translation (Jones et al., 202); remains to be seen! Some good pre-neural overviews: Lopez (2008); Koehn (2009) 57 / 59

58 References I Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 9(2):263 3, 993. David Chiang. Hierarchical phrase-based translation. computational Linguistics, 33(2):20 228, Michael Collins. Statistical machine translation: IBM models and 2, 20. URL Michael Collins. Phrase-based translation models, 203. URL Chris Dyer, Victor Chahuneau, and Noah A Smith. A simple, fast, and effective reparameterization of IBM Model 2. In Proc. of NAACL, 203. Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. What s in a translation rule? In Proc. of NAACL, Bevan Jones, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, and Kevin Knight. Semantics-based machine translation with hyperedge replacement grammars. In Proc. of COLING, 202. Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, second edition, Philipp Koehn. Statistical Machine Translation. Cambridge University Press, Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based translation. In Proc. of NAACL, / 59

59 References II Philipp Koehn, Marcello Federico, Wade Shen, Nicola Bertoldi, Ondrej Bojar, Chris Callison-Burch, Brooke Cowan, Chris Dyer, Hieu Hoang, and Richard Zens. Open source toolkit for statistical machine translation: Factored translation models and confusion network decoding, Final report of the 2006 JHU summer workshop. Adam Lopez. Statistical machine translation. ACM Computing Surveys, 40(3):8, Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proc. of ACL, / 59

Natural Language Processing (CSE 517): Machine Translation

Natural Language Processing (CSE 517): Machine Translation Natural Language Processing (CSE 517): Machine Translation Noah Smith c 2018 University of Washington nasmith@cs.washington.edu May 23, 2018 1 / 82 Evaluation Intuition: good translations are fluent in

More information

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments Overview Learning phrases from alignments 6.864 (Fall 2007) Machine Translation Part III A phrase-based model Decoding in phrase-based models (Thanks to Philipp Koehn for giving me slides from his EACL

More information

COMS 4705, Fall Machine Translation Part III

COMS 4705, Fall Machine Translation Part III COMS 4705, Fall 2011 Machine Translation Part III 1 Roadmap for the Next Few Lectures Lecture 1 (last time): IBM Models 1 and 2 Lecture 2 (today): phrase-based models Lecture 3: Syntax in statistical machine

More information

Natural Language Processing (CSE 490U): Language Models

Natural Language Processing (CSE 490U): Language Models Natural Language Processing (CSE 490U): Language Models Noah Smith c 2017 University of Washington nasmith@cs.washington.edu January 6 9, 2017 1 / 67 Very Quick Review of Probability Event space (e.g.,

More information

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science

More information

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Noah Smith c 2017 University of Washington nasmith@cs.washington.edu May 22, 2017 1 / 30 To-Do List Online

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

statistical machine translation

statistical machine translation statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical

More information

CRF Word Alignment & Noisy Channel Translation

CRF Word Alignment & Noisy Channel Translation CRF Word Alignment & Noisy Channel Translation January 31, 2013 Last Time... X p( Translation)= p(, Translation) Alignment Alignment Last Time... X p( Translation)= p(, Translation) Alignment X Alignment

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Natural Language Processing (CSE 517): Cotext Models (II)

Natural Language Processing (CSE 517): Cotext Models (II) Natural Language Processing (CSE 517): Cotext Models (II) Noah Smith c 2016 University of Washington nasmith@cs.washington.edu January 25, 2016 Thanks to David Mimno for comments. 1 / 30 Three Kinds of

More information

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015 Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische

More information

Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms

Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms 1 1 1 2 2 30% 70% 70% NTCIR-7 13% 90% 1,000 Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms Itsuki Toyota 1 Yusuke Takahashi 1 Kensaku

More information

A phrase-based hidden Markov model approach to machine translation

A phrase-based hidden Markov model approach to machine translation A phrase-based hidden Markov model approach to machine translation Jesús Andrés-Ferrer Universidad Politécnica de Valencia Dept. Sist. Informáticos y Computación jandres@dsic.upv.es Alfons Juan-Císcar

More information

Sampling Alignment Structure under a Bayesian Translation Model

Sampling Alignment Structure under a Bayesian Translation Model Sampling Alignment Structure under a Bayesian Translation Model John DeNero, Alexandre Bouchard-Côté and Dan Klein Computer Science Department University of California, Berkeley {denero, bouchard, klein}@cs.berkeley.edu

More information

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013 Machine Translation CL1: Jordan Boyd-Graber University of Maryland November 11, 2013 Adapted from material by Philipp Koehn CL1: Jordan Boyd-Graber (UMD) Machine Translation November 11, 2013 1 / 48 Roadmap

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Natural Language Processing (CSE 517): Cotext Models

Natural Language Processing (CSE 517): Cotext Models Natural Language Processing (CSE 517): Cotext Models Noah Smith c 2018 University of Washington nasmith@cs.washington.edu April 18, 2018 1 / 32 Latent Dirichlet Allocation (Blei et al., 2003) Widely used

More information

Machine Translation 1 CS 287

Machine Translation 1 CS 287 Machine Translation 1 CS 287 Review: Conditional Random Field (Lafferty et al, 2001) Model consists of unnormalized weights log ŷ(c i 1 ) ci = feat(x, c i 1 )W + b Out of log space, ŷ(c i 1 ) ci = exp(feat(x,

More information

Phrase Table Pruning via Submodular Function Maximization

Phrase Table Pruning via Submodular Function Maximization Phrase Table Pruning via Submodular Function Maximization Masaaki Nishino and Jun Suzuki and Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai, Seika-cho, Soraku-gun,

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi

More information

Lexical Translation Models 1I

Lexical Translation Models 1I Lexical Translation Models 1I Machine Translation Lecture 5 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last Time... X p( Translation)= p(, Translation)

More information

Lattice-based System Combination for Statistical Machine Translation

Lattice-based System Combination for Statistical Machine Translation Lattice-based System Combination for Statistical Machine Translation Yang Feng, Yang Liu, Haitao Mi, Qun Liu, Yajuan Lu Key Laboratory of Intelligent Information Processing Institute of Computing Technology

More information

6.864: Lecture 17 (November 10th, 2005) Machine Translation Part III

6.864: Lecture 17 (November 10th, 2005) Machine Translation Part III 6.864: Lecture 17 (November 10th, 2005) Machine Translation Part III Overview A Phrase-Based Model: (Koehn, Och and Marcu 2003) Syntax Based Model 1: (Wu 1995) Syntax Based Model 2: (Yamada and Knight

More information

IBM Model 1 and the EM Algorithm

IBM Model 1 and the EM Algorithm IBM Model 1 and the EM Algorithm Philipp Koehn 14 September 2017 Lexical Translation 1 How to translate a word look up in dictionary Haus house, building, home, household, shell. Multiple translations

More information

Fast Collocation-Based Bayesian HMM Word Alignment

Fast Collocation-Based Bayesian HMM Word Alignment Fast Collocation-Based Bayesian HMM Word Alignment Philip Schulz ILLC University of Amsterdam P.Schulz@uva.nl Wilker Aziz ILLC University of Amsterdam W.Aziz@uva.nl Abstract We present a new Bayesian HMM

More information

A Discriminative Model for Semantics-to-String Translation

A Discriminative Model for Semantics-to-String Translation A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley

More information

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned

More information

Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Yin-Wen Chang MIT CSAIL Cambridge, MA 02139, USA yinwen@csail.mit.edu Michael Collins Department of Computer Science, Columbia

More information

Word Alignment by Thresholded Two-Dimensional Normalization

Word Alignment by Thresholded Two-Dimensional Normalization Word Alignment by Thresholded Two-Dimensional Normalization Hamidreza Kobdani, Alexander Fraser, Hinrich Schütze Institute for Natural Language Processing University of Stuttgart Germany {kobdani,fraser}@ims.uni-stuttgart.de

More information

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview

More information

Phrasetable Smoothing for Statistical Machine Translation

Phrasetable Smoothing for Statistical Machine Translation Phrasetable Smoothing for Statistical Machine Translation George Foster and Roland Kuhn and Howard Johnson National Research Council Canada Ottawa, Ontario, Canada firstname.lastname@nrc.gc.ca Abstract

More information

Consensus Training for Consensus Decoding in Machine Translation

Consensus Training for Consensus Decoding in Machine Translation Consensus Training for Consensus Decoding in Machine Translation Adam Pauls, John DeNero and Dan Klein Computer Science Division University of California at Berkeley {adpauls,denero,klein}@cs.berkeley.edu

More information

Minimum Bayes-risk System Combination

Minimum Bayes-risk System Combination Minimum Bayes-risk System Combination Jesús González-Rubio Instituto Tecnológico de Informática U. Politècnica de València 46022 Valencia, Spain jegonzalez@iti.upv.es Alfons Juan Francisco Casacuberta

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Computing Optimal Alignments for the IBM-3 Translation Model

Computing Optimal Alignments for the IBM-3 Translation Model Computing Optimal Alignments for the IBM-3 Translation Model Thomas Schoenemann Centre for Mathematical Sciences Lund University, Sweden Abstract Prior work on training the IBM-3 translation model is based

More information

Improving Relative-Entropy Pruning using Statistical Significance

Improving Relative-Entropy Pruning using Statistical Significance Improving Relative-Entropy Pruning using Statistical Significance Wang Ling 1,2 N adi Tomeh 3 Guang X iang 1 Alan Black 1 Isabel Trancoso 2 (1)Language Technologies Institute, Carnegie Mellon University,

More information

Investigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation

Investigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation Investigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation Spyros Martzoukos Christophe Costa Florêncio and Christof Monz Intelligent Systems Lab

More information

Phrase-Based Statistical Machine Translation with Pivot Languages

Phrase-Based Statistical Machine Translation with Pivot Languages Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008

More information

Statistical NLP Spring HW2: PNP Classification

Statistical NLP Spring HW2: PNP Classification Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional

More information

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional

More information

Shift-Reduce Word Reordering for Machine Translation

Shift-Reduce Word Reordering for Machine Translation Shift-Reduce Word Reordering for Machine Translation Katsuhiko Hayashi, Katsuhito Sudoh, Hajime Tsukada, Jun Suzuki, Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai,

More information

Discriminative Training

Discriminative Training Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder

More information

Shift-Reduce Word Reordering for Machine Translation

Shift-Reduce Word Reordering for Machine Translation Shift-Reduce Word Reordering for Machine Translation Katsuhiko Hayashi, Katsuhito Sudoh, Hajime Tsukada, Jun Suzuki, Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai,

More information

Statistical NLP Spring Corpus-Based MT

Statistical NLP Spring Corpus-Based MT Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it

More information

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1 Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Abstract This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation.

More information

Translation as Weighted Deduction

Translation as Weighted Deduction Translation as Weighted Deduction Adam Lopez University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB United Kingdom alopez@inf.ed.ac.uk Abstract We present a unified view of many translation algorithms

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised

More information

Word Alignment III: Fertility Models & CRFs. February 3, 2015

Word Alignment III: Fertility Models & CRFs. February 3, 2015 Word Alignment III: Fertility Models & CRFs February 3, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } {

More information

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,

More information

Lexical Translation Models 1I. January 27, 2015

Lexical Translation Models 1I. January 27, 2015 Lexical Translation Models 1I January 27, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } { z } { p(e f,m)=

More information

Neural Hidden Markov Model for Machine Translation

Neural Hidden Markov Model for Machine Translation Neural Hidden Markov Model for Machine Translation Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan and Hermann Ney {surname}@i6.informatik.rwth-aachen.de July 17th, 2018 Human Language Technology and

More information

Translation as Weighted Deduction

Translation as Weighted Deduction Translation as Weighted Deduction Adam Lopez University of Edinburgh 10 Crichton Street Edinburgh, EH8 9AB United Kingdom alopez@inf.ed.ac.uk Abstract We present a unified view of many translation algorithms

More information

Multi-Task Word Alignment Triangulation for Low-Resource Languages

Multi-Task Word Alignment Triangulation for Low-Resource Languages Multi-Task Word Alignment Triangulation for Low-Resource Languages Tomer Levinboim and David Chiang Department of Computer Science and Engineering University of Notre Dame {levinboim.1,dchiang}@nd.edu

More information

Coactive Learning for Interactive Machine Translation

Coactive Learning for Interactive Machine Translation Coactive Learning for Interactive Machine ranslation Artem Sokolov Stefan Riezler Shay B. Cohen Computational Linguistics Heidelberg University 69120 Heidelberg, Germany sokolov@cl.uni-heidelberg.de Computational

More information

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider

More information

A Systematic Comparison of Training Criteria for Statistical Machine Translation

A Systematic Comparison of Training Criteria for Statistical Machine Translation A Systematic Comparison of Training Criteria for Statistical Machine Translation Richard Zens and Saša Hasan and Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Gappy Phrasal Alignment by Agreement

Gappy Phrasal Alignment by Agreement Gappy Phrasal Alignment by Agreement Mohit Bansal UC Berkeley, CS Division mbansal@cs.berkeley.edu Chris Quirk Microsoft Research chrisq@microsoft.com Robert C. Moore Google Research robert.carter.moore@gmail.com

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Variational Decoding for Statistical Machine Translation

Variational Decoding for Statistical Machine Translation Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Out of GIZA Efficient Word Alignment Models for SMT

Out of GIZA Efficient Word Alignment Models for SMT Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza

More information

Structure and Complexity of Grammar-Based Machine Translation

Structure and Complexity of Grammar-Based Machine Translation Structure and of Grammar-Based Machine Translation University of Padua, Italy New York, June 9th, 2006 1 2 Synchronous context-free grammars Definitions Computational problems 3 problem SCFG projection

More information

CKY-based Convolutional Attention for Neural Machine Translation

CKY-based Convolutional Attention for Neural Machine Translation CKY-based Convolutional Attention for Neural Machine Translation Taiki Watanabe and Akihiro Tamura and Takashi Ninomiya Ehime University 3 Bunkyo-cho, Matsuyama, Ehime, JAPAN {t_watanabe@ai.cs, tamura@cs,

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

Enhanced Bilingual Evaluation Understudy

Enhanced Bilingual Evaluation Understudy Enhanced Bilingual Evaluation Understudy Krzysztof Wołk, Krzysztof Marasek Department of Multimedia Polish Japanese Institute of Information Technology, Warsaw, POLAND kwolk@pjwstk.edu.pl Abstract - Our

More information

Integrating Morphology in Probabilistic Translation Models

Integrating Morphology in Probabilistic Translation Models Integrating Morphology in Probabilistic Translation Models Chris Dyer joint work with Jon Clark, Alon Lavie, and Noah Smith January 24, 2011 lti das alte Haus the old house mach das do that 2 das alte

More information

arxiv: v1 [cs.cl] 22 Jun 2015

arxiv: v1 [cs.cl] 22 Jun 2015 Neural Transformation Machine: A New Architecture for Sequence-to-Sequence Learning arxiv:1506.06442v1 [cs.cl] 22 Jun 2015 Fandong Meng 1 Zhengdong Lu 2 Zhaopeng Tu 2 Hang Li 2 and Qun Liu 1 1 Institute

More information

A Recursive Statistical Translation Model

A Recursive Statistical Translation Model A Recursive Statistical Translation Model Juan Miguel Vilar Dpto. de Lenguajes y Sistemas Informáticos Universitat Jaume I Castellón (Spain) jvilar@lsi.uji.es Enrique Vidal Dpto. de Sistemas Informáticos

More information

The Geometry of Statistical Machine Translation

The Geometry of Statistical Machine Translation The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Hemanta K Maji Raghavendra Udupa U IBM India Research Laboratory, New Delhi {hemantkm, uraghave}@inibmcom Abstract Viterbi

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

Improved Decipherment of Homophonic Ciphers

Improved Decipherment of Homophonic Ciphers Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,

More information

Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach. Abhishek Arun

Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach. Abhishek Arun Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach Abhishek Arun Doctor of Philosophy Institute for Communicating and Collaborative Systems School of Informatics University

More information

Fast Consensus Decoding over Translation Forests

Fast Consensus Decoding over Translation Forests Fast Consensus Decoding over Translation Forests John DeNero Computer Science Division University of California, Berkeley denero@cs.berkeley.edu David Chiang and Kevin Knight Information Sciences Institute

More information

Graphical models for part of speech tagging

Graphical models for part of speech tagging Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional

More information

Statistical Machine Translation of Natural Languages

Statistical Machine Translation of Natural Languages 1/37 Statistical Machine Translation of Natural Languages Rule Extraction and Training Probabilities Matthias Büchse, Toni Dietze, Johannes Osterholzer, Torsten Stüber, Heiko Vogler Technische Universität

More information

Triplet Lexicon Models for Statistical Machine Translation

Triplet Lexicon Models for Statistical Machine Translation Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations

More information

Natural Language Processing (CSEP 517): Text Classification

Natural Language Processing (CSEP 517): Text Classification Natural Language Processing (CSEP 517): Text Classification Noah Smith c 2017 University of Washington nasmith@cs.washington.edu April 10, 2017 1 / 71 To-Do List Online quiz: due Sunday Read: Jurafsky

More information

Feature-Rich Translation by Quasi-Synchronous Lattice Parsing

Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Kevin Gimpel and Noah A. Smith Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213,

More information

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this. Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one

More information

Generalized Stack Decoding Algorithms for Statistical Machine Translation

Generalized Stack Decoding Algorithms for Statistical Machine Translation Generalized Stack Decoding Algorithms for Statistical Machine Translation Daniel Ortiz Martínez Inst. Tecnológico de Informática Univ. Politécnica de Valencia 4607 Valencia, Spain dortiz@iti.upv.es Ismael

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder

More information

logo The Prague Bulletin of Mathematical Linguistics NUMBER??? AUGUST Efficient Word Alignment with Markov Chain Monte Carlo

logo The Prague Bulletin of Mathematical Linguistics NUMBER??? AUGUST Efficient Word Alignment with Markov Chain Monte Carlo P B M L logo The Prague Bulletin of Mathematical Linguistics NUMBER??? AUGUST 2016 1 22 Efficient Word Alignment with Markov Chain Monte Carlo Robert Östling, Jörg Tiedemann Department of Modern Languages,

More information

Discriminative Training. March 4, 2014

Discriminative Training. March 4, 2014 Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =

More information

Minimum Error Rate Training Semiring

Minimum Error Rate Training Semiring Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)

More information

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation Yilin Yang 1 Liang Huang 1,2 Mingbo Ma 1,2 1 Oregon State University Corvallis, OR,

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source

More information