Minimum Error Rate Training Semiring

Size: px
Start display at page:

Download "Minimum Error Rate Training Semiring"

Transcription

1 Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud EAMT May 2011 Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

2 Talk Plan 1 Introduction Phrase-based statistical machine translation Minimum Error Rate Training Contribution 2 Semirings Lattice MERT MERT Semiring 3 Implementation 4 Experiments Setup Results 5 Future Work Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

3 Introduction Phrase-based statistical machine translation Probability model and inference in SMT system Probability of translation e given source sentence f: p(e f) = Z(f) 1 exp( λ h(e, f)) h(e, f) feature vector (various compatibility measures of e and f) λ parameter vector, λ i regulates importance of the feature h i (e, f) Translating by MAP-inference: ẽ f ( λ) = arg max e E p(e f) = arg max λ h(e, f) e E E reachable translations (search space), can be approximated by: list of n-best hypotheses word lattice Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

4 Introduction Tuning SMT system with MERT Minimum Error Rate Training Given: development set {(f, r f )} (source f & reference r f pairs) Solve: λ = arg max BLEU({ẽ f ( λ, E( λ)), r f }) λ BLEU is non-convex and not differentiable, hence heuristics (MERT). Search space approximation depends on λ, so iterative tuning: {(f, r f )} Configuration: λ t Decoder Approximation of E Updater Scorer Updated λ t Tuning: MERT Features Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

5 Introduction Minimum Error Rate Training MERT proceeds in series of optimizations along directions r: Optimal translation: ẽ f (γ) = arg max e E λ = λ 0 + γ r λ h(e, f) = arg max e E λ 0 h(e, f) }{{} intercept +γ r h(e, f) }{{} slope each translation hypothesis is associated with a line, upper envelope: dominating lines when λ is moved along r λ h γ Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

6 Introduction Minimum Error Rate Training γ-projections of intersections give intervals of constant optimal hypothesis optimal γ found by merging intervals for f F and scoring each update λ = λ 0 + γ i r i, where i is the index of the direction yielding the highest BLEU λ h γ Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

7 Introduction Minimum Error Rate Training MERT problems very slow, because of: overall number of iterations folklore: number of iterations number of dimensions slowness of each iteration (dominated by decoding time) non-monotonicity/instability of the training process sensitivity of the resulting solutions to initial conditions Ways to tackle the problems improve optimization other target function approximations changes into optimization algorithms improve search space processing this presentation use lattices (better approximation of the complete search space) reduce search to standard operations (facilitates implementation) reduce number of iteration this presentation Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

8 Contribution Introduction Contribution Recast Lattice MERT algorithm of [Macherey et al., 2008] in a semiring framework has already been hinted to in [Dyer et al., 2010] but was never formally described lack of implementation details Reimplement MERT using this reformulation and general-purpose FST toolbox OpenFST Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

9 Semirings Lattice MERT Semirings Semiring K = K,,, 0, 1 : K,, 0 is a commutative monoid with identity element 0: a (b c) = (a b) c a b = b a a 0 = 0 a = a K,, 1 is a monoid with identity element 1 distributes over a (b c) = (a b) (a c) (b c) a = (b a) (c a) element 0 annihilates K a 0 = 0 a = 0. Examples R, +,, 0, 1 real semiring S,,,, i S i semiring of sets Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

10 Semirings Lattice MERT Lattice MERT [Macherey et al., 2008] source fr: Vénus est la jumelle infernale de la Terre target en: Venus is Earth s hellish twin Vénus:Venus h_02 2 est:is h_23 la_jumelle:twin h_35 0 Vénus_est:Venus h_01 1 NULL:- h_13 3 la:the h_34 jumelle:twin h_45 5 infernale:of_hell h_58 4 infernale:hellish h_47 infernale:infernal h_46 7 jumelle:twin h_78 jumelle:twin h_68 8 de_la_terre:of_the_earth h_ Decomposability of h(e, f) into a sum of local features h 01,h Envelopes are distributed over nodes in the lattice Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

11 MERT Semiring Semirings MERT Semiring Host set: a line: d y + d s x (hypothesis) D = D,,, 0, 1 set of lines d i : d = {d i,y + d i,s x} (set of hypotheses) set of sets d k of lines: D = { {d k i,y + d k i,s x}} Operations and : for d 1, d 2 D d 1 d 2 = env(d 1 d 2 ) d 1 d 2 = env({(di.y 1 + d j.y 2 ) + (d i.s 1 + d j.s 2 ) x d 1 i d 1, dj 2 d 2 }) Unities: 0 = 1 = {0 + 0 x} Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

12 d 1 d 2 = env(d 1 d 2 ) Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18 Semirings Semiring Operations Illustration MERT Semiring -example d 1 d 2 = env({(d 1 i.y + d 2 j.y ) + (d 1 i.s + d 2 j.s) x d 1 i d 1, d 2 j d 2 }) -example

13 Semirings MERT Semiring Shortest Paths for MERT Semiring Each arc in the FST carries: target word a vector h(a, f) of local features associated with a singleton set containing line d with slope d s = ( r h(a, f)) y-intercept d y = ( λ 0 h(a, f)) Weight of a candidate translation path e = e 1... e l : w(e) = l w(e i ) = { λ 0 i=1 l h(e i, f) + ( r i=1 Upper envelope of all the lines (hypotheses): env( e w(e)) = e w(e) = e l(e) l h(e i, f)) x} i=1 w(e i ). Generic shortest distance algorithms over acyclic graphs calculate this. Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18 i=1

14 Implementation Implementation Basics: OpenFST toolbox works with any semiring proven and well optimized ShortestPath algorithms other useful algorithms: Union, Determinize, etc. Lattice minimization: Union of lattices between decoder runs Determinize+Minimize to eliminate duplicate hypotheses won t work MERT semiring is not divisible circumvent by performing Union+Determinize over (min, +) semiring All directions simultaneously weights as arrays of envelopes random direction BLEU Random restarts help only for the first iteration Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

15 Experiments Setup Experiments Data: NewsCommentary (dev: 2051) & WMT10 (dev: 1026), common test French to English FST MERT tuning: OpenFST-based multi-threaded implementation zero restart points axes and additional random directions Baseline MERT tuning: MERT implementation included in MOSES toolkit 100-best list, 20 restart points Koehn s coordinate descend (only axis directions) Decoder: n-gram phrase-based SMT system N-code 1, 11 features 1 Demo on Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

16 Experiments Results Experiments newsco, dev newsco, test BLEU n-best MERT, dev 15.5 fst MERT (r=0), dev fst MERT (r=20), dev WMT10, dev n-best MERT, test 15.5 fst MERT (r=0), test 15.0 fst MERT (r=20), test WMT10, test BLEU n-best MERT, dev fst MERT (r=0), dev fst MERT (r=20), dev fst MERT (r=50), dev n-best MERT, test fst MERT (r=0), test fst MERT (r=20), test fst MERT (r=50), test iteration iteration Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

17 Future Work Conclusion & Future Work Conclusion Semiring formalization allows using generic FST toolkits to do MERT Convergence in less iterations Future Work Better stopping criteria to detect saturation Faster should be most helpful for speed up Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

18 Future Work Thank you for your attention! Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18

19 Bibliography Dyer, C., Lopez, A., Ganitkevitch, J., Weese, J., Ture, F., Blunsom, P., Setiawan, H., Eidelman, V., & Resnik, P. (2010). cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models. In Proc. of the ACL (pp. 7 12). Macherey, W., Och, F. J., Thayer, I., & Uszkoreit, J. (2008). Lattice-based minimum error rate training for statistical machine translation. In Proc. of the Conf. on EMNLP (pp ). Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 1

Computing Lattice BLEU Oracle Scores for Machine Translation

Computing Lattice BLEU Oracle Scores for Machine Translation Computing Lattice Oracle Scores for Machine Translation Artem Sokolov & Guillaume Wisniewski & François Yvon {firstname.lastname}@limsi.fr LIMSI, Orsay, France 1 Introduction 2 Oracle Decoding Task 3 Proposed

More information

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015 Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single

More information

The Geometry of Statistical Machine Translation

The Geometry of Statistical Machine Translation The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide

More information

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

Discriminative Training

Discriminative Training Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder

More information

Phrase-Based Statistical Machine Translation with Pivot Languages

Phrase-Based Statistical Machine Translation with Pivot Languages Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008

More information

Stochastic Structured Prediction under Bandit Feedback

Stochastic Structured Prediction under Bandit Feedback Stochastic Structured Prediction under Bandit Feedback Artem Sokolov,, Julia Kreutzer, Christopher Lo,, Stefan Riezler, Computational Linguistics & IWR, Heidelberg University, Germany {sokolov,kreutzer,riezler}@cl.uni-heidelberg.de

More information

Minimum Bayes-risk System Combination

Minimum Bayes-risk System Combination Minimum Bayes-risk System Combination Jesús González-Rubio Instituto Tecnológico de Informática U. Politècnica de València 46022 Valencia, Spain jegonzalez@iti.upv.es Alfons Juan Francisco Casacuberta

More information

A Discriminative Model for Semantics-to-String Translation

A Discriminative Model for Semantics-to-String Translation A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley

More information

Consensus Training for Consensus Decoding in Machine Translation

Consensus Training for Consensus Decoding in Machine Translation Consensus Training for Consensus Decoding in Machine Translation Adam Pauls, John DeNero and Dan Klein Computer Science Division University of California at Berkeley {adpauls,denero,klein}@cs.berkeley.edu

More information

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Improving Relative-Entropy Pruning using Statistical Significance

Improving Relative-Entropy Pruning using Statistical Significance Improving Relative-Entropy Pruning using Statistical Significance Wang Ling 1,2 N adi Tomeh 3 Guang X iang 1 Alan Black 1 Isabel Trancoso 2 (1)Language Technologies Institute, Carnegie Mellon University,

More information

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder

More information

Multi-Task Word Alignment Triangulation for Low-Resource Languages

Multi-Task Word Alignment Triangulation for Low-Resource Languages Multi-Task Word Alignment Triangulation for Low-Resource Languages Tomer Levinboim and David Chiang Department of Computer Science and Engineering University of Notre Dame {levinboim.1,dchiang}@nd.edu

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi

More information

Discriminative Training. March 4, 2014

Discriminative Training. March 4, 2014 Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =

More information

Natural Language Processing (CSEP 517): Machine Translation

Natural Language Processing (CSEP 517): Machine Translation Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,

More information

A Disambiguation Algorithm for Weighted Automata

A Disambiguation Algorithm for Weighted Automata A Disambiguation Algorithm for Weighted Automata Mehryar Mohri a,b and Michael D. Riley b a Courant Institute of Mathematical Sciences, 251 Mercer Street, New York, NY 10012. b Google Research, 76 Ninth

More information

Lattice-based System Combination for Statistical Machine Translation

Lattice-based System Combination for Statistical Machine Translation Lattice-based System Combination for Statistical Machine Translation Yang Feng, Yang Liu, Haitao Mi, Qun Liu, Yajuan Lu Key Laboratory of Intelligent Information Processing Institute of Computing Technology

More information

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments Overview Learning phrases from alignments 6.864 (Fall 2007) Machine Translation Part III A phrase-based model Decoding in phrase-based models (Thanks to Philipp Koehn for giving me slides from his EACL

More information

Monte Carlo techniques for phrase-based translation

Monte Carlo techniques for phrase-based translation Monte Carlo techniques for phrase-based translation Abhishek Arun (a.arun@sms.ed.ac.uk) Barry Haddow (bhaddow@inf.ed.ac.uk) Philipp Koehn (pkoehn@inf.ed.ac.uk) Adam Lopez (alopez@inf.ed.ac.uk) University

More information

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best

More information

Neural Hidden Markov Model for Machine Translation

Neural Hidden Markov Model for Machine Translation Neural Hidden Markov Model for Machine Translation Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan and Hermann Ney {surname}@i6.informatik.rwth-aachen.de July 17th, 2018 Human Language Technology and

More information

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.

More information

Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Graeme Blackwood, Adrià de Gispert, William Byrne Machine Intelligence Laboratory Cambridge

More information

Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach. Abhishek Arun

Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach. Abhishek Arun Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach Abhishek Arun Doctor of Philosophy Institute for Communicating and Collaborative Systems School of Informatics University

More information

CRF Word Alignment & Noisy Channel Translation

CRF Word Alignment & Noisy Channel Translation CRF Word Alignment & Noisy Channel Translation January 31, 2013 Last Time... X p( Translation)= p(, Translation) Alignment Alignment Last Time... X p( Translation)= p(, Translation) Alignment X Alignment

More information

COMS 4705, Fall Machine Translation Part III

COMS 4705, Fall Machine Translation Part III COMS 4705, Fall 2011 Machine Translation Part III 1 Roadmap for the Next Few Lectures Lecture 1 (last time): IBM Models 1 and 2 Lecture 2 (today): phrase-based models Lecture 3: Syntax in statistical machine

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

Statistical Ranking Problem

Statistical Ranking Problem Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing

More information

Statistical NLP Spring Corpus-Based MT

Statistical NLP Spring Corpus-Based MT Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it

More information

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1 Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

Toponym Disambiguation in an English-Lithuanian SMT System with Spatial Knowledge

Toponym Disambiguation in an English-Lithuanian SMT System with Spatial Knowledge Toponym Disambiguation in an English-Lithuanian SMT System with Spatial Knowledge Raivis Skadiņš Tilde SIA Vienibas gatve 75a, Riga, Latvia raiviss@tilde.lv Tatiana Gornostay Tilde SIA Vienibas gatve 75a,

More information

Decoding in Statistical Machine Translation. Mid-course Evaluation. Decoding. Christian Hardmeier

Decoding in Statistical Machine Translation. Mid-course Evaluation. Decoding. Christian Hardmeier Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation http://stp.lingfil.uu.se/~sara/kurser/mt16/ mid-course-eval.html Decoding The decoder is the part of the

More information

Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation

Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation Carolin Lawrence Heidelberg University, Germany Artem Sokolov Amazon Development

More information

Triplet Lexicon Models for Statistical Machine Translation

Triplet Lexicon Models for Statistical Machine Translation Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language

More information

Multiple System Combination. Jinhua Du CNGL July 23, 2008

Multiple System Combination. Jinhua Du CNGL July 23, 2008 Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments

More information

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation Yilin Yang 1 Liang Huang 1,2 Mingbo Ma 1,2 1 Oregon State University Corvallis, OR,

More information

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David

More information

statistical machine translation

statistical machine translation statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical

More information

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned

More information

Phrase Table Pruning via Submodular Function Maximization

Phrase Table Pruning via Submodular Function Maximization Phrase Table Pruning via Submodular Function Maximization Masaaki Nishino and Jun Suzuki and Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai, Seika-cho, Soraku-gun,

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

n-gram-based MT : what s behind us, what s ahead

n-gram-based MT : what s behind us, what s ahead n-gram-based MT : what s behind us, what s ahead F. Yvon and the LIMSI MT crew LIMSI CNRS and Université Paris Sud MT Marathon in Prague, Sep 8th, 215 F. Yvon (LIMSI) n-gram-based MT MTM@Prague - 215-9-8

More information

Lexical Translation Models 1I

Lexical Translation Models 1I Lexical Translation Models 1I Machine Translation Lecture 5 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last Time... X p( Translation)= p(, Translation)

More information

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,

More information

Statistical NLP Spring HW2: PNP Classification

Statistical NLP Spring HW2: PNP Classification Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional

More information

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional

More information

Improved Decipherment of Homophonic Ciphers

Improved Decipherment of Homophonic Ciphers Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,

More information

Feature-Rich Translation by Quasi-Synchronous Lattice Parsing

Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Kevin Gimpel and Noah A. Smith Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213,

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Coactive Learning for Interactive Machine Translation

Coactive Learning for Interactive Machine Translation Coactive Learning for Interactive Machine ranslation Artem Sokolov Stefan Riezler Shay B. Cohen Computational Linguistics Heidelberg University 69120 Heidelberg, Germany sokolov@cl.uni-heidelberg.de Computational

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

Lesson 3-2: Solving Linear Systems Algebraically

Lesson 3-2: Solving Linear Systems Algebraically Yesterday we took our first look at solving a linear system. We learned that a linear system is two or more linear equations taken at the same time. Their solution is the point that all the lines have

More information

Machine Translation without Words through Substring Alignment

Machine Translation without Words through Substring Alignment Machine Translation without Words through Substring Alignment Graham Neubig 1,2,3, Taro Watanabe 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 2 3 now at 1 Machine Translation Translate a source sentence F

More information

Word Alignment III: Fertility Models & CRFs. February 3, 2015

Word Alignment III: Fertility Models & CRFs. February 3, 2015 Word Alignment III: Fertility Models & CRFs February 3, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } {

More information

Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation

Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation Journal of Machine Learning Research 12 (2011) 1-30 Submitted 5/10; Revised 11/10; Published 1/11 Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation Yizhao

More information

MATH 8. Unit 1: Rational and Irrational Numbers (Term 1) Unit 2: Using Algebraic Properties to Simplify Expressions - Probability

MATH 8. Unit 1: Rational and Irrational Numbers (Term 1) Unit 2: Using Algebraic Properties to Simplify Expressions - Probability MATH 8 Unit 1: Rational and Irrational Numbers (Term 1) 1. I CAN write an algebraic expression for a given phrase. 2. I CAN define a variable and write an equation given a relationship. 3. I CAN use order

More information

Graphical Solutions of Linear Systems

Graphical Solutions of Linear Systems Graphical Solutions of Linear Systems Consistent System (At least one solution) Inconsistent System (No Solution) Independent (One solution) Dependent (Infinite many solutions) Parallel Lines Equations

More information

Fast Collocation-Based Bayesian HMM Word Alignment

Fast Collocation-Based Bayesian HMM Word Alignment Fast Collocation-Based Bayesian HMM Word Alignment Philip Schulz ILLC University of Amsterdam P.Schulz@uva.nl Wilker Aziz ILLC University of Amsterdam W.Aziz@uva.nl Abstract We present a new Bayesian HMM

More information

arxiv: v1 [cs.cl] 22 Jun 2015

arxiv: v1 [cs.cl] 22 Jun 2015 Neural Transformation Machine: A New Architecture for Sequence-to-Sequence Learning arxiv:1506.06442v1 [cs.cl] 22 Jun 2015 Fandong Meng 1 Zhengdong Lu 2 Zhaopeng Tu 2 Hang Li 2 and Qun Liu 1 1 Institute

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

A Systematic Comparison of Training Criteria for Statistical Machine Translation

A Systematic Comparison of Training Criteria for Statistical Machine Translation A Systematic Comparison of Training Criteria for Statistical Machine Translation Richard Zens and Saša Hasan and Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 3, 7 Sep., 2016 jtl@ifi.uio.no Machine Translation Evaluation 2 1. Automatic MT-evaluation: 1. BLEU 2. Alternatives 3. Evaluation

More information

CSC321 Lecture 10 Training RNNs

CSC321 Lecture 10 Training RNNs CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw

More information

Incremental HMM Alignment for MT System Combination

Incremental HMM Alignment for MT System Combination Incremental HMM Alignment for MT System Combination Chi-Ho Li Microsoft Research Asia 49 Zhichun Road, Beijing, China chl@microsoft.com Yupeng Liu Harbin Institute of Technology 92 Xidazhi Street, Harbin,

More information

Language Model Rest Costs and Space-Efficient Storage

Language Model Rest Costs and Space-Efficient Storage Language Model Rest Costs and Space-Efficient Storage Kenneth Heafield Philipp Koehn Alon Lavie Carnegie Mellon, University of Edinburgh July 14, 2012 Complaint About Language Models Make Search Expensive

More information

Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms

Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms 1 1 1 2 2 30% 70% 70% NTCIR-7 13% 90% 1,000 Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms Itsuki Toyota 1 Yusuke Takahashi 1 Kensaku

More information

Coverage Embedding Models for Neural Machine Translation

Coverage Embedding Models for Neural Machine Translation Coverage Embedding Models for Neural Machine Translation Haitao Mi Baskaran Sankaran Zhiguo Wang Abe Ittycheriah T.J. Watson Research Center IBM 1101 Kitchawan Rd, Yorktown Heights, NY 10598 {hmi, bsankara,

More information

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26 Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and

More information

MATH 7 HONORS. Unit 1: Rational and Irrational Numbers (Term 1) Unit 2: Using Algebraic Properties to Simplify Expressions - Probability

MATH 7 HONORS. Unit 1: Rational and Irrational Numbers (Term 1) Unit 2: Using Algebraic Properties to Simplify Expressions - Probability MATH 7 HONORS Unit 1: Rational and Irrational Numbers (Term 1) 1. I CAN write an algebraic expression for a given phrase. 2. I CAN define a variable and write an equation given a relationship. 3. I CAN

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for atural Language Processing Alexander M. Rush (based on joint work with Michael Collins, Tommi Jaakkola, Terry Koo, David Sontag) Uncertainty

More information

Fast Consensus Decoding over Translation Forests

Fast Consensus Decoding over Translation Forests Fast Consensus Decoding over Translation Forests John DeNero Computer Science Division University of California, Berkeley denero@cs.berkeley.edu David Chiang and Kevin Knight Information Sciences Institute

More information

Optimal Beam Search for Machine Translation

Optimal Beam Search for Machine Translation Optimal Beam Search for Machine Translation Alexander M. Rush Yin-Wen Chang MIT CSAIL, Cambridge, MA 02139, USA {srush, yinwen}@csail.mit.edu Michael Collins Department of Computer Science, Columbia University,

More information

Today s Lecture. Dropout

Today s Lecture. Dropout Today s Lecture so far we discussed RNNs as encoder and decoder we discussed some architecture variants: RNN vs. GRU vs. LSTM attention mechanisms Machine Translation 1: Advanced Neural Machine Translation

More information

Correlation of From Patterns to Algebra to the Ontario Mathematics Curriculum (Grade 4)

Correlation of From Patterns to Algebra to the Ontario Mathematics Curriculum (Grade 4) Correlation of From Patterns to Algebra to the Ontario Mathematics Curriculum (Grade 4) Strand: Patterning and Algebra Ontario Curriculum Outcomes By the end of Grade 4, students will: describe, extend,

More information

Optimal Search for Minimum Error Rate Training

Optimal Search for Minimum Error Rate Training Optimal Search for Minimum Error Rate Training Michel Galley Microsoft Research Redmond, WA 98052, USA mgalley@microsoft.com Chris Quirk Microsoft Research Redmond, WA 98052, USA chrisq@microsoft.com Abstract

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

Solving Systems of Linear Equations

Solving Systems of Linear Equations Section 2.3 Solving Systems of Linear Equations TERMINOLOGY 2.3 Previously Used: Equivalent Equations Literal Equation Properties of Equations Substitution Principle Prerequisite Terms: Coordinate Axes

More information

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz

More information

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0

More information

Dual Decomposition for Natural Language Processing. Decoding complexity

Dual Decomposition for Natural Language Processing. Decoding complexity Dual Decomposition for atural Language Processing Alexander M. Rush and Michael Collins Decoding complexity focus: decoding problem for natural language tasks motivation: y = arg max y f (y) richer model

More information

Variational Decoding for Statistical Machine Translation

Variational Decoding for Statistical Machine Translation Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1

More information

Distance Preservation - Part 2

Distance Preservation - Part 2 Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear

More information

1 Evaluation of SMT systems: BLEU

1 Evaluation of SMT systems: BLEU 1 Evaluation of SMT systems: BLEU Idea: We want to define a repeatable evaluation method that uses: a gold standard of human generated reference translations a numerical translation closeness metric in

More information

1.5 F15 O Brien. 1.5: Linear Equations and Inequalities

1.5 F15 O Brien. 1.5: Linear Equations and Inequalities 1.5: Linear Equations and Inequalities I. Basic Terminology A. An equation is a statement that two expressions are equal. B. To solve an equation means to find all of the values of the variable that make

More information

Introduction to Kleene Algebras

Introduction to Kleene Algebras Introduction to Kleene Algebras Riccardo Pucella Basic Notions Seminar December 1, 2005 Introduction to Kleene Algebras p.1 Idempotent Semirings An idempotent semiring is a structure S = (S, +,, 1, 0)

More information

Machine Translation Evaluation

Machine Translation Evaluation Machine Translation Evaluation Sara Stymne 2017-03-29 Partly based on Philipp Koehn s slides for chapter 8 Why Evaluation? How good is a given machine translation system? Which one is the best system for

More information