Computing Lattice BLEU Oracle Scores for Machine Translation

Size: px
Start display at page:

Download "Computing Lattice BLEU Oracle Scores for Machine Translation"

Transcription

1 Computing Lattice Oracle Scores for Machine Translation Artem Sokolov & Guillaume Wisniewski & François Yvon LIMSI, Orsay, France 1 Introduction 2 Oracle Decoding Task 3 Proposed Oracle Decoders 4 Experiments 5 Conclusion & Future Work

2 Model & Inference in SMT Usual translating of sentences f: best translation: e = arg max e E(f) score(e f) e.g.: score(e f) = λ h(e, f) h(e, f) features, λ params E(f) search space interested in this Example 0 source: Vénus est la jumelle infernale de la Terre search space options: list of n-best hypotheses Venus - the twin of hell of the Earth Venus is the infernal binocular of the Earth Venus is the hellish twin of the Earth word lattice Vénus : Venus Vénus_est : Venus 2 1 est : is NULL : - 3 la_jumelle : twin la : the jumelle : twin 5 4 infernale : hellish 7 infernale : infernal 6 infernale : of_hell jumelle : twin jumelle : binocular 8 de_la_terre : of_the_earth 9 Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 1 / 15

3 Motivation in reality SMT systems perform poorly knowing why is difficult because of system complexity useful to know the best possible hypotheses can help failure analysis Usefulness of (oracle) translation Finding bottlenecks if oracle is good why the system couldn t return it? if oracle is bad why doesn t E contain better ones? Learning update towards the best hypothesis In this work interested in finding best hypotheses contained in lattices two new methods, comparison with 2 existing methods Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 2 / 15

4 Oracle Decoding Task by best hypothesis in E we mean highest translation ( n ) 1/n n-({e i }, {r i }) = BrevityPenalty p m m=1 e i hypotheses, r i references, p m clipped precisions need some sentence-level approximation Exact oracle finding with sent- in n-best list straight-forward in word lattice NP-hard even for 2- [Leusch et al., 2008] so, approximations are needed focus of this presentation Oracle decoding on lattice L for reference r f π (f) = arg max (e π, r f ). π {paths on L} Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 3 / 15

5 Earlier Approaches Language Model Oracle [Li & Khudanpur, 2009] LM n-gram language model A LM built on reference r find most probable hypothesis π LM (f) = ShortestPath(L A LM(r f )) Partial Oracle [Dreyer et al., 2007] PB/PBl for partial path π let its weight be: log (π) = 1 4 m=1..4 log p m hypothesis h 1 is prefered to h 2 if log (h 1 ) > log (h 2 ) and same 3-word history or PB same 3-word history and same length PBl find shortest path πpb (f) = ShortestPath(L) Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 4 / 15

6 Problems Problems with existing approaches 1 LM distant relation to low quality 2 PB approximate search (recombination heuristics) can be resource-greedy (keeps many hypotheses until the final state) 3 Both LM & PB ignore: precision clipping brevity penalty (except PBl) corpus nature of Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 5 / 15

7 Our Contribution New oracles partially take clipping, brevity and corpus nature into account produce better hypotheses and often faster Take-away message although true oracles are NP-hard... we can find approximate oracles: very easy to calculate very high Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 6 / 15

8 Linear approximation of corpus- LB Linear approximation of [Tromble et al., 2008] 4 lin (π) = θ 0 e π + θ n c u (e π )δ u (r) n=1 u Σ n c u (e) number of occurences of u in e, δ u (r) indicator of presence of u in r. parameters: θ 0 = 1 ( brevity), θ n = 1 4p r n 1 (n-gram discounts) de Bruijn transducers n each n contains all n-grams as output symbols arcs weighted by δ u (r) 0:00/0 1:11/0 1:ε/0 1 0:ε/0 1:ε/0 10 1:101/0 0:110/0 0:100/0 1:111/0 11 1:01/0 q₀ 1:011/0 1:1/0 0:0/0 0:ε/0 0 0:10/0 1 0:ε/0 0:ε/0 0:000/0 q₀ q₀ 1:ε/0 0 1:ε/0 01 0:010/0 1:001/0 00 n automata for alphabet {0, 1} and n = Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 7 / 15

9 Linear approximation of corpus- LB Composition effect of s L n produces n-gram lattice each n-gram arc weighted by θ n, if the n-gram was in reference Find LB oracle translation for 4- weight lattice L with θ 0 weight i with θ i find shortest path π LB = ShortestPath(L ). so far all oracles: transformed and/or reweighted lattice L called ShortestPath difficult to account for clipping (it depends on the resulting path) Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 8 / 15

10 Integer Linear Programming Oracles ILP how to take clipping into account? need to fine-tune path weights reformulate 1- shortest path problem as an ILP problem: arg max ξ ξ = 1, ξ in(q fin ) ξ out(q) i {arcs} ξ ξ i (Θ 1 1 i r Θ 2 1 i r ) ξ out(q init ) ξ in(q) ξ = 1 ξ = 0, q {nodes} \ {q init, q fin } maximize 1- solution should be path arc indicator ξ i = { 1 if arc i is active in solution, 0 otherwise, Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 9 / 15

11 Taking account of clipping in ILP New variable: Clipped word counter γ w = min{ ξ, # of w in r} ξ {edges with w} ILP problem with clipping arg max Θ 1 ξ,γ w w {words} γ w Θ 2 γ w 0, γ w c w (r), γ w ξ in(q fin ) ξ out(q) ξ = 1, ξ ξ out(q init ) ξ in(q) ξ = 1 i {arcs} ξ i ξ {edges carring w} w {words} ξ = 0, q {nodes} \ {q init, q fin } ξ γ w many ILP solvers available we used Gurobi Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 10 / 15

12 Langrangian Relaxation-based Oracle Decoder RLX can be done without 3rd-party solvers Lagrangian relaxation to deal with clipping Primal arg min ξ i θ i ξ {paths} ξ with w i arcs ξ c w (r), w r arg max L(λ) λ λ w 0, w r Dual Where L(λ) = min ξ i {arcs} ξ i θ i + w r λ w ξ with w ξ c w (r) Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 11 / 15

13 Langrangian Relaxation-based Oracle Decoder L(λ) = min ξ ξ = arg min ξ L(λ) λ w = λ w ξ i θ i + i {arcs} w r L(λ) = arg min ξ ξ Ω(w) ξ ξ c w (r) i {arcs} ξ with w ξ c w (r) ξ i ( λw(ξi ) θ i ) just find shortest path update λ towards max L(λ) RLX Incremental Constraint Enforcing cf. [Rush et al., 2010] w, λ (0) w 0 for t = 1 T do ξ (t) = arg min ξ i ξ i (λ ) w(ξi) θ i if all clipping constraints are enforced then optimal solution found else for w r do n w n. of occurrences of w in ξ (t) λ (t) w λ (t) w λ (t) w + α (t) (n w c w (r)) max(0, λ (t) w ) this was for 1- for n- run on L n Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 12 / 15

14 Experiments: Performance for two decoders N-code avg. time n-best RLX ILP LB4 LB2 PB PBl LM4 LM avg. time, s avg. time n-best RLX ILP LB4 LB2 PB PBl LM4LM avg. time, s avg. time n-best RLX ILP LB4 LB2 PB PBl LM4LM avg. time, s (a) fr en (test: 27.88) (b) de en (test: 22.05) Moses (c) en de (test: 15.83) avg. time n-best RLX ILP LB4 LB2 PB PBl LM4 LM2 3 avg. time, s RLX ILP LB4 LB2 PB PBl LM4 LM2 4 3 avg. time, s RLX ILP LB4 LB2 PB PBl LM4 LM avg. time, s (d) fr en (test: 27.68) (e) de en (test: 21.85) (f) en de (test: 15.89) Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 13 / 15

15 Conclusion & Future Work Results new oracle decoders find better hypotheses...and do it faster (clipping/brevity/corpus) optimization for 2- is sufficient to obtain good 4- Main Conclusion lattices contain excellent translations points to n-best list oracles +20 points above results of state-of-the-art models SMT has serious scoring problems Future Work linear models not flexible enough? non-linear score func. more features needed? rich features training flawed? discriminative training with lattice oracles Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 14 / 15

16 Thank you for your attention! Questions? Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 15 / 15

17 Bibliography Dreyer, M., Hall, K. B., & Khudanpur, S. P. (2007). Comparing reordering constraints for SMT using efficient oracle computation. In Proc. of the Workshop on Syntax and Structure in Statistical Translation (pp ). Morristown, NJ, USA. Leusch, G., Matusov, E., & Ney, H. (2008). Complexity of finding the -optimal hypothesis in a confusion network. In Proc. of the 2008 Conf. on EMNLP (pp ). Honolulu, Hawaii. Li, Z. & Khudanpur, S. (2009). Efficient extraction of oracle-best translations from hypergraphs. In Proc. of Human Language Technologies: The 2009 Annual Conf. of the North American Chapter of the ACL, Companion Volume: Short Papers (pp. 9 12). Morristown, NJ, USA. Rush, A. M., Sontag, D., Collins, M., & Jaakkola, T. (2010). On dual decomposition and linear programming relaxations for natural language processing. In Proc. of the 2010 Conf. on EMNLP (pp. 1 11). Stroudsburg, PA, USA. Tromble, R. W., Kumar, S., Och, F., & Macherey, W. (2008). Lattice minimum bayes-risk decoding for statistical machine translation. In Proc. of the Conf. on EMNLP (pp ). Stroudsburg, PA, USA. Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 1 / 2

18 Implementation OpenFST toolbox works with any semiring (PB required semiring of sets) proven and well optimized ShortestPath algorithms Gurobi ILP solver Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 2 / 2

Minimum Error Rate Training Semiring

Minimum Error Rate Training Semiring Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)

More information

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015 Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische

More information

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single

More information

Improved Decipherment of Homophonic Ciphers

Improved Decipherment of Homophonic Ciphers Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,

More information

Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Graeme Blackwood, Adrià de Gispert, William Byrne Machine Intelligence Laboratory Cambridge

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi

More information

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David

More information

Dual Decomposition for Natural Language Processing. Decoding complexity

Dual Decomposition for Natural Language Processing. Decoding complexity Dual Decomposition for atural Language Processing Alexander M. Rush and Michael Collins Decoding complexity focus: decoding problem for natural language tasks motivation: y = arg max y f (y) richer model

More information

The Geometry of Statistical Machine Translation

The Geometry of Statistical Machine Translation The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide

More information

Consensus Training for Consensus Decoding in Machine Translation

Consensus Training for Consensus Decoding in Machine Translation Consensus Training for Consensus Decoding in Machine Translation Adam Pauls, John DeNero and Dan Klein Computer Science Division University of California at Berkeley {adpauls,denero,klein}@cs.berkeley.edu

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for atural Language Processing Alexander M. Rush (based on joint work with Michael Collins, Tommi Jaakkola, Terry Koo, David Sontag) Uncertainty

More information

Variational Decoding for Statistical Machine Translation

Variational Decoding for Statistical Machine Translation Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1

More information

Phrase-Based Statistical Machine Translation with Pivot Languages

Phrase-Based Statistical Machine Translation with Pivot Languages Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008

More information

Decoding in Statistical Machine Translation. Mid-course Evaluation. Decoding. Christian Hardmeier

Decoding in Statistical Machine Translation. Mid-course Evaluation. Decoding. Christian Hardmeier Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation http://stp.lingfil.uu.se/~sara/kurser/mt16/ mid-course-eval.html Decoding The decoder is the part of the

More information

Margin-based Decomposed Amortized Inference

Margin-based Decomposed Amortized Inference Margin-based Decomposed Amortized Inference Gourab Kundu and Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign Urbana, IL. 61801 {kundu2, vsrikum2, danr}@illinois.edu Abstract Given

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Multiple System Combination. Jinhua Du CNGL July 23, 2008

Multiple System Combination. Jinhua Du CNGL July 23, 2008 Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments

More information

Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Abstract This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation.

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

Minimum Bayes-risk System Combination

Minimum Bayes-risk System Combination Minimum Bayes-risk System Combination Jesús González-Rubio Instituto Tecnológico de Informática U. Politècnica de València 46022 Valencia, Spain jegonzalez@iti.upv.es Alfons Juan Francisco Casacuberta

More information

Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Yin-Wen Chang MIT CSAIL Cambridge, MA 02139, USA yinwen@csail.mit.edu Michael Collins Department of Computer Science, Columbia

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation

ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way

More information

A Dynamic Programming Algorithm for Computing N-gram Posteriors from Lattices

A Dynamic Programming Algorithm for Computing N-gram Posteriors from Lattices A Dynamic Programming Algorithm for Computing N-gram Posteriors from Lattices Doğan Can Department of Computer Science University of Southern California Los Angeles, CA, USA dogancan@usc.edu Shrikanth

More information

Relaxed Marginal Inference and its Application to Dependency Parsing

Relaxed Marginal Inference and its Application to Dependency Parsing Relaxed Marginal Inference and its Application to Dependency Parsing Sebastian Riedel David A. Smith Department of Computer Science University of Massachusetts, Amherst {riedel,dasmith}@cs.umass.edu Abstract

More information

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017 Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization

More information

Discriminative Training

Discriminative Training Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state

More information

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.

More information

Improving Relative-Entropy Pruning using Statistical Significance

Improving Relative-Entropy Pruning using Statistical Significance Improving Relative-Entropy Pruning using Statistical Significance Wang Ling 1,2 N adi Tomeh 3 Guang X iang 1 Alan Black 1 Isabel Trancoso 2 (1)Language Technologies Institute, Carnegie Mellon University,

More information

Discriminative Training. March 4, 2014

Discriminative Training. March 4, 2014 Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =

More information

Incremental HMM Alignment for MT System Combination

Incremental HMM Alignment for MT System Combination Incremental HMM Alignment for MT System Combination Chi-Ho Li Microsoft Research Asia 49 Zhichun Road, Beijing, China chl@microsoft.com Yupeng Liu Harbin Institute of Technology 92 Xidazhi Street, Harbin,

More information

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

More information

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1 Lecture 9: Decoding Andreas Maletti Statistical Machine Translation Stuttgart January 20, 2012 SMT VIII A. Maletti 1 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder

More information

Structural Learning with Amortized Inference

Structural Learning with Amortized Inference Structural Learning with Amortized Inference AAAI 15 Kai-Wei Chang, Shyam Upadhyay, Gourab Kundu, and Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign {kchang10,upadhya3,kundu2,danr}@illinois.edu

More information

1 Evaluation of SMT systems: BLEU

1 Evaluation of SMT systems: BLEU 1 Evaluation of SMT systems: BLEU Idea: We want to define a repeatable evaluation method that uses: a gold standard of human generated reference translations a numerical translation closeness metric in

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 3, 7 Sep., 2016 jtl@ifi.uio.no Machine Translation Evaluation 2 1. Automatic MT-evaluation: 1. BLEU 2. Alternatives 3. Evaluation

More information

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José

More information

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned

More information

Unsupervised Model Adaptation using Information-Theoretic Criterion

Unsupervised Model Adaptation using Information-Theoretic Criterion Unsupervised Model Adaptation using Information-Theoretic Criterion Ariya Rastrow 1, Frederick Jelinek 1, Abhinav Sethy 2 and Bhuvana Ramabhadran 2 1 Human Language Technology Center of Excellence, and

More information

Feedback Loop between High Level Semantics and Low Level Vision

Feedback Loop between High Level Semantics and Low Level Vision Feedback Loop between High Level Semantics and Low Level Vision Varun K. Nagaraja Vlad I. Morariu Larry S. Davis {varun,morariu,lsd}@umiacs.umd.edu University of Maryland, College Park, MD, USA. Abstract.

More information

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best

More information

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

NLP Homework: Dependency Parsing with Feed-Forward Neural Network NLP Homework: Dependency Parsing with Feed-Forward Neural Network Submission Deadline: Monday Dec. 11th, 5 pm 1 Background on Dependency Parsing Dependency trees are one of the main representations used

More information

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source

More information

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018 Evaluation Brian Thompson slides by Philipp Koehn 25 September 2018 Evaluation 1 How good is a given machine translation system? Hard problem, since many different translations acceptable semantic equivalence

More information

Seq2Seq Losses (CTC)

Seq2Seq Losses (CTC) Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss

More information

Statistical NLP Spring HW2: PNP Classification

Statistical NLP Spring HW2: PNP Classification Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional

More information

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Optimal Beam Search for Machine Translation

Optimal Beam Search for Machine Translation Optimal Beam Search for Machine Translation Alexander M. Rush Yin-Wen Chang MIT CSAIL, Cambridge, MA 02139, USA {srush, yinwen}@csail.mit.edu Michael Collins Department of Computer Science, Columbia University,

More information

Polyhedral Outer Approximations with Application to Natural Language Parsing

Polyhedral Outer Approximations with Application to Natural Language Parsing Polyhedral Outer Approximations with Application to Natural Language Parsing André F. T. Martins 1,2 Noah A. Smith 1 Eric P. Xing 1 1 Language Technologies Institute School of Computer Science Carnegie

More information

Routing. Topics: 6.976/ESD.937 1

Routing. Topics: 6.976/ESD.937 1 Routing Topics: Definition Architecture for routing data plane algorithm Current routing algorithm control plane algorithm Optimal routing algorithm known algorithms and implementation issues new solution

More information

Out of GIZA Efficient Word Alignment Models for SMT

Out of GIZA Efficient Word Alignment Models for SMT Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability

More information

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview

More information

Algorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Classifica(on III Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley The Perceptron, Again Start with zero weights Visit training instances one by one Try to classify If correct,

More information

Optimization and Sampling for NLP from a Unified Viewpoint

Optimization and Sampling for NLP from a Unified Viewpoint Optimization and Sampling for NLP from a Unified Viewpoint Marc DYMETMAN Guillaume BOUCHARD Simon CARTER 2 () Xerox Research Centre Europe, Grenoble, France (2) ISLA, University of Amsterdam, The Netherlands

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Word Alignment for Statistical Machine Translation Using Hidden Markov Models

Word Alignment for Statistical Machine Translation Using Hidden Markov Models Word Alignment for Statistical Machine Translation Using Hidden Markov Models by Anahita Mansouri Bigvand A Depth Report Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of

More information

Minimum cost transportation problem

Minimum cost transportation problem Minimum cost transportation problem Complements of Operations Research Giovanni Righini Università degli Studi di Milano Definitions The minimum cost transportation problem is a special case of the minimum

More information

Lecture 2: Learning from Evaluative Feedback. or Bandit Problems

Lecture 2: Learning from Evaluative Feedback. or Bandit Problems Lecture 2: Learning from Evaluative Feedback or Bandit Problems 1 Edward L. Thorndike (1874-1949) Puzzle Box 2 Learning by Trial-and-Error Law of Effect: Of several responses to the same situation, those

More information

Statistical Phrase-Based Speech Translation

Statistical Phrase-Based Speech Translation Statistical Phrase-Based Speech Translation Lambert Mathias 1 William Byrne 2 1 Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University 2 Machine

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Hemanta K Maji Raghavendra Udupa U IBM India Research Laboratory, New Delhi {hemantkm, uraghave}@inibmcom Abstract Viterbi

More information

Discrete (and Continuous) Optimization Solutions of Exercises 2 WI4 131

Discrete (and Continuous) Optimization Solutions of Exercises 2 WI4 131 Discrete (and Continuous) Optimization Solutions of Exercises 2 WI4 131 Kees Roos Technische Universiteit Delft Faculteit Electrotechniek, Wiskunde en Informatica Afdeling Informatie, Systemen en Algoritmiek

More information

Marrying Dynamic Programming with Recurrent Neural Networks

Marrying Dynamic Programming with Recurrent Neural Networks Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying

More information

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer

Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

Integrating Morphology in Probabilistic Translation Models

Integrating Morphology in Probabilistic Translation Models Integrating Morphology in Probabilistic Translation Models Chris Dyer joint work with Jon Clark, Alon Lavie, and Noah Smith January 24, 2011 lti das alte Haus the old house mach das do that 2 das alte

More information

FMCAD 2013 Parameter Synthesis with IC3

FMCAD 2013 Parameter Synthesis with IC3 FMCAD 2013 Parameter Synthesis with IC3 A. Cimatti, A. Griggio, S. Mover, S. Tonetta FBK, Trento, Italy Motivations and Contributions Parametric descriptions of systems arise in many domains E.g. software,

More information

Inference by Minimizing Size, Divergence, or their Sum

Inference by Minimizing Size, Divergence, or their Sum University of Massachusetts Amherst From the SelectedWorks of Andrew McCallum 2012 Inference by Minimizing Size, Divergence, or their Sum Sebastian Riedel David A. Smith Andrew McCallum, University of

More information

Sequences and Information

Sequences and Information Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Exact Sampling and Decoding in High-Order Hidden Markov Models

Exact Sampling and Decoding in High-Order Hidden Markov Models Exact Sampling and Decoding in High-Order Hidden Markov Models Simon Carter Marc Dymetman Guillaume Bouchard ISLA, University of Amsterdam Science Park 904, 1098 XH Amsterdam, The Netherlands s.c.carter@uva.nl

More information

Word Alignment via Submodular Maximization over Matroids

Word Alignment via Submodular Maximization over Matroids Word Alignment via Submodular Maximization over Matroids Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 21, 2011 Lin and Bilmes Submodular Word Alignment June

More information

Decipherment of Substitution Ciphers with Neural Language Models

Decipherment of Substitution Ciphers with Neural Language Models Decipherment of Substitution Ciphers with Neural Language Models Nishant Kambhatla, Anahita Mansouri Bigvand, Anoop Sarkar School of Computing Science Simon Fraser University Burnaby, BC, Canada {nkambhat,amansour,anoop}@sfu.ca

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

LECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples?

LECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples? LECTURE NOTE #8 PROF. ALAN YUILLE 1. Linear Classifiers and Perceptrons A dataset contains N samples: { (x µ, y µ ) : µ = 1 to N }, y µ {±1} Can we find a linear classifier that separates the position

More information

Statistical Ranking Problem

Statistical Ranking Problem Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing

More information

Topics in Natural Language Processing

Topics in Natural Language Processing Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 5 Solving an NLP Problem When modelling a new problem in NLP, need to

More information

Feasibility Pump Heuristics for Column Generation Approaches

Feasibility Pump Heuristics for Column Generation Approaches 1 / 29 Feasibility Pump Heuristics for Column Generation Approaches Ruslan Sadykov 2 Pierre Pesneau 1,2 Francois Vanderbeck 1,2 1 University Bordeaux I 2 INRIA Bordeaux Sud-Ouest SEA 2012 Bordeaux, France,

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Computing Optimal Alignments for the IBM-3 Translation Model

Computing Optimal Alignments for the IBM-3 Translation Model Computing Optimal Alignments for the IBM-3 Translation Model Thomas Schoenemann Centre for Mathematical Sciences Lund University, Sweden Abstract Prior work on training the IBM-3 translation model is based

More information

Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization

Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur Human Language Technology Center of Excellence Center for Language

More information

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,

More information