Computing Lattice BLEU Oracle Scores for Machine Translation
|
|
- Luke Hill
- 5 years ago
- Views:
Transcription
1 Computing Lattice Oracle Scores for Machine Translation Artem Sokolov & Guillaume Wisniewski & François Yvon LIMSI, Orsay, France 1 Introduction 2 Oracle Decoding Task 3 Proposed Oracle Decoders 4 Experiments 5 Conclusion & Future Work
2 Model & Inference in SMT Usual translating of sentences f: best translation: e = arg max e E(f) score(e f) e.g.: score(e f) = λ h(e, f) h(e, f) features, λ params E(f) search space interested in this Example 0 source: Vénus est la jumelle infernale de la Terre search space options: list of n-best hypotheses Venus - the twin of hell of the Earth Venus is the infernal binocular of the Earth Venus is the hellish twin of the Earth word lattice Vénus : Venus Vénus_est : Venus 2 1 est : is NULL : - 3 la_jumelle : twin la : the jumelle : twin 5 4 infernale : hellish 7 infernale : infernal 6 infernale : of_hell jumelle : twin jumelle : binocular 8 de_la_terre : of_the_earth 9 Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 1 / 15
3 Motivation in reality SMT systems perform poorly knowing why is difficult because of system complexity useful to know the best possible hypotheses can help failure analysis Usefulness of (oracle) translation Finding bottlenecks if oracle is good why the system couldn t return it? if oracle is bad why doesn t E contain better ones? Learning update towards the best hypothesis In this work interested in finding best hypotheses contained in lattices two new methods, comparison with 2 existing methods Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 2 / 15
4 Oracle Decoding Task by best hypothesis in E we mean highest translation ( n ) 1/n n-({e i }, {r i }) = BrevityPenalty p m m=1 e i hypotheses, r i references, p m clipped precisions need some sentence-level approximation Exact oracle finding with sent- in n-best list straight-forward in word lattice NP-hard even for 2- [Leusch et al., 2008] so, approximations are needed focus of this presentation Oracle decoding on lattice L for reference r f π (f) = arg max (e π, r f ). π {paths on L} Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 3 / 15
5 Earlier Approaches Language Model Oracle [Li & Khudanpur, 2009] LM n-gram language model A LM built on reference r find most probable hypothesis π LM (f) = ShortestPath(L A LM(r f )) Partial Oracle [Dreyer et al., 2007] PB/PBl for partial path π let its weight be: log (π) = 1 4 m=1..4 log p m hypothesis h 1 is prefered to h 2 if log (h 1 ) > log (h 2 ) and same 3-word history or PB same 3-word history and same length PBl find shortest path πpb (f) = ShortestPath(L) Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 4 / 15
6 Problems Problems with existing approaches 1 LM distant relation to low quality 2 PB approximate search (recombination heuristics) can be resource-greedy (keeps many hypotheses until the final state) 3 Both LM & PB ignore: precision clipping brevity penalty (except PBl) corpus nature of Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 5 / 15
7 Our Contribution New oracles partially take clipping, brevity and corpus nature into account produce better hypotheses and often faster Take-away message although true oracles are NP-hard... we can find approximate oracles: very easy to calculate very high Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 6 / 15
8 Linear approximation of corpus- LB Linear approximation of [Tromble et al., 2008] 4 lin (π) = θ 0 e π + θ n c u (e π )δ u (r) n=1 u Σ n c u (e) number of occurences of u in e, δ u (r) indicator of presence of u in r. parameters: θ 0 = 1 ( brevity), θ n = 1 4p r n 1 (n-gram discounts) de Bruijn transducers n each n contains all n-grams as output symbols arcs weighted by δ u (r) 0:00/0 1:11/0 1:ε/0 1 0:ε/0 1:ε/0 10 1:101/0 0:110/0 0:100/0 1:111/0 11 1:01/0 q₀ 1:011/0 1:1/0 0:0/0 0:ε/0 0 0:10/0 1 0:ε/0 0:ε/0 0:000/0 q₀ q₀ 1:ε/0 0 1:ε/0 01 0:010/0 1:001/0 00 n automata for alphabet {0, 1} and n = Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 7 / 15
9 Linear approximation of corpus- LB Composition effect of s L n produces n-gram lattice each n-gram arc weighted by θ n, if the n-gram was in reference Find LB oracle translation for 4- weight lattice L with θ 0 weight i with θ i find shortest path π LB = ShortestPath(L ). so far all oracles: transformed and/or reweighted lattice L called ShortestPath difficult to account for clipping (it depends on the resulting path) Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 8 / 15
10 Integer Linear Programming Oracles ILP how to take clipping into account? need to fine-tune path weights reformulate 1- shortest path problem as an ILP problem: arg max ξ ξ = 1, ξ in(q fin ) ξ out(q) i {arcs} ξ ξ i (Θ 1 1 i r Θ 2 1 i r ) ξ out(q init ) ξ in(q) ξ = 1 ξ = 0, q {nodes} \ {q init, q fin } maximize 1- solution should be path arc indicator ξ i = { 1 if arc i is active in solution, 0 otherwise, Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 9 / 15
11 Taking account of clipping in ILP New variable: Clipped word counter γ w = min{ ξ, # of w in r} ξ {edges with w} ILP problem with clipping arg max Θ 1 ξ,γ w w {words} γ w Θ 2 γ w 0, γ w c w (r), γ w ξ in(q fin ) ξ out(q) ξ = 1, ξ ξ out(q init ) ξ in(q) ξ = 1 i {arcs} ξ i ξ {edges carring w} w {words} ξ = 0, q {nodes} \ {q init, q fin } ξ γ w many ILP solvers available we used Gurobi Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 10 / 15
12 Langrangian Relaxation-based Oracle Decoder RLX can be done without 3rd-party solvers Lagrangian relaxation to deal with clipping Primal arg min ξ i θ i ξ {paths} ξ with w i arcs ξ c w (r), w r arg max L(λ) λ λ w 0, w r Dual Where L(λ) = min ξ i {arcs} ξ i θ i + w r λ w ξ with w ξ c w (r) Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 11 / 15
13 Langrangian Relaxation-based Oracle Decoder L(λ) = min ξ ξ = arg min ξ L(λ) λ w = λ w ξ i θ i + i {arcs} w r L(λ) = arg min ξ ξ Ω(w) ξ ξ c w (r) i {arcs} ξ with w ξ c w (r) ξ i ( λw(ξi ) θ i ) just find shortest path update λ towards max L(λ) RLX Incremental Constraint Enforcing cf. [Rush et al., 2010] w, λ (0) w 0 for t = 1 T do ξ (t) = arg min ξ i ξ i (λ ) w(ξi) θ i if all clipping constraints are enforced then optimal solution found else for w r do n w n. of occurrences of w in ξ (t) λ (t) w λ (t) w λ (t) w + α (t) (n w c w (r)) max(0, λ (t) w ) this was for 1- for n- run on L n Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 12 / 15
14 Experiments: Performance for two decoders N-code avg. time n-best RLX ILP LB4 LB2 PB PBl LM4 LM avg. time, s avg. time n-best RLX ILP LB4 LB2 PB PBl LM4LM avg. time, s avg. time n-best RLX ILP LB4 LB2 PB PBl LM4LM avg. time, s (a) fr en (test: 27.88) (b) de en (test: 22.05) Moses (c) en de (test: 15.83) avg. time n-best RLX ILP LB4 LB2 PB PBl LM4 LM2 3 avg. time, s RLX ILP LB4 LB2 PB PBl LM4 LM2 4 3 avg. time, s RLX ILP LB4 LB2 PB PBl LM4 LM avg. time, s (d) fr en (test: 27.68) (e) de en (test: 21.85) (f) en de (test: 15.89) Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 13 / 15
15 Conclusion & Future Work Results new oracle decoders find better hypotheses...and do it faster (clipping/brevity/corpus) optimization for 2- is sufficient to obtain good 4- Main Conclusion lattices contain excellent translations points to n-best list oracles +20 points above results of state-of-the-art models SMT has serious scoring problems Future Work linear models not flexible enough? non-linear score func. more features needed? rich features training flawed? discriminative training with lattice oracles Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 14 / 15
16 Thank you for your attention! Questions? Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 15 / 15
17 Bibliography Dreyer, M., Hall, K. B., & Khudanpur, S. P. (2007). Comparing reordering constraints for SMT using efficient oracle computation. In Proc. of the Workshop on Syntax and Structure in Statistical Translation (pp ). Morristown, NJ, USA. Leusch, G., Matusov, E., & Ney, H. (2008). Complexity of finding the -optimal hypothesis in a confusion network. In Proc. of the 2008 Conf. on EMNLP (pp ). Honolulu, Hawaii. Li, Z. & Khudanpur, S. (2009). Efficient extraction of oracle-best translations from hypergraphs. In Proc. of Human Language Technologies: The 2009 Annual Conf. of the North American Chapter of the ACL, Companion Volume: Short Papers (pp. 9 12). Morristown, NJ, USA. Rush, A. M., Sontag, D., Collins, M., & Jaakkola, T. (2010). On dual decomposition and linear programming relaxations for natural language processing. In Proc. of the 2010 Conf. on EMNLP (pp. 1 11). Stroudsburg, PA, USA. Tromble, R. W., Kumar, S., Och, F., & Macherey, W. (2008). Lattice minimum bayes-risk decoding for statistical machine translation. In Proc. of the Conf. on EMNLP (pp ). Stroudsburg, PA, USA. Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 1 / 2
18 Implementation OpenFST toolbox works with any semiring (PB required semiring of sets) proven and well optimized ShortestPath algorithms Gurobi ILP solver Artem Sokolov, G. Wisniewski, F. Yvon LIMSI-CNRS Lattice Oracles for SMT 2 / 2
Minimum Error Rate Training Semiring
Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)
More informationDecoding Revisited: Easy-Part-First & MERT. February 26, 2015
Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische
More informationStatistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single
More informationImproved Decipherment of Homophonic Ciphers
Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,
More informationEfficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Graeme Blackwood, Adrià de Gispert, William Byrne Machine Intelligence Laboratory Cambridge
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi
More informationLagrangian Relaxation Algorithms for Inference in Natural Language Processing
Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David
More informationDual Decomposition for Natural Language Processing. Decoding complexity
Dual Decomposition for atural Language Processing Alexander M. Rush and Michael Collins Decoding complexity focus: decoding problem for natural language tasks motivation: y = arg max y f (y) richer model
More informationThe Geometry of Statistical Machine Translation
The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide
More informationConsensus Training for Consensus Decoding in Machine Translation
Consensus Training for Consensus Decoding in Machine Translation Adam Pauls, John DeNero and Dan Klein Computer Science Division University of California at Berkeley {adpauls,denero,klein}@cs.berkeley.edu
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationProbabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing
Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for atural Language Processing Alexander M. Rush (based on joint work with Michael Collins, Tommi Jaakkola, Terry Koo, David Sontag) Uncertainty
More informationVariational Decoding for Statistical Machine Translation
Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationDecoding in Statistical Machine Translation. Mid-course Evaluation. Decoding. Christian Hardmeier
Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation http://stp.lingfil.uu.se/~sara/kurser/mt16/ mid-course-eval.html Decoding The decoder is the part of the
More informationMargin-based Decomposed Amortized Inference
Margin-based Decomposed Amortized Inference Gourab Kundu and Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign Urbana, IL. 61801 {kundu2, vsrikum2, danr}@illinois.edu Abstract Given
More informationProbabilistic Graphical Models
Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference
More informationMultiple System Combination. Jinhua Du CNGL July 23, 2008
Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments
More informationExact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Abstract This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation.
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationMinimum Bayes-risk System Combination
Minimum Bayes-risk System Combination Jesús González-Rubio Instituto Tecnológico de Informática U. Politècnica de València 46022 Valencia, Spain jegonzalez@iti.upv.es Alfons Juan Francisco Casacuberta
More informationExact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation Yin-Wen Chang MIT CSAIL Cambridge, MA 02139, USA yinwen@csail.mit.edu Michael Collins Department of Computer Science, Columbia
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationStatistical Machine Translation
Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation
ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way
More informationA Dynamic Programming Algorithm for Computing N-gram Posteriors from Lattices
A Dynamic Programming Algorithm for Computing N-gram Posteriors from Lattices Doğan Can Department of Computer Science University of Southern California Los Angeles, CA, USA dogancan@usc.edu Shrikanth
More informationRelaxed Marginal Inference and its Application to Dependency Parsing
Relaxed Marginal Inference and its Application to Dependency Parsing Sebastian Riedel David A. Smith Department of Computer Science University of Massachusetts, Amherst {riedel,dasmith}@cs.umass.edu Abstract
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization
More informationDiscriminative Training
Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder
More informationDual Decomposition for Inference
Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for
More informationDoctoral Course in Speech Recognition. May 2007 Kjell Elenius
Doctoral Course in Speech Recognition May 2007 Kjell Elenius CHAPTER 12 BASIC SEARCH ALGORITHMS State-based search paradigm Triplet S, O, G S, set of initial states O, set of operators applied on a state
More informationAutomatic Speech Recognition and Statistical Machine Translation under Uncertainty
Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.
More informationImproving Relative-Entropy Pruning using Statistical Significance
Improving Relative-Entropy Pruning using Statistical Significance Wang Ling 1,2 N adi Tomeh 3 Guang X iang 1 Alan Black 1 Isabel Trancoso 2 (1)Language Technologies Institute, Carnegie Mellon University,
More informationDiscriminative Training. March 4, 2014
Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =
More informationIncremental HMM Alignment for MT System Combination
Incremental HMM Alignment for MT System Combination Chi-Ho Li Microsoft Research Asia 49 Zhichun Road, Beijing, China chl@microsoft.com Yupeng Liu Harbin Institute of Technology 92 Xidazhi Street, Harbin,
More informationLow-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park
Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features
More informationAn Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems
An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz
More informationIBM Model 1 for Machine Translation
IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationLecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1
Lecture 9: Decoding Andreas Maletti Statistical Machine Translation Stuttgart January 20, 2012 SMT VIII A. Maletti 1 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder
More informationStructural Learning with Amortized Inference
Structural Learning with Amortized Inference AAAI 15 Kai-Wei Chang, Shyam Upadhyay, Gourab Kundu, and Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign {kchang10,upadhya3,kundu2,danr}@illinois.edu
More information1 Evaluation of SMT systems: BLEU
1 Evaluation of SMT systems: BLEU Idea: We want to define a repeatable evaluation method that uses: a gold standard of human generated reference translations a numerical translation closeness metric in
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 3, 7 Sep., 2016 jtl@ifi.uio.no Machine Translation Evaluation 2 1. Automatic MT-evaluation: 1. BLEU 2. Alternatives 3. Evaluation
More informationTALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto
TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José
More informationMachine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples
Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned
More informationUnsupervised Model Adaptation using Information-Theoretic Criterion
Unsupervised Model Adaptation using Information-Theoretic Criterion Ariya Rastrow 1, Frederick Jelinek 1, Abhinav Sethy 2 and Bhuvana Ramabhadran 2 1 Human Language Technology Center of Excellence, and
More informationFeedback Loop between High Level Semantics and Low Level Vision
Feedback Loop between High Level Semantics and Low Level Vision Varun K. Nagaraja Vlad I. Morariu Larry S. Davis {varun,morariu,lsd}@umiacs.umd.edu University of Maryland, College Park, MD, USA. Abstract.
More informationSpeech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories
Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best
More informationNLP Homework: Dependency Parsing with Feed-Forward Neural Network
NLP Homework: Dependency Parsing with Feed-Forward Neural Network Submission Deadline: Monday Dec. 11th, 5 pm 1 Background on Dependency Parsing Dependency trees are one of the main representations used
More informationFast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science
More informationAnalysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation
Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source
More informationEvaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018
Evaluation Brian Thompson slides by Philipp Koehn 25 September 2018 Evaluation 1 How good is a given machine translation system? Hard problem, since many different translations acceptable semantic equivalence
More informationSeq2Seq Losses (CTC)
Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss
More informationStatistical NLP Spring HW2: PNP Classification
Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional
More informationHW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment
Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationOptimal Beam Search for Machine Translation
Optimal Beam Search for Machine Translation Alexander M. Rush Yin-Wen Chang MIT CSAIL, Cambridge, MA 02139, USA {srush, yinwen}@csail.mit.edu Michael Collins Department of Computer Science, Columbia University,
More informationPolyhedral Outer Approximations with Application to Natural Language Parsing
Polyhedral Outer Approximations with Application to Natural Language Parsing André F. T. Martins 1,2 Noah A. Smith 1 Eric P. Xing 1 1 Language Technologies Institute School of Computer Science Carnegie
More informationRouting. Topics: 6.976/ESD.937 1
Routing Topics: Definition Architecture for routing data plane algorithm Current routing algorithm control plane algorithm Optimal routing algorithm known algorithms and implementation issues new solution
More informationOut of GIZA Efficient Word Alignment Models for SMT
Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationNatural Language Processing
Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability
More informationNatural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu
Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationAlgorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview
More informationAlgorithms for NLP. Classifica(on III. Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classifica(on III Taylor Berg- Kirkpatrick CMU Slides: Dan Klein UC Berkeley The Perceptron, Again Start with zero weights Visit training instances one by one Try to classify If correct,
More informationOptimization and Sampling for NLP from a Unified Viewpoint
Optimization and Sampling for NLP from a Unified Viewpoint Marc DYMETMAN Guillaume BOUCHARD Simon CARTER 2 () Xerox Research Centre Europe, Grenoble, France (2) ISLA, University of Amsterdam, The Netherlands
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationApplications of Tree Automata Theory Lecture VI: Back to Machine Translation
Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationWord Alignment for Statistical Machine Translation Using Hidden Markov Models
Word Alignment for Statistical Machine Translation Using Hidden Markov Models by Anahita Mansouri Bigvand A Depth Report Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of
More informationMinimum cost transportation problem
Minimum cost transportation problem Complements of Operations Research Giovanni Righini Università degli Studi di Milano Definitions The minimum cost transportation problem is a special case of the minimum
More informationLecture 2: Learning from Evaluative Feedback. or Bandit Problems
Lecture 2: Learning from Evaluative Feedback or Bandit Problems 1 Edward L. Thorndike (1874-1949) Puzzle Box 2 Learning by Trial-and-Error Law of Effect: Of several responses to the same situation, those
More informationStatistical Phrase-Based Speech Translation
Statistical Phrase-Based Speech Translation Lambert Mathias 1 William Byrne 2 1 Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University 2 Machine
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Hemanta K Maji Raghavendra Udupa U IBM India Research Laboratory, New Delhi {hemantkm, uraghave}@inibmcom Abstract Viterbi
More informationDiscrete (and Continuous) Optimization Solutions of Exercises 2 WI4 131
Discrete (and Continuous) Optimization Solutions of Exercises 2 WI4 131 Kees Roos Technische Universiteit Delft Faculteit Electrotechniek, Wiskunde en Informatica Afdeling Informatie, Systemen en Algoritmiek
More informationMarrying Dynamic Programming with Recurrent Neural Networks
Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying
More informationImproving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer
Improving the Multi-Stack Decoding Algorithm in a Segment-based Speech Recognizer Gábor Gosztolya, András Kocsor Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationIntegrating Morphology in Probabilistic Translation Models
Integrating Morphology in Probabilistic Translation Models Chris Dyer joint work with Jon Clark, Alon Lavie, and Noah Smith January 24, 2011 lti das alte Haus the old house mach das do that 2 das alte
More informationFMCAD 2013 Parameter Synthesis with IC3
FMCAD 2013 Parameter Synthesis with IC3 A. Cimatti, A. Griggio, S. Mover, S. Tonetta FBK, Trento, Italy Motivations and Contributions Parametric descriptions of systems arise in many domains E.g. software,
More informationInference by Minimizing Size, Divergence, or their Sum
University of Massachusetts Amherst From the SelectedWorks of Andrew McCallum 2012 Inference by Minimizing Size, Divergence, or their Sum Sebastian Riedel David A. Smith Andrew McCallum, University of
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationExact Sampling and Decoding in High-Order Hidden Markov Models
Exact Sampling and Decoding in High-Order Hidden Markov Models Simon Carter Marc Dymetman Guillaume Bouchard ISLA, University of Amsterdam Science Park 904, 1098 XH Amsterdam, The Netherlands s.c.carter@uva.nl
More informationWord Alignment via Submodular Maximization over Matroids
Word Alignment via Submodular Maximization over Matroids Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 21, 2011 Lin and Bilmes Submodular Word Alignment June
More informationDecipherment of Substitution Ciphers with Neural Language Models
Decipherment of Substitution Ciphers with Neural Language Models Nishant Kambhatla, Anahita Mansouri Bigvand, Anoop Sarkar School of Computing Science Simon Fraser University Burnaby, BC, Canada {nkambhat,amansour,anoop}@sfu.ca
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationLECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples?
LECTURE NOTE #8 PROF. ALAN YUILLE 1. Linear Classifiers and Perceptrons A dataset contains N samples: { (x µ, y µ ) : µ = 1 to N }, y µ {±1} Can we find a linear classifier that separates the position
More informationStatistical Ranking Problem
Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing
More informationTopics in Natural Language Processing
Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 5 Solving an NLP Problem When modelling a new problem in NLP, need to
More informationFeasibility Pump Heuristics for Column Generation Approaches
1 / 29 Feasibility Pump Heuristics for Column Generation Approaches Ruslan Sadykov 2 Pierre Pesneau 1,2 Francois Vanderbeck 1,2 1 University Bordeaux I 2 INRIA Bordeaux Sud-Ouest SEA 2012 Bordeaux, France,
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationComputing Optimal Alignments for the IBM-3 Translation Model
Computing Optimal Alignments for the IBM-3 Translation Model Thomas Schoenemann Centre for Mathematical Sciences Lund University, Sweden Abstract Prior work on training the IBM-3 translation model is based
More informationAdapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization
Adapting n-gram Maximum Entropy Language Models with Conditional Entropy Regularization Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur Human Language Technology Center of Excellence Center for Language
More informationAlgorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,
More information