Minimum Error Rate Training Semiring
|
|
- Lily Williamson
- 5 years ago
- Views:
Transcription
1 Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud EAMT May 2011 Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
2 Talk Plan 1 Introduction Phrase-based statistical machine translation Minimum Error Rate Training Contribution 2 Semirings Lattice MERT MERT Semiring 3 Implementation 4 Experiments Setup Results 5 Future Work Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
3 Introduction Phrase-based statistical machine translation Probability model and inference in SMT system Probability of translation e given source sentence f: p(e f) = Z(f) 1 exp( λ h(e, f)) h(e, f) feature vector (various compatibility measures of e and f) λ parameter vector, λ i regulates importance of the feature h i (e, f) Translating by MAP-inference: ẽ f ( λ) = arg max e E p(e f) = arg max λ h(e, f) e E E reachable translations (search space), can be approximated by: list of n-best hypotheses word lattice Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
4 Introduction Tuning SMT system with MERT Minimum Error Rate Training Given: development set {(f, r f )} (source f & reference r f pairs) Solve: λ = arg max BLEU({ẽ f ( λ, E( λ)), r f }) λ BLEU is non-convex and not differentiable, hence heuristics (MERT). Search space approximation depends on λ, so iterative tuning: {(f, r f )} Configuration: λ t Decoder Approximation of E Updater Scorer Updated λ t Tuning: MERT Features Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
5 Introduction Minimum Error Rate Training MERT proceeds in series of optimizations along directions r: Optimal translation: ẽ f (γ) = arg max e E λ = λ 0 + γ r λ h(e, f) = arg max e E λ 0 h(e, f) }{{} intercept +γ r h(e, f) }{{} slope each translation hypothesis is associated with a line, upper envelope: dominating lines when λ is moved along r λ h γ Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
6 Introduction Minimum Error Rate Training γ-projections of intersections give intervals of constant optimal hypothesis optimal γ found by merging intervals for f F and scoring each update λ = λ 0 + γ i r i, where i is the index of the direction yielding the highest BLEU λ h γ Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
7 Introduction Minimum Error Rate Training MERT problems very slow, because of: overall number of iterations folklore: number of iterations number of dimensions slowness of each iteration (dominated by decoding time) non-monotonicity/instability of the training process sensitivity of the resulting solutions to initial conditions Ways to tackle the problems improve optimization other target function approximations changes into optimization algorithms improve search space processing this presentation use lattices (better approximation of the complete search space) reduce search to standard operations (facilitates implementation) reduce number of iteration this presentation Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
8 Contribution Introduction Contribution Recast Lattice MERT algorithm of [Macherey et al., 2008] in a semiring framework has already been hinted to in [Dyer et al., 2010] but was never formally described lack of implementation details Reimplement MERT using this reformulation and general-purpose FST toolbox OpenFST Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
9 Semirings Lattice MERT Semirings Semiring K = K,,, 0, 1 : K,, 0 is a commutative monoid with identity element 0: a (b c) = (a b) c a b = b a a 0 = 0 a = a K,, 1 is a monoid with identity element 1 distributes over a (b c) = (a b) (a c) (b c) a = (b a) (c a) element 0 annihilates K a 0 = 0 a = 0. Examples R, +,, 0, 1 real semiring S,,,, i S i semiring of sets Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
10 Semirings Lattice MERT Lattice MERT [Macherey et al., 2008] source fr: Vénus est la jumelle infernale de la Terre target en: Venus is Earth s hellish twin Vénus:Venus h_02 2 est:is h_23 la_jumelle:twin h_35 0 Vénus_est:Venus h_01 1 NULL:- h_13 3 la:the h_34 jumelle:twin h_45 5 infernale:of_hell h_58 4 infernale:hellish h_47 infernale:infernal h_46 7 jumelle:twin h_78 jumelle:twin h_68 8 de_la_terre:of_the_earth h_ Decomposability of h(e, f) into a sum of local features h 01,h Envelopes are distributed over nodes in the lattice Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
11 MERT Semiring Semirings MERT Semiring Host set: a line: d y + d s x (hypothesis) D = D,,, 0, 1 set of lines d i : d = {d i,y + d i,s x} (set of hypotheses) set of sets d k of lines: D = { {d k i,y + d k i,s x}} Operations and : for d 1, d 2 D d 1 d 2 = env(d 1 d 2 ) d 1 d 2 = env({(di.y 1 + d j.y 2 ) + (d i.s 1 + d j.s 2 ) x d 1 i d 1, dj 2 d 2 }) Unities: 0 = 1 = {0 + 0 x} Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
12 d 1 d 2 = env(d 1 d 2 ) Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18 Semirings Semiring Operations Illustration MERT Semiring -example d 1 d 2 = env({(d 1 i.y + d 2 j.y ) + (d 1 i.s + d 2 j.s) x d 1 i d 1, d 2 j d 2 }) -example
13 Semirings MERT Semiring Shortest Paths for MERT Semiring Each arc in the FST carries: target word a vector h(a, f) of local features associated with a singleton set containing line d with slope d s = ( r h(a, f)) y-intercept d y = ( λ 0 h(a, f)) Weight of a candidate translation path e = e 1... e l : w(e) = l w(e i ) = { λ 0 i=1 l h(e i, f) + ( r i=1 Upper envelope of all the lines (hypotheses): env( e w(e)) = e w(e) = e l(e) l h(e i, f)) x} i=1 w(e i ). Generic shortest distance algorithms over acyclic graphs calculate this. Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18 i=1
14 Implementation Implementation Basics: OpenFST toolbox works with any semiring proven and well optimized ShortestPath algorithms other useful algorithms: Union, Determinize, etc. Lattice minimization: Union of lattices between decoder runs Determinize+Minimize to eliminate duplicate hypotheses won t work MERT semiring is not divisible circumvent by performing Union+Determinize over (min, +) semiring All directions simultaneously weights as arrays of envelopes random direction BLEU Random restarts help only for the first iteration Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
15 Experiments Setup Experiments Data: NewsCommentary (dev: 2051) & WMT10 (dev: 1026), common test French to English FST MERT tuning: OpenFST-based multi-threaded implementation zero restart points axes and additional random directions Baseline MERT tuning: MERT implementation included in MOSES toolkit 100-best list, 20 restart points Koehn s coordinate descend (only axis directions) Decoder: n-gram phrase-based SMT system N-code 1, 11 features 1 Demo on Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
16 Experiments Results Experiments newsco, dev newsco, test BLEU n-best MERT, dev 15.5 fst MERT (r=0), dev fst MERT (r=20), dev WMT10, dev n-best MERT, test 15.5 fst MERT (r=0), test 15.0 fst MERT (r=20), test WMT10, test BLEU n-best MERT, dev fst MERT (r=0), dev fst MERT (r=20), dev fst MERT (r=50), dev n-best MERT, test fst MERT (r=0), test fst MERT (r=20), test fst MERT (r=50), test iteration iteration Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
17 Future Work Conclusion & Future Work Conclusion Semiring formalization allows using generic FST toolkits to do MERT Convergence in less iterations Future Work Better stopping criteria to detect saturation Faster should be most helpful for speed up Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
18 Future Work Thank you for your attention! Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 18
19 Bibliography Dyer, C., Lopez, A., Ganitkevitch, J., Weese, J., Ture, F., Blunsom, P., Setiawan, H., Eidelman, V., & Resnik, P. (2010). cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models. In Proc. of the ACL (pp. 7 12). Macherey, W., Och, F. J., Thayer, I., & Uszkoreit, J. (2008). Lattice-based minimum error rate training for statistical machine translation. In Proc. of the Conf. on EMNLP (pp ). Artem Sokolov & François Yvon (LIMSI) Minimum Error Rate Training Semiring EAMT / 1
Computing Lattice BLEU Oracle Scores for Machine Translation
Computing Lattice Oracle Scores for Machine Translation Artem Sokolov & Guillaume Wisniewski & François Yvon {firstname.lastname}@limsi.fr LIMSI, Orsay, France 1 Introduction 2 Oracle Decoding Task 3 Proposed
More informationDecoding Revisited: Easy-Part-First & MERT. February 26, 2015
Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationStatistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single
More informationThe Geometry of Statistical Machine Translation
The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide
More informationFast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationDiscriminative Training
Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationStochastic Structured Prediction under Bandit Feedback
Stochastic Structured Prediction under Bandit Feedback Artem Sokolov,, Julia Kreutzer, Christopher Lo,, Stefan Riezler, Computational Linguistics & IWR, Heidelberg University, Germany {sokolov,kreutzer,riezler}@cl.uni-heidelberg.de
More informationMinimum Bayes-risk System Combination
Minimum Bayes-risk System Combination Jesús González-Rubio Instituto Tecnológico de Informática U. Politècnica de València 46022 Valencia, Spain jegonzalez@iti.upv.es Alfons Juan Francisco Casacuberta
More informationA Discriminative Model for Semantics-to-String Translation
A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley
More informationConsensus Training for Consensus Decoding in Machine Translation
Consensus Training for Consensus Decoding in Machine Translation Adam Pauls, John DeNero and Dan Klein Computer Science Division University of California at Berkeley {adpauls,denero,klein}@cs.berkeley.edu
More informationAlgorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationImproving Relative-Entropy Pruning using Statistical Significance
Improving Relative-Entropy Pruning using Statistical Significance Wang Ling 1,2 N adi Tomeh 3 Guang X iang 1 Alan Black 1 Isabel Trancoso 2 (1)Language Technologies Institute, Carnegie Mellon University,
More informationTALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto
TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder
More informationMulti-Task Word Alignment Triangulation for Low-Resource Languages
Multi-Task Word Alignment Triangulation for Low-Resource Languages Tomer Levinboim and David Chiang Department of Computer Science and Engineering University of Notre Dame {levinboim.1,dchiang}@nd.edu
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi
More informationDiscriminative Training. March 4, 2014
Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =
More informationNatural Language Processing (CSEP 517): Machine Translation
Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,
More informationA Disambiguation Algorithm for Weighted Automata
A Disambiguation Algorithm for Weighted Automata Mehryar Mohri a,b and Michael D. Riley b a Courant Institute of Mathematical Sciences, 251 Mercer Street, New York, NY 10012. b Google Research, 76 Ninth
More informationLattice-based System Combination for Statistical Machine Translation
Lattice-based System Combination for Statistical Machine Translation Yang Feng, Yang Liu, Haitao Mi, Qun Liu, Yajuan Lu Key Laboratory of Intelligent Information Processing Institute of Computing Technology
More informationOverview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments
Overview Learning phrases from alignments 6.864 (Fall 2007) Machine Translation Part III A phrase-based model Decoding in phrase-based models (Thanks to Philipp Koehn for giving me slides from his EACL
More informationMonte Carlo techniques for phrase-based translation
Monte Carlo techniques for phrase-based translation Abhishek Arun (a.arun@sms.ed.ac.uk) Barry Haddow (bhaddow@inf.ed.ac.uk) Philipp Koehn (pkoehn@inf.ed.ac.uk) Adam Lopez (alopez@inf.ed.ac.uk) University
More informationSpeech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories
Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best
More informationNeural Hidden Markov Model for Machine Translation
Neural Hidden Markov Model for Machine Translation Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan and Hermann Ney {surname}@i6.informatik.rwth-aachen.de July 17th, 2018 Human Language Technology and
More informationAutomatic Speech Recognition and Statistical Machine Translation under Uncertainty
Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.
More informationEfficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Graeme Blackwood, Adrià de Gispert, William Byrne Machine Intelligence Laboratory Cambridge
More informationProbabilistic Inference for Phrase-based Machine Translation: A Sampling Approach. Abhishek Arun
Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach Abhishek Arun Doctor of Philosophy Institute for Communicating and Collaborative Systems School of Informatics University
More informationCRF Word Alignment & Noisy Channel Translation
CRF Word Alignment & Noisy Channel Translation January 31, 2013 Last Time... X p( Translation)= p(, Translation) Alignment Alignment Last Time... X p( Translation)= p(, Translation) Alignment X Alignment
More informationCOMS 4705, Fall Machine Translation Part III
COMS 4705, Fall 2011 Machine Translation Part III 1 Roadmap for the Next Few Lectures Lecture 1 (last time): IBM Models 1 and 2 Lecture 2 (today): phrase-based models Lecture 3: Syntax in statistical machine
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationStatistical Ranking Problem
Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing
More informationStatistical NLP Spring Corpus-Based MT
Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it
More informationCorpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1
Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationToponym Disambiguation in an English-Lithuanian SMT System with Spatial Knowledge
Toponym Disambiguation in an English-Lithuanian SMT System with Spatial Knowledge Raivis Skadiņš Tilde SIA Vienibas gatve 75a, Riga, Latvia raiviss@tilde.lv Tatiana Gornostay Tilde SIA Vienibas gatve 75a,
More informationDecoding in Statistical Machine Translation. Mid-course Evaluation. Decoding. Christian Hardmeier
Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation http://stp.lingfil.uu.se/~sara/kurser/mt16/ mid-course-eval.html Decoding The decoder is the part of the
More informationCounterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation
Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation Carolin Lawrence Heidelberg University, Germany Artem Sokolov Amazon Development
More informationTriplet Lexicon Models for Statistical Machine Translation
Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language
More informationMultiple System Combination. Jinhua Du CNGL July 23, 2008
Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments
More informationBreaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation
Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation Yilin Yang 1 Liang Huang 1,2 Mingbo Ma 1,2 1 Oregon State University Corvallis, OR,
More informationLagrangian Relaxation Algorithms for Inference in Natural Language Processing
Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David
More informationstatistical machine translation
statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical
More informationMachine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples
Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned
More informationPhrase Table Pruning via Submodular Function Maximization
Phrase Table Pruning via Submodular Function Maximization Masaaki Nishino and Jun Suzuki and Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai, Seika-cho, Soraku-gun,
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationn-gram-based MT : what s behind us, what s ahead
n-gram-based MT : what s behind us, what s ahead F. Yvon and the LIMSI MT crew LIMSI CNRS and Université Paris Sud MT Marathon in Prague, Sep 8th, 215 F. Yvon (LIMSI) n-gram-based MT MTM@Prague - 215-9-8
More informationLexical Translation Models 1I
Lexical Translation Models 1I Machine Translation Lecture 5 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last Time... X p( Translation)= p(, Translation)
More informationCMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss
CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,
More informationStatistical NLP Spring HW2: PNP Classification
Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional
More informationHW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment
Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional
More informationImproved Decipherment of Homophonic Ciphers
Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,
More informationFeature-Rich Translation by Quasi-Synchronous Lattice Parsing
Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Kevin Gimpel and Noah A. Smith Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213,
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationCoactive Learning for Interactive Machine Translation
Coactive Learning for Interactive Machine ranslation Artem Sokolov Stefan Riezler Shay B. Cohen Computational Linguistics Heidelberg University 69120 Heidelberg, Germany sokolov@cl.uni-heidelberg.de Computational
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationCS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationLesson 3-2: Solving Linear Systems Algebraically
Yesterday we took our first look at solving a linear system. We learned that a linear system is two or more linear equations taken at the same time. Their solution is the point that all the lines have
More informationMachine Translation without Words through Substring Alignment
Machine Translation without Words through Substring Alignment Graham Neubig 1,2,3, Taro Watanabe 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 2 3 now at 1 Machine Translation Translate a source sentence F
More informationWord Alignment III: Fertility Models & CRFs. February 3, 2015
Word Alignment III: Fertility Models & CRFs February 3, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } {
More informationExploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation
Journal of Machine Learning Research 12 (2011) 1-30 Submitted 5/10; Revised 11/10; Published 1/11 Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation Yizhao
More informationMATH 8. Unit 1: Rational and Irrational Numbers (Term 1) Unit 2: Using Algebraic Properties to Simplify Expressions - Probability
MATH 8 Unit 1: Rational and Irrational Numbers (Term 1) 1. I CAN write an algebraic expression for a given phrase. 2. I CAN define a variable and write an equation given a relationship. 3. I CAN use order
More informationGraphical Solutions of Linear Systems
Graphical Solutions of Linear Systems Consistent System (At least one solution) Inconsistent System (No Solution) Independent (One solution) Dependent (Infinite many solutions) Parallel Lines Equations
More informationFast Collocation-Based Bayesian HMM Word Alignment
Fast Collocation-Based Bayesian HMM Word Alignment Philip Schulz ILLC University of Amsterdam P.Schulz@uva.nl Wilker Aziz ILLC University of Amsterdam W.Aziz@uva.nl Abstract We present a new Bayesian HMM
More informationarxiv: v1 [cs.cl] 22 Jun 2015
Neural Transformation Machine: A New Architecture for Sequence-to-Sequence Learning arxiv:1506.06442v1 [cs.cl] 22 Jun 2015 Fandong Meng 1 Zhengdong Lu 2 Zhaopeng Tu 2 Hang Li 2 and Qun Liu 1 1 Institute
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationMachine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017
Machine Learning Regularization and Feature Selection Fabio Vandin November 14, 2017 1 Regularized Loss Minimization Assume h is defined by a vector w = (w 1,..., w d ) T R d (e.g., linear models) Regularization
More informationAnalysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation
Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology
More informationA Systematic Comparison of Training Criteria for Statistical Machine Translation
A Systematic Comparison of Training Criteria for Statistical Machine Translation Richard Zens and Saša Hasan and Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 3, 7 Sep., 2016 jtl@ifi.uio.no Machine Translation Evaluation 2 1. Automatic MT-evaluation: 1. BLEU 2. Alternatives 3. Evaluation
More informationCSC321 Lecture 10 Training RNNs
CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw
More informationIncremental HMM Alignment for MT System Combination
Incremental HMM Alignment for MT System Combination Chi-Ho Li Microsoft Research Asia 49 Zhichun Road, Beijing, China chl@microsoft.com Yupeng Liu Harbin Institute of Technology 92 Xidazhi Street, Harbin,
More informationLanguage Model Rest Costs and Space-Efficient Storage
Language Model Rest Costs and Space-Efficient Storage Kenneth Heafield Philipp Koehn Alon Lavie Carnegie Mellon, University of Edinburgh July 14, 2012 Complaint About Language Models Make Search Expensive
More informationUtilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms
1 1 1 2 2 30% 70% 70% NTCIR-7 13% 90% 1,000 Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms Itsuki Toyota 1 Yusuke Takahashi 1 Kensaku
More informationCoverage Embedding Models for Neural Machine Translation
Coverage Embedding Models for Neural Machine Translation Haitao Mi Baskaran Sankaran Zhiguo Wang Abe Ittycheriah T.J. Watson Research Center IBM 1101 Kitchawan Rd, Yorktown Heights, NY 10598 {hmi, bsankara,
More informationMachine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26
Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and
More informationMATH 7 HONORS. Unit 1: Rational and Irrational Numbers (Term 1) Unit 2: Using Algebraic Properties to Simplify Expressions - Probability
MATH 7 HONORS Unit 1: Rational and Irrational Numbers (Term 1) 1. I CAN write an algebraic expression for a given phrase. 2. I CAN define a variable and write an equation given a relationship. 3. I CAN
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationProbabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing
Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for atural Language Processing Alexander M. Rush (based on joint work with Michael Collins, Tommi Jaakkola, Terry Koo, David Sontag) Uncertainty
More informationFast Consensus Decoding over Translation Forests
Fast Consensus Decoding over Translation Forests John DeNero Computer Science Division University of California, Berkeley denero@cs.berkeley.edu David Chiang and Kevin Knight Information Sciences Institute
More informationOptimal Beam Search for Machine Translation
Optimal Beam Search for Machine Translation Alexander M. Rush Yin-Wen Chang MIT CSAIL, Cambridge, MA 02139, USA {srush, yinwen}@csail.mit.edu Michael Collins Department of Computer Science, Columbia University,
More informationToday s Lecture. Dropout
Today s Lecture so far we discussed RNNs as encoder and decoder we discussed some architecture variants: RNN vs. GRU vs. LSTM attention mechanisms Machine Translation 1: Advanced Neural Machine Translation
More informationCorrelation of From Patterns to Algebra to the Ontario Mathematics Curriculum (Grade 4)
Correlation of From Patterns to Algebra to the Ontario Mathematics Curriculum (Grade 4) Strand: Patterning and Algebra Ontario Curriculum Outcomes By the end of Grade 4, students will: describe, extend,
More informationOptimal Search for Minimum Error Rate Training
Optimal Search for Minimum Error Rate Training Michel Galley Microsoft Research Redmond, WA 98052, USA mgalley@microsoft.com Chris Quirk Microsoft Research Redmond, WA 98052, USA chrisq@microsoft.com Abstract
More informationIBM Model 1 for Machine Translation
IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of
More informationSolving Systems of Linear Equations
Section 2.3 Solving Systems of Linear Equations TERMINOLOGY 2.3 Previously Used: Equivalent Equations Literal Equation Properties of Equations Substitution Principle Prerequisite Terms: Coordinate Axes
More informationAn Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems
An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz
More informationA Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0
More informationDual Decomposition for Natural Language Processing. Decoding complexity
Dual Decomposition for atural Language Processing Alexander M. Rush and Michael Collins Decoding complexity focus: decoding problem for natural language tasks motivation: y = arg max y f (y) richer model
More informationVariational Decoding for Statistical Machine Translation
Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1
More informationDistance Preservation - Part 2
Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear
More information1 Evaluation of SMT systems: BLEU
1 Evaluation of SMT systems: BLEU Idea: We want to define a repeatable evaluation method that uses: a gold standard of human generated reference translations a numerical translation closeness metric in
More information1.5 F15 O Brien. 1.5: Linear Equations and Inequalities
1.5: Linear Equations and Inequalities I. Basic Terminology A. An equation is a statement that two expressions are equal. B. To solve an equation means to find all of the values of the variable that make
More informationIntroduction to Kleene Algebras
Introduction to Kleene Algebras Riccardo Pucella Basic Notions Seminar December 1, 2005 Introduction to Kleene Algebras p.1 Idempotent Semirings An idempotent semiring is a structure S = (S, +,, 1, 0)
More informationMachine Translation Evaluation
Machine Translation Evaluation Sara Stymne 2017-03-29 Partly based on Philipp Koehn s slides for chapter 8 Why Evaluation? How good is a given machine translation system? Which one is the best system for
More information