Triplet Lexicon Models for Statistical Machine Translation
|
|
- Elaine Lorraine Sparks
- 5 years ago
- Views:
Transcription
1 Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer CLSP Student Seminar February 6, 2009 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6 Computer Science Department RWTH Aachen University, Germany Ganitkevitch Triplet Lexicon Models for SMT 1 / 18 February 6, 2009
2 Introduction - Models in SMT Lexical translation models e.g. IBM model 1 (Brown et al.; 93) p(f J 1 ei 1 ) = 1 (I + 1) J J j=1 I p ibm1 (f j e i ) i=0 Phrase-based translation (Zens, Och; 02) Using segmentation S = ( f K 1, ẽk 1, ãk 1 ): Pr(f J 1 ei 1 ) = S Pr(S e I 1 ) Pr( f K 1 ẽk 1, ãk 1 )? day of time a suggest may I if wenn ich eine Uhrzeit vorschlagen darf? Language modeling: n-gram models Ganitkevitch Triplet Lexicon Models for SMT 2 / 18 February 6, 2009
3 Motivation & Related Work Most current approaches feature local context models.... the bill el proyecto de ley.... pay pagar.... Global context information could help with WSD, paraphrasing, stylistic coherence Approaches to broaden context: IBM model 1 WSD features for phrases (Carpuat, Wu; 07) WSD features for hierarchical rules (Chan, Ng, Chiang; 07) Ganitkevitch Triplet Lexicon Models for SMT 3 / 18 February 6, 2009
4 Triplet Model Extension of IBM model 1 by second trigger Capture word splits (e.g. verb-prefix) Train dependencies across phrases Lexical WSD with broad context Maximum likelihood training via EM algorithm Probablity for unconstrained triplet model p all : p all (f j e I 1 ) = 2 I(I 1) I i=1 I k=i+1 α all (f j e i, e k ). an mal einfach ich fange dann I will just start. Optional: min k i max Ganitkevitch Triplet Lexicon Models for SMT 4 / 18 February 6, 2009
5 Trigger Space For every f j sum over all (e, e ) in trigger space TS(j) TS(j) = { (e i, e k ) i, k {1,..., I}, i k } e I 1 e I 1 e' e Ganitkevitch Triplet Lexicon Models for SMT 5 / 18 February 6, 2009
6 Trigger Space Comparison Unconstrained triplet model p all (f e, e ): e' Context analogous to IBM1 Large lexica ( 4.6G triplets on EPPS) Long training times ( 9h per iteration on EPPS) e Unconstrained triplet model with max : e' Restricts e to local words around e Lexicon size reduced ( 2.2G for max = 10) Significatly shorter training time ( 3h on EPPS) e Ganitkevitch Triplet Lexicon Models for SMT 6 / 18 February 6, 2009
7 Trigger Space Comparison Phrase-bounded triplet model p phr (f e, e ): e' e~ Distinguishes local and distant context via forced alignments Significantly smaller lexica ( 1.3G on EPPS) Shorter training times ( 2.5h on EPPS) e~ e Path-aligned triplet model p align (f e, e ): e' Fixes e to pre-aligned words Small lexicon size ( 260M on EPPS) Short training times ( 1h on EPPS) e Ganitkevitch Triplet Lexicon Models for SMT 7 / 18 February 6, 2009
8 Examples - Lexicon Entries EPPS, TC-Star 2007, English-Spanish, p all & p align p all e e f p(f e, e ) paguen 0.52 bill countries agrario 0.36 países 0.08 p all bill taxpayer pagar 0.50 factura 0.30 contribuyente 0.16 p align bill taxpayer factura 1.00 draft ley 0.96 factura 0.60 agriculture ley 0.39 Ganitkevitch Triplet Lexicon Models for SMT 8 / 18 February 6, 2009
9 Corpora & Rescoring Models applied in n-best rescoring framework Corpora and n-best lists used in experiments TER Lang. Training sent. n-best lists 1-best Oracle IWSLT 2007 ChEn 43k 10k-best, PBT IWSLT 2008 ChEn 38k + 120k 10k-best, SysComb TC-Star 2007 EnEs 1.3M 10k-best, PBT Ganitkevitch Triplet Lexicon Models for SMT 9 / 18 February 6, 2009
10 Training Iterations - IWSLT 2007 ChEn Perplexity of training and development sets, p all fe, e 0, occ 1 IWSLT 2007 Chinese-English, test05 used as development set train PPL dev PPL dev TER perplexity TER EM iterations Ganitkevitch Triplet Lexicon Models for SMT 10 / 18 February 6, 2009
11 Histogram Pruning - IWSLT 2007 ChEn Histogram pruning & coverage: p all fe lexicon size 12M 10M 4M 2M lexicon size % of events covered coverage (%) TER dev TER occurence threshold Ganitkevitch Triplet Lexicon Models for SMT 11 / 18 February 6, 2009
12 Effect of Maximum Distance Constraint - EPPS EnEs TC-Star 2007, English-Spanish, PBT lists, oracle TER p align ef + fe, 10 EM iterations, using IBM4 word alignments dev07 test06 test07 BLEU TER BLEU TER BLEU TER baseline p align, occ max = max = max = Ganitkevitch Triplet Lexicon Models for SMT 12 / 18 February 6, 2009
13 Variant Comparison - EPPS EnEs TC-Star 2007, English-Spanish, PBT lists, oracle TER All ef + fe, 10 EM iterations dev07 test06 test07 BLEU TER BLEU TER BLEU TER Memory Time baseline IBM1 fe p align, max = 5, occ G 8.4h p phr, max =, occ G 24h p all, max = 10, occ G 28h Ganitkevitch Triplet Lexicon Models for SMT 13 / 18 February 6, 2009
14 Evaluation Results - IWSLT 2008 ChEn IWSLT 2008, Chinese-English, BTEC CRR System combination lists, oracle TER 20.13, optimized on test05 Using p all, ef +fe, no e 0, occ 2, 20 EM iterations, trained on additional HIT data test05 test08 BLEU TER BLEU WER baseline IBM p all NGram + WP + LM IBM p all Ganitkevitch Triplet Lexicon Models for SMT 14 / 18 February 6, 2009
15 Examples - Translation Improvements IWSLT 2008, Chinese-English, BTEC CRR source 我要靠近海德公园的酒店 reference I would like a hotel near the Hyde Park. baseline I would like Hyde Park Hotel. p all I would like a hotel close to Hyde Park. GALE 2008, Chinese-English, Newswire source 中国队下半场也换了 3 名球员, 效果则不佳 reference china also substituted three players in the second half... baseline the chinese team in the second half, have been changed for three... p all the chinese team also replaced three players in the second half... Ganitkevitch Triplet Lexicon Models for SMT 15 / 18 February 6, 2009
16 Summary Introduced triplet model p all (f e, e ) Large lexicon sizes Histogram pruning Constrained models p phr and p align Long training times Maximum distance constraint max EM training converges quickly Triplet model is competetive in rescoring on wide range of tasks Slight improvements over IBM1 ( BLEU) Ganitkevitch Triplet Lexicon Models for SMT 16 / 18 February 6, 2009
17 Outlook Incorporation into the decoder Explicitly restrict trigger positions: i < k for (f, e i, e k ) Train triplet model on word classes to reduce lexicon size: p all (f e, c e ) Introduce a trigger distance prior: p all (f e, e ) d(f, e, e ) Ganitkevitch Triplet Lexicon Models for SMT 17 / 18 February 6, 2009
18 Thank you for your attention Juri Ganitkevitch Ganitkevitch Triplet Lexicon Models for SMT 18 / 18 February 6, 2009
19 Literature Dempster, Laird; 77 A. P. Dempster, N. M. Laird, and D. B. Rubin: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, Vol. 39, No. 1, pp. 1 22, Brown, Mercer; 93 Brown, Della Pietra, Della Pietra, Mercer (IBM Research): The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, Vol. 19, No. 2, pp , Rosenfeld; 96 Rosenfeld (CMU): A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer, Speech and Language, Vol. 10, pp , Tillmann, Ney; 97 Tillmann, Ney (RWTH Aachen): Word Triggers and the EM Algorithm. Proc. of SIG Workshop on Computational Natural Language Learning (ACL), pp , Zens, Och; 02 Zens, Och, Ney (RWTH Aachen): Phrase-Based Statistical Machine Translation. German Conf. on Artificial Intelligence, pp , September Ganitkevitch Triplet Lexicon Models for SMT 19 / 18 February 6, 2009
20 Literature Och, Ney; 04 Och, Ney (RWTH Aachen): The alignment template approach to statistical machine translation. Computational Linguistics, Vol. 30, No. 4, pp , Carpuat, Wu; 07 Carpuat, Wu (HKU): Improving Statistical Machine Translation using Word Sense Disambiguation. Proc. of 2007 Joint Conference on Empirical Methods in NLP and CNLL, pp , Chan, Ng, Chiang; 07 Chan, Ng, Chiang (University of Singapore, ISI): Word Sense Disambiguation improves Statistical Machine Translation. Proc. of the 45th Annual Meeting of the ACL, pp , Hasan et al.; 08 S. Hasan, J. Ganitkevitch, H. Ney and J. Andrés Ferrer (RWTH Aachen): Triplet Lexicon Models for Statistical Machine Translation. Proc. of the Conference on Empirical Methods in Natural Language Processing, pp , Ganitkevitch Triplet Lexicon Models for SMT 20 / 18 February 6, 2009
21 Triplet Model Equations Unconstrained triplet model: p all (f j e I 1 ) = 2 I(I 1) I I α all (f j e i, e k ) i=1 k=i+1 Ganitkevitch Triplet Lexicon Models for SMT 21 / 18 February 6, 2009
22 Triplet Model Equations For a given sentence pair and word alignment (f J 1, ei 1, A) Let a ij = 1 f j aligned to e i, otherwise a ij = 0 A = {a ij } Path-aligned triplet model: p align (f j e I 1, A) = 1 Z j I I a ij (1 δ(i, i )) α align (f j e i, e i ), i=1 i =1 Ganitkevitch Triplet Lexicon Models for SMT 22 / 18 February 6, 2009
23 Triplet Model Equations For a given sentence pair and segmentation (f J 1, ei 1, sm 1 ) s m = ( f m, ẽ m ) Let π ij = 1 m : f j f m e i ẽ m, otherwise π ij = 0 Π = {π ij } Phrase-bounded triplet model: p phr (f j e I 1, Π) = 1 Z j I I π ij (1 π i j)α phr (f j e i, e i ), i=1 i =1 Ganitkevitch Triplet Lexicon Models for SMT 23 / 18 February 6, 2009
24 Evaluation Results - IWSLT 2008 ArEn IWSLT 2008, Arabic-English, BTEC CRR Using p all, ef + fe, no e 0, occ 2, 20 EM iterations 23k training sentences, only LM uses additional data test05 test08 BLEU TER WER BLEU WER baseline IBM1 fe p all p all + 6-gramLM + WP Ganitkevitch Triplet Lexicon Models for SMT 24 / 18 February 6, 2009
25 Examples - Lexicon Entries EPPS, TC-Star 2007, English-Spanish, IBM model 1 e f p(f e) factura 0.19 IBM1 bill ley 0.18 proyecto 0.11 pagar 0.07 Ganitkevitch Triplet Lexicon Models for SMT 25 / 18 February 6, 2009
Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation
Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationTALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto
TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationFast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science
More informationOut of GIZA Efficient Word Alignment Models for SMT
Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza
More informationGeneralized Stack Decoding Algorithms for Statistical Machine Translation
Generalized Stack Decoding Algorithms for Statistical Machine Translation Daniel Ortiz Martínez Inst. Tecnológico de Informática Univ. Politécnica de Valencia 4607 Valencia, Spain dortiz@iti.upv.es Ismael
More informationA Systematic Comparison of Training Criteria for Statistical Machine Translation
A Systematic Comparison of Training Criteria for Statistical Machine Translation Richard Zens and Saša Hasan and Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationA phrase-based hidden Markov model approach to machine translation
A phrase-based hidden Markov model approach to machine translation Jesús Andrés-Ferrer Universidad Politécnica de Valencia Dept. Sist. Informáticos y Computación jandres@dsic.upv.es Alfons Juan-Císcar
More informationMultiple System Combination. Jinhua Du CNGL July 23, 2008
Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments
More informationNeural Hidden Markov Model for Machine Translation
Neural Hidden Markov Model for Machine Translation Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan and Hermann Ney {surname}@i6.informatik.rwth-aachen.de July 17th, 2018 Human Language Technology and
More informationMulti-Task Word Alignment Triangulation for Low-Resource Languages
Multi-Task Word Alignment Triangulation for Low-Resource Languages Tomer Levinboim and David Chiang Department of Computer Science and Engineering University of Notre Dame {levinboim.1,dchiang}@nd.edu
More informationstatistical machine translation
statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical
More informationThe Geometry of Statistical Machine Translation
The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide
More informationNatural Language Processing (CSEP 517): Machine Translation
Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,
More informationStatistical Phrase-Based Speech Translation
Statistical Phrase-Based Speech Translation Lambert Mathias 1 William Byrne 2 1 Center for Language and Speech Processing Department of Electrical and Computer Engineering Johns Hopkins University 2 Machine
More informationIBM Model 1 for Machine Translation
IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationSpeech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories
Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best
More informationA Recursive Statistical Translation Model
A Recursive Statistical Translation Model Juan Miguel Vilar Dpto. de Lenguajes y Sistemas Informáticos Universitat Jaume I Castellón (Spain) jvilar@lsi.uji.es Enrique Vidal Dpto. de Sistemas Informáticos
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder
More informationAutomatic Speech Recognition and Statistical Machine Translation under Uncertainty
Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.
More informationA Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0
More informationStructure and Complexity of Grammar-Based Machine Translation
Structure and of Grammar-Based Machine Translation University of Padua, Italy New York, June 9th, 2006 1 2 Synchronous context-free grammars Definitions Computational problems 3 problem SCFG projection
More informationAlgorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview
More informationStatistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi
More informationUtilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms
1 1 1 2 2 30% 70% 70% NTCIR-7 13% 90% 1,000 Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms Itsuki Toyota 1 Yusuke Takahashi 1 Kensaku
More informationImproving Relative-Entropy Pruning using Statistical Significance
Improving Relative-Entropy Pruning using Statistical Significance Wang Ling 1,2 N adi Tomeh 3 Guang X iang 1 Alan Black 1 Isabel Trancoso 2 (1)Language Technologies Institute, Carnegie Mellon University,
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations
More informationPhrasetable Smoothing for Statistical Machine Translation
Phrasetable Smoothing for Statistical Machine Translation George Foster and Roland Kuhn and Howard Johnson National Research Council Canada Ottawa, Ontario, Canada firstname.lastname@nrc.gc.ca Abstract
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationWord Alignment for Statistical Machine Translation Using Hidden Markov Models
Word Alignment for Statistical Machine Translation Using Hidden Markov Models by Anahita Mansouri Bigvand A Depth Report Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of
More informationComparison of Log-Linear Models and Weighted Dissimilarity Measures
Comparison of Log-Linear Models and Weighted Dissimilarity Measures Daniel Keysers 1, Roberto Paredes 2, Enrique Vidal 2, and Hermann Ney 1 1 Lehrstuhl für Informatik VI, Computer Science Department RWTH
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationN-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24
L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,
More informationSYNTHER A NEW M-GRAM POS TAGGER
SYNTHER A NEW M-GRAM POS TAGGER David Sündermann and Hermann Ney RWTH Aachen University of Technology, Computer Science Department Ahornstr. 55, 52056 Aachen, Germany {suendermann,ney}@cs.rwth-aachen.de
More informationInvestigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation
Investigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation Spyros Martzoukos Christophe Costa Florêncio and Christof Monz Intelligent Systems Lab
More informationStatistical NLP Spring Corpus-Based MT
Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it
More informationCorpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1
Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source
More informationBayesian Learning of Non-compositional Phrases with Synchronous Parsing
Bayesian Learning of Non-compositional Phrases with Synchronous Parsing Hao Zhang Computer Science Department University of Rochester Rochester, NY 14627 zhanghao@cs.rochester.edu Chris Quirk Microsoft
More informationComputing Optimal Alignments for the IBM-3 Translation Model
Computing Optimal Alignments for the IBM-3 Translation Model Thomas Schoenemann Centre for Mathematical Sciences Lund University, Sweden Abstract Prior work on training the IBM-3 translation model is based
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationN-gram Language Modeling Tutorial
N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationWord Alignment by Thresholded Two-Dimensional Normalization
Word Alignment by Thresholded Two-Dimensional Normalization Hamidreza Kobdani, Alexander Fraser, Hinrich Schütze Institute for Natural Language Processing University of Stuttgart Germany {kobdani,fraser}@ims.uni-stuttgart.de
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.
More informationNatural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale
Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Noah Smith c 2017 University of Washington nasmith@cs.washington.edu May 22, 2017 1 / 30 To-Do List Online
More informationImproved Decipherment of Homophonic Ciphers
Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,
More informationStatistical Machine Translation
Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School University of Pisa Pisa, 7-19 May 2008 Part V: Language Modeling 1 Comparing ASR and statistical MT N-gram
More informationMinimum Error Rate Training Semiring
Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationSpeech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 5: N-gram Language Models Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Components Acoustic and pronunciation model: Pr(o w) =
More informationEfficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices Graeme Blackwood, Adrià de Gispert, William Byrne Machine Intelligence Laboratory Cambridge
More informationProbabilistic Word Alignment under the L 0 -norm
15th Conference on Computational Natural Language Learning, Portland, Oregon, 2011 Probabilistic Word Alignment under the L 0 -norm Thomas Schoenemann Center for Mathematical Sciences Lund University,
More informationDepartment of Computer Science and Engineering, Department of Electronic and Computer Engineering, HKUST, Hong Kong, Dec. 04, 2012
Department of Computer Science and Engineering, Department of Electronic and Computer Engineering, HKUST, Hong Kong, Dec. 04, 2012 The Statistical Approach to Speech Recognition and Natural Language Processing:
More informationGappy Phrasal Alignment by Agreement
Gappy Phrasal Alignment by Agreement Mohit Bansal UC Berkeley, CS Division mbansal@cs.berkeley.edu Chris Quirk Microsoft Research chrisq@microsoft.com Robert C. Moore Google Research robert.carter.moore@gmail.com
More informationCS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18 Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011 Going forward
More informationIntegrating Morphology in Probabilistic Translation Models
Integrating Morphology in Probabilistic Translation Models Chris Dyer joint work with Jon Clark, Alon Lavie, and Noah Smith January 24, 2011 lti das alte Haus the old house mach das do that 2 das alte
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationMachine Learning for natural language processing
Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a
More informationNatural Language Processing. Statistical Inference: n-grams
Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability
More informationMinimum Bayes-risk System Combination
Minimum Bayes-risk System Combination Jesús González-Rubio Instituto Tecnológico de Informática U. Politècnica de València 46022 Valencia, Spain jegonzalez@iti.upv.es Alfons Juan Francisco Casacuberta
More informationDecoding Revisited: Easy-Part-First & MERT. February 26, 2015
Decoding Revisited: Easy-Part-First & MERT February 26, 2015 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische
More informationLearning Features from Co-occurrences: A Theoretical Analysis
Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences
More informationComputing Lattice BLEU Oracle Scores for Machine Translation
Computing Lattice Oracle Scores for Machine Translation Artem Sokolov & Guillaume Wisniewski & François Yvon {firstname.lastname}@limsi.fr LIMSI, Orsay, France 1 Introduction 2 Oracle Decoding Task 3 Proposed
More informationStatistical Ranking Problem
Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing
More informationEXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT
Abstract EXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT Xueying Zhang zhangsnowy@163.com Guonian Lv Zhiren Xie Yizhong Sun 210046 Key Laboratory of Virtual Geographical Environment (MOE) Naning
More informationExploring Asymmetric Clustering for Statistical Language Modeling
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL, Philadelphia, July 2002, pp. 83-90. Exploring Asymmetric Clustering for Statistical Language Modeling Jianfeng
More informationTheory of Alignment Generators and Applications to Statistical Machine Translation
Theory of Alignment Generators and Applications to Statistical Machine Translation Hemanta K Maji Raghavendra Udupa U IBM India Research Laboratory, New Delhi {hemantkm, uraghave}@inibmcom Abstract Viterbi
More informationStatistical Machine Translation of Natural Languages
1/37 Statistical Machine Translation of Natural Languages Rule Extraction and Training Probabilities Matthias Büchse, Toni Dietze, Johannes Osterholzer, Torsten Stüber, Heiko Vogler Technische Universität
More informationApplications of Deep Learning
Applications of Deep Learning Alpha Go Google Translate Data Center Optimisation Robin Sigurdson, Yvonne Krumbeck, Henrik Arnelid November 23, 2016 Template by Philipp Arndt Applications of Deep Learning
More informationIncremental HMM Alignment for MT System Combination
Incremental HMM Alignment for MT System Combination Chi-Ho Li Microsoft Research Asia 49 Zhichun Road, Beijing, China chl@microsoft.com Yupeng Liu Harbin Institute of Technology 92 Xidazhi Street, Harbin,
More informationNatural Language Processing SoSe Words and Language Model
Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence
More informationLog-linear models (part 1)
Log-linear models (part 1) CS 690N, Spring 2018 Advanced Natural Language Processing http://people.cs.umass.edu/~brenocon/anlp2018/ Brendan O Connor College of Information and Computer Sciences University
More informationAn Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems
An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz
More informationCS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm
+ September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature
More informationMaximal Lattice Overlap in Example-Based Machine Translation
Maximal Lattice Overlap in Example-Based Machine Translation Rebecca Hutchinson Paul N. Bennett Jaime Carbonell Peter Jansen Ralf Brown June 6, 2003 CMU-CS-03-138 School of Computer Science Carnegie Mellon
More informationProbabilistic Inference for Phrase-based Machine Translation: A Sampling Approach. Abhishek Arun
Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach Abhishek Arun Doctor of Philosophy Institute for Communicating and Collaborative Systems School of Informatics University
More informationGaussian Mixture Model Uncertainty Learning (GMMUL) Version 1.0 User Guide
Gaussian Mixture Model Uncertainty Learning (GMMUL) Version 1. User Guide Alexey Ozerov 1, Mathieu Lagrange and Emmanuel Vincent 1 1 INRIA, Centre de Rennes - Bretagne Atlantique Campus de Beaulieu, 3
More informationAn Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling
An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling Masaharu Kato, Tetsuo Kosaka, Akinori Ito and Shozo Makino Abstract Topic-based stochastic models such as the probabilistic
More informationVariational Decoding for Statistical Machine Translation
Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1
More informationStatistical Machine Translation and Automatic Speech Recognition under Uncertainty
Statistical Machine Translation and Automatic Speech Recognition under Uncertainty Lambert Mathias A dissertation submitted to the Johns Hopkins University in conformity with the requirements for the degree
More informationUsing a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs
Using a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs Yasuhiro Akiba,, Eiichiro Sumita, Hiromi Nakaiwa, Seiichi Yamamoto, and Hiroshi G. Okuno ATR
More informationExploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation
Journal of Machine Learning Research 12 (2011) 1-30 Submitted 5/10; Revised 11/10; Published 1/11 Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation Yizhao
More informationPAPER Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation
1536 IEICE TRANS. INF. & SYST., VOL.E96 D, NO.7 JULY 2013 PAPER Bayesian Word Alignment and Phrase Table Training for Statistical Machine Translation Zezhong LI a, Member, Hideto IKEDA, Nonmember, and
More informationSampling Alignment Structure under a Bayesian Translation Model
Sampling Alignment Structure under a Bayesian Translation Model John DeNero, Alexandre Bouchard-Côté and Dan Klein Computer Science Department University of California, Berkeley {denero, bouchard, klein}@cs.berkeley.edu
More informationSpeech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research
Speech Recognition Lecture 7: Maximum Entropy Models Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Information theory basics Maximum entropy models Duality theorem
More informationThe Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech
CS 294-5: Statistical Natural Language Processing The Noisy Channel Model Speech Recognition II Lecture 21: 11/29/05 Search through space of all possible sentences. Pick the one that is most probable given
More informationLanguage Modeling. Hung-yi Lee 李宏毅
Language Modeling Hung-yi Lee 李宏毅 Language modeling Language model: Estimated the probability of word sequence Word sequence: w 1, w 2, w 3,., w n P(w 1, w 2, w 3,., w n ) Application: speech recognition
More informationMachine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013
Machine Translation CL1: Jordan Boyd-Graber University of Maryland November 11, 2013 Adapted from material by Philipp Koehn CL1: Jordan Boyd-Graber (UMD) Machine Translation November 11, 2013 1 / 48 Roadmap
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised
More informationMIA - Master on Artificial Intelligence
MIA - Master on Artificial Intelligence 1 Introduction Unsupervised & semi-supervised approaches Supervised Algorithms Maximum Likelihood Estimation Maximum Entropy Modeling Introduction 1 Introduction
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation
ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way
More informationThe Noisy Channel Model. Statistical NLP Spring Mel Freq. Cepstral Coefficients. Frame Extraction ... Lecture 10: Acoustic Models
Statistical NLP Spring 2009 The Noisy Channel Model Lecture 10: Acoustic Models Dan Klein UC Berkeley Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationStatistical NLP Spring The Noisy Channel Model
Statistical NLP Spring 2009 Lecture 10: Acoustic Models Dan Klein UC Berkeley The Noisy Channel Model Search through space of all possible sentences. Pick the one that is most probable given the waveform.
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More information