Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Similar documents
Triplet Lexicon Models for Statistical Machine Translation

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Neural Hidden Markov Model for Machine Translation

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto

Phrase-Based Statistical Machine Translation with Pivot Languages

Aspects of Tree-Based Statistical Machine Translation

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Decoding and Inference with Syntactic Translation Models

Tuning as Linear Regression

National Centre for Language Technology School of Computing Dublin City University

Improved Decipherment of Homophonic Ciphers

LECTURER: BURCU CAN Spring

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Statistical Machine Translation

Aspects of Tree-Based Statistical Machine Translation

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

The Geometry of Statistical Machine Translation

Multilevel Coarse-to-Fine PCFG Parsing

Random Generation of Nondeterministic Tree Automata

AN ABSTRACT OF THE DISSERTATION OF

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

DT2118 Speech and Speaker Recognition

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments

Language Model Rest Costs and Space-Efficient Storage

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories

Multiple System Combination. Jinhua Du CNGL July 23, 2008

Lecture 7: Introduction to syntax-based MT

Syntax-based Statistical Machine Translation

Generalized Stack Decoding Algorithms for Statistical Machine Translation

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

The Noisy Channel Model and Markov Models

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Machine Translation. Uwe Dick

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

IBM Model 1 for Machine Translation

SYNTHER A NEW M-GRAM POS TAGGER

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Part I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions

Expectation Maximization (EM)

Chapter 14 (Partially) Unsupervised Parsing

Out of GIZA Efficient Word Alignment Models for SMT

Natural Language Processing (CSEP 517): Machine Translation

statistical machine translation

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Probabilistic Context Free Grammars. Many slides from Michael Collins

Gappy Phrasal Alignment by Agreement

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Algorithms for Syntax-Aware Statistical Machine Translation

Parsing with Context-Free Grammars

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks

COMS 4705, Fall Machine Translation Part III

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation

Maschinelle Sprachverarbeitung

Discriminative Training. March 4, 2014

Maschinelle Sprachverarbeitung

Syntax-Based Decoding

Introduction to Computational Linguistics

Foundations of Informatics: a Bridging Course

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015

A* Search. 1 Dijkstra Shortest Path

Linear Classifiers IV

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Discriminative Training

10/17/04. Today s Main Points

Natural Language Processing

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

NLP Programming Tutorial 11 - The Structured Perceptron

A Systematic Comparison of Training Criteria for Statistical Machine Translation

Latent Variable Models in NLP

Structure and Complexity of Grammar-Based Machine Translation

Variational Decoding for Statistical Machine Translation

Review. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Multiword Expression Identification with Tree Substitution Grammars

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Features of Statistical Parsers

Parsing with Context-Free Grammars

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems

Probabilistic Context-free Grammars

Midterm sample questions

Deep Learning for NLP

Theory of Alignment Generators and Applications to Statistical Machine Translation

Coverage Embedding Models for Neural Machine Translation

A Discriminative Model for Semantics-to-String Translation

Design and Implementation of Speech Recognition Systems

Learning to translate with neural networks. Michael Auli

Introduction to Semantics. The Formalization of Meaning 1

6.864: Lecture 17 (November 10th, 2005) Machine Translation Part III

Lecture 25: Cook s Theorem (1997) Steven Skiena. skiena

arxiv: v1 [cs.cl] 22 Jun 2015

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Bringing machine learning & compositional semantics together: central concepts

Statistical Methods for NLP

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

A Tree-Position Kernel for Document Compression

Exam: Synchronous Grammars

Transcription:

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6 Computer Science Department RWTH Aachen University, Germany D. Vilar, D. Stein, H. Ney Soft-syntax 1 IWSLT 2008 20. October 2008

1 Introduction Hierarchical phrase-based models: Generalization of phrase-based-models Allow for gaps in the phrases Integration of reordering in the translation model Study the effect of extraction heuristics Extension with inclusion of (soft) syntactic features D. Vilar, D. Stein, H. Ney Soft-syntax 2 IWSLT 2008 20. October 2008

Outline 1 Introduction 2 Hierarchical Phrases 3 Heuristic Features 4 Syntactical Features 5 Experimental Results 6 Conclusions D. Vilar, D. Stein, H. Ney Soft-syntax 3 IWSLT 2008 20. October 2008

2 Hierarchical Phrases Formalization as a synchronous CFG Rules of the form X γ, α,, where: X is a non-terminal γ and α are strings of terminals and non-terminals is a one-to-one correspondence between the non-terminals of α and γ Example: X 中 X 0 那个 X 1, It s the X 1 in the X 0 X 也要 X 0 一些 X 1, like to X 0 some X 1 too Additionally: Glue rules S S 0 X 1, S 0 X 1 S X 0, X 0 D. Vilar, D. Stein, H. Ney Soft-syntax 4 IWSLT 2008 20. October 2008

Illustration meal toddler a order you did bambini per piatto un ordinato ha Alignment D. Vilar, D. Stein, H. Ney Soft-syntax 5 IWSLT 2008 20. October 2008

Illustration meal toddler a order you did bambini per piatto un ordinato ha Standard phrases D. Vilar, D. Stein, H. Ney Soft-syntax 5 IWSLT 2008 20. October 2008

Illustration X~1 toddler a X~0 X~0 Example rule bambini per X~1 un D. Vilar, D. Stein, H. Ney Soft-syntax 5 IWSLT 2008 20. October 2008

3 Heuristic Features Following features were tested: Paste rule Binary feature for rules of the form X X 0 α, X 0 β or X αx 0, βx 0 Hierarchical penalty Binary feature for hierarchical rules Number of non-terminals Two binary features indicating if the rule has one or two non-terminals. Extended glue rule added rule of the form X X 0 X 1, X 0 X 1 D. Vilar, D. Stein, H. Ney Soft-syntax 6 IWSLT 2008 20. October 2008

4 Syntactical Features Goal: include linguistic information from a deep syntactic parser Idea: introduce additional soft syntactic features This can be done during the extraction of the phrases No additional computational costs during decoding Can be done both on source and target side Rules are not filtered out D. Vilar, D. Stein, H. Ney Soft-syntax 7 IWSLT 2008 20. October 2008

Valid syntactical phrases A phrase is valid when a node exists that completely covers all positions In order to obtain a normalized score, we add up all the counts and divide by the number of occurences of the phrase pair S S WHADVP VP VP WRB AUX NP VV P NP Where is DT the JJ public NN toilet 洗手间 PP 在 PN 哪里 Extracted rule: X 0 在哪里 # Where is X 0 D. Vilar, D. Stein, H. Ney Soft-syntax 8 IWSLT 2008 20. October 2008

Scoring variants m(i, j) = minimum number of words to be deleted or added to a phrase, so that it fits the yield of a node S WHADVP VP WRB AUX NP Where is DT JJ NN Source Phrases: the public toilet public toilet is the D. Vilar, D. Stein, H. Ney Soft-syntax 9 IWSLT 2008 20. October 2008

Scoring variants m(i, j) = minimum number of words to be deleted or added to a phrase, so that it fits the yield of a node S WHADVP VP WRB AUX NP Where is DT JJ NN Source Phrases: the public toilet public toilet m(i, j) = 1 is the D. Vilar, D. Stein, H. Ney Soft-syntax 9 IWSLT 2008 20. October 2008

Scoring variants m(i, j) = minimum number of words to be deleted or added to a phrase, so that it fits the yield of a node S WHADVP VP WRB AUX NP Where is DT JJ NN Source Phrases: the public toilet public toilet m(i, j) = 1 is the m(i, j) = 1 D. Vilar, D. Stein, H. Ney Soft-syntax 9 IWSLT 2008 20. October 2008

Four count ( smoothing ) variants: δ (m(i, j), 0) binary c(i, j t) := 1 m(i, j) + 1 1 exp (m(i, j)) j i (j i) + m(i, j) linear exponentional relative D. Vilar, D. Stein, H. Ney Soft-syntax 10 IWSLT 2008 20. October 2008

5 Experimental Results IWSLT BTEC Data (Tourist and Travel domain) Chinese English Training data Sentences 23 940 Running words 181 486 232 746 Vocabulary 9 041 10 350 Test 2004 Data Sentences 500 Running words 7 543 10 718 OOVs 96 154 Test 2005 Data Sentences 506 Running words 8 052 10 828 OOVs 101 164 Test 2008 Data Sentences 507 Running words 6325 OOVs 87 D. Vilar, D. Stein, H. Ney Soft-syntax 11 IWSLT 2008 20. October 2008

Results test04 test05 test08 BLEU TER BLEU TER BLEU baseline 47.3 42.6 50.9 37.6 39.6 non-syntactic information hierarch 48.4 41.9 51.4 38.1 39.6 paste 49.1 41.6 51.1 38.0 40.8 glue2 48.2 41.8 51.2 37.6 39.7 1NT2NT 48.4 42.2 51.8 37.2 39.8 syntactic information binary 47.8 41.7 51.7 37.5 40.3 linear 47.6 41.9 51.2 37.6 40.6 exponential 47.9 41.7 51.6 37.4 40.3 relative 47.3 42.4 51.5 37.3 40.2 D. Vilar, D. Stein, H. Ney Soft-syntax 12 IWSLT 2008 20. October 2008

Results test04 test05 test08 BLEU TER BLEU TER BLEU baseline 47.3 42.6 50.9 37.6 39.6 non-syntactic information hierarch + paste 48.5 42.0 51.9 37.6 39.6 hierarch + paste + glue2 49.2 42.5 50.8 37.5 39.5 hierarch + paste + glue2 + 1NT2NT 48.6 41.6 51.0 37.9 40.0 combination of both syntactic and non-syntactic information (all features) binary 46.9 42.5 50.6 38.4 39.9 linear 48.0 42.3 51.2 38.0 40.5 exponential 47.7 42.3 51.0 38.4 40.0 relative 47.8 42.3 51.0 38.0 40.3 D. Vilar, D. Stein, H. Ney Soft-syntax 13 IWSLT 2008 20. October 2008

Example Translations reference Where is the exchange counter? baseline The currency exchange office is syntactical Where is the currency exchange office? reference Could you exchange it for a new one? baseline You can buy a new one? syntactical Could you change it for a new one? reference You can take our airport shuttle bus to pick up the car. baseline You can take our airport shuttle bus with me. syntactical You can take our the airport shuttle bus come to pick it up. D. Vilar, D. Stein, H. Ney Soft-syntax 14 IWSLT 2008 20. October 2008

6 Conclusions Analyzed heuristics for phrase extraction Introduced soft syntactic constraints Use of source- and target-side information No additional search effort High variability of results Test on bigger corpora Bigger improvements when dealing with speech input (system talk tomorrow!) Applicable also to phrase-based systems D. Vilar, D. Stein, H. Ney Soft-syntax 15 IWSLT 2008 20. October 2008

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6 Computer Science Department RWTH Aachen University, Germany D. Vilar, D. Stein, H. Ney Soft-syntax 16 IWSLT 2008 20. October 2008

The Blackslide GoBack