A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Similar documents
Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Multiword Expression Identification with Tree Substitution Grammars

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013

The Noisy Channel Model and Markov Models

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Expectation Maximization (EM)

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Tuning as Linear Regression

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Improved Decipherment of Homophonic Ciphers

statistical machine translation

Natural Language Processing (CSEP 517): Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Word Alignment. Chris Dyer, Carnegie Mellon University

A* Search. 1 Dijkstra Shortest Path

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

A Discriminative Model for Semantics-to-String Translation

Out of GIZA Efficient Word Alignment Models for SMT

Algorithms for Syntax-Aware Statistical Machine Translation

Probabilistic Context-Free Grammar

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments

Statistical NLP Spring HW2: PNP Classification

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment

Structure and Complexity of Grammar-Based Machine Translation

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

IBM Model 1 for Machine Translation

Machine Translation without Words through Substring Alignment

Decoding and Inference with Syntactic Translation Models

DT2118 Speech and Speaker Recognition

National Centre for Language Technology School of Computing Dublin City University

6.864: Lecture 17 (November 10th, 2005) Machine Translation Part III

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Discriminative Training

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1

Chapter 14 (Partially) Unsupervised Parsing

Prenominal Modifier Ordering via MSA. Alignment

Statistical Machine Translation of Natural Languages

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

COMS 4705, Fall Machine Translation Part III

Phrase-Based Statistical Machine Translation with Pivot Languages

Syntax-based Statistical Machine Translation

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks

Shift-Reduce Word Reordering for Machine Translation

Statistical NLP Spring Corpus-Based MT

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1

LECTURER: BURCU CAN Spring

Maximal Lattice Overlap in Example-Based Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Cross-Entropy and Estimation of Probabilistic Context-Free Grammars

Latent Variable Models in NLP

TnT Part of Speech Tagger

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing

Discriminative Training. March 4, 2014

Massachusetts Institute of Technology

Aspects of Tree-Based Statistical Machine Translation

COMS 4705 (Fall 2011) Machine Translation Part II

Shift-Reduce Word Reordering for Machine Translation

Dual Decomposition for Natural Language Processing. Decoding complexity

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale

IBM Model 1 and the EM Algorithm

Bringing machine learning & compositional semantics together: central concepts

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

Advanced Natural Language Processing Syntactic Parsing

Probabilistic Linguistics

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

Variational Decoding for Statistical Machine Translation

Multi-Source Neural Translation

Lecture 7: Introduction to syntax-based MT

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Multiple System Combination. Jinhua Du CNGL July 23, 2008

Multi-Source Neural Translation

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015

Multi-Task Word Alignment Triangulation for Low-Resource Languages

Part I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Machine Translation. Uwe Dick

Investigating Connectivity and Consistency Criteria for Phrase Pair Extraction in Statistical Machine Translation

6.864 (Fall 2007) Machine Translation Part II

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Learning to translate with neural networks. Michael Auli

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

1 Evaluation of SMT systems: BLEU

Bayesian Learning of Non-compositional Phrases with Synchronous Parsing

Statistical Machine Translation

CS 224N HW:#3. (V N0 )δ N r p r + N 0. N r (r δ) + (V N 0)δ. N r r δ. + (V N 0)δ N = 1. 1 we must have the restriction: δ NN 0.

Statistical Machine Translation of Natural Languages

Word Alignment for Statistical Machine Translation Using Hidden Markov Models

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Lexical Translation Models 1I

Transition-Based Parsing

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Word Alignment via Submodular Maximization over Matroids

Speech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Lecture 12: Algorithms for HMMs

Fast Consensus Decoding over Translation Forests

Doctoral Course in Speech Recognition. May 2007 Kjell Elenius

Transcription:

A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006

Introduction The model Experiment Conclusion

Statistical Translation Model (STM): - mathematical model - statistical modelling of human-language translation - parameters estimated with training corpus

First steps - IBM 1998: string-to-string word-based translation - completely independent process

Question: Why not word-based translation? - no structural or synactic aspects - how to handle different word order?

Solution: A Syntax-based Statistical Translation Model

Input: parse tree (syntactic parser) Output: string

Three operations on each node of the tree: - reorder - inserting - translating

Reorder - different word order - English vs. Japanese

Word-insertion - e.g. capture linguistic differnces in specifying syntactic cases

Translation - Translate leaf words into the destination language

Output A Syntax-based Statistical Translation Model

Introduction The model Experiment Conclusion

English parse tree into a noisy channel Output should be a Japanese sentence Stochastical operations an each nodes

Reorder - N! possible reorderings for N children - probability given by the r-table

Word-insertion - left, right oder nowhere - probability given by the n-table

Translation - dependent only on the word itself - Probability given by the t-table

Total probability - product of the single operation probabilities Tables - English-Japanese training corpus

Formal Transcription Input: English parse tree ε (in nodes ε 1, ε 2, ε n ) Output: French sentence f (in words f 1, f 2, f m )

Probability getting f for ε Str(Θ(ε )) is a sequence of leaf words

P(Θ ε ): Probability of a particular set of RVs in a parse tree

RVsθ i = <ν i,ρ i,τ i > independent on each other, but dependent on features of ε i

Probability of getting a French sentence f given an English parse tree ε

Automatic Estimation of model parameters - update parameter to maximize the likelihood of the training corpus

Algorithm 1. Initialize probability tables 2. Reset counters 3. For each iteration the number of events are counted and weighted by the probability of events 4. Parameter re-estimated by the counts

Introduction The model Experiment Conclusion

Experiment with small English-Japanese corpus - 2121 translation sentence pairs - taggers build the English parse trees Comparison with IBM 5

Evaluation - generate the most probable alignment of the training corpus (Viterbi) - average score of the first 50 sentences

Alignment okay Not sure Alignment wrong 1 point 0,5 point 0 points Average score Perfect sentences Our Model 0,582 10 IBM Model 5 0,431 0

Perplexity - Our Model: 15,79 - IBM Model 5: 9,84

Introduction The model Experiment Conclusion

Syntax-based translation model Statistical modelling the translation process Syntactic information for languages with different word order Better alignment results in an experiment

Outline 2 nd Part Problems of Tree to String Clone Operation Tree to Tree Phrasal Translation Translation System

Problem of Tree to String Model Not all re-orderings of terminal nodes are possible Constrains syntactic correspondence between native and foreign language

The Clone operation Insert a copy of a subtree at any point in the tree Delete original node Clone Delete

When to clone? Should a clone be inserted as child of ε j P clone ε ) ins ( j Decide which node should be cloned P clone ( ε i clone = 1) = k P P makeclone makeclone ( ε i ) ( ε Probability of cloning is independent of previous cloning operations k )

Example A Syntax-based Statistical Translation Model

Tree to - Tree Syntactic trees for foreign and native language Output tree instead of output string Additional tree transformations Single source node Two target nodes Two source nodes Single target node New model: P T b T ) ( a

Building the output tree At each level at the output tree: Choose a elementary tree Align the children of the the elementary tree Translate the leaves

Elementary tree P elem ( t a ε a children ( ε a )) Means that two nodes can grouped together Example: Nodes A and B are considered a elementary tree

Alignment of children All children of an elementary tree are aligned at once according to: Palign ( α ε a children ( t a )) Insertions and Deletions are also done in this step

Tree to Tree Clone Same reordering problems as Tree to String Cloning will now added to alignment step P ins ( clone α ε a children( ta )) P clone ( ε i clone α ) = k P P makeclone makeclone ( ε i ) ( ε ) k

Parameter comparison A Syntax-based Statistical Translation Model

Experiments Data from Korean English corpus (Military domain) Korean suffixes often carry meaning This suffixes became leaves in the syntax tree Vocabulary was reduced from 10059 to 3279 Average Korean: 13 words and 21 tokens Average English: 16 words

Results AER = 1 2 A A + G G Word pairings by System (A) and Gold Standard(G)

Phrasal Translation 1 to 1 word translation not perfect Compound nouns German vs English Idiomatic phrases To kick the bucket vs Den Löffel abgeben

Model 1 to 1 Model: 1 to N with fertility µ: N to N: ) ( ) ( e f t = t Τ τ = = = Τ l i i l e f t e l e f f f t t 1 2 1 ) ( ) ( ).. ( ) ( µ τ = = = Φ l i m i m m l e e e f t e e e l e e e f f f t ph 1 2 1 2 1 2 1 2 1 ).. ( ).. ( ).... ( ) ( µ φ

Incorporation into TM R: feature for reordering N: feature for insertion ρ: Reorder operation υ: Insert operation ) ( ) ( ) (1 ) ( ) ( i i i i i i i i N n R r ph P i i υ ρ λ φ λ ε θ Φ Φ + Φ =

From Models to the Translation Task: Translate from Foreign to English P ( E F ) Reformulation P ( E F ) = P ( F E ) P ( E )

Language Model Probability of an English Sentence P(E) N-gram LM in original Implementation No syntactic Information used Improvement: Immediate-head parsing for language models

Immediate-head parsing English Sentence Non-lexical PCFG Large Parse Forest Pruning of the Large Parse Forest Which edges have high probability of being correct? Evaluation of pruned Parse with a lexical PCFG

Decoder Decoder works in reverse direction to TM Find most probable syntactic tree E from a Sentence F Basic idea: Translate Parse tree using TM to the foreign language Parse the foreign sentence Translate back to English And check LM

Conclusion Improvements to Syntactic Translation Model Tree to String Clone Tree to Tree Tree to Tree Clone Phrasal Translation Brief overview over Translation Process

References A Syntax based Statistical Translation Model Kenji Yamada and Kevin Knight, Proceedings of the Conference of the Association for Computational Linguistics 2001 Loosely Tree-based Alignment for Machine Translation Daniel Gildea, In Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics (ACL-03), Supporo, Japan, 2003. Syntax baed Language Models for Statistical Machine Translation Eugene Charniak, Kevin Knight and Kenji Yamada, In MT Summit IX. Intl. Assoc. for Machine Translation, 2003 A Decoder for Syntax-Based Statistical MT Kenji Ymada and Kevin Knight, Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02), 2002

Thank you for your attention!