A Discriminative Model for Semantics-to-String Translation

Similar documents
A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

NLU: Semantic parsing

Tuning as Linear Regression

Learning to translate with neural networks. Michael Auli

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Single-Rooted DAGs in Regular DAG Languages: Parikh Image and Path Languages

Natural Language Processing (CSEP 517): Machine Translation

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

Discriminative Training

Minimum Error Rate Training Semiring

Lexical Translation Models 1I

Multi-Source Neural Translation

Multi-Source Neural Translation

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale

Maja Popović Humboldt University of Berlin Berlin, Germany 2 CHRF and WORDF scores

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

The Geometry of Statistical Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

National Centre for Language Technology School of Computing Dublin City University

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Machine learning: lecture 20. Tommi S. Jaakkola MIT CSAIL

Improved Decipherment of Homophonic Ciphers

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments

Natural Language Processing (CSE 517): Machine Translation

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

COMS 4705, Fall Machine Translation Part III

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Multiword Expression Identification with Tree Substitution Grammars

Transition-Based Parsing

Statistical Machine Translation of Natural Languages

Natural Language Processing

Probabilistic Inference for Phrase-based Machine Translation: A Sampling Approach. Abhishek Arun

Out of GIZA Efficient Word Alignment Models for SMT

Features of Statistical Parsers

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Theory of Alignment Generators and Applications to Statistical Machine Translation

Lexical Translation Models 1I. January 27, 2015

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Discriminative Training. March 4, 2014

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Logistic Regression. COMP 527 Danushka Bollegala

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

IBM Model 1 for Machine Translation

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013

Word Alignment. Chris Dyer, Carnegie Mellon University

EM with Features. Nov. 19, Sunday, November 24, 13

Multi-Task Word Alignment Triangulation for Low-Resource Languages

Word Alignment III: Fertility Models & CRFs. February 3, 2015

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Information Extraction from Text

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

Multiple System Combination. Jinhua Du CNGL July 23, 2008

Generating Sentences by Editing Prototypes

Learning Features from Co-occurrences: A Theoretical Analysis

Exploitation of Machine Learning Techniques in Modelling Phrase Movements for Machine Translation

Phrase Table Pruning via Submodular Function Maximization

Statistical Ranking Problem

Discriminative Learning in Speech Recognition

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Results of the WMT14 Metrics Shared Task

Transition-Based Generation from Abstract Meaning Representations

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Latent Variable Models in NLP

Statistical Methods for NLP

Sequences and Information

A constrained graph algebra for semantic parsing with AMRs

Mining coreference relations between formulas and text using Wikipedia

Neural Hidden Markov Model for Machine Translation

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Intelligent Systems (AI-2)

A phrase-based hidden Markov model approach to machine translation

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Recurrent neural network grammars

Improving Relative-Entropy Pruning using Statistical Significance

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

NEAL: A Neurally Enhanced Approach to Linking Citation and Reference

IBM Model 1 and the EM Algorithm

Statistical NLP Spring HW2: PNP Classification

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment

Ecient Higher-Order CRFs for Morphological Tagging

Machine Translation Evaluation

Toponym Disambiguation using Ontology-based Semantic Similarity

Conditional Language Modeling. Chris Dyer

Improved Learning through Augmenting the Loss

arxiv: v2 [cs.cl] 1 Jan 2019

Embedding-Based Techniques MATRICES, TENSORS, AND NEURAL NETWORKS

A* Search. 1 Dijkstra Shortest Path

Transcription:

A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 1 / 14

Introduction State-of-the-art MT models still use a simplistic view of the data words typically treated as independent, unrelated units relations between words only captured through linear context Unified semantic representations, such as Abstract Meaning Representation (AMR, Banarescu et al. 2013), (re)gaining popularity Abstraction from surface words, semantic relations made explicit, related words brought together (possibly distant in the surface realization) Possible uses: Richer models of source context our work Target-side (or joint) models to capture semantic coherence Semantic transfer followed by target-side generation Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 2 / 14

Semantic Representation Logical Form transformed into an AMR-style representation (Vanderwende et al., 2015) Labeled directed graph, not necessarily acyclic (e.g. coreference) Nodes content words, edges semantic relations Function words (mostly) not represented as nodes Bits capture various linguistic properties Figure 1 : Logical Form (computed tree) for the sentence: I would like to give you a sandwich taken from the fridge. Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 3 / 14

Graph-to-String Translation Translation = generation of target-side surface words in order, conditioned on source semantic nodes and previously generated words. Start in the (virtual) root At each step, transition to a semantic node and emit a target word A single node can be visited multiple times One transition can move anywhere in the LF Source-side semantic graph: G = (V, E), V = {n 1,..., n S }, E V V Target string E = (e 1,..., e T ), alignment A = (a 1,..., a T ), a i 0...S. P(A, E G) = T i=1 P(a i a i 1 1, e i 1 1, G)P(e i a i 1, e i 1 1, G) Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 4 / 14

Translation Example Ich möchte dir einen Sandwich... "Dobj->Dind" "Dsub^-1" "Dind^-1->Dobj" I like you sandwich sandwich... "" Figure 2 : An example of the translation process illustrating several first steps of translating the sentence into German ( Ich möchte dir einen Sandwich... ). Labels in italics correspond to the shortest undirected paths between the nodes. Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 5 / 14

Alignment of Graph Nodes How do we align source-side semantic nodes to target-side words? Evaluated approaches: 1 Gibbs sampling 2 Direct GIZA++ 3 Alignment composition Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 6 / 14

Alignment of Graph Nodes Gibbs Sampling Alignment ( transition) distribution P(a i ) modeled as a categorical distribution: P(a i a i 1, G) c(label(a i 1, a i )) Translation ( emission) distribution modeled as a set of categorical distributions, one for each source semantic node: P(e i n ai ) c(lemma(n ai ) e i ) Sample from the following distribution: P(t n i ) c(lemma(n i) t) + α c(lemma(n i )) + αl c(label(n i, n i 1 )) + β T + βp c(label(n i+1, n i )) + β T + βp Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 7 / 14

Alignment of Graph Nodes Evaluation 2 Direct GIZA++ Linearize the LF, run GIZA++ (standard word alignment) Heuristic linearization, try to preserve source surface word order 3 Alignment composition Source-side nodes to source-side tokens Parser-provided alignment GIZA++ Source-target word alignment GIZA++ Manual inspection of alignments Alignment composition clearly superior Not much difference between GIZA++ and parser alignments Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 8 / 14

Discriminative Translation Model A maximum-entropy classifier ( exp w f ) (e i, n P(e i n ai, n ai 1, G, e i 1 i k+1 ) = ai, n ai 1, G, e i 1 i k+1 ) Z Z = e GEN(n ai ) exp( w f (e, n ai, n ai 1, G, e i 1 i k+1 )) Possible classes: top 50 translations observed with given lemma Online learning with stochastic gradient descent Learning rate 0.05, cumulative L1 regularization with weight 1, batch size 1, 22 hash bits Early stopping when held-out perplexity increases Parallelized (multi-threading) and distributed learning for tractability Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 9 / 14

Current node, previous node, parent node lemma, POS, bits Path from previous node path length, path description Bag of lemmas capture overall topic of the sentence Graph context features from nodes close in the graph (limited by the length of shortest undirected path) Generated tokens fertility ; some nodes should generate a function word first (e.g. an article) and then the content word Previous tokens target-side context Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 10 / 14 Feature Set Ich möchte dir einen Sandwich... "Dobj->Dind" "Dsub^-1" "Dind^-1->Dobj" I like you sandwich sandwich... ""

Experiments Evaluated in a n-best re-ranking experiment Generate 1000-best translations of devset sentences Add scores from our model Re-run MERT on the enriched n-best lists Basic phrase-based system, French English 1 million parallel training sentences Obtained small but consistent improvements Differences would most likely be larger after integration in decoding Dataset Baseline +Semantics WMT 2009 = devset 17.44 17.55 WMT 2010 17.59 17.64 WMT 2013 17.41 17.55 Table 1 : BLEU scores of n-best reranking in French English translation. Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 11 / 14

Conclusion Initial attempt at including semantic features in statistical MT Feature set comprising morphological, syntactic and semantic properties Small but consistent improvement of BLEU Future work: Integrate directly in the decoder Parser accuracy limited use multiple analyses Explore other ways of integration Target-side models of semantic plausibility Semantic transfer and generation Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 12 / 14

Thank You! Questions? Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 13 / 14

References Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, pages 178 186, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/w13-2322. Lucy Vanderwende, Arul Menezes, and Chris Quirk. An AMR parser for English, French, German, Spanish and Japanese and new AMR-annotated corpus. In Proceedings of the 2015 NAACL HLT Demonstration Session, Denver, Colorado, June 2015. Association for Computational Linguistics. Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 14 / 14