National Centre for Language Technology School of Computing Dublin City University

Similar documents
A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

LECTURER: BURCU CAN Spring

Multiword Expression Identification with Tree Substitution Grammars

Out of GIZA Efficient Word Alignment Models for SMT

Decoding Revisited: Easy-Part-First & MERT. February 26, 2015

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

A Discriminative Model for Semantics-to-String Translation

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Algorithms for Syntax-Aware Statistical Machine Translation

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Multiple System Combination. Jinhua Du CNGL July 23, 2008

Unsupervised!eneratio" of #ara$el %reebanks through &ub'%ree (lignmen)

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

The Formal Architecture of. Lexical-Functional Grammar. Ronald M. Kaplan and Mary Dalrymple

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Decoding and Inference with Syntactic Translation Models

Statistical NLP Spring HW2: PNP Classification

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment

Statistical Machine Translation

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

Bringing machine learning & compositional semantics together: central concepts

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing

Aspects of Tree-Based Statistical Machine Translation

Latent Variable Models in NLP

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Machine Translation. Uwe Dick

Dependency grammar. Recurrent neural networks. Transition-based neural parsing. Word representations. Informs Models

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Syntax-based Statistical Machine Translation

Probabilistic Context-free Grammars

STRUCTURAL NON-CORRESPONDENCE IN TRANSLATION

DT2118 Speech and Speaker Recognition

CS460/626 : Natural Language

Lecture 5: UDOP, Dependency Grammars

Processing/Speech, NLP and the Web

Knowledge representation DATA INFORMATION KNOWLEDGE WISDOM. Figure Relation ship between data, information knowledge and wisdom.

Statistical Methods for NLP

IBM Model 1 for Machine Translation

Part I - Introduction Part II - Rule Extraction Part III - Decoding Part IV - Extensions

Syntax-Based Decoding

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018

Probabilistic Context-Free Grammar

Prenominal Modifier Ordering via MSA. Alignment

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012

Computational Linguistics

COMS 4705 (Fall 2011) Machine Translation Part II

Advanced Natural Language Processing Syntactic Parsing

Computational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing

Ch. 2: Phrase Structure Syntactic Structure (basic concepts) A tree diagram marks constituents hierarchically

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

Parsing with Context-Free Grammars

Effectiveness of complex index terms in information retrieval

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Expectation Maximization (EM)

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

A* Search. 1 Dijkstra Shortest Path

Statistical NLP Spring Corpus-Based MT

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1

Introduction to Semantics. The Formalization of Meaning 1

GRAMMAR SPECIALIZATION THROUGH ENTROPY THRESHOLDS

Recurrent neural network grammars

Spectral Unsupervised Parsing with Additive Tree Metrics

Chapter 14 (Partially) Unsupervised Parsing

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

X-bar theory. X-bar :

statistical machine translation

AN ABSTRACT OF THE DISSERTATION OF

Model-Theory of Property Grammars with Features

Machine Translation Evaluation

Unification Grammars and Off-Line Parsability. Efrat Jaeger

Multilevel Coarse-to-Fine PCFG Parsing

Natural Language Processing

Discriminative Training

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

Spectral Learning for Non-Deterministic Dependency Parsing

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

Aspects of Tree-Based Statistical Machine Translation

Tuning as Linear Regression

Spatial Role Labeling CS365 Course Project

Phrase-Based Statistical Machine Translation with Pivot Languages

Entropy. Leonoor van der Beek, Department of Alfa-informatica Rijksuniversiteit Groningen. May 2005

Propositional Logic: Part II - Syntax & Proofs 0-0

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks

Marrying Dynamic Programming with Recurrent Neural Networks

Theory of Alignment Generators and Applications to Statistical Machine Translation

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories

The Noisy Channel Model and Markov Models

Machine Translation without Words through Substring Alignment

Word Alignment via Submodular Maximization over Matroids

Features of Statistical Parsers

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Linguistics

Transcription:

with with National Centre for Language Technology School of Computing Dublin City University

Parallel treebanks A parallel treebank comprises: sentence pairs parsed word-aligned tree-aligned (Volk & Samuelsson, 2004) The role of alignments: Santos (1996), paraphrasing Lab (1990): Having a linguistic description of two languages is not the same as having a linguistic description of the translation between them. with

Parallel treebanks Our work involves automatically obtaining a parallel treebank from a parallel corpus via parsing and tree alignment. Our overall objective is to use the parallel treebank for inducing a variety of syntax-aware and syntax-driven models of translation for use in data-driven MT. In this paper/presentation, the focus is on the capture of translational divergences through the application of a tree-aligner to gold-standard tree pairs. with

translational divergences We aim to: make explicit the syntactic divergences between source and target sentence pairs align to express as precisely as possible the translational equivalences between the tree pair constraining phrase-alignments in the data set is a consequence of aligning trees, but not an objective We remain agnostic with regard to: with which linguistic formalism is most appropriate for the expression of monolingual syntax how best to exploit parallel treebanks for syntax-aware data-driven MT

Outline with

Outline with

Links indicate translational equivalence: a link between root nodes indicates equivalence between the sentence pair a link between any given pair of source and target nodes indicates equivalence between the substrings they dominate equivalence between the substrings they do not dominate with S S NP VP NP VP John V NP John V NP sees Mary voit Mary

In the simplest case: the sentence lengths are identical the word order is identical the tree structures are isomorphic with S S NP VP NP VP John V NP John V NP sees Mary voit Mary

Slightly more complex: not every node in each tree needs to be linked each node is linked at most once terminal nodes are not linked VP VP V PP with V NP cliquez P NP click D ADJ N sur D N ADJ the Save Asbutton le bouton Enregistrer Sous

Tree- vs. Word- Word-alignment: unaligned words are problematic and to be avoided Tree-alignment: unaligned nodes are informative... Jacob s ladder...... l échelle de Jacob... Word alignment: Tree alignment: with Jacob la s échelle ladder de Jacob NP NP NP NP NP PP PN POS N D N P NP Jacob s ladder la échelle de PN Jacob

Hierarchical alignments On the relationship between s and de in... Jacob s ladder...... l échelle de Jacob... with s de X s Y Y de X NP 1 s NP 2 NP 2 de NP 1 NP NP 1 s NP 2 : NP NP 2 de NP 1 NP NP NP POS NP NP PP s P NP de

Outline with

Nominalisation with VP NP V NP N PP removing the print head retraite P NP de la tête d impression

Lexical with CONJP CONJP CONJ S CONJ S as au fur et à mesure que

Context-Dependent Lexical Selection S S PRON VPv PRON VPverb you V VPinf vous V VPverb need PART VPv devez to S S PRON VPv PRON VPverb you V VPinf il V VPverb need PART VPv faut to CONJPsub (1) (2) with CONJsub S if PRON VPv you V NP need PP P NPdet pour (3)

Embedded Complexities PP P NP CONJP pendant DETP NP CONJ S PRE D N PP while NP VP toute la duréep NP the scanner AUX VP de D N PP is AUX V le étalonnage P NP being calibrated de le scanner with

Structural Dissimilarity Sadj CONJPsub COMMA S CONJsub S, NP VPcop if NPadj VPaux D NPadj V NP A N AUX V the N PP is N CONJ N unauthorisedrepair is performed remainder P NP null and void of D NPzero the N N warranty period S NPdet VPv D NPpp V NPdet toute N APvp invaliderait D N intervention Amod V la garantie non autorisée with any unauthorised action would invalidate the guarantee

Outline with

Tree-alignment algorithm algorithm: hypothesise initial alignments: each source node can link to any target node and vice versa; assign a score to each hypothesised alignment; select a set of links meeting the well-formedness criteria according to a greedy search. Well-formedness criteria: each node can only be linked once; descendants of a source linked node may only link to descendents of its target linked counterpart; ancestors of a source linked node may only link to ancestors of its target linked counterpart. with

Tree-alignment algorithm HEADER-1 PP-1 PP-2 COLON-9 P-2 NP-7 P-3 NP-4 : P-3 D-5 P-6 D-8 NP-9 from D-5 NP-6 à partir de une N-10 N-11 a N-7 N-8 application Windows Windows Application 1 2 3 5 6 7 8 9 10 11 1 1 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 3 0 3 0 0 0 0 0 0 0 0 4 0 0 0 0 0 6 0 0 0 0 5 0 0 0 0 2 0 2 0 0 0 6 0 0 0 0 0 2 0 5 4 0 7 0 0 0 0 3 0 0 0 0 7 8 0 0 0 0 0 0 0 0 4 0 9 0 0 0 0 3 0 2 0 0 5 with

Tree-alignment algorithm - - - - - - - a - - w - - z b c x y s l = b c t l = x y s l = a t l = w z Computing hypothesis scores: Assume tree pair <S,T>, hypothesis <s,t>, the following strings and GIZA++ / Moses word-alignment probabilities. with s l = s i...s ix t l = t j...t jx s l = S 1...s i 1 s ix+1...s m t l = T 1...t j 1 t jx+1...t n Hypothesis score: γ( s, t ) = α(s l t l ) α(t l s l ) α(s l t l ) α(t l s l ) String correspondence score: α(x y) = x j=1 P y i=1 P(xj yi) y

Outline with

Methodology dataset: HomeCentre English-French corpus, parsed and aligned, 810 sentence pairs evaluation: precision and recall of automatic alignments vs. manual alignments Translation evaluation: split the data into training and test, 6 splits, averaged results MT system used: DOT (Hearne &, EAMT-06) train the system on manual vs. automatic alignments Manual analysis of translational divergences with

vs. Gold Standard all links lexical links non-lexical links Configs Precision Recall Precision Recall Precision Recall scr1 0.6162 0.7783 0.5057 0.7441 0.8394 0.7486 scr2 0.6215 0.7876 0.5131 0.7431 0.8107 0.7756 scr1 sp1 0.6256 0.8100 0.5163 0.7626 0.8139 0.8002 scr2 sp1 0.6245 0.7962 0.5184 0.7517 0.8031 0.7871 with

Translation vs. Gold Standard Translation with (all links) Configs Bleu NIST Meteor Coverage manual 0.5222 6.8931 71.8531 68.5417 scr1 0.5091 6.9145 71.7764 71.8750 scr2 0.5333 6.8855 72.9614 72.5000 scr1 sp1 0.5273 6.9384 72.7157 72.5000 scr2 sp1 0.5290 6.8762 72.8765 72.5000

Divergence Simple, isomorphic alignments: with NP D N the scanner PP P NP to D N the HomeCentre NP D N le scanner PP P NP à D N le HomeCentre

Divergence with Nominalisation VP V NP removing NP N PP retraite P NP de

Divergence with Lexical Selection S PRON VPv you V VPinf need PART VPv to S PRON VPv you V VPinf need PART VPv to S PRON VPverb vous V VPverb devez S PRON VPverb il V VPverb faut

Outline with

Conclusions aligner performance is better at the phrase level than the lexical level imbalance between precision and recall at the lexical level aligner uses GIZA++ word-alignment probabilities GIZA++ prioritises broad coverage over high precision in terms of capturing translational divergences between tree pairs, the preference is for the opposite with it is appropriate for tree-alignment to prioritise precision over recall MT systems should use high-precision tree alignments in conjunction with broad-coverage models to preserve robustness

with investigate alternative word-alignment methods to further improve the accuracy of the tree-alignment algorithm investigate the impact of imperfect parse quality on tree-alignment investigate the extraction of translation models from automatically-annotated parallel treebanks

with The End.