Part-of-Speech Driven Cross-Lingual Pronoun Prediction with Feed-Forward Neural Networks

Similar documents
First derivative analysis

Learning Spherical Convolution for Fast Features from 360 Imagery

Higher order derivatives

3-2-1 ANN Architecture

3 Noisy Channel model

EXST Regression Techniques Page 1

NEW APPLICATIONS OF THE ABEL-LIOUVILLE FORMULA

What are those βs anyway? Understanding Design Matrix & Odds ratios

CS 361 Meeting 12 10/3/18

Least Favorable Distributions to Facilitate the Design of Detection Systems with Sensors at Deterministic Locations

22/ Breakdown of the Born-Oppenheimer approximation. Selection rules for rotational-vibrational transitions. P, R branches.

Estimation of apparent fraction defective: A mathematical approach

Note If the candidate believes that e x = 0 solves to x = 0 or gives an extra solution of x = 0, then withhold the final accuracy mark.

ph People Grade Level: basic Duration: minutes Setting: classroom or field site

That is, we start with a general matrix: And end with a simpler matrix:

Pipe flow friction, small vs. big pipes

Application of Vague Soft Sets in students evaluation

The van der Waals interaction 1 D. E. Soper 2 University of Oregon 20 April 2012

Section 11.6: Directional Derivatives and the Gradient Vector

General Notes About 2007 AP Physics Scoring Guidelines

u x v x dx u x v x v x u x dx d u x v x u x v x dx u x v x dx Integration by Parts Formula

MCB137: Physical Biology of the Cell Spring 2017 Homework 6: Ligand binding and the MWC model of allostery (Due 3/23/17)

Forces. Quantum ElectroDynamics. α = = We have now:

Chapter 13 Aggregate Supply

Function Spaces. a x 3. (Letting x = 1 =)) a(0) + b + c (1) = 0. Row reducing the matrix. b 1. e 4 3. e 9. >: (x = 1 =)) a(0) + b + c (1) = 0

Probability Translation Guide

Homotopy perturbation technique

Incorporating Side Information into Recurrent Neural Network Language Models

Search sequence databases 3 10/25/2016

4037 ADDITIONAL MATHEMATICS

Addition of angular momentum

Propositional Logic. Combinatorial Problem Solving (CPS) Albert Oliveras Enric Rodríguez-Carbonell. May 17, 2018

Answer Homework 5 PHA5127 Fall 1999 Jeff Stark

Partial Derivatives: Suppose that z = f(x, y) is a function of two variables.

Two Products Manufacturer s Production Decisions with Carbon Constraint

Addition of angular momentum

GEOMETRICAL PHENOMENA IN THE PHYSICS OF SUBATOMIC PARTICLES. Eduard N. Klenov* Rostov-on-Don, Russia

1 Minimum Cut Problem

A Propagating Wave Packet Group Velocity Dispersion

Chapter 14 Aggregate Supply and the Short-run Tradeoff Between Inflation and Unemployment

Elements of Statistical Thermodynamics

The Importance of Action History in Decision Making and Reinforcement Learning

CE 530 Molecular Simulation

Nonparametric Methods: Goodness-of-Fit Tests

Differentiation of Exponential Functions

LINEAR DELAY DIFFERENTIAL EQUATION WITH A POSITIVE AND A NEGATIVE TERM

Random Access Techniques: ALOHA (cont.)

ECE602 Exam 1 April 5, You must show ALL of your work for full credit.

(Upside-Down o Direct Rotation) β - Numbers

UNTYPED LAMBDA CALCULUS (II)

A Prey-Predator Model with an Alternative Food for the Predator, Harvesting of Both the Species and with A Gestation Period for Interaction

EEO 401 Digital Signal Processing Prof. Mark Fowler

Robust surface-consistent residual statics and phase correction part 2

Dealing with quantitative data and problem solving life is a story problem! Attacking Quantitative Problems

Need to understand interaction of macroscopic measures

Calculus concepts derivatives

A Sub-Optimal Log-Domain Decoding Algorithm for Non-Binary LDPC Codes

Linear Non-Gaussian Structural Equation Models

Differential Equations

Observer Bias and Reliability By Xunchi Pu

cycle that does not cross any edges (including its own), then it has at least

+ f. e f. Ch. 8 Inflation, Interest Rates & FX Rates. Purchasing Power Parity. Purchasing Power Parity

Category Theory Approach to Fusion of Wavelet-Based Features

CLONES IN 3-CONNECTED FRAME MATROIDS

Principles of Humidity Dalton s law

Brief Introduction to Statistical Mechanics

Ch. 24 Molecular Reaction Dynamics 1. Collision Theory

3 Finite Element Parametric Geometry

SUMMER 17 EXAMINATION

Classical Magnetic Dipole

Machine Detector Interface Workshop: ILC-SLAC, January 6-8, 2005.

Symmetric centrosymmetric matrix vector multiplication

From Elimination to Belief Propagation

SECTION where P (cos θ, sin θ) and Q(cos θ, sin θ) are polynomials in cos θ and sin θ, provided Q is never equal to zero.

Roadmap. XML Indexing. DataGuide example. DataGuides. Strong DataGuides. Multiple DataGuides for same data. CPS Topics in Database Systems

On Certain Conditions for Generating Production Functions - II

CO-ORDINATION OF FAST NUMERICAL RELAYS AND CURRENT TRANSFORMERS OVERDIMENSIONING FACTORS AND INFLUENCING PARAMETERS

Fourier Transforms and the Wave Equation. Key Mathematics: More Fourier transform theory, especially as applied to solving the wave equation.

Aim To manage files and directories using Linux commands. 1. file Examines the type of the given file or directory

EEO 401 Digital Signal Processing Prof. Mark Fowler

Economics 201b Spring 2010 Solutions to Problem Set 3 John Zhu

Inflation and Unemployment

The pn junction: 2 Current vs Voltage (IV) characteristics

There is an arbitrary overall complex phase that could be added to A, but since this makes no difference we set it to zero and choose A real.

6.1 Integration by Parts and Present Value. Copyright Cengage Learning. All rights reserved.

Direct Approach for Discrete Systems One-Dimensional Elements

REGISTER!!! The Farmer and the Seeds (a parable of scientific reasoning) Class Updates. The Farmer and the Seeds. The Farmer and the Seeds

MATH 319, WEEK 15: The Fundamental Matrix, Non-Homogeneous Systems of Differential Equations

Computing and Communications -- Network Coding

Chapter 6 Folding. Folding

5.80 Small-Molecule Spectroscopy and Dynamics

Data Assimilation 1. Alan O Neill National Centre for Earth Observation UK

CPSC 665 : An Algorithmist s Toolkit Lecture 4 : 21 Jan Linear Programming

Coupled Pendulums. Two normal modes.

Preprocessing on bilingual data for Statistical Machine Translation

Image Filtering: Noise Removal, Sharpening, Deblurring. Yao Wang Polytechnic University, Brooklyn, NY11201

Extraction of Doping Density Distributions from C-V Curves

4. Money cannot be neutral in the short-run the neutrality of money is exclusively a medium run phenomenon.

u r du = ur+1 r + 1 du = ln u + C u sin u du = cos u + C cos u du = sin u + C sec u tan u du = sec u + C e u du = e u + C

Full Waveform Inversion Using an Energy-Based Objective Function with Efficient Calculation of the Gradient

Transcription:

artospch Drivn CrossLingual ronoun rdiction with FdForward Nural Ntworks Jimmy Callin, Christian Hardmir, Jörg Tidmann Dpartmnt o Linguistics and hilology Uppsala Univrsity, Swdn jimmy.callin.49@studnt.uu.s {christian.hardmir, jorg.tidmann}@lingil.uu.s Abstract For som languag pairs, pronoun translation is a discoursdrivn task which rquirs inormation that lis byond its local contxt. This motivats th task o prdicting th corrct pronoun givn a sourc sntnc and a targt translation, whr th translatd pronouns hav bn rplacd with placholdrs. For crosslingual pronoun prdiction, w suggst a nural ntworkbasd modl using prcding nouns and dtrminrs as aturs or suggsting antcdnt candidats. Our modl scors on par with similar modls whil having a simplr architctur. Introduction Most modrn statistical machin translation (SMT) systms us contxt or translation; th maning o a word is mor otn than not ambiguous, and can only b dcodd through its usag. That said, contxt us in modrn SMT still mostly assums that sntncs ar indpndnt o on anothr, and dpndncis btwn sntncs ar simply ignord. Whil today s popular SMT systms could us aturs rom prvious sntncs in th sourc txt, translatd sntncs within a documnt hav up to this point rarly bn includd. Hardmir and Frdrico (00) argu that SMT rsarch has bcom matur nough to stop assuming sntnc indpndnc, and start to incorporat aturs byond th sntnc boundary. Languags with gndrmarkd pronouns introduc crtain diicultis, sinc th choic o pronoun is dtrmind by th gndr o its antcdnt. icking th wrong thirdprson pronoun might sm lik a rlativly minor rror, spcially i prsnt in an othrwis comprhnsibl translation, but could potntially produc misundrstandings. Tak th ollowing English sntncs: Th monky at th banana bcaus it was hungry. Th monky at th banana bcaus it was rip. Th monky at th banana bcaus it was tatim. It in ach o ths thr cass rrnc somthing dirnt, ithr th monky, th banana, or th abstract notion o tim. I w wr to translat ths sntncs to Grman, w would hav to consciously mak dcisions whthr it should b in masculin (r, rrring to th monky), minin (si, rrring to th banana), or nutr (s, rrring to th tim) (Mitkov t al., 995). Whil ths xampls us a local dpndncy, th antcdnt o it could just as asily hav bn on or svral sntncs away which would hav mad ncssary translation aturs out o rach or sntnc basd SMT dcodrs. Rlatd work Most o th work in anaphora rsolution or machin translation has bn don in th paradigm o rulbasd MT, whil th topic has gaind littl intrst within SMT (Hardmir and Fdrico, 00; Mitkov, 999). On o th irst xampls o using discours analysis or pronoun translation in SMT was don by Nagard and Kohn (00), who us corrnc rsolution to prdict th antcdnts in th sourc languag as aturs in a standard SMT systm. Whil thy saw scor improvmnts in pronoun prdiction, thy claim th bad prormanc o th corrnc rsolution sriously impactd th rsults ngativly. Thy prormd this as a postprocssing stp, which sms to b primarily or practical rasons sinc most popular SMT ramworks such as Moss (Kohn t al., 007) do not provid prvious targt translations or us as aturs. Guillou t al. (0) 59 rocdings o th Scond Workshop on Discours in Machin Translation (DiscoMT), pags 59 64, Lisbon, ortugal, 7 Sptmbr 05. c 05 Association or Computational Linguistics.

trid a similar approach or EnglishCzch translation with littl improvmnt vn atr actoring out major sourcs o rror. Thy singld out on possibl rason or this, which is how a rasonabl translation altrnativ o a pronoun s antcdnt could act th prdictd pronoun, including th possibility o simply cancling out pronouns. E.g, th u.s., claiming som succss in its trad could b paraphrasd as th u.s., claiming som succss in trad diplomacy without any loss in translation quality, whil still acting th scor ngativly. This dmonstrats thr is ncssary linguistic inormation in th targt translation that is not availabl in th sourc. Hardmir and Frdrico (00) xtndd th phrasbasd Moss dcodr with a word dpndncy modl basd on xisting corrnc rsolution systms, by parsing th output o th dcodr and catching its prvious translations. Unortunatly thy only producd minor improvmnts or EnglishGrman. In light o this, thr hav bn attmpts at considring pronoun translation a classiication task sparat rom traditional machin translation. This could potntially lad to urthr insights into th natur o anaphora rsolution. In this ashion a pronoun translation modul could b tratd as just anothr part o translation by discours orintd machin translation systms, or as a postprocssing stp similarly to Guillou t al. (0). Hardmir t al. (0b) introducd this task and prsntd a dorward nural ntwork modl using aturs rom an xtrnal anaphora rsolution systm, BART (Broschit t al., 00), to inr th pronoun s antcdnt candidats and us th alignd words in th targt translation as input. This modl was latr intgratd into thir documntlvl dcodr Docnt (Hardmir t al., 0a; Hardmir, 04, chaptr 9). Task stup Th goal o crosslingual pronoun prdiction is to accuratly prdict th corrct missing pronoun in translatd txt. Th pronouns in ocus ar it and thy, whr th word alignd phrass in th translation hav bn rplacd by placholdrs. Th word alignmnt is includd, and was automatically producd by GIZA (Och, 00). W ar also awar o documnt boundaris within th corpus. Th corpus is a st o thr dirnt English Frnch paralll corpora gathrd rom thr sparat domains: transcribd TED talks, Europarl (Kohn, 005) with transcribd procdings rom th Europan parliamnt, and a st o nws txts. Tst data is a collction o transcribd TED talks, in total documnts containing 09 sntncs with a total o 05 classiication problms, with a similar dvlopmnt st. Furthr dtails o th task stup, including inal prormanc rsults, ar availabl in Hardmir t. al. (05). 4 Mthod Inspird by th nural ntwork architctur st up in Hardmir t al. (0b), w similarly propos a dorward nural ntwork with a layr o word mbddings as wll as an additional hiddn layr or larning abstract atur rprsntations. Th inal architctur as shown in ig. uss both sourc contxt and translation contxt around th missing pronoun, by ncoding a numbr o word mbddings n words to th lt and m words to th right (hrby rrrd to as having a contxt window siz o nm). Th main dirnc in our modl lis in avoiding using an xtrnal anaphora rsolution systm to collct antcdnt aturs. Rathr, to simpliy th modl w simply look at th our closst prvious nouns and dtrminrs in English, and us th corrsponding alignd Frnch nouns and articls in th modl, as illustratd in ig.. Whrvr th alignmnts map to mor than on word, only th ltmost word in th phras is usd. W ncod ths nouns and articls as mbddings in th irst input layr. This way, th ordr o ach word is mbddd, which should approximat th distanc rom th missing pronoun. Additionally, w allow ourslvs to look at th Frnch contxt o th missing pronoun. Whil th automatically translatd contxt might b too unrliabl, Frnch usag should b a bttr indicator or som o th classs,.g. c which is highly dpndnt on bing prcdnt o st. S ig. or an xampl o contxt in sourc and translation as aturs. Similarly to th original modl in Hardmir t al. (0b), th nural ntwork is traind using stochastic gradint dscnt with minibatchs and L rgularization. Crossntropy is usd as a cost unction, with a sotmax output layr. Furthrmor th dimnsionality o th mbddings is incrasd rom 0 to 50, sinc w saw minor improvmnts o th scors on th dvlopmnt st with th incras. To rduc training tim and spd up convrgnc, w us tanh as activa 60

p r o n o E H S 4 Figur : Nural ntwork architctur. Blu mbddings (E) signiis sourc contxt, rd targt contxt, and yllow th prcding OS tags. Th shown numbr o aturs is not quivalnt with what is usd in th inal modl. tion unction btwn th hiddn layrs (LCun t al., 0), in contrast to th sigmoid unction usd in Hardmir s modl. To avoid ovritting, arly stopping is introducd whr th training stops i no improvmnts hav bn ound within a crtain numbr o itrations. This usually rsults in a training tim o 0 pochs, whn run on TED data. Th modl uss a layrwis uniorm random wight initialization as proposd by Glorot and Bngio (00), whr thy show that nural ntwork modls using tanh as activation unction gnrally prorm bttr with a uniormally distributd random initialization within th intrval [ anin 6, anin 6 ], whr an in an out an out and an out ar numbr o inputs and numbr o hiddn units rspctivly. Sinc th modl uss a ixd contxt window siz or English and Frnch, as wll as a ixd numbr o prcding nouns and articls, w nd to ind out optimal paramtr sttings. W obsrv that a paramtr stting o 44 contxt window or English and Frnch, with prcding nouns and articls ach prorm wll. Figur 4 showcass how window siz and numbr o prcding OS tags act th prormanc outcom on th dvlopmnt st. W also look into asymmtric window sizs, but notic no improvmnts (ig. 5). W hav this bannr in our oics in alo Alto Nous avons ctt bannièr dans nos buraux à alo Alto Figur : An English OS taggr is usd to ind nouns and articls in prcding uttrancs, whil th word alignmnts dtrmin which Frnch words ar to b usd as aturs. Fatur ablation as prsntd in tabl shows that whil all atur classs ar rquird or rtriving top scor, OS aturs ar gnrally th atur class that contributs th last to improvd rsults. It is curious to notic that ll vn prorms bttr without th OS aturs, whil lls rcivs a suicint bump with thm. Furthrmor, th rsults indicat that targt aturs is th most inormativ o th tstd atur classs. Th nural ntwork is implmntd in Thano (Brgstra t al., 00), and is publicly availabl on Github. http://github.com/jimmycallin/ whatlls 6

<S> <S> <S> it xprsss our viw o how w <S> <S> <S> xprim notr manièr d' abordr Figur : Exampl o contxt usd in th classiication modl, color codd according to thir position in th nural ntwork as illustratd in ig.. Macro F 0.7 0.6 0.5 aramtr variation 0.4 Window siz OS tags 0. 0 4 5 6 7 8 9 0 Figur 4: aramtr variation o window siz and numbr o prcding OS tags. Window siz is varid in a symmtrical ashion o nn. Whn varying window siz, prcding OS tags ar usd. Whn varying numbr o OS tags, a window siz o 44 is usd. 5 Rsults Th rsults rom th shard task ar prsntd in tabl and tabl. Th bst prorming classs ar c, ils, and othr, all raching F scors ovr 80 prcnt. Th lss commonly occurring classs ll and lls prorm signiicantly wors, spcially rcallwis. Th ovrall macro F scor nds up bing 55.%. 6 Discussion Rsults indicat that th modl prorms on par with prviously suggstd modls (Hardmir t al., 0b), whil having a simplr architctur. Classs highly dpndnt on local contxt, such as c, prorm spcially wll, which is likly du to st bing a good indicator o its prsnc. This is supportd by th larg prormanc gains rom 40 to 4 in ig. 5, sinc st usually ollows c. Singular and plural classs rarly gt conusd, du to thm bing prdicatd on th English pronoun which marks it or thy. Th classs o minin gndr do not prorm as wll, spcially rcallwis, but this was to b xpctd 40 4 4 4 44 4 4 4 04 0. Window asymmtry variation 0.44 0.5 0.57 0.6 0.5 0.45 0.5 0.47 0. 0.4 0.5 0.6 0.7 Macro F Figur 5: aramtr variation o window siz asymmtry, whr ach labl corrsponds to nn, whr n is th contxt siz in ach dirction. sinc th only inormation rom which to inr its antcdnt is ordrd distanc rom th pronoun in ocus. It is apparnt that th modl has a bias towards making majority class prdictions, spcially givn th low numbr o wrong prdictions on th ll and lls classs rlativ to il and ils. Th high rcall o ils is xplaind by this phnomnon as wll. An additional hypothsis is that thr is simply too littl data to ralistically crat usabl mbddings, xcpt or a w roccurring circumstancs. A somwhat intrsting xampl o what OS tags might caus is:... which is th history o who invntd gams... and thy would b so immrsd in playing th dic gams...... l histoir d qui a invnté l ju t pourquoi... sraint si concntrés sur lur ju d dés... This is on o th w instancs whr ils has bn misclassiid as lls. Sinc this classiication only happns whn using at last thr prcding OS tags, it is likly thr is somthing happning with th antcdnt candidats. Th third dtrminr is th (history), and points to histoir which is a noun o minin gndr. It is likly th classiir has larnd this connction and has put too much wight into it. Th xtra numbr o aturs as wll as th incras in mbdding dimnsionality maks th training and prdiction slightly slowr, but sinc th training still is don in lss than an hour, and tsting dos not tak longr than a w sconds, 6

OS Sourc Targt Non c 0.96 0.869 0.6405 0.88 cla 0.679 0.64 0.456 0.660 ll 0.96 0.09 0.090 0.57 lls 0.500 0.069 0.667 0. il 0.566 0.446 0.65 0.560 ils 0.864 0.845 0.7050 0.8754 OTHER 0.8976 0.8769 0.6969 0.8847 Macro 0.556 0.58 0.569 0.699 Micro 0.787 0.750 0.5797 0.809 Tabl : Fscor or ach labl in a atur ablation tst, whr th spciid atur classs wr rmovd in training and tsting on th dvlopmnt st. Th Non column has no rmovd aturs. Micro scor is th ovrall classiication scor, whil macro is th avrag ovr ach class. rcision Rcall F c 0.89 0.8967 0.866 cla 0.74 0.60 0.669 ll 0.5000 0.65 0.465 lls 0.696 0. 0.459 il 0.56 0.654 0.564 ils 0.7487 0.9 0.80 othr 0.8450 0.8579 0.854 Macro 0.586 0.5495 0.550 Micro 0.7 0.7 0.7 Tabl : rcision, rcall, and Fscor or all classs. Micro scor is th ovrall classiication scor, whil macro is th avrag ovr ach class. Th lattr scoring mthod is usd or incrasing th importanc o classs with wr instancs. it is still good nough or gnral usag. Furthrmor, th implmntation is mad in such a way that urthr prormanc incrass ar to b xpctd i you run it on CUDA compatibl GU with minor changs. Whil thr sparat training data collctions wr availabl, w only ound intrsting rsults whn using data rom th sam domain as th tst data, i.. transcribd TED talks. To ovrcom th skwd class distribution, attmpts wr mad at ovrsampling th lss rqunt classs rom Europarl, but unortunatly this only ld to prormanc loss on th dvlopmnt st. Th modl dos not sm to gnraliz wll rom othr typs o training data such as Europarl or nws txt, dc cla ll lls il ils othr sum c 65 0 8 6 84 cla 5 80 4 0 8 9 ll 7 0 8 8 lls 0 0 0 8 0 5 il 7 9 0 64 04 ils 0 0 5 0 49 5 60 othr 0 9 9 5 8 94 sum 99 44 7 4 99 400 Tabl : Conusion matrix o class prdictions. Row signiis actual class according to gold standard, whil column rprsnts prdictd class according to th classiir. spit Europarl bing transcribd spch as wll. This is an obvious shortcoming o th modl. W trid svral altrations in paramtr sttings or contxt window and OS tags, and ound no signiicant improvmnts byond th inal paramtr sttings whn run on th dvlopmnt st, as sn in ig. 4. Figur 5 maks it clar that a symmtric window siz is bnicial, whil w ar not as sur o why this is th cas. Right contxt sms to b mor important than lt contxt, which could b du to th act that pronouns in thir rol as subjcts largly appars arly in sntncs, making lt contxt nothing but sntnc start markrs. In utur work, it would b intrsting to look into how much sourc contxt actually contributs to th classiication, givn a targt contxt. rliminary rsults o th atur ablation tst in tabl indicat that w indd captur inormation or at last som o th classs with th us o sourc aturs, whil it is not quit clar why this is th cas. Whil th English contxt is nic to hav, sinc you cannot b ntirly crtain o th translation quality in th targt languag, intuitivly all ncssary linguistic inormation or inrring th corrct pronoun should b availabl in th targt translation. Atr all, th gndr o a pronoun is not dpndnt on whatvr sourc languag you translat rom, as long as you hav ound its antcdnt. I th sourc txt still wr ound usul, all English word mbddings could b prtraind on a larg numbr o translation xampls and through this procss larn th most probabl crosslinguistic gndr. In th sam mannr, gndr awar Frnch word mbddings would hypothtically incras th scor as wll. 6

7 Conclusion In this work, w dvlop a crosslingual pronoun prdiction classiir basd on a dorward nural ntwork. Th modl is havily inspird by Hardmir t al. (0b), whil trying to simpliy th architctur by using prcding nouns and dtrminrs or corrnc rsolution rathr than using aturs rom an anaphora xtractor such as BART, as in th original papr. W ind out that th modl indd prorms on par with similar modls, whil bing asir to train. Thr ar som xpctd drops in prormanc or th lss common classs havily dpndnt on inding thir antcdnt. W discuss probabl causs or this, as wll as possibl solutions using prtraind mbddings on largr amounts o data. Rrncs [Brgstra t al.00] Jams Brgstra, Olivir Brulux, Frédéric Bastin, ascal Lamblin, Razvan ascanu, Guillaum Dsjardins, Josph Turian, David Ward Farly, and Yoshua Bngio. 00. Thano: a cpu and gpu math xprssion compilr. In rocdings o th ython or Scintiic Computing Conrnc (Sciy). [Broschit t al.00] Samul Broschit, Massimo osio, Simon aolo onztto, Kpa Josba Rodriguz, Lornza Romano, Olga Uryupina, Yannick Vrsly, and Robrto Zanoli. 00. Bart: A multilingual anaphora rsolution systm. In rocdings o th 5th Intrnational Workshop on Smantic Evaluation, pags 04 07. Association or Computational Linguistics. [Glorot and Bngio00] Xavir Glorot and Yoshua Bngio. 00. Undrstanding th diiculty o training dp dorward nural ntworks. In Intrnational conrnc on artiicial intllignc and statistics, pags 49 56. [Guillou0] Lian Guillou. 0. Improving pronoun translation or statistical machin translation. In rocdings o th Studnt Rsarch Workshop at th th Conrnc o th Europan Chaptr o th Association or Computational Linguistics, EACL, pags 0. Association or Computational Linguistics. [Hardmir and Fdrico00] Christian Hardmir and Marcllo Fdrico. 00. Modlling pronominal anaphora in statistical machin translation. In rocdings o th 7th Intrnational Workshop on Spokn Languag Translation, pags 8 89. [Hardmir t al.0a] Christian Hardmir, Sara Stymn, Jörg Tidmann, and Joakim Nivr. 0a. Docnt: A documntlvl dcodr or phrasbasd statistical machin translation. In ACL 0 (5st Annual Mting o th Association or Computational Linguistics), pags 9 98. Association or Computational Linguistics. [Hardmir t al.0b] Christian Hardmir, Jörg Tidmann, and Joakim Nivr. 0b. Latnt anaphora rsolution or crosslingual pronoun prdiction. In rocdings o th 0 Conrnc on Empirical Mthods in Natural Languag rocssing, pags 80 9. [Hardmir t al.05] Christian Hardmir, rslav Nakov, Sara Stymn, Jörg Tidmann, Yannick Vrsly, and Mauro Cttolo. 05. ronounocusd MT and crosslingual pronoun prdiction: Findings o th 05 DiscoMT shard task on pronoun translation. In rocdings o th Scond Workshop on Discours in Machin Translation, Lisbon, ortugal. [Hardmir04] Christian Hardmir. 04. Discours in Statistical Machin Translation. hd thsis, Uppsala Univrsity, Dpartmnt o Linguistics and hilology. [Kohn t al.007] hilipp Kohn, Hiu Hoang, Alxandra Birch, Chris CallisonBurch, Marcllo Fdrico, Nicola Brtoldi, Brook Cowan, Wad Shn, Christin Moran, Richard Zns, t al. 007. Moss: Opn sourc toolkit or statistical machin translation. In rocdings o th 45th annual mting o th ACL on intractiv postr and dmonstration sssions, pags 77 80. Association or Computational Linguistics. [Kohn005] hilipp Kohn. 005. Europarl: A paralll corpus or statistical machin translation. In MT summit, volum 5, pags 79 86. [L Nagard and Kohn00] Ronan L Nagard and hilipp Kohn. 00. Aiding pronoun translation with corrnc rsolution. In rocdings o th Joint Fith Workshop on Statistical Machin Translation and MtricsMATR, WMT 0, pags 5 6. Association or Computational Linguistics. [LCun t al.0] Yann A. LCun, Léon Bottou, Gnviv B. Orr, and KlausRobrt Müllr. 0. Eicint backprop. In Nural ntworks: Tricks o th trad, pags 9 48. Springr. [Mitkov t al.995] Ruslan Mitkov, Sungkwon Choi R, and All Sharp. 995. Anaphora rsolution in machin translation. In rocdings o th Sixth Intrnational Conrnc on Thortical and Mthodological Issus in Machin Translation, pags 5 7. [Mitkov999] Ruslan Mitkov. 999. Introduction: Spcial issu on anaphora rsolution in machin translation and multilingual nlp. Machin translation, 4():59 6. [Och00] Franz Jos Och. 00. Giza sotwar. Intrnal rport, RWTH Aachn Univrsity. 64