Integrating Morphology in Probabilistic Translation Models

Similar documents
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Word Alignment. Chris Dyer, Carnegie Mellon University

IBM Model 1 for Machine Translation

Word Alignment III: Fertility Models & CRFs. February 3, 2015

Lexical Translation Models 1I

Lexical Translation Models 1I. January 27, 2015

Prenominal Modifier Ordering via MSA. Alignment

Latent Variable Models in NLP

Natural Language Processing (CSEP 517): Machine Translation

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Driving Semantic Parsing from the World s Response

Learning to translate with neural networks. Michael Auli

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011

EM with Features. Nov. 19, Sunday, November 24, 13

Tuning as Linear Regression

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

Learning Features from Co-occurrences: A Theoretical Analysis

Hierarchical Bayesian Nonparametrics

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Phrase-Based Statistical Machine Translation with Pivot Languages

Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms

Conditional Language Modeling. Chris Dyer

Out of GIZA Efficient Word Alignment Models for SMT

CRF Word Alignment & Noisy Channel Translation

Statistical NLP Spring Corpus-Based MT

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Expectation Maximization (EM)

Statistical Methods for NLP

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Phrase Table Pruning via Submodular Function Maximization

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Multi-Task Word Alignment Triangulation for Low-Resource Languages

Latent Dirichlet Allocation Introduction/Overview

Bringing machine learning & compositional semantics together: central concepts

Multiword Expression Identification with Tree Substitution Grammars

A Discriminative Model for Semantics-to-String Translation

Multi-Source Neural Translation

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Natural Language Processing (CSEP 517): Text Classification

Discriminative Training

Information Extraction from Text

Word Alignment for Statistical Machine Translation Using Hidden Markov Models

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Lab 12: Structured Prediction

Maschinelle Sprachverarbeitung

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Applied Natural Language Processing

Maschinelle Sprachverarbeitung

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Sequences and Information

More Smoothing, Tuning, and Evaluation

Conditional Random Fields

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Statistical Methods for NLP

Ecient Higher-Order CRFs for Morphological Tagging

Variational Decoding for Statistical Machine Translation

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale

Machine Translation Evaluation

Improved Decipherment of Homophonic Ciphers

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

Multi-Source Neural Translation

Bayesian Tools for Natural Language Learning. Yee Whye Teh Gatsby Computational Neuroscience Unit UCL

Midterm sample questions

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Nonparametrics for Speech and Signal Processing

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Fast Collocation-Based Bayesian HMM Word Alignment

DT2118 Speech and Speaker Recognition

Machine Translation without Words through Substring Alignment

Sampling Alignment Structure under a Bayesian Translation Model

statistical machine translation

Machine Learning for Structured Prediction

Unsupervised Rank Aggregation with Distance-Based Models

Theory of Alignment Generators and Applications to Statistical Machine Translation

Multiple System Combination. Jinhua Du CNGL July 23, 2008

Word Alignment via Submodular Maximization over Matroids

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING

Graphical models for part of speech tagging

Structure and Complexity of Grammar-Based Machine Translation

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems

CS230: Lecture 10 Sequence models II

Neural Hidden Markov Model for Machine Translation

From perceptrons to word embeddings. Simon Šuster University of Groningen

The Noisy Channel Model and Markov Models

Latent Dirichlet Allocation (LDA)

Natural Language Processing

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)

Hidden Markov Models

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

Word Alignment by Thresholded Two-Dimensional Normalization

Transcription:

Integrating Morphology in Probabilistic Translation Models Chris Dyer joint work with Jon Clark, Alon Lavie, and Noah Smith January 24, 2011 lti

das alte Haus the old house mach das do that 2

das alte Haus the old house mach das do that guten Tag hello 3

1 4 2 5 7 6 3 1 8 10 9 11 12 4

das alte Haus the old house mach das do that guten Tag hello 5

das alte Haus the old house Haus mach das do that guten Tag hello 6

das alte Haus the old house Haus mach das do that house guten Tag hello 7

das alte Haus the old house das mach das do that guten Tag hello 8

das alte Haus the old house das mach das do that the guten Tag hello 9

das alte Haus the old house das mach das do that that guten Tag hello 10

das alte Haus the old house markant mach das do that guten Tag hello 11

das alte Haus the old house markant mach das do that??? guten Tag hello 12

So far so good, but... 13

das alte Haus the old house alten mach das do that guten Tag hello 14

das alte Haus the old house alten mach das do that??? guten Tag hello 15

das alte Haus the old house alten mach das do that old? guten Tag hello 16

Problems 1. Source language inflectional richness. 17

das alte Haus the old house mach das do that guten Tag hello 18

das alte Haus the old house mach das do that guten Tag hello 19

das alte Haus the old house mach das do that old guten Tag hello 20

das alte Haus the old house alte mach das do that old guten Tag hello 21

das alte Haus the old house alten? mach das do that old guten Tag hello 22

Problems 1. Source language inflectional richness. 2. Target language inflectional richness. 23

Kopfschmerzen head ache Bauchschmerzen Rückenschmerzen abdominal pain Rücken back Kopf head 24

Kopfschmerzen head ache Bauchschmerzen Rückenschmerzen abdominal pain Rücken??? back Kopf head 25

Kopfschmerzen head ache Bauchschmerzen Rückenschmerzen abdominal pain Rücken back pain back Kopf head 26

Kopfschmerzen head ache Bauchschmerzen Rückenschmerzen abdominal pain Rücken back ache back Kopf head 27

Problems 1. Source language inflectional richness. 2. Target language inflectional richness. 3.Source language sublexical semantic compositionality. 28

General Solution MORPHOLOGY 29

f Analysis f Translation e Synthesis e 30

f f e 31

f AlAbAmA f e 31

f AlAbAmA f Al# Abama (looks like Al + OOV) e 31

f AlAbAmA f Al# Abama (looks like Al + OOV) e the Ibama 31

But...Ambiguity! Morphology is an inherently ambiguous problem Competing linguistic theories Lexicalization Morphological analyzers (tools) make mistakes Are minimal linguistic morphemes the optimal morphemes for MT? 32

Problems 1. Source language inflectional richness. 2. Target language inflectional richness. 3.Source language sublexical semantic compositionality. 4. Ambiguity everywhere! 33

General Solution MORPHOLOGY + PROBABILITY 34

Why probability? Probabilistic models formalize uncertainty e.g., words can be formed via a morphological derivation according to a joint distribution: p(word, derivation) The probability of a word is naturally defined as the marginal probability: p(word) = p(word, derivation) derivation Such a model can even be trained observing just words (EM!) 35

p(derived) = p(derived, de+rive+d) + p(derived, derived+ ) + p(derived, derive+d) + p(derived, deriv+ed) +... 36

Outline Introduction: 4 problems Three probabilistic modeling solutions Embracing uncertainty: multi-segmentations for decoding and learning Rich morphology via sparse lexical features Hierarchical Bayesian translation: infinite translation lexicons Conclusion 37

Outline Introduction: 4 problems Three probabilistic modeling solutions Embracing uncertainty: multi-segmentations for decoding and learning Rich morphology via sparse lexical features Hierarchical Bayesian translation: infinite translation lexicons Conclusion 38

f 39

f AlAbAmA 39

f AlAbAmA f Al# Abama f AlAbama 39

f AlAbAmA f Al# Abama f AlAbama e the Ibama e Alabama 39

Two problems We need to decode lots of similar source candidates efficiently Lattice / confusion network decoding Kumar & Byrne (EMNLP, 2005), Bertoldi, Zens, Federico (ICAASP, 2007), Dyer et al. (ACL, 2008), inter alia We need a model to generate a set of candidate sources What are the right candidates? 40

Two problems We need to decode lots of similar source candidates efficiently Lattice / confusion network decoding Kumar & Byrne (EMNLP, 2005), Bertoldi, Zens, Federico (ICAASP, 2007), Dyer et al. (ACL, 2008), inter alia We need a model to generate a set of candidate sources What are the right candidates? 41

Uncertainty is everywhere Requirement: a probabilistic model p(f f) that transforms f f Possible solution: a discriminatively trained model, e.g., a CRF Required data: example (f,f ) pairs from a linguistic expert or other source 42

Uncertainty is everywhere What is the best/right analysis... for MT? AlAntxAbAt (DEF+election+PL) 43

Uncertainty is everywhere What is the best/right analysis... for MT? AlAntxAbAt (DEF+election+PL) Some possibilities: Sadat & Habash (NAACL, 2007) AlAntxAb +At Al+ AntxAb +At Al+ AntxAbAt AlAntxAbAt 43

Uncertainty is everywhere What is the best/right analysis... for MT? AlAntxAbAt (DEF+election+PL) Some possibilities: Sadat & Habash (NAACL, 2007) AlAntxAb +At Al+ AntxAb +At Al+ AntxAbAt AlAntxAbAt Let s use them all! 43

Wait...multiple references?!? Train with EM variant Lattices can encode very large sets of references and support efficient inference Bonus: annotation task is much simpler Don t know whether to label an example with A or B? Dyer (NAACL, 2009), Dyer (thesis, 2010) Label it with both! 44

Wait...multiple references?!? Train with EM variant Lattices can encode very large sets of references and support efficient inference Bonus: annotation task is much simpler Don t know whether to label an example with A or B? Dyer (NAACL, 2009), Dyer (thesis, 2010) Label it with both! 45

Reference Segmentations f f freitag freitag tonbandaufnahme ton tonband band aufnahme 46

good phonotactics! Rücken + schmerzen Rückenschmerzen Rückensc + hmerzen Rü + cke + nschme + rzen bad phonotactics! Phonotactic features! 47

Just 20 features Phonotactic probability Lexical features (in vocab, OOV) Lexical frequencies Is high frequency? Segment length... https://github.com/redpony/cdec/tree/master/compound-split 48

Input: tonbandaufnahme 49

! " #$%&'()*!+,-. - #$%/&'()*!+01., #$%/2&'()*!+0". 3 #$%/2%&'()*!+00. 4 #$%/2%5&'()*!+34. 0 #$%/2%52&'()*!+16. 1 #$%/2%527&'()*!+6,. 6 #$%/2%5278&'()*!+69. 9 #$%/2%5278%&'()*!+94. "! #$%/2%5278%2&'()*"+!9. "" #$%/2%5278%2:;<&'()*"+!0. /2%&'()*!+,-. /2%5&'()*!+3!. /2%52&'()*!+36. /2%527&'()*!+10. /2%5278&'()*!+6-. /2%5278%&'()*!+61. /2%5278%2&'()*!+9,. /2%5278%2:;<&'()*"+"9. 2%5&'()*!+91. 2%52&'()*!+40. 2%527&'()*!+10. 2%5278&'()*!+6". 2%5278%&'()*!+61. 2%5278%2&'()*!+9,. 2%5278%2:;<&'()*"+"!. %52&'()*!+0-. %527&'()*"+-1. %5278&'()*"+-". %5278%&'()*"+-1. %5278%2&'()*"+,,. %5278%2:;<&'()*"+4!. 527&'()*!+30. 5278&'()*"+!6. 5278%&'()*"+!-. 5278%2&'()*"+!1. 5278%2:;<&'()*"+-4. 278&'()*!+4-. 278%&'()*!+3,. 278%2&'()*!+49. 278%2:;<&'()*!+,6. 78%&'()*!+60. 78%2&'()*"+3,. 78%2:;<&'()*"+39. 8%2&'()*!+19. 8%2:;<&'()*"+,3. %2:;<&'()*!+03. 2:;<&'()*!+1!. :;<&'()*!+66. 49 tonbandaufnahme Input:

! " #$%&'%(')*%'+,-./01!2!!3 4 #$%&'%(./015!2673 ')*%'+,-./015!2893! " #$%&'()*!+,-. - #$%/0%1&'()*!+23., #$%/0%1045%0678&'()*"+!9. /0%1&'()*!+2!. 045%0678&'()*!+,:.! " #$%&'()*!+,-. - #$%/&'()*!+01., #$%/2&'()*!+0". 3 #$%/2%&'()*!+00. 4 #$%/2%5&'()*!+34. 0 #$%/2%52&'()*!+16. 1 #$%/2%527&'()*!+6,. 6 #$%/2%5278&'()*!+69. 9 #$%/2%5278%&'()*!+94. "! #$%/2%5278%2&'()*"+!9. "" #$%/2%5278%2:;<&'()*"+!0. /2%&'()*!+,-. /2%5&'()*!+3!. /2%52&'()*!+36. /2%527&'()*!+10. /2%5278&'()*!+6-. /2%5278%&'()*!+61. /2%5278%2&'()*!+9,. /2%5278%2:;<&'()*"+"9. 2%5&'()*!+91. 2%52&'()*!+40. 2%527&'()*!+10. 2%5278&'()*!+6". 2%5278%&'()*!+61. 2%5278%2&'()*!+9,. 2%5278%2:;<&'()*"+"!. %52&'()*!+0-. %527&'()*"+-1. %5278&'()*"+-". %5278%&'()*"+-1. %5278%2&'()*"+,,. %5278%2:;<&'()*"+4!. 527&'()*!+30. 5278&'()*"+!6. 5278%&'()*"+!-. 5278%2&'()*"+!1. 5278%2:;<&'()*"+-4. 278&'()*!+4-. 278%&'()*!+3,. 278%2&'()*!+49. 278%2:;<&'()*!+,6. 78%&'()*!+60. 78%2&'()*"+3,. 78%2:;<&'()*"+39. 8%2&'()*!+19. 8%2:;<&'()*"+,3. %2:;<&'()*!+03. 2:;<&'()*!+1!. :;<&'()*!+66. a=0.4 a=0.2 50 a= tonbandaufnahme Input:

1 0.995 0.99 0.985 Recall 0.98 0.975 0.97 0.965 0.96 0.955 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Precision

Translation Evaluation Input BLEU TER Unsegmented 20.8 61.0 1-best segmentation 20.3 60.2 Lattice (a=0.2) 21.5 59.8 in police raids found illegal guns, ammunition stahlkern, laserzielfernrohr and a machine gun. in police raids found with illegal guns and ammunition steel core, a laser objective telescope and a machine gun. REF: police raids found illegal guns, steel core ammunition, a laser scope and a machine gun. 52

Outline Introduction: 4 problems Three probabilistic modeling solutions Embracing uncertainty: multi-segmentations for decoding and learning Rich morphology via sparse lexical features Hierarchical Bayesian translation: infinite translation lexicons Conclusion 53

What do we see when we look inside the IBM models? (or any multinomial-based generative model...like parsing models!) 54

What do we see when we look inside the IBM models? (or any multinomial-based generative model...like parsing models!) old altes 0.3 car Wagen 0.2 alte alt alter gammelig 0.1 0.2 0.1 0.1 Auto PKW 0.6 0.2 gammeliges 0.1 55

What do we see when we look inside the IBM models? (or any multinomial-based generative model...like parsing models!) old altes 0.3 car Wagen 0.2 alte alt alter gammelig 0.1 0.2 0.1 0.1 Auto PKW 0.6 0.2 gammeliges 0.1 56

DLVM for Translation Addresses problems: 1. Source language inflectional richness. 2. Target language inflectional richness. How? 1. Replace the locally normalized multinomial parameterization in a translation model globally normalized log-linear model. p(e f) with a 2. Add lexical association features sensitive to sublexical units. C. Dyer, J. Clark, A. Lavie, and N. Smith (in review) 57

s s n n a 1 a 2 a 3 a n s s s s a... 1 a 2 a 3 a... n... t 1 t 2 t 3 t n s s s... t 1 t 2 t 3 t n s Fully directed model (Brown et al., 1993; Vogel et al., 1996; Berg-Kirkpatrick et al., 2010) Our model 58

s s n n a 1 a 2 a 3 a n s s s s a... 1 a 2 a 3 a... n... t 1 t 2 t 3 t n s s s... t 1 t 2 t 3 t n s Fully directed model (Brown et al., 1993; Vogel et al., 1996; Berg-Kirkpatrick et al., 2010) Our model 59

old altes 0.3 car Wagen 0.2 alte alt alter gammelig 0.1 0.2 0.1 0.1 Auto PKW 0.6 0.2 gammeliges 0.1 60

old altes 0.3 car Wagen 0.2 alte alt alter gammelig 0.1 0.2 0.1 0.1 Auto PKW 0.6 0.2 gammeliges 0.1 New model: score(e,f) = 0.2h1(e,f) + 0.9h2(e,f) + 1.3h1(e,f) +... old alt+ Ω [0,2] gammelig+ Ω [0,2] 61

old altes 0.3 car Wagen 0.2 alte alt alter gammelig 0.1 0.2 0.1 0.1 Auto PKW 0.6 0.2 gammeliges 0.1 New model: score(e,f) = 0.2h1(e,f) + 0.9h2(e,f) + 1.3h1(e,f) +... old alt+ Ω [0,2] gammelig+ Ω [0,2] (~ Incremental vs. realizational) 62

Sublexical Features každoroční annual IDkaždoroční_annual PREFIXkaž_ann PREFIXkažd_annu PREFIXkaždo_annua SUFFIXí_l SUFFIXní_al 63

Sublexical Features každoroční annually IDkaždoroční_annually PREFIXkaž_ann PREFIXkažd_annu PREFIXkaždo_annua SUFFIXí_y SUFFIXní_ly 64

Sublexical Features každoročního annually IDkaždoročního_annually PREFIXkaž_ann PREFIXkažd_annu PREFIXkaždo_annua SUFFIXo_y SUFFIXho_ly 65

Sublexical Features každoročního annually IDkaždoročního_annually PREFIXkaž_ann PREFIXkažd_annu PREFIXkaždo_annua }Abstract away from inflectional variation! SUFFIXo_y SUFFIXho_ly 66

Evaluation Given a parallel corpus (no supervised alignments!), we can infer The weights in the log-linear translation model The MAP alignment The model is a translation model, but we evaluate it as applied to alignment 67

Alignment Evaluation AER e f 24.8 Model 4 f e 33.6 sym. 23.4 e f 21.9 DLVM f e 29.3 sym. 20.5 Czech-English, 3.1M words training, 525 sentences gold alignments. 68

Translation Evaluation Alignment BLEU METEOR TER Model 4 16.3 σ=0.2 46.1 σ=0.1 67.4 σ=0.3 Our model 16.5 σ=0.1 46.8 σ=0.1 67.0 σ=0.2 Both 17.4 σ=0.1 47.7 σ=0.1 66.3 σ=0.5 Czech-English, WMT 2010 test set, 1 reference 69

Outline Introduction: 4 problems Three probabilistic modeling solutions Embracing uncertainty: multi-segmentations for decoding and learning Rich morphology via sparse lexical features Hierarchical Bayesian translation: infinite translation lexicons Conclusion 70

Bayesian Translation Addresses problems: 2. Target language inflectional richness. How? 1. Replace multinomials in a lexical translation model with a process that generates target language lexical items by combining stems and suffixes. 2. Fully inflected forms can be generated, but a hierarchical prior backs off to a component-wise generation. 71

Chinese Restaurant Process a b a c x... 72

Chinese Restaurant Process New customer a b a c x... 73

Chinese Restaurant Process a b a c x... 1 3 1 2 αp 0 (x) 7+α 7+α 7+α 7+α 7+α 74

Chinese Restaurant Process a b a c x... 1 3 1 2 αp 0 (x) 7+α 7+α 7+α 7+α 7+α α P 0 (x) Concentration parameter Base distribution 75

old altes 0.3 car Wagen 0.2 alte alt alter gammelig 0.1 0.2 0.1 0.1 Auto PKW 0.6 0.2 gammeliges 0.1 76

old altes 0.3 car Wagen 0.2 alte alt alter gammelig 0.1 0.2 0.1 0.1 Auto PKW 0.6 0.2 gammeliges 0.1 New model: suffixes +es + old alt+e +es alt +en alt+es +e +er alt+ + 77

Modeling assumptions Observed words are formed by an unobserved process that concatenates a stem α and a suffix β, yielding αβ A source word should have only a few translations αβ translate into only a few stems α The suffix β occurs many times, with many different stems β may be null β will have a maximum length of r Once a word has been translated into some inflected form, that inflected form, its stem, and its suffix should be more likely ( rich get richer ) 78

f Translation e Synthesis e X Z Observed during training Latent variable 79

f Translation e + Synthesis e X Z Observed during training Latent variable 80

Task: Translate the word old 81

Task: Translate the word old old alt+e alt+ gammelig+ inflected old 82

Task: Translate the word old alt old alt+e alt + alt+ gammelig +e gammelig+ inflected old stem old suffix old 83

Task: Translate the word old alt + old alt+e alt + alt+ gammelig +e gammelig+ inflected old stem old? 84

+en +e +s alt + + +er old alt+e alt + alt+ gammelig +e gammelig+ inflected old stem old? 85

+en +e +s alt + en + +er old alt+e alt + alt+ gammelig +e gammelig+ inflected old stem old +en 86

Evaluation Given a parallel corpus, we can infer The MAP alignment The MAP segmentation of each target word into <stem+suffix> 87

Alignment Evaluation AER Model 1 - EM Model 1 - HPYP Model 1 - EM Model 1 - HPYP f e 43.3 f e 37.5 e f 38.4 e f 36.6 English-French, 115k words, 447 sentences gold alignments. 88

Frequent suffixes Suffix Count + 20,837 +s 334 +d 217 +e 156 +n 156 +y 130 +ed 121 +ing 119 89

Assessment Breaking the lexical independence assumption is computationally costly The search space is much, much larger! Dealing only with inflectional morphology simplifies the problems Sparse priors are crucial for avoiding degenerate solutions 90

In conclusion... 91

Why don t we have integrated morphology? 92

Why don t we have integrated morphology? Because we spend all our time working on English, which doesn t have much morphology! 93

Why don t we have integrated morphology? Translation with words is already hard: an n-word sentence has n! permutations But, if you re looking at a sentence with m letters there are m! permutations Search is... considerably harder m > n m! >>>>> n! Modeling is harder too must also support all these permutations! 94

Take away messages Morphology matters for MT Probabilistic models are a great fit for the uncertainty involved Breaking the lexical independence assumption is hard 95

Thank you! Toda! $kraf! https://github.com/redpony/cdec/

n Poisson(λ) a i Uniform(1/ f ) e i f ai T fai T fai a, b, M PYP(a, b, M( f ai )) M(e = α + β f) =G f (α) H f (β) G f a, b, f, P 0 PYP(a, b, P 0 ( )) H f a, b, f, H 0 PYP(a, b, H 0 ( )) H 0 a, b, Q 0 PYP(a, b, Q 0 ( )) P 0 (α; p) = Q 0 (β; r) = p β (1 p) V β 1 ( V r) β 97