Decoding Revisited: Easy-Part-First & MERT. February 26, 2015

Size: px
Start display at page:

Download "Decoding Revisited: Easy-Part-First & MERT. February 26, 2015"

Transcription

1 Decoding Revisited: Easy-Part-First & MERT February 26, 2015

2 Translating the Easy Part First? the tourism initiative addresses this for the first time the die tm:-0.19,lm:-0.4, d:0, all:-0.65 tourism touristische tm:-1.16,lm:-2.93 d:0, all:-4.09 initiative initiative tm:-1.21,lm:-4.67 d:0, all:-5.88 the first time das erste mal tm:-0.56,lm:-2.81 d: all:-4.11 both hypotheses translate 3 words worse hypothesis has better score Chapter 6: Decoding 25

3 Estimating Future Cost Future cost estimate: how expensive is translation of rest of sentence? Optimistic: choose cheapest translation options Cost for each translation option translation model: cost known language model: output words known, but not context! estimate without context reordering model: unknown, ignored for future cost estimation Chapter 6: Decoding 26

4 Cost Estimates from Translation Options the tourism initiative addresses this for the first time cost of cheapest translation options for each input span (log-probabilities) Chapter 6: Decoding 27

5 Cost Estimates for all Spans Compute cost estimate for all contiguous spans by combining cheapest options first future cost estimate for n words (from first) word the tourism initiative addresses this for the first time -1.6 Function words cheaper (the: -1.0) than content words (tourism -2.0) Common phrases cheaper (for the first time: -2.3) than unusual ones (tourism initiative addresses: -5.9) Chapter 6: Decoding 28

6 Combining Score and Future Cost the tourism initiative die touristische initiative tm:-1.21,lm:-4.67 d:0, all: the first time das erste mal this for... time für diese zeit = = = tm:-0.56,lm:-2.81 tm:-0.82,lm: d: all:-4.11 d: all:-4.86 Hypothesis score and future cost estimate are combined for pruning left hypothesis starts with hard part: the tourism initiative score: -5.88, future cost: -6.1! total cost middle hypothesis starts with easiest part: the first time score: -4.11, future cost: -9.3! total cost right hypothesis picks easy parts: this for... time score: -4.86, future cost: -9.1! total cost Chapter 6: Decoding 29

7 f: Maria no dio una bofetada a la bruja verde Q[0] Q[1] Q[2]... Mary : <s> Mary : * : 0.9 fc: 8.6e-9 e: <s> cp : Maria e: <s> Maria : 1.0 fc: 1.5e-9 c : * p: 0.3 fc: 8.6e-9 Not e cp e cp : <s> Not : -* : 0.4 fc: 1.0e-9 Future costs make these }hypotheses comparable.

8 Other Decoding Algorithms A* search Greedy hill-climbing Using finite state transducers (standard toolkits) Chapter 6: Decoding 30

9 A* Search probability + heuristic estimate cheapest score depth-first expansion to completed path number of words covered Uses admissible future cost heuristic: never overestimates cost Translation agenda: create hypothesis with lowest score + heuristic cost Done, when complete hypothesis created Chapter 6: Decoding 31

10 Greedy Hill-Climbing Create one complete hypothesis with depth-first search (or other means) Search for better hypotheses by applying change operators change the translation of a word or phrase combine the translation of two words into a phrase split up the translation of a phrase into two smaller phrase translations move parts of the output into a di erent position swap parts of the output with the output at a di erent part of the sentence Terminates if no operator application produces a better translation Chapter 6: Decoding 32

11 Decoding algorithm Translation as a search problem Partial hypothesis keeps track of which source words have been translated (coverage vector) n-1 most recent words of English (for LM!) a back pointer list to the previous hypothesis + (e,f) phrase pair used the (partial) translation probability the estimated probability of translating the remaining words (precomputed, a function of the coverage vector) Start state: no translated words, E=<s>, bp=nil Goal state: all translated words

12 Decoding algorithm Q[0] Start state for i = 0 to f -1 Keep b best hypotheses at Q[i] for each hypothesis h in Q[i] for each untranslated span in h.c for which there is a translation <e,f> in the phrase table h = h extend by <e,f> Is there an item in Q[ h.c ] with = LM state? yes: update the item bp list and probability no: Q[ h.c ] h Find the best hypothesis in Q[ f ], reconstruction translation by following back pointers

13 Parameter Learning: Review 13

14 K-Best List Example h 1 ~w #3 #6 #5 #4 #2#1 0.8 apple < apple < apple < 0.6 #8 #7 0.2 apple < apple < 0.2 #10 #9 h 2 14

15 h 1 h 2 Fit a linear model 15

16 h 1 h 2 ~w Fit a linear model 16

17 K-Best List Example h 1 #3 #2#1 0.8 apple < 1.0 #6 #5 #4 0.6 apple < apple < 0.6 ~w #8 #7 0.2 apple < apple < 0.2 #10 #9 h 2 17

18 Limitations We can t optimize corpus-level metrics, like BLEU, on a test set These don t decompose by sentence! We turn now to a kind of direct cost minimization 18

19 MERT Minimum Error Rate Training Directly target an automatic evaluation metric BLEU is defined at the corpus level MERT optimizes at the corpus level Downsides Does not deal well with > ~20 features 19

20 MERT Given weight vector w, any hypothesis he, ai will have a (scalar) score m = w > h(g, e, a) Now pick a search vector v, and consider how the score of this hypothesis will change: w new = w + v m =(w + v) > h(g, e, a) = w > h(g, e, a) {z } b + v > h(g, e, a) {z } a m = a + b 20

21 MERT Given weight vector w, any hypothesis he, ai will have a (scalar) score m = w > h(g, e, a) Now pick a search vector v, and consider how the score of this hypothesis will change: w new = w + v m =(w + v) > h(g, e, a) = w > h(g, e, a) {z } b + v > h(g, e, a) {z } a m = a + b 21

22 MERT Given weight vector w, any hypothesis he, ai will have a (scalar) score m = w > h(g, e, a) Now pick a search vector v, and consider how the score of this hypothesis will change: w new = w + v m =(w + v) > h(g, e, a) = w > h(g, e, a) {z } b + v > h(g, e, a) {z } a m = a + b 22

23 MERT Given weight vector w, any hypothesis he, ai will have a (scalar) score m = w > h(g, e, a) Now pick a search vector v, and consider how the score of this hypothesis will change: w new = w + v m =(w + v) > h(g, e, a) = w > h(g, e, a) {z } b + v > h(g, e, a) {z } a m = a + b 23

24 MERT Given weight vector w, any hypothesis he, ai will have a (scalar) score m = w > h(g, e, a) Now pick a search vector v, and consider how the score of this hypothesis will change: w new = w + v m =(w + v) > h(g, e, a) = w > h(g, e, a) {z } b = a + b + v > h(g, e, a) {z } a m Linear function in 2D! 24

25 MERT m 25

26 MERT m Recall our k-best set { e i, a i } K i=1 26

27 MERT m Recall our k-best set { e i, a i } K i=1 27

28 MERT m 28

29 MERT he 162, a 162i m he 28, a 28i he 73, a 73i 29

30 MERT he 162, a 162i m he 28, a 28i he 73, a 73i 30

31 MERT he 162, a 162i m he 28, a 28i he 73, a 73i errors 31

32 MERT he 162, a 162i m he 28, a 28i he 73, a 73i errors 32

33 MERT m errors 33

34 MERT m errors 34

35 errors Let w new = v + w 35

36 MERT In practice errors are sufficient statistics for evaluation metrics (e.g., BLEU) Can maximize or minimize How do you pick the search direction? 36

37 Dynamic Programming MERT 37

38 Other Algorithms Given a hypergraph translation space In the Viterbi (Inside) algorithm, there are two operations Multiplication (extend path) Maximization (choose between paths) Semirings generalize these to compute other quantities

39 Semirings

40 Inside Algorithm

41 Point-Line Duality Represent a set of lines as a set of points (and vice-versa) y = mx + b => (m, -b) The slope between dual points is the intersection x-axis of the pair of lines An upper envelope is dual to a lower convex hull

42 Primal Dual

43 Convex Hull Semiring

44 Theorem 2 The Inside algorithm with the computes the convex hull dual to the MERT upper envelope generated from the -best list of derivations

45 Summary Evaluation metrics Figure out how well we re doing Figure out if a feature helps Train your system What s a great way to improve translation? Improve evaluation! 45

Discriminative Training

Discriminative Training Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder

More information

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments Overview Learning phrases from alignments 6.864 (Fall 2007) Machine Translation Part III A phrase-based model Decoding in phrase-based models (Thanks to Philipp Koehn for giving me slides from his EACL

More information

COMS 4705, Fall Machine Translation Part III

COMS 4705, Fall Machine Translation Part III COMS 4705, Fall 2011 Machine Translation Part III 1 Roadmap for the Next Few Lectures Lecture 1 (last time): IBM Models 1 and 2 Lecture 2 (today): phrase-based models Lecture 3: Syntax in statistical machine

More information

Decoding in Statistical Machine Translation. Mid-course Evaluation. Decoding. Christian Hardmeier

Decoding in Statistical Machine Translation. Mid-course Evaluation. Decoding. Christian Hardmeier Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation http://stp.lingfil.uu.se/~sara/kurser/mt16/ mid-course-eval.html Decoding The decoder is the part of the

More information

The Geometry of Statistical Machine Translation

The Geometry of Statistical Machine Translation The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide

More information

Minimum Error Rate Training Semiring

Minimum Error Rate Training Semiring Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)

More information

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks

Statistical Machine Translation. Part III: Search Problem. Complexity issues. DP beam-search: with single and multi-stacks Statistical Machine Translation Marcello Federico FBK-irst Trento, Italy Galileo Galilei PhD School - University of Pisa Pisa, 7-19 May 008 Part III: Search Problem 1 Complexity issues A search: with single

More information

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned

More information

Natural Language Processing (CSEP 517): Machine Translation

Natural Language Processing (CSEP 517): Machine Translation Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,

More information

Computing Lattice BLEU Oracle Scores for Machine Translation

Computing Lattice BLEU Oracle Scores for Machine Translation Computing Lattice Oracle Scores for Machine Translation Artem Sokolov & Guillaume Wisniewski & François Yvon {firstname.lastname}@limsi.fr LIMSI, Orsay, France 1 Introduction 2 Oracle Decoding Task 3 Proposed

More information

Statistical NLP Spring HW2: PNP Classification

Statistical NLP Spring HW2: PNP Classification Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional

More information

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment

HW2: PNP Classification. Statistical NLP Spring Levels of Transfer. Phrasal / Syntactic MT: Examples. Lecture 16: Word Alignment Statistical NLP Spring 2010 Lecture 16: Word Alignment Dan Klein UC Berkeley HW2: PNP Classification Overall: good work! Top results: 88.1: Matthew Can (word/phrase pre/suffixes) 88.1: Kurtis Heimerl (positional

More information

Discriminative Training. March 4, 2014

Discriminative Training. March 4, 2014 Discriminative Training March 4, 2014 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder e =

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi

More information

statistical machine translation

statistical machine translation statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical

More information

Statistical NLP Spring Corpus-Based MT

Statistical NLP Spring Corpus-Based MT Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it

More information

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1 Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)

More information

Soft Inference and Posterior Marginals. September 19, 2013

Soft Inference and Posterior Marginals. September 19, 2013 Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference

More information

Decoding and Inference with Syntactic Translation Models

Decoding and Inference with Syntactic Translation Models Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP

More information

IBM Model 1 and the EM Algorithm

IBM Model 1 and the EM Algorithm IBM Model 1 and the EM Algorithm Philipp Koehn 14 September 2017 Lexical Translation 1 How to translate a word look up in dictionary Haus house, building, home, household, shell. Multiple translations

More information

Statistical Machine Translation

Statistical Machine Translation Statistical Machine Translation -tree-based models (cont.)- Artem Sokolov Computerlinguistik Universität Heidelberg Sommersemester 2015 material from P. Koehn, S. Riezler, D. Altshuler Bottom-Up Decoding

More information

23. Cutting planes and branch & bound

23. Cutting planes and branch & bound CS/ECE/ISyE 524 Introduction to Optimization Spring 207 8 23. Cutting planes and branch & bound ˆ Algorithms for solving MIPs ˆ Cutting plane methods ˆ Branch and bound methods Laurent Lessard (www.laurentlessard.com)

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview

More information

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 ) Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

Phrase-Based Statistical Machine Translation with Pivot Languages

Phrase-Based Statistical Machine Translation with Pivot Languages Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008

More information

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning CS 188: Artificial Intelligence Spring 21 Lecture 22: Nearest Neighbors, Kernels 4/18/211 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!) Remaining

More information

Improved Decipherment of Homophonic Ciphers

Improved Decipherment of Homophonic Ciphers Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories

Speech Translation: from Singlebest to N-Best to Lattice Translation. Spoken Language Communication Laboratories Speech Translation: from Singlebest to N-Best to Lattice Translation Ruiqiang ZHANG Genichiro KIKUI Spoken Language Communication Laboratories 2 Speech Translation Structure Single-best only ASR Single-best

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1 Lecture 9: Decoding Andreas Maletti Statistical Machine Translation Stuttgart January 20, 2012 SMT VIII A. Maletti 1 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 4: Query Optimization Query Optimization Cost estimation Strategies for exploring plans Q min CS 347 Notes 4 2 Cost Estimation Based on

More information

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing

Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for Natural Language Processing Probabilistic Graphical Models: Lagrangian Relaxation Algorithms for atural Language Processing Alexander M. Rush (based on joint work with Michael Collins, Tommi Jaakkola, Terry Koo, David Sontag) Uncertainty

More information

Informed Search. Chap. 4. Breadth First. O(Min(N,B L )) BFS. Search. have same cost BIBFS. Bi- Direction. O(Min(N,2B L/2 )) BFS. have same cost UCS

Informed Search. Chap. 4. Breadth First. O(Min(N,B L )) BFS. Search. have same cost BIBFS. Bi- Direction. O(Min(N,2B L/2 )) BFS. have same cost UCS Informed Search Chap. 4 Material in part from http://www.cs.cmu.edu/~awm/tutorials Uninformed Search Complexity N = Total number of states B = Average number of successors (branching factor) L = Length

More information

Informed Search. Day 3 of Search. Chap. 4, Russel & Norvig. Material in part from

Informed Search. Day 3 of Search. Chap. 4, Russel & Norvig. Material in part from Informed Search Day 3 of Search Chap. 4, Russel & Norvig Material in part from http://www.cs.cmu.edu/~awm/tutorials Uninformed Search Complexity N = Total number of states B = Average number of successors

More information

National Centre for Language Technology School of Computing Dublin City University

National Centre for Language Technology School of Computing Dublin City University with with National Centre for Language Technology School of Computing Dublin City University Parallel treebanks A parallel treebank comprises: sentence pairs parsed word-aligned tree-aligned (Volk & Samuelsson,

More information

Dual Decomposition for Natural Language Processing. Decoding complexity

Dual Decomposition for Natural Language Processing. Decoding complexity Dual Decomposition for atural Language Processing Alexander M. Rush and Michael Collins Decoding complexity focus: decoding problem for natural language tasks motivation: y = arg max y f (y) richer model

More information

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints.

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints. Section Notes 8 Integer Programming II Applied Math 121 Week of April 5, 2010 Goals for the week understand IP relaxations be able to determine the relative strength of formulations understand the branch

More information

Optimal Planning for Delete-free Tasks with Incremental LM-cut

Optimal Planning for Delete-free Tasks with Incremental LM-cut Theoretical Background Contributions Experiments Conclusion Optimal Planning for Delete-free Tasks with Incremental LM-cut Florian Pommerening and Malte Helmert Universität Basel Departement Informatik

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

CS540 ANSWER SHEET

CS540 ANSWER SHEET CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

N-gram Language Modeling

N-gram Language Modeling N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical

More information

Introduction to Mathematical Programming

Introduction to Mathematical Programming Introduction to Mathematical Programming Ming Zhong Lecture 22 October 22, 2018 Ming Zhong (JHU) AMS Fall 2018 1 / 16 Table of Contents 1 The Simplex Method, Part II Ming Zhong (JHU) AMS Fall 2018 2 /

More information

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Lagrangian Relaxation Algorithms for Inference in Natural Language Processing Alexander M. Rush and Michael Collins (based on joint work with Yin-Wen Chang, Tommi Jaakkola, Terry Koo, Roi Reichart, David

More information

Statistical Ranking Problem

Statistical Ranking Problem Statistical Ranking Problem Tong Zhang Statistics Department, Rutgers University Ranking Problems Rank a set of items and display to users in corresponding order. Two issues: performance on top and dealing

More information

Word Alignment. Chris Dyer, Carnegie Mellon University

Word Alignment. Chris Dyer, Carnegie Mellon University Word Alignment Chris Dyer, Carnegie Mellon University John ate an apple John hat einen Apfel gegessen John ate an apple John hat einen Apfel gegessen Outline Modeling translation with probabilistic models

More information

Support Vector Machines

Support Vector Machines Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/22/2010 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements W7 due tonight [this is your last written for

More information

CMU Lecture 4: Informed Search. Teacher: Gianni A. Di Caro

CMU Lecture 4: Informed Search. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 4: Informed Search Teacher: Gianni A. Di Caro UNINFORMED VS. INFORMED Uninformed Can only generate successors and distinguish goals from non-goals Informed Strategies that can distinguish

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016

INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 3, 7 Sep., 2016 1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 3, 7 Sep., 2016 jtl@ifi.uio.no Machine Translation Evaluation 2 1. Automatic MT-evaluation: 1. BLEU 2. Alternatives 3. Evaluation

More information

Syntax-Based Decoding

Syntax-Based Decoding Syntax-Based Decoding Philipp Koehn 9 November 2017 1 syntax-based models Synchronous Context Free Grammar Rules 2 Nonterminal rules NP DET 1 2 JJ 3 DET 1 JJ 3 2 Terminal rules N maison house NP la maison

More information

Machine Translation 1 CS 287

Machine Translation 1 CS 287 Machine Translation 1 CS 287 Review: Conditional Random Field (Lafferty et al, 2001) Model consists of unnormalized weights log ŷ(c i 1 ) ci = feat(x, c i 1 )W + b Out of log space, ŷ(c i 1 ) ci = exp(feat(x,

More information

Homework 2: MDPs and Search

Homework 2: MDPs and Search Graduate Artificial Intelligence 15-780 Homework 2: MDPs and Search Out on February 15 Due on February 29 Problem 1: MDPs [Felipe, 20pts] Figure 1: MDP for Problem 1. States are represented by circles

More information

Natural Language Processing (CSE 517): Machine Translation

Natural Language Processing (CSE 517): Machine Translation Natural Language Processing (CSE 517): Machine Translation Noah Smith c 2018 University of Washington nasmith@cs.washington.edu May 23, 2018 1 / 82 Evaluation Intuition: good translations are fluent in

More information

1 Evaluation of SMT systems: BLEU

1 Evaluation of SMT systems: BLEU 1 Evaluation of SMT systems: BLEU Idea: We want to define a repeatable evaluation method that uses: a gold standard of human generated reference translations a numerical translation closeness metric in

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science

More information

The Purpose of Hypothesis Testing

The Purpose of Hypothesis Testing Section 8 1A:! An Introduction to Hypothesis Testing The Purpose of Hypothesis Testing See s Candy states that a box of it s candy weighs 16 oz. They do not mean that every single box weights exactly 16

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

9.7 Extension: Writing and Graphing the Equations

9.7 Extension: Writing and Graphing the Equations www.ck12.org Chapter 9. Circles 9.7 Extension: Writing and Graphing the Equations of Circles Learning Objectives Graph a circle. Find the equation of a circle in the coordinate plane. Find the radius and

More information

Lectures 6, 7 and part of 8

Lectures 6, 7 and part of 8 Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

Variational Decoding for Statistical Machine Translation

Variational Decoding for Statistical Machine Translation Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Lecture 8: Determinants I

Lecture 8: Determinants I 8-1 MATH 1B03/1ZC3 Winter 2019 Lecture 8: Determinants I Instructor: Dr Rushworth January 29th Determinants via cofactor expansion (from Chapter 2.1 of Anton-Rorres) Matrices encode information. Often

More information

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 7: Testing (Feb 12, 2019) David Bamman, UC Berkeley Significance in NLP You develop a new method for text classification; is it better than what comes

More information

Language Model Rest Costs and Space-Efficient Storage

Language Model Rest Costs and Space-Efficient Storage Language Model Rest Costs and Space-Efficient Storage Kenneth Heafield Philipp Koehn Alon Lavie Carnegie Mellon, University of Edinburgh July 14, 2012 Complaint About Language Models Make Search Expensive

More information

Topics in Natural Language Processing

Topics in Natural Language Processing Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 5 Solving an NLP Problem When modelling a new problem in NLP, need to

More information

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this. Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one

More information

Large-scale Information Processing, Summer Recommender Systems (part 2)

Large-scale Information Processing, Summer Recommender Systems (part 2) Large-scale Information Processing, Summer 2015 5 th Exercise Recommender Systems (part 2) Emmanouil Tzouridis tzouridis@kma.informatik.tu-darmstadt.de Knowledge Mining & Assessment SVM question When a

More information

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Machine Translation without Words through Substring Alignment

Machine Translation without Words through Substring Alignment Machine Translation without Words through Substring Alignment Graham Neubig 1,2,3, Taro Watanabe 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 2 3 now at 1 Machine Translation Translate a source sentence F

More information

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018 Evaluation Brian Thompson slides by Philipp Koehn 25 September 2018 Evaluation 1 How good is a given machine translation system? Hard problem, since many different translations acceptable semantic equivalence

More information

BMI/CS 576 Fall 2016 Final Exam

BMI/CS 576 Fall 2016 Final Exam BMI/CS 576 all 2016 inal Exam Prof. Colin Dewey Saturday, December 17th, 2016 10:05am-12:05pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module - 5 Lecture - 22 SVM: The Dual Formulation Good morning.

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining 13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.

More information

Dual fitting approximation for Set Cover, and Primal Dual approximation for Set Cover

Dual fitting approximation for Set Cover, and Primal Dual approximation for Set Cover duality 1 Dual fitting approximation for Set Cover, and Primal Dual approximation for Set Cover Guy Kortsarz duality 2 The set cover problem with uniform costs Input: A universe U and a collection of subsets

More information

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization

Spring 2017 CO 250 Course Notes TABLE OF CONTENTS. richardwu.ca. CO 250 Course Notes. Introduction to Optimization Spring 2017 CO 250 Course Notes TABLE OF CONTENTS richardwu.ca CO 250 Course Notes Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4, 2018 Table

More information

AN ABSTRACT OF THE DISSERTATION OF

AN ABSTRACT OF THE DISSERTATION OF AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms

More information

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018 CS 301 Lecture 18 Decidable languages Stephen Checkoway April 2, 2018 1 / 26 Decidable language Recall, a language A is decidable if there is some TM M that 1 recognizes A (i.e., L(M) = A), and 2 halts

More information

N-gram Language Modeling Tutorial

N-gram Language Modeling Tutorial N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures

More information

Input: System of inequalities or equalities over the reals R. Output: Value for variables that minimizes cost function

Input: System of inequalities or equalities over the reals R. Output: Value for variables that minimizes cost function Linear programming Input: System of inequalities or equalities over the reals R A linear cost function Output: Value for variables that minimizes cost function Example: Minimize 6x+4y Subject to 3x + 2y

More information

CS 4100 // artificial intelligence. Recap/midterm review!

CS 4100 // artificial intelligence. Recap/midterm review! CS 4100 // artificial intelligence instructor: byron wallace Recap/midterm review! Attribution: many of these slides are modified versions of those distributed with the UC Berkeley CS188 materials Thanks

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra

LP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra LP Duality: outline I Motivation and definition of a dual LP I Weak duality I Separating hyperplane theorem and theorems of the alternatives I Strong duality and complementary slackness I Using duality

More information

Algorithms for Syntax-Aware Statistical Machine Translation

Algorithms for Syntax-Aware Statistical Machine Translation Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial

More information

Chapter 14 (Partially) Unsupervised Parsing

Chapter 14 (Partially) Unsupervised Parsing Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Hemanta K Maji Raghavendra Udupa U IBM India Research Laboratory, New Delhi {hemantkm, uraghave}@inibmcom Abstract Viterbi

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information