Integrating Morphology in Probabilistic Translation Models

Size: px
Start display at page:

Download "Integrating Morphology in Probabilistic Translation Models"

Transcription

1 Integrating Morphology in Probabilistic Translation Models Chris Dyer joint work with Jon Clark, Alon Lavie, and Noah Smith January 24, 2011 lti

2 das alte Haus the old house mach das do that 2

3 das alte Haus the old house mach das do that guten Tag hello 3

4

5 das alte Haus the old house mach das do that guten Tag hello 5

6 das alte Haus the old house Haus mach das do that guten Tag hello 6

7 das alte Haus the old house Haus mach das do that house guten Tag hello 7

8 das alte Haus the old house das mach das do that guten Tag hello 8

9 das alte Haus the old house das mach das do that the guten Tag hello 9

10 das alte Haus the old house das mach das do that that guten Tag hello 10

11 das alte Haus the old house markant mach das do that guten Tag hello 11

12 das alte Haus the old house markant mach das do that??? guten Tag hello 12

13 So far so good, but... 13

14 das alte Haus the old house alten mach das do that guten Tag hello 14

15 das alte Haus the old house alten mach das do that??? guten Tag hello 15

16 das alte Haus the old house alten mach das do that old? guten Tag hello 16

17 Problems 1. Source language inflectional richness. 17

18 das alte Haus the old house mach das do that guten Tag hello 18

19 das alte Haus the old house mach das do that guten Tag hello 19

20 das alte Haus the old house mach das do that old guten Tag hello 20

21 das alte Haus the old house alte mach das do that old guten Tag hello 21

22 das alte Haus the old house alten? mach das do that old guten Tag hello 22

23 Problems 1. Source language inflectional richness. 2. Target language inflectional richness. 23

24 Kopfschmerzen head ache Bauchschmerzen Rückenschmerzen abdominal pain Rücken back Kopf head 24

25 Kopfschmerzen head ache Bauchschmerzen Rückenschmerzen abdominal pain Rücken??? back Kopf head 25

26 Kopfschmerzen head ache Bauchschmerzen Rückenschmerzen abdominal pain Rücken back pain back Kopf head 26

27 Kopfschmerzen head ache Bauchschmerzen Rückenschmerzen abdominal pain Rücken back ache back Kopf head 27

28 Problems 1. Source language inflectional richness. 2. Target language inflectional richness. 3.Source language sublexical semantic compositionality. 28

29 General Solution MORPHOLOGY 29

30 f Analysis f Translation e Synthesis e 30

31 f f e 31

32 f AlAbAmA f e 31

33 f AlAbAmA f Al# Abama (looks like Al + OOV) e 31

34 f AlAbAmA f Al# Abama (looks like Al + OOV) e the Ibama 31

35 But...Ambiguity! Morphology is an inherently ambiguous problem Competing linguistic theories Lexicalization Morphological analyzers (tools) make mistakes Are minimal linguistic morphemes the optimal morphemes for MT? 32

36 Problems 1. Source language inflectional richness. 2. Target language inflectional richness. 3.Source language sublexical semantic compositionality. 4. Ambiguity everywhere! 33

37 General Solution MORPHOLOGY + PROBABILITY 34

38 Why probability? Probabilistic models formalize uncertainty e.g., words can be formed via a morphological derivation according to a joint distribution: p(word, derivation) The probability of a word is naturally defined as the marginal probability: p(word) = p(word, derivation) derivation Such a model can even be trained observing just words (EM!) 35

39 p(derived) = p(derived, de+rive+d) + p(derived, derived+ ) + p(derived, derive+d) + p(derived, deriv+ed)

40 Outline Introduction: 4 problems Three probabilistic modeling solutions Embracing uncertainty: multi-segmentations for decoding and learning Rich morphology via sparse lexical features Hierarchical Bayesian translation: infinite translation lexicons Conclusion 37

41 Outline Introduction: 4 problems Three probabilistic modeling solutions Embracing uncertainty: multi-segmentations for decoding and learning Rich morphology via sparse lexical features Hierarchical Bayesian translation: infinite translation lexicons Conclusion 38

42 f 39

43 f AlAbAmA 39

44 f AlAbAmA f Al# Abama f AlAbama 39

45 f AlAbAmA f Al# Abama f AlAbama e the Ibama e Alabama 39

46 Two problems We need to decode lots of similar source candidates efficiently Lattice / confusion network decoding Kumar & Byrne (EMNLP, 2005), Bertoldi, Zens, Federico (ICAASP, 2007), Dyer et al. (ACL, 2008), inter alia We need a model to generate a set of candidate sources What are the right candidates? 40

47 Two problems We need to decode lots of similar source candidates efficiently Lattice / confusion network decoding Kumar & Byrne (EMNLP, 2005), Bertoldi, Zens, Federico (ICAASP, 2007), Dyer et al. (ACL, 2008), inter alia We need a model to generate a set of candidate sources What are the right candidates? 41

48 Uncertainty is everywhere Requirement: a probabilistic model p(f f) that transforms f f Possible solution: a discriminatively trained model, e.g., a CRF Required data: example (f,f ) pairs from a linguistic expert or other source 42

49 Uncertainty is everywhere What is the best/right analysis... for MT? AlAntxAbAt (DEF+election+PL) 43

50 Uncertainty is everywhere What is the best/right analysis... for MT? AlAntxAbAt (DEF+election+PL) Some possibilities: Sadat & Habash (NAACL, 2007) AlAntxAb +At Al+ AntxAb +At Al+ AntxAbAt AlAntxAbAt 43

51 Uncertainty is everywhere What is the best/right analysis... for MT? AlAntxAbAt (DEF+election+PL) Some possibilities: Sadat & Habash (NAACL, 2007) AlAntxAb +At Al+ AntxAb +At Al+ AntxAbAt AlAntxAbAt Let s use them all! 43

52 Wait...multiple references?!? Train with EM variant Lattices can encode very large sets of references and support efficient inference Bonus: annotation task is much simpler Don t know whether to label an example with A or B? Dyer (NAACL, 2009), Dyer (thesis, 2010) Label it with both! 44

53 Wait...multiple references?!? Train with EM variant Lattices can encode very large sets of references and support efficient inference Bonus: annotation task is much simpler Don t know whether to label an example with A or B? Dyer (NAACL, 2009), Dyer (thesis, 2010) Label it with both! 45

54 Reference Segmentations f f freitag freitag tonbandaufnahme ton tonband band aufnahme 46

55 good phonotactics! Rücken + schmerzen Rückenschmerzen Rückensc + hmerzen Rü + cke + nschme + rzen bad phonotactics! Phonotactic features! 47

56 Just 20 features Phonotactic probability Lexical features (in vocab, OOV) Lexical frequencies Is high frequency? Segment length

57 Input: tonbandaufnahme 49

58 ! " #$%&'()*!+,-. - #$%/&'()*!+01., #$%/2&'()*!+0". 3 #$%/2%&'()*! #$%/2%5&'()*! #$%/2%52&'()*! #$%/2%527&'()*!+6,. 6 #$%/2%5278&'()*! #$%/2%5278%&'()*!+94. "! #$%/2%5278%2&'()*"+!9. "" #$%/2%5278%2:;<&'()*"+!0. /2%&'()*!+,-. /2%5&'()*!+3!. /2%52&'()*!+36. /2%527&'()*!+10. /2%5278&'()*!+6-. /2%5278%&'()*!+61. /2%5278%2&'()*!+9,. /2%5278%2:;<&'()*"+"9. 2%5&'()*!+91. 2%52&'()*!+40. 2%527&'()*!+10. 2%5278&'()*!+6". 2%5278%&'()*!+61. 2%5278%2&'()*!+9,. 2%5278%2:;<&'()*"+"!. %52&'()*!+0-. %527&'()*"+-1. %5278&'()*"+-". %5278%&'()*"+-1. %5278%2&'()*"+,,. %5278%2:;<&'()*"+4!. 527&'()*! &'()*"+! %&'()*"+! %2&'()*"+! %2:;<&'()*" &'()*! %&'()*!+3,. 278%2&'()*! %2:;<&'()*!+,6. 78%&'()*! %2&'()*"+3,. 78%2:;<&'()*"+39. 8%2&'()*!+19. 8%2:;<&'()*"+,3. %2:;<&'()*!+03. 2:;<&'()*!+1!. :;<&'()*! tonbandaufnahme Input:

59 ! " #$%&'%(')*%'+,-./01!2!!3 4 #$%&'%(./015!2673 ')*%'+,-./015!2893! " #$%&'()*!+,-. - #$%/0%1&'()*!+23., #$%/0%1045%0678&'()*"+!9. /0%1&'()*!+2!. 045%0678&'()*!+,:.! " #$%&'()*!+,-. - #$%/&'()*!+01., #$%/2&'()*!+0". 3 #$%/2%&'()*! #$%/2%5&'()*! #$%/2%52&'()*! #$%/2%527&'()*!+6,. 6 #$%/2%5278&'()*! #$%/2%5278%&'()*!+94. "! #$%/2%5278%2&'()*"+!9. "" #$%/2%5278%2:;<&'()*"+!0. /2%&'()*!+,-. /2%5&'()*!+3!. /2%52&'()*!+36. /2%527&'()*!+10. /2%5278&'()*!+6-. /2%5278%&'()*!+61. /2%5278%2&'()*!+9,. /2%5278%2:;<&'()*"+"9. 2%5&'()*!+91. 2%52&'()*!+40. 2%527&'()*!+10. 2%5278&'()*!+6". 2%5278%&'()*!+61. 2%5278%2&'()*!+9,. 2%5278%2:;<&'()*"+"!. %52&'()*!+0-. %527&'()*"+-1. %5278&'()*"+-". %5278%&'()*"+-1. %5278%2&'()*"+,,. %5278%2:;<&'()*"+4!. 527&'()*! &'()*"+! %&'()*"+! %2&'()*"+! %2:;<&'()*" &'()*! %&'()*!+3,. 278%2&'()*! %2:;<&'()*!+,6. 78%&'()*! %2&'()*"+3,. 78%2:;<&'()*"+39. 8%2&'()*!+19. 8%2:;<&'()*"+,3. %2:;<&'()*!+03. 2:;<&'()*!+1!. :;<&'()*!+66. a=0.4 a= a= tonbandaufnahme Input:

60 Recall Precision

61 Translation Evaluation Input BLEU TER Unsegmented best segmentation Lattice (a=0.2) in police raids found illegal guns, ammunition stahlkern, laserzielfernrohr and a machine gun. in police raids found with illegal guns and ammunition steel core, a laser objective telescope and a machine gun. REF: police raids found illegal guns, steel core ammunition, a laser scope and a machine gun. 52

62 Outline Introduction: 4 problems Three probabilistic modeling solutions Embracing uncertainty: multi-segmentations for decoding and learning Rich morphology via sparse lexical features Hierarchical Bayesian translation: infinite translation lexicons Conclusion 53

63 What do we see when we look inside the IBM models? (or any multinomial-based generative model...like parsing models!) 54

64 What do we see when we look inside the IBM models? (or any multinomial-based generative model...like parsing models!) old altes 0.3 car Wagen 0.2 alte alt alter gammelig Auto PKW gammeliges

65 What do we see when we look inside the IBM models? (or any multinomial-based generative model...like parsing models!) old altes 0.3 car Wagen 0.2 alte alt alter gammelig Auto PKW gammeliges

66 DLVM for Translation Addresses problems: 1. Source language inflectional richness. 2. Target language inflectional richness. How? 1. Replace the locally normalized multinomial parameterization in a translation model globally normalized log-linear model. p(e f) with a 2. Add lexical association features sensitive to sublexical units. C. Dyer, J. Clark, A. Lavie, and N. Smith (in review) 57

67 s s n n a 1 a 2 a 3 a n s s s s a... 1 a 2 a 3 a... n... t 1 t 2 t 3 t n s s s... t 1 t 2 t 3 t n s Fully directed model (Brown et al., 1993; Vogel et al., 1996; Berg-Kirkpatrick et al., 2010) Our model 58

68 s s n n a 1 a 2 a 3 a n s s s s a... 1 a 2 a 3 a... n... t 1 t 2 t 3 t n s s s... t 1 t 2 t 3 t n s Fully directed model (Brown et al., 1993; Vogel et al., 1996; Berg-Kirkpatrick et al., 2010) Our model 59

69 old altes 0.3 car Wagen 0.2 alte alt alter gammelig Auto PKW gammeliges

70 old altes 0.3 car Wagen 0.2 alte alt alter gammelig Auto PKW gammeliges 0.1 New model: score(e,f) = 0.2h1(e,f) + 0.9h2(e,f) + 1.3h1(e,f) +... old alt+ Ω [0,2] gammelig+ Ω [0,2] 61

71 old altes 0.3 car Wagen 0.2 alte alt alter gammelig Auto PKW gammeliges 0.1 New model: score(e,f) = 0.2h1(e,f) + 0.9h2(e,f) + 1.3h1(e,f) +... old alt+ Ω [0,2] gammelig+ Ω [0,2] (~ Incremental vs. realizational) 62

72 Sublexical Features každoroční annual IDkaždoroční_annual PREFIXkaž_ann PREFIXkažd_annu PREFIXkaždo_annua SUFFIXí_l SUFFIXní_al 63

73 Sublexical Features každoroční annually IDkaždoroční_annually PREFIXkaž_ann PREFIXkažd_annu PREFIXkaždo_annua SUFFIXí_y SUFFIXní_ly 64

74 Sublexical Features každoročního annually IDkaždoročního_annually PREFIXkaž_ann PREFIXkažd_annu PREFIXkaždo_annua SUFFIXo_y SUFFIXho_ly 65

75 Sublexical Features každoročního annually IDkaždoročního_annually PREFIXkaž_ann PREFIXkažd_annu PREFIXkaždo_annua }Abstract away from inflectional variation! SUFFIXo_y SUFFIXho_ly 66

76 Evaluation Given a parallel corpus (no supervised alignments!), we can infer The weights in the log-linear translation model The MAP alignment The model is a translation model, but we evaluate it as applied to alignment 67

77 Alignment Evaluation AER e f 24.8 Model 4 f e 33.6 sym e f 21.9 DLVM f e 29.3 sym Czech-English, 3.1M words training, 525 sentences gold alignments. 68

78 Translation Evaluation Alignment BLEU METEOR TER Model σ= σ= σ=0.3 Our model 16.5 σ= σ= σ=0.2 Both 17.4 σ= σ= σ=0.5 Czech-English, WMT 2010 test set, 1 reference 69

79 Outline Introduction: 4 problems Three probabilistic modeling solutions Embracing uncertainty: multi-segmentations for decoding and learning Rich morphology via sparse lexical features Hierarchical Bayesian translation: infinite translation lexicons Conclusion 70

80 Bayesian Translation Addresses problems: 2. Target language inflectional richness. How? 1. Replace multinomials in a lexical translation model with a process that generates target language lexical items by combining stems and suffixes. 2. Fully inflected forms can be generated, but a hierarchical prior backs off to a component-wise generation. 71

81 Chinese Restaurant Process a b a c x... 72

82 Chinese Restaurant Process New customer a b a c x... 73

83 Chinese Restaurant Process a b a c x αp 0 (x) 7+α 7+α 7+α 7+α 7+α 74

84 Chinese Restaurant Process a b a c x αp 0 (x) 7+α 7+α 7+α 7+α 7+α α P 0 (x) Concentration parameter Base distribution 75

85 old altes 0.3 car Wagen 0.2 alte alt alter gammelig Auto PKW gammeliges

86 old altes 0.3 car Wagen 0.2 alte alt alter gammelig Auto PKW gammeliges 0.1 New model: suffixes +es + old alt+e +es alt +en alt+es +e +er alt+ + 77

87 Modeling assumptions Observed words are formed by an unobserved process that concatenates a stem α and a suffix β, yielding αβ A source word should have only a few translations αβ translate into only a few stems α The suffix β occurs many times, with many different stems β may be null β will have a maximum length of r Once a word has been translated into some inflected form, that inflected form, its stem, and its suffix should be more likely ( rich get richer ) 78

88 f Translation e Synthesis e X Z Observed during training Latent variable 79

89 f Translation e + Synthesis e X Z Observed during training Latent variable 80

90 Task: Translate the word old 81

91 Task: Translate the word old old alt+e alt+ gammelig+ inflected old 82

92 Task: Translate the word old alt old alt+e alt + alt+ gammelig +e gammelig+ inflected old stem old suffix old 83

93 Task: Translate the word old alt + old alt+e alt + alt+ gammelig +e gammelig+ inflected old stem old? 84

94 +en +e +s alt + + +er old alt+e alt + alt+ gammelig +e gammelig+ inflected old stem old? 85

95 +en +e +s alt + en + +er old alt+e alt + alt+ gammelig +e gammelig+ inflected old stem old +en 86

96 Evaluation Given a parallel corpus, we can infer The MAP alignment The MAP segmentation of each target word into <stem+suffix> 87

97 Alignment Evaluation AER Model 1 - EM Model 1 - HPYP Model 1 - EM Model 1 - HPYP f e 43.3 f e 37.5 e f 38.4 e f 36.6 English-French, 115k words, 447 sentences gold alignments. 88

98 Frequent suffixes Suffix Count + 20,837 +s 334 +d 217 +e 156 +n 156 +y 130 +ed 121 +ing

99 Assessment Breaking the lexical independence assumption is computationally costly The search space is much, much larger! Dealing only with inflectional morphology simplifies the problems Sparse priors are crucial for avoiding degenerate solutions 90

100 In conclusion... 91

101 Why don t we have integrated morphology? 92

102 Why don t we have integrated morphology? Because we spend all our time working on English, which doesn t have much morphology! 93

103 Why don t we have integrated morphology? Translation with words is already hard: an n-word sentence has n! permutations But, if you re looking at a sentence with m letters there are m! permutations Search is... considerably harder m > n m! >>>>> n! Modeling is harder too must also support all these permutations! 94

104 Take away messages Morphology matters for MT Probabilistic models are a great fit for the uncertainty involved Breaking the lexical independence assumption is hard 95

105 Thank you! Toda! $kraf!

106 n Poisson(λ) a i Uniform(1/ f ) e i f ai T fai T fai a, b, M PYP(a, b, M( f ai )) M(e = α + β f) =G f (α) H f (β) G f a, b, f, P 0 PYP(a, b, P 0 ( )) H f a, b, f, H 0 PYP(a, b, H 0 ( )) H 0 a, b, Q 0 PYP(a, b, Q 0 ( )) P 0 (α; p) = Q 0 (β; r) = p β (1 p) V β 1 ( V r) β 97

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Word Alignment. Chris Dyer, Carnegie Mellon University

Word Alignment. Chris Dyer, Carnegie Mellon University Word Alignment Chris Dyer, Carnegie Mellon University John ate an apple John hat einen Apfel gegessen John ate an apple John hat einen Apfel gegessen Outline Modeling translation with probabilistic models

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

Word Alignment III: Fertility Models & CRFs. February 3, 2015

Word Alignment III: Fertility Models & CRFs. February 3, 2015 Word Alignment III: Fertility Models & CRFs February 3, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } {

More information

Lexical Translation Models 1I

Lexical Translation Models 1I Lexical Translation Models 1I Machine Translation Lecture 5 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn Last Time... X p( Translation)= p(, Translation)

More information

Lexical Translation Models 1I. January 27, 2015

Lexical Translation Models 1I. January 27, 2015 Lexical Translation Models 1I January 27, 2015 Last Time... X p( Translation)= p(, Translation) Alignment = X Alignment Alignment p( p( Alignment) Translation Alignment) {z } {z } X z } { z } { p(e f,m)=

More information

Prenominal Modifier Ordering via MSA. Alignment

Prenominal Modifier Ordering via MSA. Alignment Introduction Prenominal Modifier Ordering via Multiple Sequence Alignment Aaron Dunlop Margaret Mitchell 2 Brian Roark Oregon Health & Science University Portland, OR 2 University of Aberdeen Aberdeen,

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Natural Language Processing (CSEP 517): Machine Translation

Natural Language Processing (CSEP 517): Machine Translation Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,

More information

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview

More information

Driving Semantic Parsing from the World s Response

Driving Semantic Parsing from the World s Response Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013 Machine Translation CL1: Jordan Boyd-Graber University of Maryland November 11, 2013 Adapted from material by Philipp Koehn CL1: Jordan Boyd-Graber (UMD) Machine Translation November 11, 2013 1 / 48 Roadmap

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011

Discrimina)ve Latent Variable Models. SPFLODD November 15, 2011 Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs

More information

EM with Features. Nov. 19, Sunday, November 24, 13

EM with Features. Nov. 19, Sunday, November 24, 13 EM with Features Nov. 19, 2013 Word Alignment das Haus ein Buch das Buch the house a book the book Lexical Translation Goal: a model p(e f,m) where e and f are complete English and Foreign sentences Lexical

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,

More information

Learning Features from Co-occurrences: A Theoretical Analysis

Learning Features from Co-occurrences: A Theoretical Analysis Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences

More information

Hierarchical Bayesian Nonparametrics

Hierarchical Bayesian Nonparametrics Hierarchical Bayesian Nonparametrics Micha Elsner April 11, 2013 2 For next time We ll tackle a paper: Green, de Marneffe, Bauer and Manning: Multiword Expression Identification with Tree Substitution

More information

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features

More information

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21

More information

Phrase-Based Statistical Machine Translation with Pivot Languages

Phrase-Based Statistical Machine Translation with Pivot Languages Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008

More information

Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms

Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms 1 1 1 2 2 30% 70% 70% NTCIR-7 13% 90% 1,000 Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms Itsuki Toyota 1 Yusuke Takahashi 1 Kensaku

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

Out of GIZA Efficient Word Alignment Models for SMT

Out of GIZA Efficient Word Alignment Models for SMT Out of GIZA Efficient Word Alignment Models for SMT Yanjun Ma National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Series March 4, 2009 Y. Ma (DCU) Out of Giza

More information

CRF Word Alignment & Noisy Channel Translation

CRF Word Alignment & Noisy Channel Translation CRF Word Alignment & Noisy Channel Translation January 31, 2013 Last Time... X p( Translation)= p(, Translation) Alignment Alignment Last Time... X p( Translation)= p(, Translation) Alignment X Alignment

More information

Statistical NLP Spring Corpus-Based MT

Statistical NLP Spring Corpus-Based MT Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein UC Berkeley Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it

More information

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1

Corpus-Based MT. Statistical NLP Spring Unsupervised Word Alignment. Alignment Error Rate. IBM Models 1/2. Problems with Model 1 Statistical NLP Spring 2010 Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana I will do it tomorrow Hasta pronto See you soon Hasta pronto See

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Maximum Entropy Models I Welcome back for the 3rd module

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

Phrase Table Pruning via Submodular Function Maximization

Phrase Table Pruning via Submodular Function Maximization Phrase Table Pruning via Submodular Function Maximization Masaaki Nishino and Jun Suzuki and Masaaki Nagata NTT Communication Science Laboratories, NTT Corporation 2-4 Hikaridai, Seika-cho, Soraku-gun,

More information

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty

Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Outlines Automatic Speech Recognition and Statistical Machine Translation under Uncertainty Lambert Mathias Advisor: Prof. William Byrne Thesis Committee: Prof. Gerard Meyer, Prof. Trac Tran and Prof.

More information

Multi-Task Word Alignment Triangulation for Low-Resource Languages

Multi-Task Word Alignment Triangulation for Low-Resource Languages Multi-Task Word Alignment Triangulation for Low-Resource Languages Tomer Levinboim and David Chiang Department of Computer Science and Engineering University of Notre Dame {levinboim.1,dchiang}@nd.edu

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Bringing machine learning & compositional semantics together: central concepts

Bringing machine learning & compositional semantics together: central concepts Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

A Discriminative Model for Semantics-to-String Translation

A Discriminative Model for Semantics-to-String Translation A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Natural Language Processing (CSEP 517): Text Classification

Natural Language Processing (CSEP 517): Text Classification Natural Language Processing (CSEP 517): Text Classification Noah Smith c 2017 University of Washington nasmith@cs.washington.edu April 10, 2017 1 / 71 To-Do List Online quiz: due Sunday Read: Jurafsky

More information

Discriminative Training

Discriminative Training Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder

More information

Information Extraction from Text

Information Extraction from Text Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information

More information

Word Alignment for Statistical Machine Translation Using Hidden Markov Models

Word Alignment for Statistical Machine Translation Using Hidden Markov Models Word Alignment for Statistical Machine Translation Using Hidden Markov Models by Anahita Mansouri Bigvand A Depth Report Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of

More information

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples

Machine Translation: Examples. Statistical NLP Spring Levels of Transfer. Corpus-Based MT. World-Level MT: Examples Statistical NLP Spring 2009 Machine Translation: Examples Lecture 17: Word Alignment Dan Klein UC Berkeley Corpus-Based MT Levels of Transfer Modeling correspondences between languages Sentence-aligned

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado CSCI 5822 Probabilistic Model of Human and Machine Learning Mike Mozer University of Colorado Topics Language modeling Hierarchical processes Pitman-Yor processes Based on work of Teh (2006), A hierarchical

More information

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 7: Testing (Feb 12, 2019) David Bamman, UC Berkeley Significance in NLP You develop a new method for text classification; is it better than what comes

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Sequences and Information

Sequences and Information Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols

More information

More Smoothing, Tuning, and Evaluation

More Smoothing, Tuning, and Evaluation More Smoothing, Tuning, and Evaluation Nathan Schneider (slides adapted from Henry Thompson, Alex Lascarides, Chris Dyer, Noah Smith, et al.) ENLP 21 September 2016 1 Review: 2 Naïve Bayes Classifier w

More information

Conditional Random Fields

Conditional Random Fields Conditional Random Fields Micha Elsner February 14, 2013 2 Sums of logs Issue: computing α forward probabilities can undeflow Normally we d fix this using logs But α requires a sum of probabilities Not

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

Ecient Higher-Order CRFs for Morphological Tagging

Ecient Higher-Order CRFs for Morphological Tagging Ecient Higher-Order CRFs for Morphological Tagging Thomas Müller, Helmut Schmid and Hinrich Schütze Center for Information and Language Processing University of Munich Outline 1 Contributions 2 Motivation

More information

Variational Decoding for Statistical Machine Translation

Variational Decoding for Statistical Machine Translation Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev Khudanpur Center for Language and Speech Processing Computer Science Department Johns Hopkins University 1

More information

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Noah Smith c 2017 University of Washington nasmith@cs.washington.edu May 22, 2017 1 / 30 To-Do List Online

More information

Machine Translation Evaluation

Machine Translation Evaluation Machine Translation Evaluation Sara Stymne 2017-03-29 Partly based on Philipp Koehn s slides for chapter 8 Why Evaluation? How good is a given machine translation system? Which one is the best system for

More information

Improved Decipherment of Homophonic Ciphers

Improved Decipherment of Homophonic Ciphers Improved Decipherment of Homophonic Ciphers Malte Nuhn and Julian Schamper and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department, RWTH Aachen University, Aachen,

More information

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source

More information

Bayesian Tools for Natural Language Learning. Yee Whye Teh Gatsby Computational Neuroscience Unit UCL

Bayesian Tools for Natural Language Learning. Yee Whye Teh Gatsby Computational Neuroscience Unit UCL Bayesian Tools for Natural Language Learning Yee Whye Teh Gatsby Computational Neuroscience Unit UCL Bayesian Learning of Probabilistic Models Potential outcomes/observations X. Unobserved latent variables

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Fast Collocation-Based Bayesian HMM Word Alignment

Fast Collocation-Based Bayesian HMM Word Alignment Fast Collocation-Based Bayesian HMM Word Alignment Philip Schulz ILLC University of Amsterdam P.Schulz@uva.nl Wilker Aziz ILLC University of Amsterdam W.Aziz@uva.nl Abstract We present a new Bayesian HMM

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

Machine Translation without Words through Substring Alignment

Machine Translation without Words through Substring Alignment Machine Translation without Words through Substring Alignment Graham Neubig 1,2,3, Taro Watanabe 2, Shinsuke Mori 1, Tatsuya Kawahara 1 1 2 3 now at 1 Machine Translation Translate a source sentence F

More information

Sampling Alignment Structure under a Bayesian Translation Model

Sampling Alignment Structure under a Bayesian Translation Model Sampling Alignment Structure under a Bayesian Translation Model John DeNero, Alexandre Bouchard-Côté and Dan Klein Computer Science Department University of California, Berkeley {denero, bouchard, klein}@cs.berkeley.edu

More information

statistical machine translation

statistical machine translation statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Unsupervised Rank Aggregation with Distance-Based Models

Unsupervised Rank Aggregation with Distance-Based Models Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University Collaborators: Alex Klementiev (Johns Hopkins University) Ivan Titov (Saarland University) Dan Roth (University of

More information

Theory of Alignment Generators and Applications to Statistical Machine Translation

Theory of Alignment Generators and Applications to Statistical Machine Translation Theory of Alignment Generators and Applications to Statistical Machine Translation Raghavendra Udupa U Hemanta K Mai IBM India Research Laboratory, New Delhi {uraghave, hemantkm}@inibmcom Abstract Viterbi

More information

Multiple System Combination. Jinhua Du CNGL July 23, 2008

Multiple System Combination. Jinhua Du CNGL July 23, 2008 Multiple System Combination Jinhua Du CNGL July 23, 2008 Outline Introduction Motivation Current Achievements Combination Strategies Key Techniques System Combination Framework in IA Large-Scale Experiments

More information

Word Alignment via Submodular Maximization over Matroids

Word Alignment via Submodular Maximization over Matroids Word Alignment via Submodular Maximization over Matroids Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 21, 2011 Lin and Bilmes Submodular Word Alignment June

More information

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data

More information

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING Outline Some Sample NLP Task [Noah Smith] Structured Prediction For NLP Structured Prediction Methods Conditional Random Fields Structured Perceptron Discussion

More information

Graphical models for part of speech tagging

Graphical models for part of speech tagging Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional

More information

Structure and Complexity of Grammar-Based Machine Translation

Structure and Complexity of Grammar-Based Machine Translation Structure and of Grammar-Based Machine Translation University of Padua, Italy New York, June 9th, 2006 1 2 Synchronous context-free grammars Definitions Computational problems 3 problem SCFG projection

More information

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems Wolfgang Macherey Google Inc. 1600 Amphitheatre Parkway Mountain View, CA 94043, USA wmach@google.com Franz

More information

CS230: Lecture 10 Sequence models II

CS230: Lecture 10 Sequence models II CS23: Lecture 1 Sequence models II Today s outline We will learn how to: - Automatically score an NLP model I. BLEU score - Improve Machine II. Beam Search Translation results with Beam search III. Speech

More information

Neural Hidden Markov Model for Machine Translation

Neural Hidden Markov Model for Machine Translation Neural Hidden Markov Model for Machine Translation Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan and Hermann Ney {surname}@i6.informatik.rwth-aachen.de July 17th, 2018 Human Language Technology and

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science

More information

Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)

Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Nathan Schneider (some slides borrowed from Chris Dyer) ENLP 12 February 2018 23 Outline Words, probabilities Features,

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information

Word Alignment by Thresholded Two-Dimensional Normalization

Word Alignment by Thresholded Two-Dimensional Normalization Word Alignment by Thresholded Two-Dimensional Normalization Hamidreza Kobdani, Alexander Fraser, Hinrich Schütze Institute for Natural Language Processing University of Stuttgart Germany {kobdani,fraser}@ims.uni-stuttgart.de

More information