Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Size: px
Start display at page:

Download "Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC"

Transcription

1 Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

2 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

3 Why Do We Need Both the Forward and Backward Algorithms? Compute posteriors α(i, s) * β(i, s) = total probability of paths through state s at step i p z i = s w 1,, w N ) = α i, s β(i, s) α(n + 1, END) α(i, s) * p(s s) * p(obs at i+1 s ) * β(i+1, s ) = total probability of paths through the s s arc (at time i) p z i = s, z i+1 = s w 1,, w N ) = α i, s p s s p obs i+1 s β(i + 1, s ) α(n + 1, END)

4 estimated counts EM for HMMs 0. Assume some value for your parameters Two step, iterative algorithm 1. E-step: count under uncertainty, assuming these parameters p z i = s w 1,, w N ) = α i, s β(i, s) α(n + 1, END) p obs (w s) p trans (s s) p z i = s, z i+1 = s w 1,, w N ) = α i, s p s s p obs i+1 s β(i + 1, s ) α(n + 1, END) 2. M-step: maximize log-likelihood, assuming these uncertain counts

5 α = computeforwards() β = computebackwards() L = α[n+1][e N D ] for(i = N; i 0; --i) { for(next = 0; next < K*; ++next) { EM For HMMs (Baum-Welch Algorithm) c obs (obs i+1 next) += α[i+1][next]* β[i+1][next]/l for(state = 0; state < K*; ++state) { u = p obs (obs i+1 next) * p trans (next state) c trans (next state) += α[i][state] * u * β[i+1][next]/l } } } update p obs, p trans using c obs, c trans

6 Semi-Supervised Learning labeled data: human annotated relatively small/few examples???????????????????????? EM unlabeled data: raw; not annotated plentiful

7 Semi-Supervised Parameter Estimation for HMMs Transition Counts N V end start N V Emission Counts w 1 w 2 W 3 w 4 N V Mixed Transition Counts N V end start N V Mixed Emission Counts w 1 w 2 W 3 w 4 N V Expected Transition Counts N V end start N V Expected Emission Counts w 1 w 2 W 3 w 4 N V

8 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

9 Warren Weaver s Note When I look at an article in Russian, I say This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. (Warren Weaver, 1947) Slides courtesy Rebecca Knowles

10 Noisy Channel Model text language language speak w or d язы к Decode language Rerank text written in (clean) English observed Russian (noisy) text translation/ decode model w or d (clean) language model speak English Slides courtesy Rebecca Knowles

11 Noisy Channel Model text language language speak w or d язы к Decode language Rerank text written in (clean) English observed Russian (noisy) text translation/ decode model w or d (clean) language model speak English Slides courtesy Rebecca Knowles

12 Noisy Channel Model text language language speak w or d язы к Decode language Rerank text written in (clean) English observed Russian (noisy) text translation/ decode model w or d (clean) language model speak English Slides courtesy Rebecca Knowles

13 Translation Translate French (observed) into English: Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

14 Translation Translate French (observed) into English: Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

15 Translation Translate French (observed) into English: Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

16 Alignment Le chat est sur la chaise. The cat is on the chair.? Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

17 Parallel Texts Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world, Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people, Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law, Whereas it is essential to promote the development of friendly relations between nations, Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj. Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli. Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan. Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan. Slides courtesy Rebecca Knowles

18 Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world, Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people, Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law, Whereas it is essential to promote the development of friendly relations between nations, Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj. Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli. Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan. Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan. Sentence align Clean corpus Tokenize Handle case Word segmentation (morphological, BPE, etc.) Language-specific preprocessing (example: pre-reordering)... Slides courtesy Rebecca Knowles

19 Alignments If we had word-aligned text, we could easily estimate P(f e). But we don t usually have word alignments, and they are expensive to produce by hand If we had P(f e) we could produce alignments automatically. Slides courtesy Rebecca Knowles

20 IBM Model 1 (1993) Lexical Translation Model Word Alignment Model The simplest of the original IBM models For all IBM models, see the original paper (Brown et al, 1993): Slides courtesy Rebecca Knowles

21 Simplified IBM 1 We ll work through an example with a simplified version of IBM Model 1 Figures and examples are drawn from A Statistical MT Tutorial Workbook, Section 27, (Knight, 1999) Simplifying assumption: each source word must translate to exactly one target word and vice versa Slides courtesy Rebecca Knowles

22 IBM Model 1 (1993) f: vector of French words (visualization of alignment) e: vector of English words Le chat est sur la chaise verte The cat is on the green chair a: vector of alignment indices Slides courtesy Rebecca Knowles

23 IBM Model 1 (1993) f: vector of French words (visualization of alignment) e: vector of English words Le chat est sur la chaise verte The cat is on the green chair a: vector of alignment indices t(f j e i ) : translation probability of the word f j given the word e i Slides courtesy Rebecca Knowles

24 Model and Parameters Want: P(f e) But don t know how to train this directly Solution: Use P(a, f e), where a is an alignment Remember: Slides courtesy Rebecca Knowles

25 Model and Parameters: Intuition Translation prob.: Example: Interpretation: How probable is it that we see f j given e i Slides courtesy Rebecca Knowles

26 Model and Parameters: Intuition Alignment/translation prob.: Example (visual representation of a): le chat le chat P( the cat ) < P( the cat ) the cat the cat Interpretation: How probable are the alignment a and the translation f (given e) Slides courtesy Rebecca Knowles

27 Model and Parameters: Intuition Alignment prob.: Example: P( le chat, the cat ) < P( le chat, the cat ) Interpretation: How probable is alignment a (given e and f) Slides courtesy Rebecca Knowles

28 Model and Parameters How to compute: Slides courtesy Rebecca Knowles

29 Parameters For IBM model 1, we can compute all parameters given translation parameters: How many of these are there? Slides courtesy Rebecca Knowles

30 Parameters For IBM model 1, we can compute all parameters given translation parameters: How many of these are there? French vocabulary x English vocabulary Slides courtesy Rebecca Knowles

31 Data Two sentence pairs: b c b English x y y French Slides courtesy Rebecca Knowles

32 All Possible Alignments (French: x, y) x y x y (English: b, c) b c b c Remember: simplifying assumption that each word must be aligned exactly once y b Slides courtesy Rebecca Knowles

33 Expectation Maximization (EM) 0. Assume some value for and compute other parameter values Two step, iterative algorithm 1. E-step: count alignments and translations under uncertainty, assuming these parameters le chat P( the cat ) 2. M-step: maximize log-likelihood (update parameters), using uncertain counts le chat P( the cat ) estimated counts Slides courtesy Rebecca Knowles

34 Review of IBM Model 1 & EM Iteratively learned an alignment/translation model from sentence-aligned text (without gold standard alignments) Model can now be used for alignment and/or word-level translation We explored a simplified version of this; IBM Model 1 allows more types of alignments Slides courtesy Rebecca Knowles

35 Why is Model 1 insufficient? Why won t this produce great translations? Indifferent to order (language model may help?) Translates one word at a time Translates each word in isolation... Slides courtesy Rebecca Knowles

36 Uses for Alignments Component of machine translation systems Produce a translation lexicon automatically Cross-lingual projection/extraction of information Supervision for training other models (for example, neural MT systems) Slides courtesy Rebecca Knowles

37 Evaluating Machine Translation Human evaluations: Test set (source, human reference translations, MT output) Humans judge the quality of MT output (in one of several possible ways) Koehn (2017), Slides courtesy Rebecca Knowles

38 Evaluating Machine Translation Automatic evaluations: Test set (source, human reference translations, MT output) Aim to mimic (correlate with) human evaluations Many metrics: TER (Translation Error/Edit Rate) HTER (Human-Targeted Translation Edit Rate) BLEU (Bilingual Evaluation Understudy) METEOR (Metric for Evaluation of Translation with Explicit Ordering) Slides courtesy Rebecca Knowles

39 Machine Translation Alignment Now Explicitly with fancier IBM models Implicitly/learned jointly with attention in recurrent neural networks (RNNs)

40 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

41 Recall: N-gram to Maxent to Neural Language Models given some context w i-3 w i-2 w i-1 compute beliefs about what is likely p w i w i 3, w i 2, w i 1 ) count(w i 3, w i 2, w i 1, w i ) predict the next word w i

42 Recall: N-gram to Maxent to Neural Language Models given some context w i-3 w i-2 w i-1 compute beliefs about what is likely p w i w i 3, w i 2, w i 1 ) = softmax(θ f(w i 3, w i 2, w i 1, w i )) predict the next word w i

43 Hidden Markov Model Representation p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N emission probabilities/parameters = p w i z i i p z i z i 1 transition probabilities/parameters z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

44 A Different Model s Representation z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

45 A Different Model s Representation p z 1, z 2,, z N w 1, w 2,, w N = p z 1 z 0, w 1 p z N z N 1, w N = p z i z i 1, w i i z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

46 A Different Model s Representation p z 1, z 2,, z N w 1, w 2,, w N = p z 1 z 0, w 1 p z N z N 1, w N = p z i z i 1, w i i p z i z i 1, w i ) exp( θ T f w i, z i 1, z i ) z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

47 Maximum Entropy Markov Model (MEMM) A Different Model s Representation p z 1, z 2,, z N w 1, w 2,, w N = p z 1 z 0, w 1 p z N z N 1, w N = p z i z i 1, w i i p z i z i 1, w i ) exp( θ T f w i, z i 1, z i ) z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

48 MEMMs z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 Discriminative: don t care about generating observed sequence at all Maxent: use features Problem: Label-Bias problem

49 Label-Bias Problem z i w i

50 Label-Bias Problem 1 z i incoming mass must sum to 1 w i

51 Label-Bias Problem 1 1 z i incoming mass must sum to 1 outgoing mass must sum to 1 w i

52 Label-Bias Problem 1 1 z i incoming mass must sum to 1 outgoing mass must sum to 1 w i observe, but do not generate (explain) the observation Take-aways: the model can learn to ignore observations the model can get itself stuck on bad paths

53 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

54 (Linear Chain) Conditional Random Fields z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 Discriminative: don t care about generating observed sequence at all Condition on the entire observed word sequence w 1 w N Maxent: use features Solves the label-bias problem

55 (Linear Chain) Conditional Random Fields z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 p z 1,, z N w 1,, w N ) exp( θ T f z i 1, z i, w 1,, w N ) i

56 (Linear Chain) Conditional Random Fields z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 p z 1,, z N w 1,, w N ) i exp( θ T f z i 1, z i, w 1,, w N ) condition on entire sequence

57 CRF Tutorial, Fig 1.2, Sutton & McCallum (2012) Conditional vs. Sequence

58 CRF Tutorial, Fig 1.2, Sutton & McCallum (2012) Conditional vs. Sequence

59 CRF Tutorial, Fig 1.2, Sutton & McCallum (2012) Conditional vs. Sequence

60 CRF Tutorial, Fig 1.2, Sutton & McCallum (2012) Conditional vs. Sequence

61 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

62 Recall: N-gram to Maxent to Neural given some context Language Models w i-3 w i-2 create/use distributed representations e w e i-3 e i-2 e i-1 w i-1 combine these representations θ wi matrix-vector product C = f compute beliefs about what is likely p w i w i 3, w i 2, w i 1 ) = softmax(θ wi f(w i 3, w i 2, w i 1 )) predict the next word w i

63 A More Typical View of Recurrent Neural Language Modeling w i-2 w i-1 w i w i+1 h i-3 h i-2 h i-1 h i w i-3 w i-2 w i-1 w i

64 A More Typical View of Recurrent Neural Language Modeling w i-2 w i-1 w i w i+1 h i-3 h i-2 h i-1 h i w i-3 w i-2 w i-1 w i observe these words one at a time

65 A More Typical View of Recurrent Neural Language Modeling predict the next word w i-2 w i-1 w i w i+1 h i-3 h i-2 h i-1 h i w i-3 w i-2 w i-1 w i observe these words one at a time

66 A More Typical View of Recurrent Neural Language Modeling predict the next word w i-2 w i-1 w i w i+1 h i-3 h i-2 h i-1 h i from these hidden states w i-3 w i-2 w i-1 w i observe these words one at a time

67 A More Typical View of Recurrent Neural Language Modeling predict the next word cell w i-2 w i-1 w i w i+1 h i-3 h i-2 h i-1 h i from these hidden states w i-3 w i-2 w i-1 w i observe these words one at a time

68 A Recurrent Neural Network Cell w i w i+1 h i-1 h i w i-1 w i

69 A Recurrent Neural Network Cell w i w i+1 h i-1 W h i W w i-1 w i

70 A Recurrent Neural Network Cell w i w i+1 h i-1 W h i W U encoding U w i-1 w i

71 A Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i

72 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i )

73 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) σ x = exp( x)

74 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) σ x = exp( x)

75 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) σ x = exp( x)

76 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) w i+1 = softmax(sh i ) σ x = exp( x)

77 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) must learn matrices U, S, W w i+1 = softmax(sh i )

78 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) w i+1 = softmax(sh i ) must learn matrices U, S, W suggested solution: gradient descent on prediction ability

79 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) w i+1 = softmax(sh i ) must learn matrices U, S, W suggested solution: gradient descent on prediction ability problem: they re tied across inputs/timesteps

80 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) w i+1 = softmax(sh i ) must learn matrices U, S, W suggested solution: gradient descent on prediction ability problem: they re tied across inputs/timesteps good news for you: many toolkits do this automatically

81 Why Is Training RNNs Hard? Conceptually, it can get strange But really getting the gradient just requires many applications of the chain rule for derivatives

82 Why Is Training RNNs Hard? Conceptually, it can get strange But really getting the gradient just requires many applications of the chain rule for derivatives Vanishing gradients Multiply the same matrices at each timestep multiply many matrices in the gradients

83 Why Is Training RNNs Hard? Conceptually, it can get strange But really getting the gradient just requires many applications of the chain rule for derivatives Vanishing gradients Multiply the same matrices at each timestep multiply many matrices in the gradients One solution: clip the gradients to a max value

84 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

85 Natural Language Processing from keras import * from torch import *

86 Pick Your Toolkit PyTorch Deeplearning4j TensorFlow DyNet Caffe Keras MxNet Gluon CNTK Comparisons: (older )

87 Defining A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 h i-2 h i-1 h i w i-2 w i-1 w i

88 Defining A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 h i-2 h i-1 h i w i-2 w i-1 w i

89 Defining A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 h i-2 h i-1 h i w i-2 w i-1 w i

90 Defining A Simple RNN in Python (Modified Very Slightly)

91 Defining A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 h i-2 h i-1 h i w i-2 w i-1 w i encode

92 Defining A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 h i-2 h i-1 h i w i-2 w i-1 w i decode

93 Training A Simple RNN in Python (Modified Very Slightly)

94 Training A Simple RNN in Python (Modified Very Slightly) Negative loglikelihood

95 Training A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 Negative loglikelihood h i-2 h i-1 h i w i-2 w i-1 w i get predictions

96 Training A Simple RNN in Python (Modified Very Slightly) Negative loglikelihood get predictions eval predictions

97 Training A Simple RNN in Python (Modified Very Slightly) Negative loglikelihood get predictions eval predictions compute gradient

98 Training A Simple RNN in Python (Modified Very Slightly) Negative loglikelihood get predictions eval predictions compute gradient perform SGD

99 Another Solution: LSTMs/GRUs LSTM: Long Short-Term Memory (Hochreiter & Schmidhuber, 1997) GRU: Gated Recurrent Unit (Cho et al., 2014) Basic Ideas: learn to forget forget line representation line

100 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

Directed Probabilistic Graphical Models CMSC 678 UMBC

Directed Probabilistic Graphical Models CMSC 678 UMBC Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step

More information

Naïve Bayes, Maxent and Neural Models

Naïve Bayes, Maxent and Neural Models Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

CRF Word Alignment & Noisy Channel Translation

CRF Word Alignment & Noisy Channel Translation CRF Word Alignment & Noisy Channel Translation January 31, 2013 Last Time... X p( Translation)= p(, Translation) Alignment Alignment Last Time... X p( Translation)= p(, Translation) Alignment X Alignment

More information

Neural Networks Language Models

Neural Networks Language Models Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Conditional Random Fields

Conditional Random Fields Conditional Random Fields Micha Elsner February 14, 2013 2 Sums of logs Issue: computing α forward probabilities can undeflow Normally we d fix this using logs But α requires a sum of probabilities Not

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

CSC321 Lecture 10 Training RNNs

CSC321 Lecture 10 Training RNNs CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Graphical models for part of speech tagging

Graphical models for part of speech tagging Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

lecture 6: modeling sequences (final part)

lecture 6: modeling sequences (final part) Natural Language Processing 1 lecture 6: modeling sequences (final part) Ivan Titov Institute for Logic, Language and Computation Outline After a recap: } Few more words about unsupervised estimation of

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

Introduction to RNNs!

Introduction to RNNs! Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN

More information

Log-Linear Models, MEMMs, and CRFs

Log-Linear Models, MEMMs, and CRFs Log-Linear Models, MEMMs, and CRFs Michael Collins 1 Notation Throughout this note I ll use underline to denote vectors. For example, w R d will be a vector with components w 1, w 2,... w d. We use expx

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9 CSCI 5832 Natural Language Processing Jim Martin Lecture 9 1 Today 2/19 Review HMMs for POS tagging Entropy intuition Statistical Sequence classifiers HMMs MaxEnt MEMMs 2 Statistical Sequence Classification

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

CSC321 Lecture 15: Recurrent Neural Networks

CSC321 Lecture 15: Recurrent Neural Networks CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview

More information

Recurrent Neural Networks. COMP-550 Oct 5, 2017

Recurrent Neural Networks. COMP-550 Oct 5, 2017 Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks Recurrent neural networks 2 Classification Review y = f( x) output label

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology Basic Text Analysis Hidden Markov Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakimnivre@lingfiluuse Basic Text Analysis 1(33) Hidden Markov Models Markov models are

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm + September13, 2016 Professor Meteer CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm Thanks to Dan Jurafsky for these slides + ASR components n Feature

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

CSC321 Lecture 7 Neural language models

CSC321 Lecture 7 Neural language models CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018. Recap: HMM ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2018 Elements of HMM: Set of states (tags) Output alphabet (word types) Start state (beginning of sentence) State transition probabilities

More information

Conditional Random Fields for Sequential Supervised Learning

Conditional Random Fields for Sequential Supervised Learning Conditional Random Fields for Sequential Supervised Learning Thomas G. Dietterich Adam Ashenfelter Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.eecs.oregonstate.edu/~tgd

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction Hidden Markov Models (HMMs) for Information Extraction Daniel S. Weld CSE 454 Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) standard sequence model in genomics, speech, NLP, What

More information

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II)

Speech and Language Processing. Chapter 9 of SLP Automatic Speech Recognition (II) Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II) Outline for ASR ASR Architecture The Noisy Channel Model Five easy pieces of an ASR system 1) Language Model 2) Lexicon/Pronunciation

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

1. Markov models. 1.1 Markov-chain

1. Markov models. 1.1 Markov-chain 1. Markov models 1.1 Markov-chain Let X be a random variable X = (X 1,..., X t ) taking values in some set S = {s 1,..., s N }. The sequence is Markov chain if it has the following properties: 1. Limited

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace

A.I. in health informatics lecture 8 structured learning. kevin small & byron wallace A.I. in health informatics lecture 8 structured learning kevin small & byron wallace today models for structured learning: HMMs and CRFs structured learning is particularly useful in biomedical applications:

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Recurrent Neural Networks. deeplearning.ai. Why sequence models? Recurrent Neural Networks deeplearning.ai Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There

More information

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer + Ngram Review October13, 2017 Professor Meteer CS 136 Lecture 10 Language Modeling Thanks to Dan Jurafsky for these slides + ASR components n Feature Extraction, MFCCs, start of Acoustic n HMMs, the Forward

More information

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data Statistical Machine Learning from Data Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne (EPFL),

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Conditional Random Fields: An Introduction

Conditional Random Fields: An Introduction University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science 2-24-2004 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania

More information

8: Hidden Markov Models

8: Hidden Markov Models 8: Hidden Markov Models Machine Learning and Real-world Data Helen Yannakoudakis 1 Computer Laboratory University of Cambridge Lent 2018 1 Based on slides created by Simone Teufel So far we ve looked at

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN A Tutorial On Backward Propagation Through Time (BPTT In The Gated Recurrent Unit (GRU RNN Minchen Li Department of Computer Science The University of British Columbia minchenl@cs.ubc.ca Abstract In this

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations

More information

Discriminative Training

Discriminative Training Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text

More information

Hidden Markov Models: All the Glorious Gory Details

Hidden Markov Models: All the Glorious Gory Details Hidden Markov Models: All the Glorious Gory Details Noah A. Smith Department of Computer Science Johns Hopkins University nasmith@cs.jhu.edu 18 October 2004 1 Introduction Hidden Markov models (HMMs, hereafter)

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

with Local Dependencies

with Local Dependencies CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence

More information

Lecture 15: Recurrent Neural Nets

Lecture 15: Recurrent Neural Nets Lecture 15: Recurrent Neural Nets Roger Grosse 1 Introduction Most of the prediction tasks we ve looked at have involved pretty simple kinds of outputs, such as real values or discrete categories. But

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning Recurrent Neural Network (RNNs) University of Waterloo October 23, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Sequential data Recurrent neural

More information

Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, Deep Lunch

Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, Deep Lunch Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, 2015 @ Deep Lunch 1 Why LSTM? OJen used in many recent RNN- based systems Machine transla;on Program execu;on Can

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013

Machine Translation. CL1: Jordan Boyd-Graber. University of Maryland. November 11, 2013 Machine Translation CL1: Jordan Boyd-Graber University of Maryland November 11, 2013 Adapted from material by Philipp Koehn CL1: Jordan Boyd-Graber (UMD) Machine Translation November 11, 2013 1 / 48 Roadmap

More information

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto Revisiting PoS tagging Will/MD the/dt chair/nn chair/?? the/dt meeting/nn from/in that/dt

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Improved Learning through Augmenting the Loss

Improved Learning through Augmenting the Loss Improved Learning through Augmenting the Loss Hakan Inan inanh@stanford.edu Khashayar Khosravi khosravi@stanford.edu Abstract We present two improvements to the well-known Recurrent Neural Network Language

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

N-gram Language Modeling

N-gram Language Modeling N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical

More information

statistical machine translation

statistical machine translation statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information