Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Size: px

Start display at page:

Download "Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC"

Hollie Melissa Hart
5 years ago
Views:

1 Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

2 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

3 Why Do We Need Both the Forward and Backward Algorithms? Compute posteriors α(i, s) * β(i, s) = total probability of paths through state s at step i p z i = s w 1,, w N ) = α i, s β(i, s) α(n + 1, END) α(i, s) * p(s s) * p(obs at i+1 s ) * β(i+1, s ) = total probability of paths through the s s arc (at time i) p z i = s, z i+1 = s w 1,, w N ) = α i, s p s s p obs i+1 s β(i + 1, s ) α(n + 1, END)

4 estimated counts EM for HMMs 0. Assume some value for your parameters Two step, iterative algorithm 1. E-step: count under uncertainty, assuming these parameters p z i = s w 1,, w N ) = α i, s β(i, s) α(n + 1, END) p obs (w s) p trans (s s) p z i = s, z i+1 = s w 1,, w N ) = α i, s p s s p obs i+1 s β(i + 1, s ) α(n + 1, END) 2. M-step: maximize log-likelihood, assuming these uncertain counts

5 α = computeforwards() β = computebackwards() L = α[n+1][e N D ] for(i = N; i 0; --i) { for(next = 0; next < K*; ++next) { EM For HMMs (Baum-Welch Algorithm) c obs (obs i+1 next) += α[i+1][next]* β[i+1][next]/l for(state = 0; state < K*; ++state) { u = p obs (obs i+1 next) * p trans (next state) c trans (next state) += α[i][state] * u * β[i+1][next]/l } } } update p obs, p trans using c obs, c trans

6 Semi-Supervised Learning labeled data: human annotated relatively small/few examples???????????????????????? EM unlabeled data: raw; not annotated plentiful

Semi-Supervised Parameter Estimation for HMMs Transition Counts N V end start 2 0 0 N 1 2 2 V 2 1 0

8 2.1 V 3.4 2.1.4 Mixed Emission Counts w 1 w 2 W 3 w 4 N 2.4.3 1.2 2.2 V.1 2.6 1.3.3 Expected Transition Counts N V end start 1.

7 Semi-Supervised Parameter Estimation for HMMs Transition Counts N V end start N V Emission Counts w 1 w 2 W 3 w 4 N V Mixed Transition Counts N V end start N V Mixed Emission Counts w 1 w 2 W 3 w 4 N V Expected Transition Counts N V end start N V Expected Emission Counts w 1 w 2 W 3 w 4 N V

8 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

9 Warren Weaver s Note When I look at an article in Russian, I say This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. (Warren Weaver, 1947) Slides courtesy Rebecca Knowles

10 Noisy Channel Model text language language speak w or d язы к Decode language Rerank text written in (clean) English observed Russian (noisy) text translation/ decode model w or d (clean) language model speak English Slides courtesy Rebecca Knowles

11 Noisy Channel Model text language language speak w or d язы к Decode language Rerank text written in (clean) English observed Russian (noisy) text translation/ decode model w or d (clean) language model speak English Slides courtesy Rebecca Knowles

12 Noisy Channel Model text language language speak w or d язы к Decode language Rerank text written in (clean) English observed Russian (noisy) text translation/ decode model w or d (clean) language model speak English Slides courtesy Rebecca Knowles

13 Translation Translate French (observed) into English: Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

14 Translation Translate French (observed) into English: Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

15 Translation Translate French (observed) into English: Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

16 Alignment Le chat est sur la chaise. The cat is on the chair.? Le chat est sur la chaise. The cat is on the chair. Slides courtesy Rebecca Knowles

17 Parallel Texts Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world, Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people, Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law, Whereas it is essential to promote the development of friendly relations between nations, Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj. Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli. Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan. Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan. Slides courtesy Rebecca Knowles

18 Preprocessing Whereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world, Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people, Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law, Whereas it is essential to promote the development of friendly relations between nations, Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kuali nemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj. Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipan ni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatok majmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kuali timouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uan teixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli. Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uan ma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekis techchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipan tonemilis ni tlalpan. Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuak tiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan. Sentence align Clean corpus Tokenize Handle case Word segmentation (morphological, BPE, etc.) Language-specific preprocessing (example: pre-reordering)... Slides courtesy Rebecca Knowles

19 Alignments If we had word-aligned text, we could easily estimate P(f e). But we don t usually have word alignments, and they are expensive to produce by hand If we had P(f e) we could produce alignments automatically. Slides courtesy Rebecca Knowles

20 IBM Model 1 (1993) Lexical Translation Model Word Alignment Model The simplest of the original IBM models For all IBM models, see the original paper (Brown et al, 1993): Slides courtesy Rebecca Knowles

21 Simplified IBM 1 We ll work through an example with a simplified version of IBM Model 1 Figures and examples are drawn from A Statistical MT Tutorial Workbook, Section 27, (Knight, 1999) Simplifying assumption: each source word must translate to exactly one target word and vice versa Slides courtesy Rebecca Knowles

22 IBM Model 1 (1993) f: vector of French words (visualization of alignment) e: vector of English words Le chat est sur la chaise verte The cat is on the green chair a: vector of alignment indices Slides courtesy Rebecca Knowles

23 IBM Model 1 (1993) f: vector of French words (visualization of alignment) e: vector of English words Le chat est sur la chaise verte The cat is on the green chair a: vector of alignment indices t(f j e i ) : translation probability of the word f j given the word e i Slides courtesy Rebecca Knowles

24 Model and Parameters Want: P(f e) But don t know how to train this directly Solution: Use P(a, f e), where a is an alignment Remember: Slides courtesy Rebecca Knowles

25 Model and Parameters: Intuition Translation prob.: Example: Interpretation: How probable is it that we see f j given e i Slides courtesy Rebecca Knowles

26 Model and Parameters: Intuition Alignment/translation prob.: Example (visual representation of a): le chat le chat P( the cat ) < P( the cat ) the cat the cat Interpretation: How probable are the alignment a and the translation f (given e) Slides courtesy Rebecca Knowles

27 Model and Parameters: Intuition Alignment prob.: Example: P( le chat, the cat ) < P( le chat, the cat ) Interpretation: How probable is alignment a (given e and f) Slides courtesy Rebecca Knowles

28 Model and Parameters How to compute: Slides courtesy Rebecca Knowles

29 Parameters For IBM model 1, we can compute all parameters given translation parameters: How many of these are there? Slides courtesy Rebecca Knowles

30 Parameters For IBM model 1, we can compute all parameters given translation parameters: How many of these are there? French vocabulary x English vocabulary Slides courtesy Rebecca Knowles

31 Data Two sentence pairs: b c b English x y y French Slides courtesy Rebecca Knowles

32 All Possible Alignments (French: x, y) x y x y (English: b, c) b c b c Remember: simplifying assumption that each word must be aligned exactly once y b Slides courtesy Rebecca Knowles

33 Expectation Maximization (EM) 0. Assume some value for and compute other parameter values Two step, iterative algorithm 1. E-step: count alignments and translations under uncertainty, assuming these parameters le chat P( the cat ) 2. M-step: maximize log-likelihood (update parameters), using uncertain counts le chat P( the cat ) estimated counts Slides courtesy Rebecca Knowles

34 Review of IBM Model 1 & EM Iteratively learned an alignment/translation model from sentence-aligned text (without gold standard alignments) Model can now be used for alignment and/or word-level translation We explored a simplified version of this; IBM Model 1 allows more types of alignments Slides courtesy Rebecca Knowles

35 Why is Model 1 insufficient? Why won t this produce great translations? Indifferent to order (language model may help?) Translates one word at a time Translates each word in isolation... Slides courtesy Rebecca Knowles

36 Uses for Alignments Component of machine translation systems Produce a translation lexicon automatically Cross-lingual projection/extraction of information Supervision for training other models (for example, neural MT systems) Slides courtesy Rebecca Knowles

37 Evaluating Machine Translation Human evaluations: Test set (source, human reference translations, MT output) Humans judge the quality of MT output (in one of several possible ways) Koehn (2017), Slides courtesy Rebecca Knowles

38 Evaluating Machine Translation Automatic evaluations: Test set (source, human reference translations, MT output) Aim to mimic (correlate with) human evaluations Many metrics: TER (Translation Error/Edit Rate) HTER (Human-Targeted Translation Edit Rate) BLEU (Bilingual Evaluation Understudy) METEOR (Metric for Evaluation of Translation with Explicit Ordering) Slides courtesy Rebecca Knowles

39 Machine Translation Alignment Now Explicitly with fancier IBM models Implicitly/learned jointly with attention in recurrent neural networks (RNNs)

40 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

41 Recall: N-gram to Maxent to Neural Language Models given some context w i-3 w i-2 w i-1 compute beliefs about what is likely p w i w i 3, w i 2, w i 1 ) count(w i 3, w i 2, w i 1, w i ) predict the next word w i

42 Recall: N-gram to Maxent to Neural Language Models given some context w i-3 w i-2 w i-1 compute beliefs about what is likely p w i w i 3, w i 2, w i 1 ) = softmax(θ f(w i 3, w i 2, w i 1, w i )) predict the next word w i

43 Hidden Markov Model Representation p z 1, w 1, z 2, w 2,, z N, w N = p z 1 z 0 p w 1 z 1 p z N z N 1 p w N z N emission probabilities/parameters = p w i z i i p z i z i 1 transition probabilities/parameters z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

44 A Different Model s Representation z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

45 A Different Model s Representation p z 1, z 2,, z N w 1, w 2,, w N = p z 1 z 0, w 1 p z N z N 1, w N = p z i z i 1, w i i z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

46 A Different Model s Representation p z 1, z 2,, z N w 1, w 2,, w N = p z 1 z 0, w 1 p z N z N 1, w N = p z i z i 1, w i i p z i z i 1, w i ) exp( θ T f w i, z i 1, z i ) z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

47 Maximum Entropy Markov Model (MEMM) A Different Model s Representation p z 1, z 2,, z N w 1, w 2,, w N = p z 1 z 0, w 1 p z N z N 1, w N = p z i z i 1, w i i p z i z i 1, w i ) exp( θ T f w i, z i 1, z i ) z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 represent the probabilities and independence assumptions in a graph

48 MEMMs z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 Discriminative: don t care about generating observed sequence at all Maxent: use features Problem: Label-Bias problem

49 Label-Bias Problem z i w i

50 Label-Bias Problem 1 z i incoming mass must sum to 1 w i

51 Label-Bias Problem 1 1 z i incoming mass must sum to 1 outgoing mass must sum to 1 w i

52 Label-Bias Problem 1 1 z i incoming mass must sum to 1 outgoing mass must sum to 1 w i observe, but do not generate (explain) the observation Take-aways: the model can learn to ignore observations the model can get itself stuck on bad paths

53 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

54 (Linear Chain) Conditional Random Fields z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 Discriminative: don t care about generating observed sequence at all Condition on the entire observed word sequence w 1 w N Maxent: use features Solves the label-bias problem

55 (Linear Chain) Conditional Random Fields z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 p z 1,, z N w 1,, w N ) exp( θ T f z i 1, z i, w 1,, w N ) i

56 (Linear Chain) Conditional Random Fields z 1 z 2 z 3 z 4 w 1 w 2 w 3 w 4 p z 1,, z N w 1,, w N ) i exp( θ T f z i 1, z i, w 1,, w N ) condition on entire sequence

57 CRF Tutorial, Fig 1.2, Sutton & McCallum (2012) Conditional vs. Sequence

58 CRF Tutorial, Fig 1.2, Sutton & McCallum (2012) Conditional vs. Sequence

59 CRF Tutorial, Fig 1.2, Sutton & McCallum (2012) Conditional vs. Sequence

60 CRF Tutorial, Fig 1.2, Sutton & McCallum (2012) Conditional vs. Sequence

61 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

Recall: N-gram to Maxent to Neural given some context Language Models w i-3 w i-2 create/use distributed representations e w e i-3 e i-2 e i-1 w i-1 combine these

62 Recall: N-gram to Maxent to Neural given some context Language Models w i-3 w i-2 create/use distributed representations e w e i-3 e i-2 e i-1 w i-1 combine these representations θ wi matrix-vector product C = f compute beliefs about what is likely p w i w i 3, w i 2, w i 1 ) = softmax(θ wi f(w i 3, w i 2, w i 1 )) predict the next word w i

63 A More Typical View of Recurrent Neural Language Modeling w i-2 w i-1 w i w i+1 h i-3 h i-2 h i-1 h i w i-3 w i-2 w i-1 w i

64 A More Typical View of Recurrent Neural Language Modeling w i-2 w i-1 w i w i+1 h i-3 h i-2 h i-1 h i w i-3 w i-2 w i-1 w i observe these words one at a time

65 A More Typical View of Recurrent Neural Language Modeling predict the next word w i-2 w i-1 w i w i+1 h i-3 h i-2 h i-1 h i w i-3 w i-2 w i-1 w i observe these words one at a time

66 A More Typical View of Recurrent Neural Language Modeling predict the next word w i-2 w i-1 w i w i+1 h i-3 h i-2 h i-1 h i from these hidden states w i-3 w i-2 w i-1 w i observe these words one at a time

67 A More Typical View of Recurrent Neural Language Modeling predict the next word cell w i-2 w i-1 w i w i+1 h i-3 h i-2 h i-1 h i from these hidden states w i-3 w i-2 w i-1 w i observe these words one at a time

68 A Recurrent Neural Network Cell w i w i+1 h i-1 h i w i-1 w i

69 A Recurrent Neural Network Cell w i w i+1 h i-1 W h i W w i-1 w i

70 A Recurrent Neural Network Cell w i w i+1 h i-1 W h i W U encoding U w i-1 w i

71 A Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i

72 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i )

73 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) σ x = exp( x)

74 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) σ x = exp( x)

75 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) σ x = exp( x)

76 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) w i+1 = softmax(sh i ) σ x = exp( x)

77 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) must learn matrices U, S, W w i+1 = softmax(sh i )

78 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) w i+1 = softmax(sh i ) must learn matrices U, S, W suggested solution: gradient descent on prediction ability

79 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) w i+1 = softmax(sh i ) must learn matrices U, S, W suggested solution: gradient descent on prediction ability problem: they re tied across inputs/timesteps

80 A Simple Recurrent Neural Network Cell w i w i+1 S decoding S h i-1 W h i W U encoding U w i-1 w i h i = σ(wh i 1 + Uw i ) w i+1 = softmax(sh i ) must learn matrices U, S, W suggested solution: gradient descent on prediction ability problem: they re tied across inputs/timesteps good news for you: many toolkits do this automatically

81 Why Is Training RNNs Hard? Conceptually, it can get strange But really getting the gradient just requires many applications of the chain rule for derivatives

82 Why Is Training RNNs Hard? Conceptually, it can get strange But really getting the gradient just requires many applications of the chain rule for derivatives Vanishing gradients Multiply the same matrices at each timestep multiply many matrices in the gradients

83 Why Is Training RNNs Hard? Conceptually, it can get strange But really getting the gradient just requires many applications of the chain rule for derivatives Vanishing gradients Multiply the same matrices at each timestep multiply many matrices in the gradients One solution: clip the gradients to a max value

84 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

85 Natural Language Processing from keras import * from torch import *

86 Pick Your Toolkit PyTorch Deeplearning4j TensorFlow DyNet Caffe Keras MxNet Gluon CNTK Comparisons: (older )

87 Defining A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 h i-2 h i-1 h i w i-2 w i-1 w i

88 Defining A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 h i-2 h i-1 h i w i-2 w i-1 w i

89 Defining A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 h i-2 h i-1 h i w i-2 w i-1 w i

90 Defining A Simple RNN in Python (Modified Very Slightly)

91 Defining A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 h i-2 h i-1 h i w i-2 w i-1 w i encode

92 Defining A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 h i-2 h i-1 h i w i-2 w i-1 w i decode

93 Training A Simple RNN in Python (Modified Very Slightly)

94 Training A Simple RNN in Python (Modified Very Slightly) Negative loglikelihood

95 Training A Simple RNN in Python (Modified Very Slightly) w i-1 w i w i+1 Negative loglikelihood h i-2 h i-1 h i w i-2 w i-1 w i get predictions

96 Training A Simple RNN in Python (Modified Very Slightly) Negative loglikelihood get predictions eval predictions

97 Training A Simple RNN in Python (Modified Very Slightly) Negative loglikelihood get predictions eval predictions compute gradient

98 Training A Simple RNN in Python (Modified Very Slightly) Negative loglikelihood get predictions eval predictions compute gradient perform SGD

99 Another Solution: LSTMs/GRUs LSTM: Long Short-Term Memory (Hochreiter & Schmidhuber, 1997) GRU: Gated Recurrent Unit (Cho et al., 2014) Basic Ideas: learn to forget forget line representation line

100 Outline Review: EM for HMMs Machine Translation Alignment Limited Sequence Models Maximum Entropy Markov Models Conditional Random Fields Recurrent Neural Networks Basic Definitions Example in PyTorch

Directed Probabilistic Graphical Models CMSC 678 UMBC

Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,