lecture 6: modeling sequences (final part)
|
|
- Lindsay Perry
- 5 years ago
- Views:
Transcription
1 Natural Language Processing 1 lecture 6: modeling sequences (final part) Ivan Titov Institute for Logic, Language and Computation
2 Outline After a recap: } Few more words about unsupervised estimation of HMMs (forward backward) } More on discriminative estimation (CRFs / MEMMs) } Recurrent neural networks / encoder-decoder } Syntactic parsing (PCFGs) 2
3 Examples of structured prediction problems Our x } Syntactic Parsing } Protein structure prediction } Visual scene parsing Our y 3
4 Examples of structured prediction problems Our x } Syntactic Parsing } Protein structure prediction Our y We cannot estimate a distinct set of parameters each y, we need to understand: (1) how to break y into parts; (2) how to predict these parts and (3) how these parts interact with each other } Visual scene parsing 4
5 Structured Prediction D = {x i,y i } l i=1 } Given a training dataset, The output is now a graph x i 2 X,y i 2 Y Now input-output pairs are mapped to the feature space } } } Represent examples in some features space Assume the classification rule: Estimate the parameters by optimizing some objective on the training data } For example, the hinge loss (max-margin classification / SVMs): 1X arg min w y 2 w 1,...,w M 2 y ŷ = arg max y f : X Y! R n w T f(x, y), w 2 R n, y 2 Y The highest score among the remaining structures Classification is computationally challenging as we search through the space Y s. t. w T f(x i,y i ) max w T f(x i,y) y2y/y i 1 score for the "gold" structure 5
6 Consider the sequence labeling example The feature space f f N N N N M V... N N N M M V... birds dogs can fly can fly... (dogs:n can:m fly:v)= ( 0, 1, 0, 0,, 1, 1,.. 0, 1, T 1, ) (dogs:n can:n fly:n)= ( 0, 1, 1, 1,, 0, 0,.. 2, 0, T 0, ) w = ( 5, 4, 2, 2,, 10, 5,.. -1, 3, T 3, ) Counts of the corresponding fragments Features of another sequence We want to find weights which score all the wrong sequences below the correct ones w T f (dogs:n can:m fly:v) > w T f (dogs:n can:n fly:n) And all (M 3 1) other 'wrong' sequences for this sentence 6
7 Structured Perceptron } } Return to structured prediction: ŷ = arg max w T f(x, y) y2y Perceptron algorithm, given a training set D = {x i,y i } l i=1 w =0 do Pushes the correct sequence up and the incorrectly predicted one down // initialize err = 0 for i = 1 to l // over the training examples ŷ = arg max w T f(x i,y) // model prediction if ( w T f(x i, ŷ) > w T f(x i,y i ) ) // if mistake w += f(x i,y i ) f(x i, ŷ) // update err ++ // # errors endif endfor while ( err > 0 ) // repeat until no errors return w y Runs Viterbi during training 7
8 Averaged Structured Perceptron } } Return to structured prediction: ŷ = arg max w T f(x, y) y2y Perceptron algorithm, given a training set D = {x i,y i } l i=1 w =0; k =0 w k = w; k ++ do Do not run until convergence (just T iterations) // initialize err = 0 for i = 1 to l // over the training examples ŷ = arg max w T f(x i,y) // model prediction if ( w T f(x i, ŷ) > w T f(x i,y i ) ) // if mistake w += f(x i,y i ) f(x i, ŷ) // update err ++ // # errors endif endfor while ( err > 0 ) // repeat until no errors return 1 k kx i=1 w k y How can we compute the average (much more) memory efficiently? 8
9 Generative vs discriminative on smaller datasets } For smaller training sets } } Theoretical results: generative classifiers converge faster to their optimal error [Ng & Jordan, NIPS 01] Empirical: A discriminative classifier A generative model Error rates on predicting housing trends prices in Boston area # train examples 1 9
10 Hidden Markov Models: Unsupervised Estimation } } N the number tags, M vocabulary size Parameters (to be estimated from the training set): Note the change in notation from y to s } Transition probabilities a ji = P (s t = i s t 1 = j), A - [ N x N ] matrix } Emission probabilities b ik = P (x t = k s t = i), B - [ N x M] matrix } Training corpus: } x (1) = (In, an, Oct., 19, review, of,. ), y (1) = (IN, DT, NNP, CD, NN, IN,. ) } x (2) = (Ms., Haag, plays, Elianti,.), y (2) = (NNP, NNP, VBZ, NNP,.) } } x (L) = (The, company, said, ), y (L) = (DT, NN, VBD, NNP,.) For notation reasons, let's assume that all the sentences have length n } How to estimate the parameters using maximum likelihood estimation? } How You might do have we guessed estimate what models these estimates in the are? unsupervised set-up? 10
11 EM (intuition) (l) t (i) =P (s t = i x (l) 1,...,x(l) n ) (1) (1) 1 2 N V John loves Mary P (x t = books s t = N) = N V Mary loves books P (x s) = ĈE(x, s) P x ĈE(x, s) N V John hates books Ĉ E (x, s) = LX l=1 nx t=1 (l) t (s)i(x (l) t = x) An indicator function: equals to 1 if the condition is true, 0 o.w.
12 EM (intuition) (l) t (i, j) =P (s t = i, s t+1 = j x (l) 1,...,x(l) n ) V-V: 0.1 V-V: 0.2 (i, j) V-N: 0.1 V-N: 0.1 $-N: 0.8 N-N: 0.2 N-N: 0.3 $-V: 0.2 N-V: 0.6 N-V: 0.4 (1) 0 (1) 1 (i, j) N-$: 0.7 V-$: 0.3 a ji = P (s t = N s t 1 = V )= John loves Mary V-V: 0.2 V-V: 0.1 V-N: 0.2 V-N: 0.1 $-N: 0.7 N-N: 0.2 N-N: 0.1 $-V: 0.3 N-V: 0.4 N-V: 0.7 Mary loves books V-V: 0.1 V-V: 0.1 V-N: 0.2 V-N: 0.1 $-N: 0.6 N-N: 0.2 N-N: 0.3 $-V: 0.4 N-V: 0.5 N-V: 0.5 John hates books N-$: 0.8 V-$: 0.2 N-$: 0.6 V-$: 0.4 P (j i) = Ĉ T (i, j) = Ĉ T (i, j) P j 0 Ĉ T (i, j 0 ) LX nx 1 l=1 t=0 t=1 (l) t (i, j) Disclaimer: posterior distributions in this example may not satisfy natural consistency conditions - this is just an example
13 Intuitive conclusion: } We need to figure out how to compute: (1) probabilistic predictions about states t(i) =P (s t = i x 1,...,x n ) (2) probabilistic predictions about pairs of states t (i, j) =P (s t = i, s t+1 = j x 1,...,x n )
14 Forward-backward probabilities Picture based on one from Tommi Jaakkola Forward probability } Forward probabilities t (i) t (i) =P (x 1,...,x t,s t = i) } Backward probabilities t(i) t(i) =P (x t+1,...,x n s t = i) Backward probability We can think of this as of evidence about the current state from future observationsx
15 Forward-backward probabilities } Recursion for calculating forward probabilities t (i) =P (x 1,...,x t,s t = i) 1 (i) =P (i $)P (x 1 i) 0 1 t (i) X j t 1 (j)p (i j) A P (x t i)
16 Forward-backward probabilities } Analogously, recursion for calculating backward probabilities t(i) =P (x t+1,...,x n s t = i) n(i) =1 0 t(i) X j We assume here that n-th symbol is $ (or </s>) 1 P (j i)p (x t+1 j) t+1 (j) A
17 } The fw and bw probabilities are complementary and permit us to evaluate various probabilities: Forward-backward probabilities t (i) =P (x 1,...,x t,s t = i) t(i) =P (x t+1,...,x n s t = i) P (x 1,x 2,...,x n ) t(i) =P (s t = i x 1,...,x n ) t (i, j) =P (s t = i, s t+1 = j x 1,...,x n ) These two we need for EM
18 Forward-backward probabilities The probability of an observed sequence: P (x 1,...,x n )= X P (x 1,...,x n,s t = i) i = X P (x 1,...,x t,s t = i)p (x t+1,...,x n s t = i) i = X i t (i) t (i)
19 Forward-backward probabilities The posterior probability that HMM was in state i at time t P (s t = i x 1,...,x n )= P (x 1,...,x n,s t = i) t(i) = P (x 1,...,x n ) = t(i) t (i) P j t(j) t (j)
20 Forward-backward probabilities The posterior probability of being at states i and j at times t and t +1, respectively (i, j) =P (s t = i, s t+1 = j x 1,...,x n )= t (i)p (s t+1 = j s t = i)p (x t+1 s t+1 = j) t+1 (j) P j t(j) t (j)
21 Putting everything together: } We need to figure out how to compute: (1) probabilistic predictions about t(i) =P (s t = i x 1,...,x n ) (2) probabilistic predictions about states pairs of states t (i, j) =P (s t = i, s t+1 = j x 1,...,x n ) See also chapter 6.5 of J&M Now we know how: via forward and backward probabilities Now we have all the components in place for EM Ĉ T (i, j) = Ĉ E (x, s) = LX l=1 LX l=1 nx 1 t=1 nx t=1 (l) t (i, j) ) (l) t (s)i(x (l) t = x) P (j i) = ) P Ĉ T (i, j) P j 0 Ĉ T (i, j 0 ) (x s) = ĈE(x, s) P x ĈE(x, s)
22 Viterbi / forward-backward / Baum-Welch } Have you noticed a relation between the forward algorithm and Viterbi? } Roughly speaking, Viterbi select the best previous states, whereas the forward algorithm sums over all possibilities: Forward: Viterbi: Forward: Viterbi: t (i) =P (x 1,...,x t,s t )= v t (i) = max P (x 1,...,x t,s 1,...,s t 1,s t ) s 1,...,s t t (i) =P (x t X t j v t (i) =P (x t i) max v t j X s 1,...,s t 1 P (x 1,...,x t,s 1,...,s t 1,s t ) 1 (j)p (i j) A 1 (j)p (i j) A Sum-Product algorithm A Max-Product algorithm 22
23 Viterbi / forward-backward / Baum-Welch } Forward-backward computation algorithm for HMMs? forward probabilities (beliefs) backward probabilities (beliefs) } It can be generalized to more general graphs ("belief propagation")? 23
24 Summary so far } Supervised estimation: generative (ML) and (feature-rich) discriminative modeling (structured perceptron, conditional random fields, MEMM) } Unsupervised modeling: generative (EM) Recently, there was some work on estimation of feature-rich models for unsupervised modeling
25 Outline After a recap: } Few more words about unsupervised estimation of HMMs (forward backward) } More on discriminative estimation (CRFs / MEMMs) } Recurrent neural networks / encoder-decoder } Syntactic parsing (PCFGs)
26 Discriminative estimation } Generative models not only consider the labeling but also score how likely an input sequence is x 1,...,x n } Discriminative models instead concentrate only on modeling how likely an output sequence y 1,...,y n is for a given input x 1,...,x n I switched back to using y instead of s
27 Discriminative estimation } We have learnt one discriminative method: structured perceptron } but what if we want to get probabilities P (y x)?
28 Recap: logistic regression } Logistic ("softmax") regression (aka max-entropy classifier): Notation abuse: we will drop w Z(x) P (y x, w) = exp(wt f(x, y)) Z(x) is a partition function: Z(x) = X y 0 2Y exp(w T f(x, y 0 )) } "Conditional" log-likelihood: L(w) = LX log P (y (l) x (l), w)+kkwk 2 2 l=1 We will forget the regularizer in the future discussion
29 Recap: stochastic gradient descent } We compute the gradient of the conditional log-likelihood 4 function L(w) } We can update our parameter vector based on the gradient: w := w + 4 L(w) Learning rate In practice, slighter "smarter" gradient methods are normally used
30 Recap: gradient for multi-class } Let's derive the gradient: L # X log P (y (l) x (l), i l= LX T f(x (l),y (l) ) log X exp(w T f(x (l),y 0 )) i 0 l=1 y 02Y 1 LX X i (x (l),y (l) ) f i (x (l) exp(w T f(x (l), ỹ)), ỹ) P A l=1 ỹ y 0 2Y exp(wt f(x (l),y 0 )) 0 1 LX X i (x (l),y (l) ) f i (x (l), ỹ)p (ỹ x, w) A l=1 ỹ } Intuition: we are trying to find such a model that feature expectations computed according to the model are similar to their estimates from the data
31 Structured case } The soft-max classifier P (y x, w) = exp(wt f(x, y)) Z(x) How do we generalize it to sequences?
32 Idea 1: Max-Ent Markov Models (MEMMs) y 1 y 2 y 3 y n. x 1 x 2 x 3 x n ny P (y x, w) =P (y 1 x, w) P (y t y t 1, x, w) t=2 Notation abuse: we will drop t P (y t y t 1, x, w) = exp(wt f(y t 1,y t, x,t)) Z(y t 1, x,t) } Essentially, prediction of each next label is just a classification decision
33 Idea 1: Max-Ent Markov Models (MEMMs) y 1 y 2 y 3 y n. x 1 x 2 x 3 x n ny P (y x, w) =P (y 1 x, w) P (y t y t 1, x, w) t=2 Notation abuse: we will drop t P (y t y t 1, x, w) = exp(wt f(y t 1,y t, x,t)) Z(y t 1, x,t) } Essentially, prediction of each next label is just a classification decision
34 Idea 1: Max-Ent Markov Models (MEMMs) y 1 y 2 y 3 y n. x ny P (y x, w) =P (y 1 x, w) P (y t y t 1, x, w) t=2 Notation abuse: we will drop t P (y t y t 1, x, w) = exp(wt f(y t 1,y t, x,t)) Z(y t 1, x,t) } Essentially, prediction of each next label is just a classification decision } What are the training examples? } How do we search?
35 MEMMs: Problem 1 P (y t y t 1, x, w) = exp(wt f(y t 1,y t, x,t)) Z(y t 1, x,t) } Since it is trained using as y t-1 the correct label, it will end up overrelying on it } At test time, y t-1 (unlike x) will be predicted (and often incorrect) } Formally, the distribution of features in training and testing is different and the examples are interdependent (the i.i.d. assumption standard in machine learning is broken) This is not only a theoretical problem
36 MEMMs: Problem 1 } Consider a sequence labeling problem } Input vectors are uniformly distributed } If input is (1,1,1) the output is also (1, 1, 1) } Otherwise, the output is (0, 0, 0) y 1 y 2 y 3 x 1 x 2 x 3 P (y x, w) =P (y 1 x, w) {0, 1} 3!{0, 1} 3 Let's forget for now that the probabilities are computed with softmax ny P (y t y t 1, x, w) t=2 Ignores input!! P (y 1 =1 x 1 = 1) = P (y 1 =0 x 1 = 1) = 1/4 3/4 P (y 2 =0 y 1 =0,x 2 ) =1, 8x 2 P (y 2 =1 y 1 =1,x 2 = 1)= 1 P ((1, 1, 1) (1, 1, 1)) = 1/4 P ((0, 0, 0) (1, 1, 1)) = 3/4
37 MEMMs: Problem 1 } Consider a sequence labeling problem {0, 1} 3!{0, 1} 3 } Input vectors are uniformly distributed One can say: this all has happened because we factorized } If input is (1,1,1) the output is also (1, 1, 1) the model badly (i.e. features are not appropriate) } Otherwise, the output is (0, 0, 0) y 1 y 2 y 3 x 1 x 2 x 3 P (y x, w) =P (y 1 x, w) yes, but Let's forget for now that the probabilities are computed with softmax ny P (y t y t 1, x, w) t=2 Ignores input!! P 1. (y 1 We =1 x do 1 not = 1) know = 1/4if we P will (y 2 better =0 y 1 with =0,x (complex) 2 ) =1, 8x real 2 P (y 1 problems =0 x 1 = 1) = 3/4 P (y 2 =1 y 1 =1,x 2 = 1)= 1 2. P Structured ((1, 1, 1) (1, perceptron 1, 1)) = 1/4 (and CRF) would learn a P perfect ((0, 0, 0) (1, classifier 1, 1)) with = this 3/4 factorization
38 MEMMs: Problem 2 } The "label bias problem" For example, our set of labels is small (=2) Imagine we perform Viterbi: Hypothesis 1: p = It is often true At time t comes input completely inconsistent with Hypothesis 1 Hypothesis 2: p = t 2 t 1 t Can we the model (in principle) ensure that the red state is not included in the winning path?
39 MEMMs: Problem 2 } The "label bias problem" For example, our set of labels is small (=2) Imagine we perform Viterbi: Hypothesis 1: p = It is often true p = At time t comes input completely inconsistent with Hypothesis Hypothesis 2: p = p = Can we the model (in principle) ensure that the red state is not included in the winning path? t 2 t 1 t The winner
40 MEMMs: Problem 2 } The "label bias problem" For example, our set of labels is small (=2) Imagine we perform Viterbi: Hypothesis 1: p = It is often true p = x 0.5 At time t comes input completely inconsistent with Hypothesis Hypothesis 2: p = p = x 0.5 Why is it tnot 2a serious t 1 t problem for HMMs? Can we the model (in principle) ensure that the red state is not included in the winning path? No because we need to preserve the probability mass (the probabilities sum to 1)
41 MEMMs: Conclusions } MEMMs are easy to estimate and use } In practice, they are very often used in NLP We will see them again in the context of parsing } but they have serious problems and, consequently, brittle } these problems are not unique to MEMMs but for any "piecewise" estimation approach (our neural / deep models as well) } Problems very similar to "Problem 1" come up when "pipelines" are used: } } For example, text -> PoS tagging -> syntactic parsing -> semantic analysis -> dialog state prediction -> Error propagates across stages and we should be 'careful' when training models for individual stages
42 Idea 2: conditional random fields (CRF) P P (y x, w) = exp(wt n t=1 f(y t 1,y t, x)) Z(x) The partition function: Z(x) = X The factor graph y 1 y 2 y 3 y n. y 0 2Y n exp(w T nx f(yt 0 t=1 I am not careful with START and STOP to simplify notation 1,y 0 t, x)) The summation is over the set of all potential labelings of the entire sentence (exponential size) w T f(y 1,y 2, x) w T f(y 2,y 3, x) } We score the entire sequence in one shot } What are the training examples? } How do we search?
43 CRF (Lafferty, McCallum and Pereira, 2001)
44 Idea 2: chain conditional random field (CRF) P (y x, w) = exp(wt P n t=1 f(y t 1,y t, x)) Z(x) } We score the entire sequence in one shot } Do we have the i.i.d. problem (problem 1)? } Do we have the label bias problem (problem 2)?
45 How do we estimate the model? P P (y x, w) = exp(wt n t=1 f(y t Z(x) 1,y t, x)) Z(x) = X nx exp(w T f(yt 0 y 0 2Y n t=1 1,yt, 0 x)) } Do we have any hopes to compute the gradient?! LX nx L(w) = t, x) log Z(x (l) 4 4, w) = LX l=1 nx t=1 X l=1 y,y 0 2Y t=1 f(y (l) t 1,y(l) f(y (l) t 1,y(l) t, x) P (y t Again: matching data and model expectations 1 = y, y t = y 0 x, w)f(y (l) t 1 = y, y(l) t = y 0, x)) How do we compute the probabilities?
46 Summary: linear models for (supervised) sequence labeling + - HMMs Very easy to estimate Easy to generalize to unand semi-supervised learning Low asymptotic performance Structured perceptron MEMMs Simple to implement Fast to estimate (no decoding at training time) Does not yield probabilities Does not optimize an objective (kind of) They can be brittle (recall the problems mentioned) CRFs Gives out probabilities, motivated by a clear objective, stabile performance Harder to implement (forward-backward), expensive training (esp. when not 1 st order MM)
47 Outline After a recap: } Few more words about unsupervised estimation of HMMs (forward backward) } More on discriminative estimation (CRFs / MEMMs) } Recurrent neural networks / encoder-decoder } Syntactic parsing (PCFGs) 47
48 Recurrent neural networks (RNNs) y 1 y 2 y n Vanishing gradients problem is mitigated by Long-Short Term Memory Networks (LSTMs) but still a major issue } Lots of similarities to MEMMs } What's nice about them? } They can perform fairly well with minimal feature engineering } What's problematic about them? (v.s. MEMMs) } "Vanishing gradients" gradient information is not propagated from observations (y t ) to states far away in the past } Decoding? x 1 x 2 x n
49 Encoder-Decoder Encoder y 1 y 2 y n x 1 x 2 x n Decoder } Sounds like a crazy idea: compressing the entire sentences down to a single vector } More general than sequence labeling (why?) } Decoding is approximate } Brittle (but some "better" ideas are around: attention models)
50 NN models vs linear models Linear models - Often, exact decoding is possible; - Some degree of interpretability - Easier to encode prior knowledge (?) - Convex optimization (for supervised learning) Feature engineering is crucial - They can be expensive (!) in practice Representation learning models (incl. deep / neural) - feature induction - parameter sharing across multiple tasks / features (!!) - (related) modeling compositionality of language - can be efficient (e.g., on GPUs) - Non-convex optimization - Not very interpretable - Exact decoding is not possible - They can be more expensive (for some tasks)
51 NLP Problems Doc. classification Types of structures Models/Views Set-ups Modeling frameworks Bags Naive Bayes Topic analysis Shallow synt. parsing /tagging Sequences / Chains Topic models HMMs Supervised estimation Generative ML Syntactic parsing Relation extraction Semantic parsing Models of inference Machine translation Question answering Spanning trees Hierarchical trees History- / transition-based models PCFGs DOP Unsupervised Partially/semisupervised Many problems in NLP can be cast / or DAGs Global scoring approximated with sequence (e.g., MST) model Generative Bayes Discriminative Discriminative Bayes Representation learning (factorizations / NNs) Opinion analysis Bipartite graphs Summarization Dialogue systems "IBM" models What about dialog systems? (e.g., Siri)
Lecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationSequence Labeling: HMMs & Structured Perceptron
Sequence Labeling: HMMs & Structured Perceptron CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu HMM: Formal Specification Q: a finite set of N states Q = {q 0, q 1, q 2, q 3, } N N Transition
More informationNatural Language Processing
Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability
More information10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging
10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will
More informationStatistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields
Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationConditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013
Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationCSE 490 U Natural Language Processing Spring 2016
CSE 490 U Natural Language Processing Spring 2016 Feature Rich Models Yejin Choi - University of Washington [Many slides from Dan Klein, Luke Zettlemoyer] Structure in the output variable(s)? What is the
More informationConditional Random Fields for Sequential Supervised Learning
Conditional Random Fields for Sequential Supervised Learning Thomas G. Dietterich Adam Ashenfelter Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.eecs.oregonstate.edu/~tgd
More informationLecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)
Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationGraphical models for part of speech tagging
Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationwith Local Dependencies
CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationProbabilistic Models for Sequence Labeling
Probabilistic Models for Sequence Labeling Besnik Fetahu June 9, 2011 Besnik Fetahu () Probabilistic Models for Sequence Labeling June 9, 2011 1 / 26 Background & Motivation Problem introduction Generative
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationCSE 447/547 Natural Language Processing Winter 2018
CSE 447/547 Natural Language Processing Winter 2018 Feature Rich Models (Log Linear Models) Yejin Choi University of Washington [Many slides from Dan Klein, Luke Zettlemoyer] Announcements HW #3 Due Feb
More informationMACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING
MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING Outline Some Sample NLP Task [Noah Smith] Structured Prediction For NLP Structured Prediction Methods Conditional Random Fields Structured Perceptron Discussion
More informationSequential Supervised Learning
Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationA brief introduction to Conditional Random Fields
A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLecture 12: EM Algorithm
Lecture 12: EM Algorithm Kai-Wei hang S @ University of Virginia kw@kwchang.net ouse webpage: http://kwchang.net/teaching/nlp16 S6501 Natural Language Processing 1 Three basic problems for MMs v Likelihood
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More information2.2 Structured Prediction
The hinge loss (also called the margin loss), which is optimized by the SVM, is a ramp function that has slope 1 when yf(x) < 1 and is zero otherwise. Two other loss functions squared loss and exponential
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationRecap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.
Recap: HMM ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2018 Elements of HMM: Set of states (tags) Output alphabet (word types) Start state (beginning of sentence) State transition probabilities
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationWhat s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction
Hidden Markov Models (HMMs) for Information Extraction Daniel S. Weld CSE 454 Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) standard sequence model in genomics, speech, NLP, What
More informationSoft Inference and Posterior Marginals. September 19, 2013
Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationNLP Programming Tutorial 8 - Recurrent Neural Nets
NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Feed Forward Neural Nets All connections point forward ϕ( x) y It is a directed acyclic
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationNatural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay
Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay Lecture - 21 HMM, Forward and Backward Algorithms, Baum Welch
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationSequence Modeling with Neural Networks
Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes
More informationNeural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig
Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Slides credit: Graham Neubig Outline Perceptron: recap and limitations Neural networks Multi-layer perceptron Forward propagation
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationCMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009
CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models Jimmy Lin The ischool University of Maryland Wednesday, September 30, 2009 Today s Agenda The great leap forward in NLP Hidden Markov
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationRecurrent Neural Networks. Jian Tang
Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 8: Sequence Labeling Jimmy Lin University of Maryland Thursday, March 14, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationPredicting Sequences: Structured Perceptron. CS 6355: Structured Prediction
Predicting Sequences: Structured Perceptron CS 6355: Structured Prediction 1 Conditional Random Fields summary An undirected graphical model Decompose the score over the structure into a collection of
More informationLoss Functions, Decision Theory, and Linear Models
Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678
More informationConditional Random Fields
Conditional Random Fields Micha Elsner February 14, 2013 2 Sums of logs Issue: computing α forward probabilities can undeflow Normally we d fix this using logs But α requires a sum of probabilities Not
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationApplied Natural Language Processing
Applied Natural Language Processing Info 256 Lecture 20: Sequence labeling (April 9, 2019) David Bamman, UC Berkeley POS tagging NNP Labeling the tag that s correct for the context. IN JJ FW SYM IN JJ
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationStructured Prediction
Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationLinear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)
Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Nathan Schneider (some slides borrowed from Chris Dyer) ENLP 12 February 2018 23 Outline Words, probabilities Features,
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationINTRODUCTION TO DATA SCIENCE
INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do
More informationHidden Markov Models
CS 2750: Machine Learning Hidden Markov Models Prof. Adriana Kovashka University of Pittsburgh March 21, 2016 All slides are from Ray Mooney Motivating Example: Part Of Speech Tagging Annotate each word
More informationStatistical NLP: Hidden Markov Models. Updated 12/15
Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationNaïve Bayes, Maxent and Neural Models
Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words
More information