6.891: Lecture 8 (October 1st, 2003) Log-Linear Models for Parsing, and the EM Algorithm Part I

Size: px
Start display at page:

Download "6.891: Lecture 8 (October 1st, 2003) Log-Linear Models for Parsing, and the EM Algorithm Part I"

Transcription

1 6.891: Lecture 8 (October 1st, 2003) Log-Linear Models for Parsing, and EM Algorithm Part I

2 Overview Ratnaparkhi s Maximum-Entropy Parser The EM Algorithm Part I

3 Log-Linear Taggers: Independence Assumptions The input sentence S, with length n = S:length, hasjt j n possible tag sequences. Each tag sequence T has a conditional probability (T j S) = Q n j=1 P Q n j=1 = P (T (j) j S; j; T (1) : : : T (j 1)) Chain rule P (T (j) j S; j; T (j 2); T (j 1)) Independence assumptions Estimate P (T (j) j S; j; T (j 2); T (j 1)) using log-linear models Use Viterbi algorithm to compute argmax T 2T n log P (T j S)

4 A General Approach: (Conditional) History-Based Models We ve shown how to define P (T sequence j S) where T is a tag How do we define P (T j S) if T is a parse tree (or anor structure)?

5 Step 1: represent a tree as a sequence of decisions d 1 : : : d m my A General Approach: (Conditional) History-Based Models T = hd 1 ; d 2 ; : : : d m i m is not necessarily length of sentence Step 2: probability of a tree is P (T j S) = P (d i j d 1 : : : d i 1 ; S) i=1 Step 3: Use a log-linear model to estimate P (d i j d 1 : : : d i 1 ; S) Step 4: Search?? (answer we ll get to later: beam or heuristic search)

6 An Example Tree S(questioned) (lawyer) VP(questioned) lawyer Vt (witness) PP(about) questioned IN (revolver) witness about revolver

7 Ratnaparkhi s Parser: Three Layers of Structure 1. Part-of-speech tags 2. Chunks 3. Remaining structure

8 Step 1: represent a tree as a sequence of decisions d 1 : : : d m Layer 1: Part-of-Speech Tags Vt IN lawyer questioned witness about revolver T = hd 1 ; d 2 ; : : : d m i First n decisions are tagging decisions hd 1 : : : d n i = h,, Vt,,, IN,, i

9 Layer 2: Chunks Vt IN questioned about lawyer witness revolver Chunks are defined as any phrase where all children are partof-speech tags (Or common chunks are ADJP, QP)

10 Step 1: represent a tree as a sequence of decisions d 1 : : : d n Layer 2: Chunks Start() Join() Or Start() Join() Or Start() Join() Vt IN lawyer questioned witness about revolver = hd 1 ; d 2 ; : : : d n i T First n decisions are tagging decisions n Next decisions are chunk tagging decisions hd 1 : : : d 2n i = h,, Vt,,, IN,,, Start(), Join(), Or, Start(), Join(), Or, Start(), Join()i

11 Layer 3: Remaining Structure Alternate Between Two Classes of Actions: Join(X) or Start(X), where X is a label (, S, VP etc.) Check=YES or Check=NO Meaning of se actions: Start(X) starts a new constituent with label X (always acts on leftmost constituent with no start or join label above it) Join(X) continues a constituent with label X (always acts on leftmost constituent with no start or join label above it) Check=NO does nothing Check=YES takes previous Join or Start action, and converts it into a completed constituent

12 Vt IN questioned about lawyer witness revolver

13 Start(S) Vt IN questioned about witness revolver lawyer

14 Start(S) Vt IN questioned about witness revolver lawyer Check=NO

15 Start(S) Start(VP) IN Vt about questioned witness revolver lawyer

16 Start(S) Start(VP) IN Vt about questioned witness revolver lawyer Check=NO

17 Start(S) Start(VP) Join(VP) IN Vt about questioned revolver lawyer witness

18 Start(S) Start(VP) Join(VP) IN Vt about questioned revolver lawyer witness Check=NO

19 Start(S) Start(VP) Join(VP) Start(PP) Vt IN questioned about revolver lawyer witness

20 Start(S) Start(VP) Join(VP) Start(PP) Vt IN questioned about revolver lawyer witness Check=NO

21 Start(S) Start(VP) Join(VP) Start(PP) Join(PP) Vt IN questioned about lawyer witness revolver

22 Start(S) Start(VP) Join(VP) PP Vt IN questioned about lawyer witness revolver Check=YES

23 Start(S) Start(VP) Join(VP) Join(VP) Vt PP questioned IN lawyer witness about revolver

24 Start(S) VP lawyer Vt questioned IN PP witness about revolver Check=YES

25 Start(S) Join(S) VP lawyer Vt PP questioned IN witness about revolver

26 S VP lawyer Vt PP questioned IN witness about revolver Check=YES

27 The Final Sequence of decisions hd 1 : : : d 2n i = h,, Vt,,, IN,,, Start(), Join(), Or, Start(), Join(), Or, Start(), Join(), Start(S), Check=NO, Start(VP), Check=NO, Join(VP), Check=NO, Start(PP), Check=NO, Join(PP), Check=YES, Join(VP), Check=YES, Join(S), Check=YES i

28 Step 1: represent a tree as a sequence of decisions d 1 : : : d m my A General Approach: (Conditional) History-Based Models T = hd 1 ; d 2 ; : : : d m i m is not necessarily length of sentence Step 2: probability of a tree is P (T j S) = P (d i j d 1 : : : d i 1 ; S) i=1 Step 3: Use a log-linear model to estimate P (d i j d 1 : : : d i 1 ; S) Step 4: Search?? (answer we ll get to later: beam or heuristic search)

29 1:::di 1;Si;di) W effi(hd e ffi(hd 1:::di 1;Si;d) W Pd2A Applying a Log-Linear Model Step 3: Use a log-linear model to estimate P (d i j d 1 : : : d i 1 ; S) A reminder: where: hd 1 : : : d i 1 ; Si d i ffi W A is history is outcome maps a history/outcome pair to a feature vector is a parameter vector is set of possible actions (may be context dependent) P (d i j d 1 : : : d i 1 ; S) =

30 Reminder: Implementing FEATURE VECTOR Intermediate step: strings map history/tag pair to set of feature Hispaniola/P quickly/rb became/vb an/ important/jj base/vt from which Spain expanded its empire into rest of Western Hemisphere. e.g., Ratnaparkhi s features: TAG=Vt;Word=base TAG=Vt;TAG-1=JJ TAG=Vt;TAG-1=JJ;TAG-2= TAG=Vt;SUFF1=e TAG=Vt;SUFF2=se TAG=Vt;SUFF3=ase TAG=Vt;WORD-1=important TAG=Vt;WORD+1=from

31 Reminder: Implementing FEATURE VECTOR Next step: match strings to integers through a hash table Hispaniola/P quickly/rb became/vb an/ important/jj base/vt from which Spain expanded its empire into rest of Western Hemisphere. e.g., Ratnaparkhi s features:!!!!!!!! TAG=Vt;Word=base 1315 TAG=Vt;TAG-1=JJ 17 TAG=Vt;TAG-1=JJ;TAG-2= TAG=Vt;SUFF1=e 459 TAG=Vt;SUFF2=se 1000 TAG=Vt;SUFF3=ase 1509 TAG=Vt;WORD-1=important 1806 TAG=Vt;WORD+1=from 300 In this case, sparse array is: A:length = 8; A(1:::8) = f1315; 17; 32908; 459; 1000; 1509; 1806; 300g

32 1:::di 1;Si;di) W effi(hd e ffi(hd 1:::di 1;Si;d) W Pd2A Applying a Log-Linear Model Step 3: Use a log-linear model to estimate P (d i j d 1 : : : d i 1 ; S) = The big question: how do we define ffi? Ratnaparkhi s method defines ffi differently depending on wher next decision is: A tagging decision (same features as before for POS tagging!) A chunking decision A start/join decision after chunking A check=no/check=yes decision

33 Layer 2: Chunks Start() Join() Or Start() Join() IN Vt about revolver lawyer questioned witness TAG=Join();Word0=witness;POS0= ) TAG=Join();POS0= TAG=Join();Word+1=about;POS+1=IN TAG=Join();POS+1=IN TAG=Join();Word+2=;POS+2= TAG=Join();POS+2=IN TAG=Join();Word-1=;POS-1=;TAG-1=Start() TAG=Join();POS-1=;TAG-1=Start() TAG=Join();TAG-1=Start() TAG=Join();Word-2=questioned;POS-2=Vt;TAG-2=Or : : :

34

35 Layer 3: Join or Start Looks at head word, constituent (or POS) label, and start/join annotation of n th tree relative to decision, n = where 2; 1 Looks at head word, constituent (or POS) label of n th tree relative to decision, n = 0; 1; 2 where Looks at bigram features of above for (-1,0) and (0,1) Looks at trigram features of above for (-2,-1,0), (-1,0,1) and (0, 1, 2) The above features with all combinations of head words excluded Various punctuation features

36 Layer 3: Check=NO or Check=YES A variety of questions concerning proposed constituent

37 The Search Problem In POS tagging, we could use Viterbi algorithm because P (T (j) j S; j; T (1) : : : T (j 1)) = P (T (j) j S; j; T (j 2) : : : T (j 1)) Now: Decision d i could depend on arbitrary decisions in ) past no chance for dynamic programming Instead, Ratnaparkhi uses a beam search method

38 Overview Ratnaparkhi s Maximum-Entropy Parser The EM Algorithm Part I

39 An Experiment/Some Intuition I have one coin in my pocket, Coin 0 has probability of heads I toss coin 10 times, and see following sequence: HHTTHHHTHH (7 heads out of 10) What would you guess to be?

40 An Experiment/Some Intuition I have three coins in my pocket, Coin 0 has probability of heads; Coin 1 has p probability 1 of heads; Coin 2 has p probability 2 of heads For each trial I do following: First I toss Coin 0 If Coin 0 turns up heads, Itosscoin 1 three times If Coin 0 turns up tails, Itosscoin 2 three times I don t tell you wher Coin 0 came up heads or tails, or wher Coin 1 or 2 was tossed three times, but I do tell you how many heads/tails are seen at each trial You see following sequence: hhhhi; httti; hhhhi; httti; hhhhi What would you estimate as values for ; p 1 and p 2?

41 We have a parameter vector We have a parameter space Ω Maximum Likelihood Estimation We have data points X 1 ; X 2 ; : : : X n drawn from some (finite or countable) X set We have a distribution P (X j ) for any 2 Ω, such that X P (X j ) = 1 and P (X j ) 0 for all X X2X We assume that our data points X 1 ; X 2 ; : : : X n are drawn at random (independently, identically distributed) from a P (X j distribution ) for some Λ 2 Ω Λ

42 Parameter space Ω = [0; 1] A First Example: Coin Tossing X = fh,tg. Our data points X 1 ; X 2 ; : : : X n are a sequence of heads and tails, e.g. HHTTHHHTHH Parameter vector is a single parameter, i.e., probability of coin coming up heads Distribution P (X j ) is defined as ( If X = H 1 If X = T P (X j ) =

43 We have a parameter vector, and a parameter space Ω nx ny Log-Likelihood We have data points X 1 ; X 2 ; : : : X n drawn from some (finite or countable) X set We have a distribution P (X j ) for any 2 Ω The likelihood is Likelihood( ) = P (X 1 ; X 2 ; : : : X n j ) = P (X i j ) i=1 The log-likelihood is L( ) = log Likelihood( ) = log P (X i j ) i=1

44 ML = argmax 2Ω L( ) = argmax 2Ω X Maximum Likelihood Estimation Given a sample X 1 ; X 2 ; : : : X n, choose log P (X i j ) i For example, take coin example: X say :::X n 1 Count(H) has heads, (n Count(H)) and tails ) Count(H) (1 ) n Count(H) L( ) = log = Count(H) log + (n Count(H)) log(1 ) We now have ML = Count(H) n

45 r for r 2 R is parameter for rule r A Second Example: Probabilistic Context-Free Grammars X is set of all parse trees generated by underlying context-free grammar. Our sample n is T trees : : : T n 1 such that T each 2 X i. R is set of rules in context free grammar is set of non-terminals in grammar N Let R(ff) ρ R be rules of form ff! fi for some fi The parameter space Ω is set of 2 [0; 1] jrj such that for all ff 2 N X r = 1 r2r(ff)

46 Count(T;r) r Count(T;r) log r We have P (T j ) = where Count(T;r) is number of times rule r is seen in tree T Y r2r ) log P (T j ) = X r2r

47 Count(T;r) log r Count(T i ; r) log r Maximum Likelihood Estimation for PCFGs We have X log P (T j ) = where Count(T;r) is number of times rule r is seen in tree T And, r2r X X X L( ) = log P (T i j ) = i i r2r Solving ML = argmax 2Ω L( ) gives P i Count(T i ; r) r = where r is of form ff! fi for some fi P i P Count(T i ; s) s2r(ff)

48 Models with Hidden Variables Now say we have two sets X and Y, and a joint distribution (X; Y j ) P If we had fully observed data, (X i ; Y i ) pairs, n L( ) = log P (X i ; Y i j ) X i If we have partially observed data, X i examples, n L( ) = log P (X i j ) X i = log P (X i ; Y j ) X X i Y 2Y

49 ML = argmax X The EM (Expectation Maximization) algorithm isamethod for finding log P (X i ; Y j ) X i Y 2Y

50 The Three Coins Example e.g., in three coins example: = fh,tg Y = fhhh,ttt,htt,thh,hht,tth,hth,thtg X = f ; p 1 ; p 2 g and where ρ If Y = H 1 If Y = T P (X; Y j ) = P (Y j )P (X j Y; ) P (Y j ) = and ρ p h 1 (1 p 1 ) t Y = If H Y = If T P (X j Y; ) = where h = number of heads in X, t = number of tails in X p h 2 (1 p 2)t

51 = 3 5 p 1 = 3 3 p 2 = 0 3 The Three Coins Example Fully observed data might look like: (hhhhi; H); (httti; T ); (hhhhi; H); (httti; T ); (hhhhi; H) In this case maximum likelihood estimates are:

52 The Three Coins Example Partially observed data might look like: hhhhi; httti; hhhhi; httti; hhhhi How do we find maximum likelihood parameters?

53 The EM Algorithm t is parameter vector at t th iteration Choose 0 (at random, or using various heuristics) Iterative procedure is defined as where t = argmax Q( ; t 1 ) Q( ; t 1 ) = P (Y j X i ; t 1 ) log P (X i ; Y j ) X X i Y 2Y

54 argmax X argmax X The EM Algorithm Iterative procedure is defined as t = argmax Q( ; t 1 ),where Q( ; t 1 ) = P (Y j X i ; t 1 ) log P (X i ;Y j ) X X i Key points: Y 2Y Intuition: fill in hidden Y variables according P (Y j X to ; ) i EM is guaranteed to converge to a local maximum, or saddle-point, of likelihood function In general, if has a simple (analytic) solution, n i log P (X i ;Y i j ) P (Y j X i ; ) log P (X i ;Y j ) X also has a simple (analytic) solution. i Y

55 Say X = hhhhi, current parameters are ; p 1 ; p 2 = p (1 )p3 2 p 3 1 p (1 )p3 2 The Three Coins Example Partially observed data might look like: hhhhi; httti; hhhhi; httti; hhhhi P (hhhhi) = P (hhhhi; H) + P (hhhhi; T ) and P (hhhhi; H) P (Y = H j hhhhi) = P (hhhhi; H) + P (hhhhi; T ) =

56 The Three Coins Example After filling in hidden variables for each example, partially observed data might look like: (hhhhi; H) P (Y = H j HHH) = 0:6 (hhhhi; T ) P (Y = T j HHH) = 0:4 (httti; H) P (Y = H j TTT) = 0:3 (httti; T ) P (Y = T j TTT) = 0:7 (hhhhi; H) P (Y = H j HHH) = 0:6 (hhhhi; T ) P (Y = T j HHH) = 0:4 (httti; H) P (Y = H j TTT) = 0:3 (httti; T ) P (Y = T j TTT) = 0:7 (hhhhi; H) P (Y = H j HHH) = 0:6 (hhhhi; T ) P (Y = T j HHH) = 0:4

57 EM for Probabilistic Context-Free Grammars A PCFG defines a distribution P (S; T j ) over tree/sentence (S; T ) pairs If we had tree/sentence pairs (fully observed data)n L( ) = log P (S i ; T i j ) X i Say we have sentences only, S 1 : : : S n trees are hidden variables ) L( ) = log P (S i ; T j ) X X i T

58 EM for Probabilistic Context-Free Grammars Say we have sentences only, S 1 : : : S n trees are hidden variables ) L( ) = log P (S i ; T j ) X X i T EM algorithm is n t = argmax Q( ; t 1 ), where Q( ; t 1 ) = P (T j S i ; t 1 ) log P (S i ; T j ) X X i T

59 Count(S i ; T;r) log r Count(S i ;r) log r Count(S i ;T;r) log r Remember: X log P (S i ; T j ) = where Count(S; T; r) is number of times rule r is seen in sentence/tree pair (S; T ) r2r X X ) Q( ; t 1 ) = P (T j S i ; t 1 ) log P (S i ;T j ) i T X X X = P (T j S i ; t 1 ) i T r2r X X = where Count(S i ;r) = P T P (T j S i; t 1 )Count(S i ;T;r) expected counts i r2r

60 Solving ML = argmax 2Ω L( ) gives P i Count(S i ; r) r = where r is of form ff! fi for some fi P i P Count(S i ; s) s2r(ff) We ll see next week that re are efficient (dynamic programming) algorithms for computation of Count(S i ; r) = P (T j S i ; t 1 )Count(S i ; T;r) X T

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm 6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm Overview The EM algorithm in general form The EM algorithm for hidden markov models (brute force) The EM algorithm for hidden markov models (dynamic

More information

Part-of-Speech Tagging (Fall 2007): Lecture 7 Tagging. Named Entity Recognition. Overview

Part-of-Speech Tagging (Fall 2007): Lecture 7 Tagging. Named Entity Recognition. Overview Part-of-Speech Tagging IUT: Profits soared at Boeing Co., easily topping forecasts on Wall Street, as ir CEO Alan Mulally announced first quarter results. 6.864 (Fall 2007): Lecture 7 Tagging OUTPUT: Profits/N

More information

A Context-Free Grammar

A Context-Free Grammar Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily

More information

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees

Recap: Lexicalized PCFGs (Fall 2007): Lecture 5 Parsing and Syntax III. Recap: Charniak s Model. Recap: Adding Head Words/Tags to Trees Recap: Lexicalized PCFGs We now need to estimate rule probabilities such as P rob(s(questioned,vt) NP(lawyer,NN) VP(questioned,Vt) S(questioned,Vt)) 6.864 (Fall 2007): Lecture 5 Parsing and Syntax III

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Tagging Problems. a b e e a f h j a/c b/d e/c e/c a/d f/c h/d j/c (Fall 2006): Lecture 7 Tagging and History-Based Models

Tagging Problems. a b e e a f h j a/c b/d e/c e/c a/d f/c h/d j/c (Fall 2006): Lecture 7 Tagging and History-Based Models Tagging Problems Mapping strings to Tagged Sequences 6.864 (Fall 2006): Lecture 7 Tagging and History-Based Models a b e e a f h j a/c b/d e/c e/c a/d f/c h/d j/c 1 3 Overview The Tagging Problem Hidden

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Word Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks. Michael Collins, Columbia University

Word Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks. Michael Collins, Columbia University Word Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks Michael Collins, Columbia University Overview Introduction Multi-layer feedforward networks Representing

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability

More information

6.891: Lecture 16 (November 2nd, 2003) Global Linear Models: Part III

6.891: Lecture 16 (November 2nd, 2003) Global Linear Models: Part III 6.891: Lecture 16 (November 2nd, 2003) Global Linear Models: Part III Overview Recap: global linear models, and boosting Log-linear models for parameter estimation An application: LFG parsing Global and

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7 Quiz 1, COMS 4705 Name: 10 30 30 20 Good luck! 4705 Quiz 1 page 1 of 7 Part #1 (10 points) Question 1 (10 points) We define a PCFG where non-terminal symbols are {S,, B}, the terminal symbols are {a, b},

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

Feedforward Neural Networks. Michael Collins, Columbia University

Feedforward Neural Networks. Michael Collins, Columbia University Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Maximum Entropy Models I Welcome back for the 3rd module

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Unit 1: Sequence Models

Unit 1: Sequence Models CS 562: Empirical Methods in Natural Language Processing Unit 1: Sequence Models Lecture 5: Probabilities and Estimations Lecture 6: Weighted Finite-State Machines Week 3 -- Sep 8 & 10, 2009 Liang Huang

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences

More information

Features of Statistical Parsers

Features of Statistical Parsers Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical

More information

ACS Introduction to NLP Lecture 3: Language Modelling and Smoothing

ACS Introduction to NLP Lecture 3: Language Modelling and Smoothing ACS Introduction to NLP Lecture 3: Language Modelling and Smoothing Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk Language Modelling 2 A language model is a probability

More information

6.891: Lecture 24 (December 8th, 2003) Kernel Methods

6.891: Lecture 24 (December 8th, 2003) Kernel Methods 6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.

More information

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars LING 473: Day 10 START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars 1 Issues with Projects 1. *.sh files must have #!/bin/sh at the top (to run on Condor) 2. If run.sh is supposed

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9 CSCI 5832 Natural Language Processing Jim Martin Lecture 9 1 Today 2/19 Review HMMs for POS tagging Entropy intuition Statistical Sequence classifiers HMMs MaxEnt MEMMs 2 Statistical Sequence Classification

More information

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Spring 2017 Unit 1: Sequence Models Lecture 4a: Probabilities and Estimations Lecture 4b: Weighted Finite-State Machines required optional Liang Huang Probabilities experiment

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

Graphical models for part of speech tagging

Graphical models for part of speech tagging Indian Institute of Technology, Bombay and Research Division, India Research Lab Graphical models for part of speech tagging Different Models for POS tagging HMM Maximum Entropy Markov Models Conditional

More information

Maxent Models and Discriminative Estimation

Maxent Models and Discriminative Estimation Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

This lecture covers Chapter 5 of HMU: Context-free Grammars

This lecture covers Chapter 5 of HMU: Context-free Grammars This lecture covers Chapter 5 of HMU: Context-free rammars (Context-free) rammars (Leftmost and Rightmost) Derivations Parse Trees An quivalence between Derivations and Parse Trees Ambiguity in rammars

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

Processing/Speech, NLP and the Web

Processing/Speech, NLP and the Web CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Conditional Probability

Conditional Probability Conditional Probability Idea have performed a chance experiment but don t know the outcome (ω), but have some partial information (event A) about ω. Question: given this partial information what s the

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 8: Sequence Labeling Jimmy Lin University of Maryland Thursday, March 14, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Statistical methods in NLP, lecture 7 Tagging and parsing

Statistical methods in NLP, lecture 7 Tagging and parsing Statistical methods in NLP, lecture 7 Tagging and parsing Richard Johansson February 25, 2014 overview of today's lecture HMM tagging recap assignment 3 PCFG recap dependency parsing VG assignment 1 overview

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Statistical NLP: Hidden Markov Models. Updated 12/15

Statistical NLP: Hidden Markov Models. Updated 12/15 Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first

More information

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP Recap: Language models Foundations of atural Language Processing Lecture 4 Language Models: Evaluation and Smoothing Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipp

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Language and Statistics II

Language and Statistics II Language and Statistics II Lecture 19: EM for Models of Structure Noah Smith Epectation-Maimization E step: i,, q i # p r $ t = p r i % ' $ t i, p r $ t i,' soft assignment or voting M step: r t +1 # argma

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition

More information

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018. Recap: HMM ANLP Lecture 9: Algorithms for HMMs Sharon Goldwater 4 Oct 2018 Elements of HMM: Set of states (tags) Output alphabet (word types) Start state (beginning of sentence) State transition probabilities

More information

Language Technology. Unit 1: Sequence Models. CUNY Graduate Center. Lecture 4a: Probabilities and Estimations

Language Technology. Unit 1: Sequence Models. CUNY Graduate Center. Lecture 4a: Probabilities and Estimations Language Technology CUNY Graduate Center Unit 1: Sequence Models Lecture 4a: Probabilities and Estimations Lecture 4b: Weighted Finite-State Machines required hard optional Liang Huang Probabilities experiment

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Lecture 12: EM Algorithm

Lecture 12: EM Algorithm Lecture 12: EM Algorithm Kai-Wei hang S @ University of Virginia kw@kwchang.net ouse webpage: http://kwchang.net/teaching/nlp16 S6501 Natural Language Processing 1 Three basic problems for MMs v Likelihood

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Probabilistic Context Free Grammars

Probabilistic Context Free Grammars 1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

CS : Speech, NLP and the Web/Topics in AI

CS : Speech, NLP and the Web/Topics in AI CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; insideoutside probabilities Probability of a parse tree (cont.) S 1,l NP 1,2

More information

Multilevel Coarse-to-Fine PCFG Parsing

Multilevel Coarse-to-Fine PCFG Parsing Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa

More information

Expectation Maximisation (EM) CS 486/686: Introduction to Artificial Intelligence University of Waterloo

Expectation Maximisation (EM) CS 486/686: Introduction to Artificial Intelligence University of Waterloo Expectation Maximisation (EM) CS 486/686: Introduction to Artificial Intelligence University of Waterloo 1 Incomplete Data So far we have seen problems where - Values of all attributes are known - Learning

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework

More information

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite Notes on the Inside-Outside Algorithm To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It is natural to

More information

Discrete Structures for Computer Science

Discrete Structures for Computer Science Discrete Structures for Computer Science William Garrison bill@cs.pitt.edu 6311 Sennott Square Lecture #24: Probability Theory Based on materials developed by Dr. Adam Lee Not all events are equally likely

More information

Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM

Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM Alex Lascarides (Slides from Alex Lascarides and Sharon Goldwater) 2 February 2019 Alex Lascarides FNLP Lecture

More information

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be

More information

Markov Chains and Hidden Markov Models. = stochastic, generative models

Markov Chains and Hidden Markov Models. = stochastic, generative models Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

MAT Mathematics in Today's World

MAT Mathematics in Today's World MAT 1000 Mathematics in Today's World Last Time We discussed the four rules that govern probabilities: 1. Probabilities are numbers between 0 and 1 2. The probability an event does not occur is 1 minus

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

A Brief Review of Probability, Bayesian Statistics, and Information Theory

A Brief Review of Probability, Bayesian Statistics, and Information Theory A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system

More information

The Language Modeling Problem (Fall 2007) Smoothed Estimation, and Language Modeling. The Language Modeling Problem (Continued) Overview

The Language Modeling Problem (Fall 2007) Smoothed Estimation, and Language Modeling. The Language Modeling Problem (Continued) Overview The Language Modeling Problem We have some (finite) vocabulary, say V = {the, a, man, telescope, Beckham, two, } 6.864 (Fall 2007) Smoothed Estimation, and Language Modeling We have an (infinite) set of

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

2/3/04. Syllabus. Probability Lecture #2. Grading. Probability Theory. Events and Event Spaces. Experiments and Sample Spaces

2/3/04. Syllabus. Probability Lecture #2. Grading. Probability Theory. Events and Event Spaces. Experiments and Sample Spaces Probability Lecture #2 Introduction to Natural Language Processing CMPSCI 585, Spring 2004 University of Massachusetts Amherst Andrew McCallum Syllabus Probability and Information Theory Spam filtering

More information

Random Variable. Pr(X = a) = Pr(s)

Random Variable. Pr(X = a) = Pr(s) Random Variable Definition A random variable X on a sample space Ω is a real-valued function on Ω; that is, X : Ω R. A discrete random variable is a random variable that takes on only a finite or countably

More information

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events Discrete Structures II (Summer 2018) Rutgers University Instructor: Abhishek

More information

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21

More information

Formalizing Probability. Choosing the Sample Space. Probability Measures

Formalizing Probability. Choosing the Sample Space. Probability Measures Formalizing Probability Choosing the Sample Space What do we assign probability to? Intuitively, we assign them to possible events (things that might happen, outcomes of an experiment) Formally, we take

More information

Statistical methods for NLP Estimation

Statistical methods for NLP Estimation Statistical methods for NLP Estimation UNIVERSITY OF Richard Johansson January 29, 2015 why does the teacher care so much about the coin-tossing experiment? because it can model many situations: I pick

More information

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction Hidden Markov Models (HMMs) for Information Extraction Daniel S. Weld CSE 454 Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) standard sequence model in genomics, speech, NLP, What

More information

Conditional Probability (cont...) 10/06/2005

Conditional Probability (cont...) 10/06/2005 Conditional Probability (cont...) 10/06/2005 Independent Events Two events E and F are independent if both E and F have positive probability and if P (E F ) = P (E), and P (F E) = P (F ). 1 Theorem. If

More information

6.02 Fall 2012 Lecture #1

6.02 Fall 2012 Lecture #1 6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture

More information

Web Search and Text Mining. Lecture 23: Hidden Markov Models

Web Search and Text Mining. Lecture 23: Hidden Markov Models Web Search and Text Mining Lecture 23: Hidden Markov Models Markov Models Observable states: 1, 2,..., N (discrete states) Observed sequence: X 1, X 2,..., X T (discrete time) Markov assumption, P (X t

More information