CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8 POS tagset) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan, 2012

Size: px

Start display at page:

Download "CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8 POS tagset) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan, 2012"

Joanna Oliver
5 years ago
Views:

1 CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8 POS tagset) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan, 2012

2 HMM: Three Problems Problem Problem 1: Likelihood of a sequence Forward Procedure Backward Procedure Problem 2: Best state sequence Viterbi Algorithm Problem 3: Re-estimation Baum-Welch ( Forward- Backward Algorithm ) CRF Algorithm Parsing Morph Analysis HMM MEMM Semantics Part of Speech Tagging Hindi Marathi POS tagging English NLP Trinity French Language

3 Tagged Corpora ^_^ _ The_DT guys_nns that_wdt make_vbp traditional_jj hardware_nn are_vbp really_rb being_vbg obsoleted_vbn by_in microprocessorbased_jj machines_nns,_, _ said_vbd Mr._NNP Benton_NNP._.$_$

4 For Hindi Rama achhaa gaata hai. (hai is VAUX : Auxiliary verb) ; Ram sings well Rama achha ladakaa hai. (hai is VCOP : Copula verb) ; Ram is a good boy

5 Example of difficulty in POS tagging

6 Tags Content Word Function Word Noun Adjective Verb Tags Prepositi Pronoun on Conjunctio n Proper Noun NNP (for NER) Common Noun NN NNS VBP VBD VBG VBN Injection

7 Difficulty in POS Tagging Consider the following sentences: र म अ छ ग त ह _VAUX (auxiliary verb) Ram good sing is : Ram sings well GNPTAM for ग त only : Male, Singular,??,??,??,- GNPTAM for ग त ह : Male, Singular, 2 nd or 3 rd, Present, Default, Declarative र म अ छ लड़क ह _VCOP (copular verb) Ram good boy is : Ram is a good boy In general, VAUX, VM (main verb) and VCOP cannot be separated easily

8 Difficulty in POS Tagging To POS Tag based on Rules, one simple rule could be: ह Preceded by verb Preceded by nominal VAUX VCOP Facilitates co-reference स म न धकरण This is a High Precision, Low Recall rule, i.e. when it says Yes is indeed Yes but a No may not actually be No

9 Exceptions to the previous rule False Negative for VAUX Particle Injection (Particles: भ -Bhi, त -To, ह -Hi, नह -Nahi) Consider the following sentences: र म ग त त अ छ ह, पर... र म अ छ ह _VCOP र म त ग त अ छ ह _VAUX POS TAGs of ह vary here despite the preceding word being an adjective

10 Evaluation of POS Tag Accuracy Precision, Recall and F-Score Given G (what our system returns) False Positive Agreement Ideal I (Actual Tags) False Negative Precision P= G I / I Recall R= G I / I F-Score = 2PR/(P+R)

11 POS tag computation (1/2) Best tag sequence = T* = argmax P(T W) = argmax P(T)P(W T) (by Baye s Theorem) P(T) = P(t 0 =^ t 1 t 2 t n+1 =.) = P(t 0 )P(t 1 t 0 )P(t 2 t 1 t 0 )P(t 3 t 2 t 1 t 0 ) P(t n t n-1 t n-2 t 0 )P(t n+1 t n t n-1 t 0 ) = P(t 0 )P(t 1 t 0 )P(t 2 t 1 ) P(t n t n-1 )P(t n+1 t n ) N+1 = P(t i t i-1 ) Bigram Assumption i = 0

12 POS tag computation (2/2) P(W T) = P(w 0 t 0 -t n+1 )P(w 1 w 0 t 0 -t n+1 )P(w 2 w 1 w 0 t 0 -t n+1 ) P(w n w 0 -w n-1 t 0 -t n+1 )P(w n+1 w 0 -w n t 0 -t n+1 ) Assumption: A word is determined completely by its tag. This is inspired by speech recognition = P(w o t o )P(w 1 t 1 ) P(w n+1 t n+1 ) n+1 = P(w i t i ) i = 0 n+1 i = 1 = P(w i t i ) (Lexical Probability Assumption)

13 Example People jump high. People : Noun/Verb jump : Noun/Verb high : Noun/Adjective We can start with probabilities.

14 VM VM JJ ^ $ N N N ^ Peopl e Jump High $ Trellis diagram 8 POS TAG sequences are possible, given these valid tags for each word taken from dictionary

15 Bigram Assumption Best tag sequence = T* = argmax P(T W) = argmax P(T)P(W T) (by Baye s Theorem) P(T) = P(t 0 =^ t 1 t 2 t n+1 =.) = P(t 0 )P(t 1 t 0 )P(t 2 t 1 t 0 )P(t 3 t 2 t 1 t 0 ) P(t n t n-1 t n-2 t 0 )P(t n+1 t n t n-1 t 0 ) = P(t 0 )P(t 1 t 0 )P(t 2 t 1 ) P(t n t n-1 )P(t n+1 t n ) N+1 = P(t i t i-1 ) Bigram Assumption i = 0

16 Lexical Probability Assumption P(W T) = P(w 0 t 0 -t n+1 )P(w 1 w 0 t 0 -t n+1 )P(w 2 w 1 w 0 t 0 -t n+1 ) P(w n w 0 -w n-1 t 0 -t n+1 )P(w n+1 w 0 -w n t 0 -t n+1 ) Assumption: A word is determined completely by its tag. This is inspired by speech recognition = P(w o t o )P(w 1 t 1 ) P(w n+1 t n+1 ) n+1 = P(w i t i ) i = 0 n+1 i = 1 = P(w i t i ) (Lexical Probability Assumption)

17 Calculation from actual data Corpus ^ Ram got many NLP books. He found them all very interesting. Pos Tagged ^ N V A N N. N V N A R A.

18 Recording numbers ^ N V A R. ^ N V A R

19 Probabilities ^ N V A R. ^ N 0 1/5 2/5 1/5 0 1/5 V 0 1/2 0 1/2 0 0 A 0 1/ /3 1/3 R

20 Penn tagset (1/2)

21 Penn tagset (2/2)

22 Indian Language Tagset: Noun

23 Indian Language Tagset: Pronoun

24 Indian Language Tagset: Quantifier

25 Indian Language Tagset: Demonstrative 3 Demonstrative DM DM Vaha, jo, yaha, 3.1 Deictic DMD DM DMD Vaha, yaha 3.2 Relative DMR DM DMR jo, jis 3.3 Wh-word DMQ DM DMQ kis, kaun Indefinite DMI DM DMI KoI, kis

26 Indian Language Tagset: Verb, Adjective, Adverb

27 Indian Language Tagset: Postposition, conjunction

28 Indian Language Tagset: Particle

29 Indian Language Tagset: Residuals

CS838-1 Advanced NLP: Hidden Markov Models

CS838-1 Advanced NLP: Hidden Markov Models Xiaojin Zhu 2007 Send comments to jerryzhu@cs.wisc.edu 1 Part of Speech Tagging Tag each word in a sentence with its part-of-speech, e.g., The/AT representative/nn