Language Processing with Perl and Prolog

Similar documents
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Lecture 12: Algorithms for HMMs

HMM and Part of Speech Tagging. Adam Meyers New York University

Lecture 12: Algorithms for HMMs

Part-of-Speech Tagging

Part-of-Speech Tagging + Neural Networks CS 287

Statistical methods in NLP, lecture 7 Tagging and parsing

The Noisy Channel Model and Markov Models

Midterm sample questions

Sequence Labeling: HMMs & Structured Perceptron

Lecture 9: Hidden Markov Model

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

10/17/04. Today s Main Points

Probabilistic Context Free Grammars. Many slides from Michael Collins

Part-of-Speech Tagging

TnT Part of Speech Tagger

Sequential Data Modeling - The Structured Perceptron

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

LECTURER: BURCU CAN Spring

Probabilistic Context-free Grammars

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Lecture 13: Structured Prediction

Hidden Markov Models (HMMs)

Statistical Methods for NLP

Text Mining. March 3, March 3, / 49

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

Lecture 7: Sequence Labeling

Hidden Markov Models

Advanced Natural Language Processing Syntactic Parsing

Language Processing with Perl and Prolog

CS388: Natural Language Processing Lecture 4: Sequence Models I

Posterior vs. Parameter Sparsity in Latent Variable Models Supplementary Material

CS838-1 Advanced NLP: Hidden Markov Models

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8 POS tagset) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan, 2012

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Statistical Methods for NLP

CS460/626 : Natural Language

Fun with weighted FSTs

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Introduction to Probablistic Natural Language Processing

Processing/Speech, NLP and the Web

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Extracting Information from Text

Hidden Markov Models

NLP Programming Tutorial 11 - The Structured Perceptron

Hidden Markov Models

Lecture 6: Part-of-speech tagging

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Lecture 4: Smoothing, Part-of-Speech Tagging. Ivan Titov Institute for Logic, Language and Computation Universiteit van Amsterdam

Today s Agenda. Need to cover lots of background material. Now on to the Map Reduce stuff. Rough conceptual sketch of unsupervised training using EM

Multiword Expression Identification with Tree Substitution Grammars

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars

Parsing with Context-Free Grammars

Lecture 3: ASR: HMMs, Forward, Viterbi

SYNTHER A NEW M-GRAM POS TAGGER

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Boosting Applied to Tagging and PP Attachment

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

CS 545 Lecture XVI: Parsing

with Local Dependencies

An introduction to PRISM and its applications

ACS Introduction to NLP Lecture 3: Language Modelling and Smoothing

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

Applied Natural Language Processing

Hidden Markov Models in Language Processing

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Latent Variable Models in NLP

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

Graphical models for part of speech tagging

Natural Language Processing

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Collapsed Variational Bayesian Inference for Hidden Markov Models

Probabilistic Graphical Models

Features of Statistical Parsers

A Context-Free Grammar


Artificial Intelligence EDAF70

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning

DT2118 Speech and Speaker Recognition

A brief introduction to Conditional Random Fields

The SUBTLE NL Parsing Pipeline: A Complete Parser for English Mitch Marcus University of Pennsylvania

Natural Language Processing Winter 2013

POS-Tagging. Fabian M. Suchanek

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing

Natural Language Processing

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

{ Jurafsky & Martin Ch. 6:! 6.6 incl.

Sequences and Information

Maximum Entropy and Log-linear Models

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

Degree in Mathematics

Lecture 11: Viterbi and Forward Algorithms

Maxent Models and Discriminative Estimation

Transcription:

Language Processing with Perl and Prolog es Pierre Nugues Lund University Pierre.Nugues@cs.lth.se http://cs.lth.se/pierre_nugues/ Pierre Nugues Language Processing with Perl and Prolog 1 / 12

Training Set Part-of-speech taggers use a training set where every word is hand-annotated (Penn Treebank and CoNLL 2008). Index Word Hand annotation Index Word Hand annotation 1 Battle JJ 19 of IN 2 - HYPH 20 their PRP$ 3 tested JJ 21 countrymen NNS 4 Japanese JJ 22 to TO 5 industrial JJ 23 visit VB 6 managers NNS 24 Mexico NNP 7 here RB 25,, 8 always RB 26 a DT 9 buck VBP 27 boatload NN 10 up RP 28 of IN 11 nervous JJ 29 samurai FW 12 newcomers NNS 30 warriors NNS 13 with IN 31 blown VBN 14 the DT 32 ashore RB 15 tale NN 33 375 CD 16 of IN 34 years NNS 17 the DT 35 ago RB 18 first JJ 36.. Pierre Nugues Language Processing with Perl and Prolog 2 / 12

Part-of-Speech Tagging with Linear Classifiers Linear classifiers are efficient devices to carry out part-of-speech tagging: 1 The lexical values are the input data to the tagger. 2 The parts of speech are assigned from left to right by the tagger. Given the feature vector: (w i 2,w i 1,w i,w i+1,w i+2,t i 2,t i 1 ), the classifier will predict the part-of-speech tag t i at index i. ID FORM PPOS BOS BOS Padding BOS BOS 1 Battle NN 2 - HYPH 3 tested NN......... 17 the DT 18 first JJ 19 of IN 20 their PRP$ 21 countrymen NNS Input features 22 to TO 23 visit VB Predicted tag 24 Mexico 25, 26 a 27 boatload......... 34 years 35 ago 36. EOS Padding EOS Pierre Nugues Language Processing with Perl and Prolog 3 / 12

Feature Vectors ID Feature vectors PPOS w i 2 w i 1 w i w i+1 w i+2 t i 2 t i 1 1 BOS BOS Battle - tested BOS BOS NN 2 BOS Battle - tested Japanese BOS NN HYPH 3 Battle - tested Japanese industrial NN HYPH JJ........................... 19 the first of their countrymen DT JJ IN 20 first of their countrymen to JJ IN PRP$ 21 of their countrymen to visit IN PRP$ NNS 22 their countrymen to visit Mexico PRP$ NNS TO 23 countrymen to visit Mexico, NNS TO VB 24 to visit Mexico, a TO VB NNP 25 visit Mexico, a boatload VB NNP,........................... 34 ashore 375 years ago. RB CD NNS 35 375 years ago. EOS CD NNS RB 36 years ago. EOS EOS NNS RB. Pierre Nugues Language Processing with Perl and Prolog 4 / 12

POS Annotation with the Noisy Channel Model Modeling the problem: t 1,t 2,t 3,...,t n noisy channel w 1,w 2,w 3,...,w n. The optimal part of speech sequence is ˆT = argmax t 1,t 2,t 3,...,t n P(t 1,t 2,t 3,...,t n w 1,w 2,w 3,...,w n ), The Bayes rule on conditional probabilities: P(A B)P(B) = P(B A)P(A). ˆT = arg maxp(t )P(W T ). T P(T ) and P(W T ) are simplified and estimated on hand-annotated corpora, the gold standard. Pierre Nugues Language Processing with Perl and Prolog 5 / 12

The First Term: N-Gram Approximation P(T ) = P(t 1,t 2,t 3,...,t n ) P(t 1 )P(t 2 t 1 ) n i=3 P(t i t i 2,t i 1 ). If we use a start-of-sentence delimiter <s>, the two first terms of the product, P(t 1 )P(t 2 t 1 ), are rewritten as P(< s >)P(t 1 < s >)P(t 2 < s >,t 1 ), where P(< s >) = 1. We estimate the probabilities with the maximum likelihood, P MLE : P MLE (t i t i 2,t i 1 ) = C(t i 2,t i 1,t i ) C(t i 2,t i 1 ). Pierre Nugues Language Processing with Perl and Prolog 6 / 12

Sparse Data If N p is the number of the different parts-of-speech tags, there are N p N p N p values to estimate. If data is missing, we can back off to bigrams: Or to unigrams: P(T ) = P(t 1,t 2,t 3,...,t n ) P(t 1 ) P(T ) = P(t 1,t 2,t 3,...,t n ) n i=2 n i=1 P(t i t i 1 ). P(t i ). And finally, we can combine linearly these approximations: P LinearInter (t i t i 2 t i 1 ) = λ 1 P(t i t i 2 t i 1 ) + λ 2 P(t i t i 1 ) + λ 3 P(t i ), with λ 1 + λ 2 + λ 3 = 1, for example, λ 1 = 0.6, λ 2 = 0.3, λ 3 = 0.1. Pierre Nugues Language Processing with Perl and Prolog 7 / 12

The Second Term The complete word sequence knowing the part-of-speech sequence is usually approximated as: P(W T ) = P(w 1,w 2,w 3,...,w n t 1,t 2,t 3,...,t n ) n i=1 P(w i t i ). Like the previous probabilities, P(w i t i ) is estimated from hand-annotated corpora using the maximum likelihood: P MLE (w i t i ) = C(w i,t i ). C(t i ) For N w different words, there are N p N w values to obtain. But in this case, many of the estimates will be 0. Pierre Nugues Language Processing with Perl and Prolog 8 / 12

An Example Je le donne I give it le/art donne/verb je/pro le/pro donne/noun 1 P(pro /0) P(art /0,pro) P(verb pro,art) P(je pro) P(le art) P(donne verb) 2 P(pro /0) P(art /0,pro) P(noun pro,art) P(je pro) P(le art) P(donne noun) 3 P(pro /0) P(pro /0,pro) P(verb pro,pro) P(je pro) P(le pro) P(donne verb) 4 P(pro /0) P(pro /0,pro) P(noun pro,pro) P(je pro) P(le pro) P(donne noun) Pierre Nugues Language Processing with Perl and Prolog 9 / 12

Viterbi (Informal) Je le donne demain dans la matinée I give it tomorrow in the morning le/art donne/verb la/art je/pro demain/adv dans/prep matinée/noun le/pro donne/noun la/pro Pierre Nugues Language Processing with Perl and Prolog 10 / 12

Viterbi (Informal) The term brought by the word demain has still the memory of the ambiguity of donne: P(adv verb) P(demain adv) and P(adv noun) P(demain adv). This is no longer the case with dans. According to the noisy channel model and the bigram assumption, the term brought by the word dans is P(dans prep) P(prep adv). It does not show the ambiguity of le and donne. The subsequent terms will ignore it as well. We can discard the corresponding paths. The optimal path does not contain nonoptimal subpaths. Pierre Nugues Language Processing with Perl and Prolog 11 / 12

Supervised Learning: A Summary Needs a manually annotated corpus called the Gold Standard The Gold Standard may contain errors (errare humanum est) that we ignore A classifier is trained on a part of the corpus, the training set, and evaluated on another part, the test set, where automatic annotation is compared with the Gold Standard N-fold cross validation is used avoid the influence of a particular division Some algorithms may require additional optimization on a development set Classifiers can use statistical or symbolic methods Pierre Nugues Language Processing with Perl and Prolog 12 / 12