Statistical methods in NLP, lecture 7 Tagging and parsing

Similar documents
Lecture 12: Algorithms for HMMs

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Lecture 12: Algorithms for HMMs

HMM and Part of Speech Tagging. Adam Meyers New York University

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Lecture 13: Structured Prediction

Part-of-Speech Tagging

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Statistical Methods for NLP

Lecture 9: Hidden Markov Model

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

Sequence Labeling: HMMs & Structured Perceptron

Part-of-Speech Tagging

Fun with weighted FSTs

Hidden Markov Models

CS838-1 Advanced NLP: Hidden Markov Models

Hidden Markov Models (HMMs)

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Lecture 7: Sequence Labeling

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

IN FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning

TnT Part of Speech Tagger

Statistical methods in NLP Introduction

Lecture 4: Smoothing, Part-of-Speech Tagging. Ivan Titov Institute for Logic, Language and Computation Universiteit van Amsterdam

10/17/04. Today s Main Points

Machine Learning for natural language processing

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

CS388: Natural Language Processing Lecture 4: Sequence Models I

Probabilistic Context Free Grammars. Many slides from Michael Collins

ACS Introduction to NLP Lecture 3: Language Modelling and Smoothing

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Language Processing with Perl and Prolog

Sequential Data Modeling - The Structured Perceptron

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars

Midterm sample questions

NLP Programming Tutorial 11 - The Structured Perceptron

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

LECTURER: BURCU CAN Spring

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Dynamic Programming: Hidden Markov Models

CS460/626 : Natural Language

CSE 490 U Natural Language Processing Spring 2016

Lecture 12: EM Algorithm

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

Natural Language Processing

Statistical Methods for NLP

Lecture 6: Part-of-speech tagging

Natural Language Processing

Collapsed Variational Bayesian Inference for Hidden Markov Models

Hidden Markov Models

A gentle introduction to Hidden Markov Models

Natural Language Processing

The Noisy Channel Model and Markov Models

8: Hidden Markov Models

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Natural Language Processing

Natural Language Processing Winter 2013

Probabilistic Graphical Models

CSE 447/547 Natural Language Processing Winter 2018

CS460/626 : Natural Language

Parsing with Context-Free Grammars

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Natural Language Processing SoSe Words and Language Model

A brief introduction to Conditional Random Fields

LECTURER: BURCU CAN Spring

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8 POS tagset) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan, 2012

Statistical Methods for NLP

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Graphical models for part of speech tagging

CSE 447 / 547 Natural Language Processing Winter 2018

Advanced Natural Language Processing Syntactic Parsing

Hidden Markov Models in Language Processing

Today s Agenda. Need to cover lots of background material. Now on to the Map Reduce stuff. Rough conceptual sketch of unsupervised training using EM

Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)

Maxent Models and Discriminative Estimation

NLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc)

Probabilistic Context-free Grammars

CS 188: Artificial Intelligence Fall 2011

Variational Decoding for Statistical Machine Translation

Text Mining. March 3, March 3, / 49

Statistical methods for NLP Estimation

Sequences and Information

Log-Linear Models, MEMMs, and CRFs

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

Spectral Unsupervised Parsing with Additive Tree Metrics

Dept. of Linguistics, Indiana University Fall 2009

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Extracting Information from Text

Processing/Speech, NLP and the Web

Natural Language Processing. Statistical Inference: n-grams

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Language and Statistics II

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Feature Engineering. Knowledge Discovery and Data Mining 1. Roman Kern. ISDS, TU Graz

CS : Speech, NLP and the Web/Topics in AI

Transcription:

Statistical methods in NLP, lecture 7 Tagging and parsing Richard Johansson February 25, 2014

overview of today's lecture HMM tagging recap assignment 3 PCFG recap dependency parsing VG assignment 1

overview HMM tagging recap PCFGs, dependency parsing, and VG assignment 1 the next few weeks

statistical taggers the typical probabilistic formulation of a tagger starts from Bayes' rule: P(W T )P(T ) arg max P(T W ) = arg max T T P(W ) = arg max P(W T )P(T ) T P(T ) is like a language model, but for tag sequences instead of word sequences

making the probabilities practical we need to make assumptions about P(T ) and P(W T ) in a bigram tagger, the probability of the next tag depends only on the previous tag (Markov assumption): P(t n t 1,..., t n 1) P(t n t n 1) this is called the transition probability the probability of a word depends only on its tag: P(w n tags, other words) P(w n t n ) this is called the emission probability

hidden Markov models P(t n t n 1) P(w n t n ) a model where we have an unknown underlying sequence is called a hidden Markov model (HMM)

generative story in hidden Markov models

generative story in hidden Markov models

generative story in hidden Markov models

generative story in hidden Markov models

generative story in hidden Markov models

how can we estimate the probabilities? to estimate P(t n t n 1) and P(w n t n ), we need a corpus where the part-of-speech tags have been annotated (by humans) The DT rifles NNS were VBD n't RB loaded VBN.. As IN interest NN rates NNS rose VBD,,...

estimating the probabilities just like we did with the Naive Bayes classier, we estimate the probabilities by counting frequencies (MLE): count(verb, noun) PMLE (noun verb) = count(verb) PMLE (cat noun) = count(noun: cat) count(noun)

smoothing... the tag transition model may need to be smoothed, just like the language models in particular, if the training corpus is small or the tagset contains many tags here is how we do linear interpolation smoothing of the tag probabilities: PLI (tn tn 1) = λ 2 count(tn 1, tn) count(tn 1) + λ 1 count(tn) count(any tag) usually there is some special treatment for the emission probability P(w n t n ) if w n is unseen in the training corpus taking for instance punctuation, capitalization, numbers, suxes into account

tagging how do we use our probability model to tag? conceptually: enumerate all possible tag sequences; use the probabilities to nd the best one however in long sentences, the number of possible tag sequences is very large the Viterbi algorithm nds the most probable underlying tag sequence because of the Markov assumption, this algorithm is very ecient

the Viterbi algorithm for each possible tag t i of a word w i, we compute the best tag sequence leading to t i for instance: for the word saw, we nd the best sequence ending with saw as a verb, and the best ending with saw as a noun

the trick to compute the best path ending with saw as a verb, consider the best paths for the previous word and the transition probabilities assume the previous word is e.g. man, which can be a noun or a verb select the highest of the LP of the best path ending in man as a verb + the LP of the transition verb verb the LP of the best path ending in man as a noun + the LP of the transition noun verb

tagging a sentence apply the Viterbi algorithm step by step after the last token of the sentence, add a special dummy end token this token will emit a dummy end tag with probability 1 the best tag sequence for the whole sentence is the best path ending in the dummy tag nally, retrace your steps from the dummy item to get the tags so you need backpointers

Viterbi example

Viterbi example

Viterbi example

Viterbi example

Viterbi example

Viterbi example

Viterbi example

Viterbi example

Viterbi example

Viterbi example

Viterbi example

Viterbi example

using more context tagging accuracy can possibly be improved by using more contextual information in a trigram tagger, we use transition probabilities such as P(t n t n 1, t n 2) smoothing becomes more important as you use more context

Search spaces... example: Will plays golf <B> 1 1 2 NN MD NNP 1 2 0 1 4 0 VBZ NNS 2 1 1 2 NN VB 2 1 <E> <B>/<B> <B>/NN <B>/MD <B>/NNP NN/VBZ NN/NNS MD/VBZ MD/NNS NNP/VBZ NNP/NNS VBZ/NN VBZ/VB NNS/NN NNS/VB NN/<E> VB/<E> <E>/<E>

assignment 3 write a bigram part-of-speech tagger in Python estimate the emission and transition probabilities implement the Viterbi algorithm evaluate the tagger on a test set

overview HMM tagging recap PCFGs, dependency parsing, and VG assignment 1 the next few weeks

overview HMM tagging recap PCFGs, dependency parsing, and VG assignment 1 the next few weeks

the computer assignments assignment 3: tagger implementation February 27 and March 4 report deadline: March 18

next lectures March 6: unsupervised and semisupervised methods March 11: machine translation (with Prasanth) March 18 and 20: VG assignment lab sessions (and catchup)