TnT Part of Speech Tagger

Similar documents
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 3: Language Modelling and Smoothing

SYNTHER A NEW M-GRAM POS TAGGER

Text Mining. March 3, March 3, / 49

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

The Noisy Channel Model and Markov Models

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

HMM and Part of Speech Tagging. Adam Meyers New York University

Statistical methods in NLP, lecture 7 Tagging and parsing

Natural Language Processing SoSe Words and Language Model

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Part-of-Speech Tagging

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Lecture 12: Algorithms for HMMs

Prenominal Modifier Ordering via MSA. Alignment

Statistical Methods for NLP

Maschinelle Sprachverarbeitung

10/17/04. Today s Main Points

Maschinelle Sprachverarbeitung

Machine Learning for natural language processing

Lecture 12: Algorithms for HMMs

Midterm sample questions

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Chapter 3: Basics of Language Modelling

Lecture 9: Hidden Markov Model

Fun with weighted FSTs

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Hidden Markov Models in Language Processing

Natural Language Processing. Statistical Inference: n-grams

Maxent Models and Discriminative Estimation

Language Processing with Perl and Prolog

Collapsed Variational Bayesian Inference for Hidden Markov Models

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Dynamic Programming: Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Hidden Markov Models

Language Modeling. Michael Collins, Columbia University

N-gram Language Modeling

NLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc)

Lecture 13: Structured Prediction

Part-of-Speech Tagging

Lecture 4: Smoothing, Part-of-Speech Tagging. Ivan Titov Institute for Logic, Language and Computation Universiteit van Amsterdam

LECTURER: BURCU CAN Spring

Probabilistic Language Modeling

Graphical models for part of speech tagging

Sequences and Information

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Fast Logistic Regression for Text Categorization with Variable-Length N-grams

The Language Modeling Problem (Fall 2007) Smoothed Estimation, and Language Modeling. The Language Modeling Problem (Continued) Overview

Chapter 3: Basics of Language Modeling

Log-linear models (part 1)

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Probabilistic Context Free Grammars. Many slides from Michael Collins

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

Statistical Methods for NLP

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP

Probabilistic Context-free Grammars

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

Language Modeling. Introduction to N-grams. Many Slides are adapted from slides by Dan Jurafsky

A fast and simple algorithm for training neural probabilistic language models

CS 224N HW:#3. (V N0 )δ N r p r + N 0. N r (r δ) + (V N 0)δ. N r r δ. + (V N 0)δ N = 1. 1 we must have the restriction: δ NN 0.

1. Markov models. 1.1 Markov-chain

Language Models. Philipp Koehn. 11 September 2018

Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

with Local Dependencies

N-gram Language Modeling Tutorial

Log-Linear Models, MEMMs, and CRFs

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars

DT2118 Speech and Speaker Recognition

IN FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning

On Using Selectional Restriction in Language Models for Speech Recognition

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Multiword Expression Identification with Tree Substitution Grammars

The Noisy Channel Model. CS 294-5: Statistical Natural Language Processing. Speech Recognition Architecture. Digitizing Speech

Lecture 2: N-gram. Kai-Wei Chang University of Virginia Couse webpage:

Lecture 7: Sequence Labeling

Exact Sampling and Decoding in High-Order Hidden Markov Models

Variational Decoding for Statistical Machine Translation

N-gram N N-gram. N-gram. Detection and Correction for Errors in Hiragana Sequences by a Hiragana Character N-gram.

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

{ Jurafsky & Martin Ch. 6:! 6.6 incl.

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Introduction to Probablistic Natural Language Processing

CSA4050: Advanced Topics Natural Language Processing. Lecture Statistics III. Statistical Approaches to NLP

Ecient Higher-Order CRFs for Morphological Tagging

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Latent Dirichlet Allocation Based Multi-Document Summarization

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Improved Decipherment of Homophonic Ciphers

CMPT-825 Natural Language Processing

MIA - Master on Artificial Intelligence

Lecture 3: ASR: HMMs, Forward, Viterbi

Transcription:

TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31

Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation by the authors Evaluation by others 4 2 / 31

Why Then? Why Then? Why Now? Published in 2000 [Bra00] One of the first to show that tagger based on Markov models can yield state-of-the-art results 3 / 31

Why Now? Why Then? Why Now? Citation count: 305 Tested across Different languages Different domains and so on... 4 / 31

Trigrams n Tags Underlying Model Other technicalities 5 / 31

Trigrams n Tags Underlying Model Other technicalities A second order Hidden Markov Model 5 / 31

Trigrams n Tags Underlying Model Other technicalities A second order Hidden Markov Model with careful decisions regarding 5 / 31

Trigrams n Tags Underlying Model Other technicalities A second order Hidden Markov Model with careful decisions regarding Handling of start- and end-of-sequence 5 / 31

Trigrams n Tags Underlying Model Other technicalities A second order Hidden Markov Model with careful decisions regarding Handling of start- and end-of-sequence Smoothing 5 / 31

Trigrams n Tags Underlying Model Other technicalities A second order Hidden Markov Model with careful decisions regarding Handling of start- and end-of-sequence Smoothing Capitalization 5 / 31

Trigrams n Tags Underlying Model Other technicalities A second order Hidden Markov Model with careful decisions regarding Handling of start- and end-of-sequence Smoothing Capitalization Handling of unknown words 5 / 31

Trigrams n Tags Underlying Model Other technicalities A second order Hidden Markov Model with careful decisions regarding Handling of start- and end-of-sequence Smoothing Capitalization Handling of unknown words Improving speed of tagging 5 / 31

Underlying Model Other technicalities Second Order Hidden Markov Model 6 / 31

Tri-gram model Underlying Model Other technicalities Given word sequence: w 1, w 2,..., w T 7 / 31

Tri-gram model Underlying Model Other technicalities Given word sequence: w 1, w 2,..., w T Find the tag sequence: t 1, t 2,..., t T where t i Tag Set 7 / 31

Tri-gram model Underlying Model Other technicalities Given word sequence: w 1, w 2,..., w T Find the tag sequence: t 1, t 2,..., t T where t i Tag Set Specifically [ we need: T ] P(t i t i 1, t i 2).P(w i t i ) P(t T +1 tt ) argmax t 1,...,t T i=1 where t 1, t 0, t T +1 denotes beginning-of-sequence and end-of-sequence. 7 / 31

Tri-gram model Underlying Model Other technicalities Given word sequence: w 1, w 2,..., w T Find the tag sequence: t 1, t 2,..., t T where t i Tag Set Specifically [ we need: T ] P(t i t i 1, t i 2).P(w i t i ) P(t T +1 tt ) argmax t 1,...,t T i=1 where t 1, t 0, t T +1 denotes beginning-of-sequence and end-of-sequence. NB: If sentence boundaries are not marked in the input, TnT adds these tags if it encounters one of [.!?; ] as a token. 7 / 31

Tri-gram model continued Underlying Model Other technicalities Define: ˆP = Maximum likelihood probability N = Total number of tokens in the training corpus Unigrams: ˆP(t 3 ) = f (t 3) N Bigrams: ˆP(t 3 t 2 ) = f (t 2,t 3 ) f (t 2 ) Trigrams: ˆP(t 3 t 1, t 2 ) = f (t 1,t 2,t 3 ) f (t 1,t 2 ) Lexical: ˆP(w3 t 3 ) = f (w 3,t 3 ) f (t 3 ) where all t 1, t 2, t 3 are in tagset and w 3 is in the lexicon. Note: ˆP = 0 if numerator, denominator = 0 8 / 31

Underlying Model Other technicalities Other Intricate technicalities 9 / 31

Smoothing Underlying Model Other technicalities P(t 3 t 1, t 2 ) = λ 1 ˆP(t 3 ) + λ 2 ˆP(t 3 t 2 ) + λ 3 ˆP(t 3 t 1, t 2 ) where 0 λ i 1, i {1, 2, 3} such that λ 1 + λ 2 + λ 3 = 1 the values of λ i s are estimated by deleted interpolation. 10 / 31

Underlying Model Other technicalities Procedure to calculate λ i 1: set λ 1 = λ 2 = λ 3 = 0 2: for each trigram t 1, t 2, t 3 with f (t 1, t 2, t 3 ) > 0 do 3: depending on maximum value case f (t 1,t 2,t 3 ) 1 f (t 1,t 2 ) 1 : λ 3 = λ 3 + f (t 1, t 2, t 3 ) case f (t 2,t 3 ) 1 f (t 2 ) 1 : λ 2 = λ 2 + f (t 1, t 2, t 3 ) case f (t 3) 1 N 1 : λ 1 = λ 1 + f (t 1, t 2, t 3 ) 4: end 5: end for 6: normalize λ 1, λ 2, λ 3 11 / 31

Capitalization Underlying Model Other technicalities Capitalization plays a vital role English: Proper nouns German: All nouns 12 / 31

Capitalization Underlying Model Other technicalities Capitalization plays a vital role English: Proper nouns German: All nouns Probability distribution of tags around capitalized words differs from the rest. 12 / 31

Capitalization Underlying Model Other technicalities Capitalization plays a vital role English: Proper nouns German: All nouns Probability distribution of tags around capitalized words differs from the rest. Define: c i = { 1 if w i is capitalized 0 otherwise So use P(t 3, c 3 t 1, c 1, t 2, c 2 ) instead of P(t 3 t 1, t 2 ). The tri-gram model equations need to be changed accordingly. 12 / 31

Handling of Unknown Words Underlying Model Other technicalities Handled best by suffix analysis (proposed by Samuelson in 1993) for inflected languages 13 / 31

Handling of Unknown Words Underlying Model Other technicalities Handled best by suffix analysis (proposed by Samuelson in 1993) for inflected languages What is meant by suffix? final sequence of characters of a word which is not necessarily a linguistically meaningful suffix 13 / 31

Handling of Unknown Words Underlying Model Other technicalities Handled best by suffix analysis (proposed by Samuelson in 1993) for inflected languages What is meant by suffix? final sequence of characters of a word which is not necessarily a linguistically meaningful suffix e.g: smoothing g ng ing hing thing othing oothing moothing smoothing 13 / 31

Underlying Model Other technicalities Handling of Unknown Words (contd...) Given suffix length: i = m to 0 P(l n i+1,...l n t) P(t l n i+1,...l n )P(t) Define: ˆP as the ML estimate obtained from frequencies in the lexicon P(t) = ˆP(t) P(t l n i+1,...l n ) = ˆP(t l n i+1,...l n)+θ i P(t l n i,...l n) 1+θ i where ˆP(t l n i+1,...l n ) = f (t,l n i+1,...l n) θ i = 1 s 1 P = 1 s s j=1 ˆP(t j ) f (l n i+1,...l n) s j=1 (ˆP(t j ) P) 2 Note: Here m = 10 14 / 31

Beam Search Underlying Model Other technicalities A faster and approximated version of Viterbi algorithm. 15 / 31

Beam Search Underlying Model Other technicalities A faster and approximated version of Viterbi algorithm. Explore states above a certain threshold. 15 / 31

Beam Search Underlying Model Other technicalities A faster and approximated version of Viterbi algorithm. Explore states above a certain threshold. Does not guarantee the correct path but performs well. 15 / 31

Evaluation Setting Evaluation by the authors Evaluation by others DataSets: Negra Corpus: German Newspaper corpus Penn TreeBank: The Wall Street Journal portion of Penn-TreeBank corpus DataSet Split: Contiguous Round-Robin Performance Metrics Tagging Accuracy for known, and more importantly, unknown words Effect of amount of training dataset on accuracy Accuracy of Reliable Tag Assigments 16 / 31

Handling of Unknown Words Evaluation by the authors Evaluation by others 17 / 31

Evaluation by the authors Evaluation by others Learning with respect to DataSet Size 18 / 31

Evaluation by the authors Evaluation by others Learning with respect to DataSet Size 19 / 31

Accuracy of Reliable Assignments Evaluation by the authors Evaluation by others 20 / 31

Accuracy of Reliable Assignments Evaluation by the authors Evaluation by others 21 / 31

Evaluation by others Evaluation by the authors Evaluation by others 22 / 31

Evaluation by others Evaluation by the authors Evaluation by others Different people evaluating on different axes 22 / 31

Different Languages Evaluation by the authors Evaluation by others 23 / 31

Different Languages Evaluation by the authors Evaluation by others Does not work well for morphologically complex languages (e.g Icelandic) 23 / 31

Different Languages Evaluation by the authors Evaluation by others Does not work well for morphologically complex languages (e.g Icelandic) Solution: Fill gaps in lexicon using language specific morphological analyzers [Lof07] 23 / 31

Different Languages Evaluation by the authors Evaluation by others Does not work well for morphologically complex languages (e.g Icelandic) Solution: Fill gaps in lexicon using language specific morphological analyzers [Lof07] Worked well for German though 23 / 31

Different Languages Evaluation by the authors Evaluation by others Does not work well for morphologically complex languages (e.g Icelandic) Solution: Fill gaps in lexicon using language specific morphological analyzers [Lof07] Worked well for German though What form of morphological complexities create trouble? 23 / 31

Different Domains Evaluation by the authors Evaluation by others 24 / 31

Different Domains Evaluation by the authors Evaluation by others Works well for domain specific POS task 24 / 31

Different Domains Evaluation by the authors Evaluation by others Works well for domain specific POS task If trained using large domain specific corpora [HW04] 24 / 31

Different Domains Evaluation by the authors Evaluation by others Works well for domain specific POS task If trained using large domain specific corpora [HW04] If trained using large generic corpora with an additional small domain specific corpora [CPA + 05] 24 / 31

The thing about Accuracy Evaluation by the authors Evaluation by others 25 / 31

The thing about Accuracy Evaluation by the authors Evaluation by others Accuracies of over 97%... 25 / 31

The thing about Accuracy Evaluation by the authors Evaluation by others Accuracies of over 97%...... are per-token accuracy 25 / 31

The thing about Accuracy Evaluation by the authors Evaluation by others Accuracies of over 97%...... are per-token accuracy What about sentence-level accuracy? 25 / 31

The thing about Accuracy Evaluation by the authors Evaluation by others Figure: Tagging Accuracies on WSJ Development Set [Man11] 26 / 31

Different POS Tagging Error types Evaluation by the authors Evaluation by others Figure: Frequency of different POS tagging error types [Man11] 27 / 31

28 / 31

A significant milestone in the history of Part-of-Speech Tagging 28 / 31

A significant milestone in the history of Part-of-Speech Tagging A good point of entry into Statistical NLP. 28 / 31

References I Thorsten Brants, Tnt: A statistical part-of-speech tagger, Proceedings of the Sixth Conference on Applied Natural Language Processing (Stroudsburg, PA, USA), ANLC 00, Association for Computational Linguistics, 2000, pp. 224 231. Anni R. Coden, Serguei V. Pakhomov, Rie K. Ando, Patrick H. Duffy, and Christopher G. Chute, Domain-specific language models and lexicons for tagging, J. of Biomedical Informatics 38 (2005), no. 6, 422 430. Udo Hahn and Joachim Wermter, High-performance tagging on medical texts, Proceedings of the 20th International Conference on Computational Linguistics (Stroudsburg, PA, USA), COLING 04, Association for Computational Linguistics, 2004. 29 / 31

References II Hrafn Loftsson, Tagging icelandic text using a linguistic and a statistical tagger, Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers (Stroudsburg, PA, USA), NAACL-Short 07, Association for Computational Linguistics, 2007, pp. 105 108. Christopher D. Manning, Part-of-speech tagging from 97time for some linguistics?, Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing - Volume Part I (Berlin, Heidelberg), CICLing 11, Springer-Verlag, 2011, pp. 171 189. 30 / 31

Thank you! 31 / 31