Prenominal Modifier Ordering via MSA. Alignment

Similar documents
TnT Part of Speech Tagger

The Noisy Channel Model and Markov Models

Probabilistic Context-free Grammars

Midterm sample questions

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

SYNTHER A NEW M-GRAM POS TAGGER

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Text Mining. March 3, March 3, / 49

Spatial Role Labeling CS365 Course Project

More Smoothing, Tuning, and Evaluation

Natural Language Processing. Statistical Inference: n-grams

Tuning as Linear Regression

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

Sequences and Information

Natural Language Processing SoSe Words and Language Model

LECTURER: BURCU CAN Spring

Natural Language Processing SoSe Language Modelling. (based on the slides of Dr. Saeedeh Momtazi)

Ling 289 Contingency Table Statistics

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Machine Learning for NLP

Statistical Methods for NLP

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN

Maximum Entropy Klassifikator; Klassifikation mit Scikit-Learn

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

N-gram Language Modeling

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Chunking with Support Vector Machines

Internet Engineering Jacek Mazurkiewicz, PhD

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

A fast and simple algorithm for training neural probabilistic language models

Chapter 14 (Partially) Unsupervised Parsing

Ad Placement Strategies

MACHINE LEARNING FOR NATURAL LANGUAGE PROCESSING

10/17/04. Today s Main Points

Intelligent Systems (AI-2)

CS 224N HW:#3. (V N0 )δ N r p r + N 0. N r (r δ) + (V N 0)δ. N r r δ. + (V N 0)δ N = 1. 1 we must have the restriction: δ NN 0.

DT2118 Speech and Speaker Recognition

Unsupervised Vocabulary Induction

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Generative Models. CS4780/5780 Machine Learning Fall Thorsten Joachims Cornell University

Bringing machine learning & compositional semantics together: central concepts

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Statistical methods in NLP, lecture 7 Tagging and parsing

Probabilistic Context Free Grammars. Many slides from Michael Collins

The Benefits of a Model of Annotation

Variable Latent Semantic Indexing

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 13: Structured Prediction

Today s Agenda. Need to cover lots of background material. Now on to the Map Reduce stuff. Rough conceptual sketch of unsupervised training using EM

Optimum parameter selection for K.L.D. based Authorship Attribution for Gujarati

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Evaluation Strategies

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

A Support Vector Method for Multivariate Performance Measures

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Natural Language Processing

Classification, Linear Models, Naïve Bayes

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

Latent Dirichlet Allocation Based Multi-Document Summarization

Bayes Theorem & Naïve Bayes. (some slides adapted from slides by Massimo Poesio, adapted from slides by Chris Manning)

Naïve Bayes, Maxent and Neural Models

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

AN ABSTRACT OF THE DISSERTATION OF

Integrating Morphology in Probabilistic Translation Models

ML in Practice: CMSC 422 Slides adapted from Prof. CARPUAT and Prof. Roth

NLP Programming Tutorial 11 - The Structured Perceptron

Maximum Entropy Markov Models

Generative Models for Classification

Latent Dirichlet Allocation Introduction/Overview

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Graphical models for part of speech tagging

Learning Features from Co-occurrences: A Theoretical Analysis

Lecture 2: Probability, Naive Bayes

Natural Language Processing

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Intelligent Systems (AI-2)

Predicting English keywords from Java Bytecodes using Machine Learning

Machine Learning for Structured Prediction

Expectation Maximization (EM)

Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)

Sequence Labeling: HMMs & Structured Perceptron

Latent Variable Models in NLP

Features of Statistical Parsers

Lecture 5: UDOP, Dependency Grammars

EECS730: Introduction to Bioinformatics

Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

EXPERIMENTS ON PHRASAL CHUNKING IN NLP USING EXPONENTIATED GRADIENT FOR STRUCTURED PREDICTION

CSC321 Lecture 15: Recurrent Neural Networks

Machine Learning for NLP

Improved Decipherment of Homophonic Ciphers

FROM QUERIES TO TOP-K RESULTS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Natural Language Processing

Transcription:

Introduction Prenominal Modifier Ordering via Multiple Sequence Alignment Aaron Dunlop Margaret Mitchell 2 Brian Roark Oregon Health & Science University Portland, OR 2 University of Aberdeen Aberdeen, Scotland, U.K. NAACL June 3, 200

Outline Introduction Introduction Noun Phrase Ordering Multiple Sequence Alignment (MSA) 2 MSA Training Biological MSA Linguistic MSA 3 Results 4 Conclusion

Noun-phrase Ordering Introduction Noun Phrase Ordering Natural Language Generation Task Applications in: Summarization Machine Translation We want to generate natural-sounding text big clumsy brown bear vs?? brown clumsy big bear

Previous Work Introduction Noun Phrase Ordering Genre Accuracy Shaw and Medical Adjectives 94.9% Hatzivassiloglou (999) Medical w/ noun modifiers 90.7% WSJ Adjectives 80.8% WSJ w/ noun modifiers 7.0% Malouf (2000) BNC Adjectives 9.9% Mitchell (2009) Multi-genre w/ noun modifiers 77.% Nouns as modifiers: executive vice president state teacher cadet program

Introduction Multiple Sequence Alignment (DNA) Multiple Sequence Alignment (MSA) G A C T C - A T - A G T G T A T - C G T - T A T - A G T G T A T - A C T - T - T Bases A Adenine C Cytosine G Guanine T Thymine - Gap

Introduction Multiple Sequence Alignment (DNA) Multiple Sequence Alignment (MSA) G A C T C - A T - A G T G T A T - C G T - T A T - A G T G T A T - A C T - T - T G C C T - - A T Bases A Adenine C Cytosine G Guanine T Thymine - Gap

Introduction Multiple Sequence Alignment Multiple Sequence Alignment (MSA) small clumsy black bear big - black cow two-story - brown house big clumsy - bull valuable 4k gold watch

Introduction Multiple Sequence Alignment Multiple Sequence Alignment (MSA) small clumsy black bear big - black cow two-story - brown house big clumsy - bull valuable 4k gold watch big clumsy brown bear Align each permutation of the test sequence (n!) and choose highest-scoring alignment

Biological MSA Training MSA Training Biological MSA Begin with a substitution matrix Calculate distance matrix Align 2 closest sequences Repeatedly align and incorporate the closest sequence not already in the MSA Induce a Position Specific Score Matrix (PSSM) Align unseen sequences with Viterbi search Substitution Matrix A C G T - A 0 2 2 C 0 5 2 G 2 0 5 T 2 3 0-2 2 2 0

Biological MSA Training MSA Training Biological MSA Distance Matrix Begin with a substitution matrix Calculate distance matrix Align 2 closest sequences Repeatedly align and incorporate the closest sequence not already in the MSA Induce a Position Specific Score Matrix (PSSM) Align unseen sequences with Viterbi search s s 2 s 3 s 4 s 0 s 2 3 0 s 3 3 4 0 s 4 5 2 4 0

Biological MSA Training MSA Training Biological MSA Distance Matrix Begin with a substitution matrix Calculate distance matrix Align 2 closest sequences Repeatedly align and incorporate the closest sequence not already in the MSA Induce a Position Specific Score Matrix (PSSM) Align unseen sequences with Viterbi search s s 2 s 3 s 4 s 0 s 2 3 0 s 3 3 4 0 s 4 5 2 4 0 C G T - T A A G T G T A

Biological MSA Training MSA Training Biological MSA Distance Matrix Begin with a substitution matrix Calculate distance matrix Align 2 closest sequences Repeatedly align and incorporate the closest sequence not already in the MSA Induce a Position Specific Score Matrix (PSSM) Align unseen sequences with Viterbi search s s 2 s 3 s 4 s 0 s 2 3 0 s 3 3 4 0 s 4 5 2 4 0 - C G T - T A - A G T G T A - A C T - T -

Biological MSA Training MSA Training Biological MSA Distance Matrix Begin with a substitution matrix Calculate distance matrix Align 2 closest sequences Repeatedly align and incorporate the closest sequence not already in the MSA Induce a Position Specific Score Matrix (PSSM) Align unseen sequences with Viterbi search s s 2 s 3 s 4 s 0 s 2 3 0 s 3 3 4 0 s 4 5 2 4 0 - C G T - T A - A G T G T A - A C T - T - G A C T C - A

Biological MSA Training MSA Training Biological MSA Begin with a substitution matrix Calculate distance matrix Align 2 closest sequences Repeatedly align and incorporate the closest sequence not already in the MSA Induce a Position Specific Score Matrix (PSSM) Align unseen sequences with Viterbi search - C G T - T A - A G T G T A - A C T - T - G A C T C - A 2 3 4 5 6 7 A 3 4 C 4 2 4 G 4 2 4 T 4 4-3 4 4 4 2 4 3 4 4 3 4 4

Biological MSA Training MSA Training Biological MSA Begin with a substitution matrix Calculate distance matrix Align 2 closest sequences Repeatedly align and incorporate the closest sequence not already in the MSA Induce a Position Specific Score Matrix (PSSM) Align unseen sequences with Viterbi search - C G T - T A - A G T G T A - A C T - T - G A C T C - A 2 3 4 5 6 7 A 3 C 2 G 2 T 4-3 2 3 3

Biological MSA Training MSA Training Biological MSA Begin with a substitution matrix Calculate distance matrix Align 2 closest sequences Repeatedly align and incorporate the closest sequence not already in the MSA Induce a Position Specific Score Matrix (PSSM) Align unseen sequences with Viterbi search A C G 2 T - 4 - C G T - T A - A G T G T A - A C T - T - G A C T C - A 2 3 4 5 6 7 4 2 3 3 5 2 2 3 4 2 4 2

Biological MSA Training MSA Training Biological MSA Begin with a substitution matrix Calculate distance matrix Align 2 closest sequences Repeatedly align and incorporate the closest sequence not already in the MSA Induce a Position Specific Score Matrix (PSSM) Align unseen sequences with Viterbi search A C G 2 T - 4 - C G T - T A - A G T G T A - A C T - T - G A C T C - A G C C T - - A 2 3 4 5 6 7 4 2 3 3 5 2 2 3 4 2 4 2

MSA Training Linguistic MSA What is the distance between ambling black bear and big hungry grizzly bear? What is the cost of substituting executive for two-story? For a gap in another sequence? We don t want to assume that knowledge a priori So we look for linguistic features that might influence the probability of ambling big or executive two-story

Feature-set MSA Training Linguistic MSA Identity Features Word Stem, derived by the Porter Stemmer Binned length indicators (word length in letters):, 2, 3, 4, 5-6, 7-8, 9-2, 3-8, >8 Indicator Features Word begins with a capital Entire word is capitalized Hyphenated Numeric (e.g. 234) Begins with a numeral (e.g. 2-sided) Ends with -al, -ble, -ed, -er, -est, -ic, -ing, -ive, -ly

MSA Training Linguistic MSA Maximum Likelihood (Generative Model) Treat features as classes Words, stems, lengths Each indicator feature in its own class Make the (clearly false) assumption that feature classes are independent Similar to the independence assumption in Naïve Bayes

ML Training MSA Training Linguistic MSA Incorporate sequences in order of occurrence Re-induce a PSSM after each sequence is incorporated Iterate, re-incorporating sequences into MSA

ML Training MSA Training Linguistic MSA Incorporate sequences in order of occurrence Re-induce a PSSM after each sequence is incorporated Iterate, re-incorporating sequences into MSA Vocabulary: 9 words Hyphenated Ends-with ble small clumsy black big - black two-story - brown big clumsy - valuable 4k gold

ML Training Example MSA Training Linguistic MSA Column Feature Count Prob Smoothed small 2 0 big 0 0 0 two-story 0 0 0 valuable 0 0 0 small clumsy black Hyphenated 0 0 Not hyphenated 3 2 3 -ble 0 0 Not -ble 3 2 3

ML Training Example MSA Training Linguistic MSA Column Feature Count Prob Smoothed small 2 2 big 2 2 two-story 0 0 valuable 0 0 small clumsy black big - black Hyphenated 0 0 Not hyphenated 2 4 3 4 -ble 0 0 Not -ble 2 4 3 4

ML Training Example MSA Training Linguistic MSA Column Feature Count Prob Smoothed small 2 3 2 big 2 3 2 two-story 2 3 2 valuable 0 0 2 small clumsy black big - black two-story - brown Hyphenated Not hyphenated 2 3 2 3 2 5 3 5 -ble 0 0 Not -ble 3 5 4 5

ML Training Example MSA Training Linguistic MSA Column Feature Count Prob Smoothed small 2 4 3 big 2 2 3 4 3 two-story 2 4 3 valuable 0 0 3 Hyphenated 2 4 6 Not hyphenated 3 3 4 4 6 small clumsy black big - black two-story - brown big clumsy - -ble 0 0 Not -ble 4 6 5 6

ML Training Example MSA Training Linguistic MSA Column Feature Count Prob Smoothed small 2 5 4 big 2 2 3 5 4 two-story 2 5 4 valuable 2 5 4 Hyphenated 2 5 7 Not hyphenated 3 3 5 5 7 small clumsy black big - black two-story - brown big clumsy - valuable 4k gold -ble Not -ble 4 5 4 5 2 7 5 7

Discriminative Model Averaged Perceptron MSA Training Linguistic MSA Uses the same features as the generative model Does not require the independence assumption With each sequence: Align each permutation of the sequence and compute alignment cost If the correct ordering does not score highest, perform perceptron update on the correct ordering and the highest-scoring incorrect ordering.

MSA Training Discriminative Training Example Linguistic MSA Alignment Costs Feature Column 2 3 gold 5 8 6 4k 0 0 valuable 4 9 4 valuable 4k gold Total 5 7 3 gold 4k valuable 5 4 0 -ble 0 Not -ble 0 0

MSA Training Discriminative Training Example Linguistic MSA Alignment Costs Feature Column 2 3 gold 6 8 5 4k 0 0 valuable 3 9 5 valuable 4k gold Total 3 5 9 gold 4k valuable 7 6 4 -ble 0 Not -ble 0 0

Corpus Results Corpus From Mitchell (2009), including 0-fold splits Composition Combination of Penn Treebank, Brown Corpus, and Switchboard All corpora hand-annotated trees Extracted NPs including nouns and adjectives 74% Penn Treebank (Financial Text) 3% Brown (Literary Text) 3% Switchboard (Conversational)

Evaluation Results Corpus Token Accuracy Rewards correct prediction of common sequences Penalizes sets of modifers which occur in multiple orders Precision / Recall Does not require predictions for all sets Applicable to types as well as tokens Occurrences Modifiers Predicted Accuracy P R 3 brown two-story 4 3 4 3 two-story brown 0 0-0 fuzzy brown 0 0-0 0 brown fuzzy 0 0-3 5 4 4 3 5

Results Corpus Pairwise Ordering Results Token Accuracy and Type-based Precision and Recall Accuracy Precision Recall F Mitchell 2009 N/A 90.3% 67.2% 77.% ML 85.5% 84.6% 84.7% 84.7% Perceptron 88.9% 88.2% 88.% 88.2% Previous results 7.0% 9.9%

Results Corpus Full Noun Phrase Results Token Accuracy and Token-based Precision and Recall Accuracy Precision Recall F Mitchell 2009 N/A 94.4% 78.6% 85.7% ML 76.9% 76.5% 76.5% 76.5% Perceptron 86.7% 86.7% 86.7% 86.7%

Results Cross-domain Generalization Type-based Precision and Recall Corpus Training Brown+WSJ Swbd+WSJ Swbd+Brown Testing Swbd Brown WSJ Mitchell 2009 72.0% 64.5% 40.9% ML 75.0% 74.8% 7.7% Perceptron 77.9% 76.5% 77.4%

Results Cross-domain Generalization Type-based Precision and Recall Corpus Training Brown+WSJ Swbd+WSJ Swbd+Brown Testing Swbd Brown WSJ Mitchell 2009 72.0% 64.5% 40.9% ML 75.0% 74.8% 7.7% Perceptron 77.9% 76.5% 77.4%

Summary Conclusion Summary Applied MSA techniques to NP-ordering Introduced 2 novel methods of MSA training which do not require either gold-standard alignments or hand-tuned substitution matrix. Accuracy competitive with or superior to the best previously-reported results.

Future Work Conclusion Future Work Train on a larger automatically-parsed corpus. Other learning methods Add additional features: Richer morphological features Semantic class information derived from WordNet, OntoNotes, etc.

Conclusion Questions Questions?

Conclusion Questions Full NP accuracies by modifier count Modifiers Frequency Token Pairwise Accuracy Accuracy 2 89.% 89.7% 89.7% 3 0.0% 64.5% 84.4% 4 0.9% 37.2% 80.7%

Ablation Tests Conclusion Questions Feature(s) Gain/Loss Word 0.0 Stem 0.0 Capitalization -0. All-Caps 0.0 Numeric -0.2 Initial-numeral 0.0 Length -0. Hyphen 0.0 -al 0.0 -ble -0.4 Feature(s) Gain/Loss -ed -0.4 -er 0.0 -est -0. -ic +0. -ing 0.0 -ive -0. -ly 0.0 Word and stem -22.9 Word, stem, -24.2 and endings

Example Sequences Conclusion Questions few quaint old characters instrument-jammed bomber cockpits American nuclear strike Italian state-owned holding company executive vice president monthly mortgage payments great Japanese investment machine