Determining Word Sense Dominance Using a Thesaurus

Similar documents
Word Sense Disambiguation (WSD) Based on Foundations of Statistical NLP by C. Manning & H. Schütze, ch. 7 MIT Press, 2002

Recap: HMM. ANLP Lecture 9: Algorithms for HMMs. More general notation. Recap: HMM. Elements of HMM: Sharon Goldwater 4 Oct 2018.

Statistical NLP Spring 2010

Learning Features from Co-occurrences: A Theoretical Analysis

Toponym Disambiguation using Ontology-based Semantic Similarity

Topic Models and Applications to Short Documents

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Semantic Similarity from Corpora - Latent Semantic Analysis

HMM Expanded to Multiple Interleaved Chains as a Model for Word Sense Disambiguation

Bowl Maximum Entropy #4 By Ejay Weiss. Maxent Models: Maximum Entropy Foundations. By Yanju Chen. A Basic Comprehension with Derivations

Probabilistic Context-free Grammars

Lecture 12: Algorithms for HMMs

Spatial Role Labeling CS365 Course Project

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Phrase-Based Statistical Machine Translation with Pivot Languages

Multi-Layer Boosting for Pattern Recognition

Classification & Information Theory Lecture #8

Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation

Lecture 12: Algorithms for HMMs

Word Alignment by Thresholded Two-Dimensional Normalization

A Tutorial on Learning with Bayesian Networks

Classification by Selecting Plausible Formal Concepts in a Concept Lattice

The Noisy Channel Model and Markov Models

Improving Topic Models with Latent Feature Word Representations

Naïve Bayes Classifiers

6.036 midterm review. Wednesday, March 18, 15

DISTRIBUTIONAL SEMANTICS

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Maxent Models and Discriminative Estimation

Introduction to Logistic Regression and Support Vector Machine

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

TEXT SIMILARITY MEASUREMENT WITH SEMANTIC ANALYSIS. Xu Li, Na Liu, Chunlong Yao and Fenglong Fan. Received February 2017; revised June 2017

Topics. Bayesian Learning. What is Bayesian Learning? Objectives for Bayesian Learning

Behavioral Data Mining. Lecture 2

The OntoNL Semantic Relatedness Measure for OWL Ontologies

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Prenominal Modifier Ordering via MSA. Alignment

MIA - Master on Artificial Intelligence

An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Classification, Linear Models, Naïve Bayes

Latent Semantic Analysis. Hongning Wang

Expectation Maximization, and Learning from Partly Unobserved Data (part 2)

Generative Clustering, Topic Modeling, & Bayesian Inference

Latent Semantic Analysis. Hongning Wang

Hidden Markov Models

On the Problem of Error Propagation in Classifier Chains for Multi-Label Classification

A Practical Algorithm for Topic Modeling with Provable Guarantees

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Hidden Markov Models (HMMs)

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

Laconic: Label Consistency for Image Categorization

Linear Models for Classification

B u i l d i n g a n d E x p l o r i n g

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Maschinelle Sprachverarbeitung

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Maschinelle Sprachverarbeitung

SYNTHER A NEW M-GRAM POS TAGGER

Classifying Documents with Poisson Mixtures

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

Machine Learning for natural language processing

Structured Output Prediction: Generative Models

Statistical Models of the Annotation Process

Language as a Stochastic Process

arxiv: v1 [cs.lg] 5 Jul 2010

CS 188: Artificial Intelligence. Outline

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

ICDM Pisa, December 16, Motivation

Bootstrapping. 1 Overview. 2 Problem Setting and Notation. Steven Abney AT&T Laboratories Research 180 Park Avenue Florham Park, NJ, USA, 07932

Semantic Term Matching in Axiomatic Approaches to Information Retrieval

Internet Engineering Jacek Mazurkiewicz, PhD

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Analysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science

Hidden Markov Models

IBM Model 1 for Machine Translation

Latent Variable Models in NLP

Joint Emotion Analysis via Multi-task Gaussian Processes

Semantic Similarity and Relatedness

Multiword Expression Identification with Tree Substitution Grammars

Midterm sample questions

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

CS 188: Artificial Intelligence Spring Announcements

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding.

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

CS540 ANSWER SHEET

Construction of thesaurus in the field of car patent Tingting Mao 1, Xueqiang Lv 1, Kehui Liu 2

Testing the Efficacy of Part-of-Speech Information in Word Completion

Lecture 5 Neural models for NLP

Parts 3-6 are EXAMPLES for cse634

Global Machine Learning for Spatial Ontology Population

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Ensemble Methods for Machine Learning

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Transcription:

Determining Word Sense Dominance Using a Thesaurus Saif Mohammad and Graeme Hirst Department of Computer Science University of Toronto EACL, Trento, Italy (5th April, 2006) Copyright cfl2006, Saif Mohammad and Graeme Hirst Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 1

Word Sense Dominance star (CELESTIAL BODY) star (CELEBRITY) The degree of dominance of a sense of a word is the proportion of occurrences of that sense in text. Applications: Ξ Sense disambiguation, document clustering,... Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 2

McCarthy et al. s Method SIMILARLY SENSE DISTRIBUTED TARGET TEXT D = dominance method Requires WordNet. WORDNET D DOMINANCE VALUES LIN S THESAURUS A U X I L I A R Y C O R P U S Needs auxiliary text with similar sense distribution. Requires retraining (Lin s thesaurus). Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 3

Our Method TARGET TEXT PUBLISHED THESAURUS D DOMINANCE VALUES WCCM WCCM = word category co-occurrence matrix A U X I L I A R Y C O R P U S We use a published thesaurus. Auxiliary text need not have similar sense distribution. No retraining is needed (WCCM created just once). Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 4

Published Thesauri E.g., Roget s (English), Macquarie (English), Cilin (Chinese), Bunrui Goi Hyou (Japanese) Vocabulary divided into about 1000 categories Words in a category (category terms or c-terms) are Ξ closely related. A category very roughly corresponds to a sense Ξ (Yarowsky, 1992). One word, more than one category Ξ bark in ANIMAL NOISES and MEMBRANE. Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 5

Why a Thesaurus? Coarse senses: WordNet is much too fine grained. Computational ease: With just 1000 categories, the word category co-occurrence matrix is of manageable size. Availability: Thesauri are available in many languages. Words for a sense: Each sense can be represented unambiguously with a set of (possibly ambiguous) words. Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 6

Word Category Matrix c 1 c 2 ::: c j ::: w 1 m 11 m 12 ::: m 1 j ::: 2 ::: m 2 j :::.. ::: :::.... w m 21 m 22 i ::: :::........ w m i1 m i2 m ij WCCM: categories (thesaurus) vs. words (vocabulary) Cell m ij : number of times word w i co-occurs with a c-term listed in category c j Text: most of the BNC Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 7

Example CELESTIAL BODY space star CELEBRITY cell (space, CELESTIAL BODY) incremented by 1 cell (space, CELEBRITY) incremented by 1 Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 8

Example (continued) CELESTIAL BODY space X... X: star, nova, constellation, sun, empty, web Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 9

Word Category Matrix CELESTIAL c 1 c 2 ::: ::: BODY w 1 m 11 m 12 ::: m 1 j ::: w 2 m 21 m 22 ::: m 2 j :::..... :::. ::: space m i1 m i2 "" ::: m :::........ Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 10

Contingency Table for w and c w: :: c :c w n wc n :w n :c n Applying a statistic gives the strength of association (SoA) cosine Dice odds ratio pointwise mutual information Yule s coefficient of colligation Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 11

Evidence for the Senses SoA CELESTIAL BODY space SoA star CELEBRITY Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 12

Base WCCM Matrix created after the first pass of unannotated text noisy Ξ Ξ captures strong associations Words that occur close to a target word Ξ Good indicators of intended sense Ξ Co-occurrence frequency used as evidence Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 13

Bootstrapping the WCCM Second pass of the auxiliary corpus Word sense disambiguation: using co-occurring words Ξ and evidence from base WCCM New, more accurate, WCCM Cell m Ξ ij : number of times word used in sense c j co-occurs with w i Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 14

Four Methods Implicit sense disambiguation Explicit sense disambiguation Weighted voting D I,W D E,W Unweighted voting D I,U D E,U The stronger the association of a sense with its co-occurring words, the higher is its dominance. Weighted vote (SoA) to each sense or unweighted vote to sense with the highest SoA Explicit word sense disambiguation or not Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 15

Method: D I;W Each word that co-occurs with the target word t gives a weighted vote (SoA) to each sense. Dominance of a sense c is the proportion of votes it gets. D I;W(t ;c) = w2t SoA(w;c) 2senses(t) c0 w2t SoA(w;c ) 0 T is the set of all words that co-occur with t. Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 16

Method: D I;U Each word that co-occurs with the target word gives an unweighted, equal vote to a winner sense. Sense with highest strength of association with cooccurring Ξ words Dominance of a sense is the proportion of votes it gets. D jfw T : argmax c ) 2senses(t) = cgj I;U(t 0 2 SoA(w;c0 = ;c) j jt Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 17

Methods: D E ;W and D E ;U Explicit sense disambiguation Votes from co-occurring words Ξ Votes can be weighted or unweighted Ξ Dominance of a sense Ξ Proportion of occurrences pertaining to that sense Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 18

Experimental Setup Naïve sense disambiguation system Ξ Gives predominant sense as output Test datasets Different sense distributions of the two most dominant Ξ senses of each target word Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 19

Sense-tagged Data We created pseudo-thesaurus-sense-tagged data for the 27 head words in SENSEVAL-1 English Sample Space using the held out subset of BNC. Non-monosemous target word: brilliant Category: INTELLIGENCE Monosemous c-term: clever Sentence from auxiliary text: Hermione had a clever plan. Sense annotated sentence: Hermione had a brilliant//intelligence plan. Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 20

Best Results Accuracy 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 21

Best Results Accuracy 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Upper bound 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 22

Best Results Accuracy 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Upper bound Random baseline 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 23

Best Results Accuracy 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Upper bound Random baseline Lower bound 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 24

Best Results Accuracy 1 0.9 0.8 D E,W Mean distance below upper bound 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 25

Best Results Accuracy 1 0.9 0.8 D E,W Mean distance below upper bound 0.7 0.6 0.5 D E ;W :.02 pmi, odds, Yule 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 26

Best Results Accuracy 1 0.9 0.8 D E,W D I,W Mean distance below upper bound 0.7 0.6 0.5 D E ;W :.02 pmi, odds, Yule D I;W :.03 0.4 0.3 pmi 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 27

Best Results Accuracy 1 0.9 0.8 0.7 0.6 0.5 D E,W D I,W D I,U D E,U Mean distance below upper bound D E ;W :.02 pmi, odds, Yule D I;W :.03 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) pmi D I;U:.11 phi, pmi, odds, Yule D E ;U:.16 phi, pmi, odds, Yule Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 28

Observations Accuracy 1 0.9 0.8 0.7 0.6 0.5 0.4 D E,W D I,W D I,U D E,U Weighted methods are better Explicit or implicit disambiguation does not matter 0.3 0.2 0.1 Odds, pmi, and Yule are better 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 29

Effect of Bootstrapping Accuracy 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 DE,W (odds) bootstrapped 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 30

Effect of Bootstrapping Accuracy 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 DE,W (odds) bootstrapped D E,W (odds) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 31

Effect of Bootstrapping Accuracy 1 0.9 0.8 DE,W (odds) bootstrapped D E,W (odds) Most of the gain given by the first iteration. 0.7 0.6 0.5 0.4 0.3 Relative behavior of measures more or less the same. 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Sense distribution (alpha) Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 32

In Summary New methods for determining sense dominance Raw text and a published thesaurus Ξ No similarly-sense-distributed text or re-training Ξ Extensive experiments Ξ Synthetically created thesaurus-sense-tagged data Results are close to the upper bound Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 33

Future Work WCCM has applications beyond sense dominance. Linguistic distances Distributional distance of concepts Word sense disambiguation Unsupervised naïve Bayes classifier Machine translation Domain-specific translational dominance Document clustering Represent document in concept space Determining Word Sense Dominance using a Thesaurus. Saif Mohammad and Graeme Hirst. 34