WEST: WEIGHTED-EDGE BASED SIMILARITY MEASUREMENT TOOLS FOR WORD SEMANTICS

Similar documents
Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

The OntoNL Semantic Relatedness Measure for OWL Ontologies

Exploiting WordNet as Background Knowledge

Multi-Component Word Sense Disambiguation

Notes on Latent Semantic Analysis

Calculating Semantic Relatedness with GermaNet

Algebra 1: Spring Semester Review

Coarse to Fine Grained Sense Disambiguation in Wikipedia

Concepts & Categorization. Measurement of Similarity

A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness

Toponym Disambiguation using Ontology-based Semantic Similarity

Latent Semantic Analysis. Hongning Wang

Copyright Notice. Copyright 2014 Have Fun Teaching, LLC

Algebra Midyear Test What to Know

Algebra Midyear Test What to Know

3. (1.2.13, 19, 31) Find the given limit. If necessary, state that the limit does not exist.

MATHEMATICAL AND EXPERIMENTAL INVESTIGATION OF ONTOLOGICAL SIMILARITY MEASURES AND THEIR USE IN BIOMEDICAL DOMAINS

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

Francisco M. Couto Mário J. Silva Pedro Coutinho

Potential & Kinetic Energy Web Quest!

4-5 Scatter Plots Plots and and Trend Lines

NUL System at NTCIR RITE-VAL tasks

Grade 7 Mathematics Test Booklet

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

Language Supportive Teaching and Textbooks in Tanzania. Course for textbook writers, editors and illustrators John Clegg, July 2013

John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece

INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING. Crista Lopes

Grades 3-5 Graphic Organizers

7-8 Using Exponential and Logarithmic Functions

Grade 4 Dear Parents Please revise these worksheets with your children for the final exam on

Fundamentals of Similarity Search

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Towards an Efficient Combination of Similarity Measures for Semantic Relation Extraction

Geospatial Semantics. Yingjie Hu. Geospatial Semantics

Neural Networks Language Models

A number line shows the order of numbers based on their value. Locate the number on the number line that represents 40 positive units from zero.

Course 1 Benchmark Test End of Year

Theoretical approach to urban ontology: a contribution from urban system analysis

WordNet Lesk Algorithm Preprocessing. WordNet. Marina Sedinkina - Folien von Desislava Zhekova - CIS, LMU.

III.6 Advanced Query Types

The Laws of Motion. Gravity and Friction

NAME: DATE: MATHS: Statistics. Maths Statistics

Memory-Augmented Attention Model for Scene Text Recognition

Annotation tasks and solutions in CLARIN-PL

Lesson title: Scavenger Hunt for Lengths. Grade level: 6-8. Subject area: Mathematics. Duration: Two class periods

Ch 1 The Language of Algebra. 1.1 Writing Expressions and Equations. Variable: Algebraic Expression: Numerical Expression: Multiplication: Division:

George J. Klir Radim Belohlavek, Martin Trnecka. State University of New York (SUNY) Binghamton, New York 13902, USA

Solution of Nonlinear Equations: Graphical and Incremental Sea

Internet Engineering Jacek Mazurkiewicz, PhD

download instant at

Michael H. Levin, M.A., N.B.C.T. Author

3. Joyce needs to gather data that can be modeled with a linear function. Which situation would give Joyce the data she needs?

LSI vs. Wordnet Ontology in Dimension Reduction for Information Retrieval

Multiplying Rational Numbers. ESSENTIAL QUESTION How do you multiply rational numbers?

Topic Models and Applications to Short Documents

Extraction of Opposite Sentiments in Classified Free Format Text Reviews

Computational Cognitive Science

27. THESE SENTENCES CERTAINLY LOOK DIFFERENT

arxiv: v2 [cs.cl] 20 Aug 2016

Math 9: Review for final

CSCI 315: Artificial Intelligence through Deep Learning

Lesson 2 Changes in State

Meaning as Sense (Frege 1892)

Lecture 6: Neural Networks for Representing Word Meaning

Physics 11 Chapter 2: Kinematics in One Dimension

1. The type of energy described by Energy C is which type of energy?

Physics: Momentum, Work, Energy, Power

Math 125 Fall 2015 Exam 1

Latent Semantic Analysis. Hongning Wang

Printable Activity book

Chapter 7. Lesson Lesson 7.1.2

NARNU FARM CAMP BOOKLET

STRAND E: STATISTICS E2 Data Presentation

LC OL - Statistics. Types of Data

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

The Benefits of a Model of Annotation

HAVE GOT WAS WERE CAN. Koalatext.com TO BE GRAMMAR TO BE + JOBS

Name Class Date. Determine whether each number is a solution of the given inequality.

A Tutorial on Learning with Bayesian Networks

Learning to Query, Reason, and Answer Questions On Ambiguous Texts

Stable and Efficient Representation Learning with Nonnegativity Constraints. Tsung-Han Lin and H.T. Kung

STUDENT JOURNAL SAMPLE

Get Up & Move! Series 1 Calendars for

Knowledge Representation

Grundlagenmodul Semantik All Exercises

Jessie s Bus Ride (minutes) Robert s Bus Ride (minutes) Robert s Bus Ride (minutes) Answer choices C and D are on page 3.

Construction of thesaurus in the field of car patent Tingting Mao 1, Xueqiang Lv 1, Kehui Liu 2

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

Robotics Engineering DoDEA Career and Technical Education Simple and Compound Machines

Physics 30S Unit 2 Motion Graphs. Mrs. Kornelsen Teulon Collegiate Institute

Pre-Algebra Semester 2 Practice Exam DRAFT

Sharyland High School. Lightning and Inclement Weather Protocol

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

OBJ: SWBAT review on converting radicals to exponential form and learn to identify, graph, and model power functions Homework Requests: Questions

On the Concept of Enumerability Of Infinite Sets Charlie Obimbo Dept. of Computing and Information Science University of Guelph

Heat: the transfer of thermal energy from one substance to another. Electrons: the small particles inside an atom that have negative charge.

Neural Networks for Web Page Classification Based on Augmented PCA

Chapter 15 Evolution

CHAPTER 5 - DISJUNCTIONS

Text Analysis. Week 5

Transcription:

WEST: WEIGHTED-EDGE BASED SIMILARITY MEASUREMENT TOOLS FOR WORD SEMANTICS Liang Dong, Pradip K. Srimani, James Z. Wang School of Computing, Clemson University Web Intelligence 2010, September 1, 2010

Outline Word Semantic Similarity and WordNet Concept Specification vs. Categorization Weighted-Edge Based Similarity Measure Experimental Studies WEST Conclusion and Future Studies

Word Semantic Similarity (1) Word Sense Relation Synonymy lumber and timber Polysemy bank Financial institution Sloping mound

Word Semantic Similarity (2) Words Relations Hypernym/Hyponym is-a relation Fruit-apple Meronym mammal-dog has-a relation A wheel is a part of a car A leg is part of a chair

WordNet 3.0 WordNet lexical database http://wordnet.princeton.edu/ WordNet 3.0 release has 117,097 nouns, 11,488 verbs, 22,141 adjectives and 4,601 adverbs. The average noun has 1.23 senses and the average verb has 2.16 senses.

WordNet 3.0 Synset: (Synonym set) the set of near-synonyms for a WordNet sense E.g. {chump, fool, gull, mark, patsy, fall guy, sucker, soft touch, mug}

Semantic Similarity of Words One approach to determining the semantic similarity of words is by measuring their graph distance within the WordNet (function of the distance between the words) Hatch-back is more similar to motorcar, than to motor vehicle What about word pairs with the same graph distance

Outline Word Semantic Similarity and WordNet Concept Specification vs. Categorization Weighted-Edge Based Similarity Measure Experimental Studies WEST Conclusion and Future Studies

Specification & Categorization (1) Our recent study discovers that humans are more sensitive to the semantic difference caused by categorization than specification. Inheritance Word-Pair Categorization Word-Pair baked-foods :: cookie bread :: cake

Specification vs. Categorization (2) Inheritance Pair Categorization Pair Inheritance Word-Pair Categorization Word-Pair baked goods :: cookie 30 bread :: cake 19 apple pie :: food 44 cake :: beef 3 beef :: food 48 meat :: chocolate 2 brownie :: cake 44 cookie :: fruitcake 5 ground beef :: meat 24 pork :: mutton 25 apple pie :: pastry 42 pie :: puff 8 stove :: device 41 comb :: fan 8 engine :: machine 18 computer :: calculator 33 hunting dog:: canine 27 wolf :: fox 22 minicab :: car 29 jeep :: sedan 21 gold :: metal 37 aluminum :: zinc 14 Total 340 157 clementine :: fruit 36 apple :: almond 15 chicken :: food 47 octopus :: pastry 0 dynamo :: machine 45 engine :: abacus 4 abbey :: building 26 hostel :: mansion 23 tabloid :: medium 8 laptop :: computer American football :: athletic game cliff diving :: sports 51 broadcasting :: journalism workstation :: chatroom 43 36 golf :: basketball 14 44 hunting :: swimming collegiate dictionary :: book 41 atlas :: bestseller 7 Total 378 115 0 6 Groups with graph distance 2 Groups with graph distance 4

Specification vs. Categorization (3) we randomly stop people in Clemson University campus and ask them to judge which pair in each comparison group is more similar semantically. 51 individuals finished the questionnaire anonymously. The survey results in Table 1 show that in 68.41% of cases, people think the inheritance pairs are more similar, and in 31.59% of cases, people think the categorization pairs are more similar. The results in Table 2 demonstrate that in 76.67% of cases, people think that the inheritance pairs are more similar, and in 23.33% vice versa.

Outline Word Semantic Similarity and WordNet Concept Specification vs. Categorization Weighted-Edge Based Similarity Measure Experimental Studies WEST Conclusion and Future Studies

Weighted Edge Method Specification Level (SpecLev): is the number of hops on the shortest path from the synset to its root, or the depth of synset in WordNet.

Weighted Edge Distance Model Using our weighted edge model, we define weighted edge distance between a sense pair, as a function of the three SpecLev values. l ( ws, ws ) l ( ws, ws ) l ( ws, ws ) 2 l ( ws, ws ) w i j w i root w j root w lca root

Specification Level Difference Specification Level Difference (SLD) : absolute difference of the SpecLev of two word senses. Increasing SLD from 0 to 2 reduces weighted edge distance slev 1 slev j slevlca 1 i slevlca slev m n lca l ( ws, ws ) ( ) w i j m 0 n 0

New Transfer Function g We define the semantic similarity between sense pair be a function of its weighted edge distance: sim( ws, ws ) g( l ) i j w We use different non-linear functions and found that the similarity values obtained by hyperbolic functions best match human judgments.

Key Features of WEST Approach Our weighted edge distance model uses the SLD count to implicitly differentiate the inheriting and categorizing relationships, matching the human perception. The new hyperbolic transfer function matches the human perception more accurately in transferring weighted edge distance into similarity value.

Outline Word Semantic Similarity and WordNet Concept Specification vs. Categorization Weighted-Edge Based Similarity Measure Experimental Studies WEST Conclusion and Future Studies

Benchmark Datasets Miller and Charles (MC) set as the comparison baseline. Since the earlier version of WordNet missed word woodland from the MC set, only 28 word pairs were used in these studies. We utilized all 65 pairs of the original Rubenstein- Goodenough set. Since MC set is a subset of RG set, we applied the 28 pairs of MC set as testing set D0, and the rest 37 pairs of words as training set D1.

Strategy 1,3,5,6,7 lgd sim ( ws, ws ) g ( l ) e 1 i j 1 gd w sim ( ws, ws ) g ( l ) e l 3 i j 1 w 2 sim ( ws, ws ) g ( l ) sec h( x) e e 5 i j 5 w x x sim ( ws, ws ) g ( l ) 6 i j 6 w x x e e, x 0 x x tanh cx ( ) ( e e ) x 1, x 0 sim ( ws, ws ) g ( l ) g ( l ) 7 i j 5 w 6 w

Strategy 2 sim ( ws, ws ) g ( l ) g ( slev ) e 2 i j 1 gd 2 lca l gd e e slev slev lca slev slev lca e e lca lca

Strategy 4 sim ( ws, ws ) g ( l ) g ( slev ) e 4 i j 1 w 2 lca l w e e slev slev lca slev slev lca e e lca lca

Li s method with WordNet versions The correlations between the computed similarities and human judgments using Li s method under different WordNet versions. Li, 2003 Varelas, 2005 Us, 2009 WordNet 1.6 2.0/2.1 3.0 Correlation 0.8914 0.82 0.8078

The Impact of WordNet Evolution

Comparison with IC-approaches Method Type G.Varelas, 2005 Us, 2009 Resnik IC 0.79 0.8124 Lin Normalized IC 0.82 0.7517 Jiang Hybrid 0.83 0.6900 Li Graph Dist & IC 0.82 0.8078 WEST Weighted Edge - 0.8350

Outline Word Semantic Similarity and WordNet Concept Specification vs. Categorization Weighted-Edge Based Similarity Measure Experimental Studies WEST Conclusion and Future Studies

WEST Architecture LCA Selection: For each sense-pair of a word-pair, we use PathFinder from WordNet::Similarity module to retrieve its LCA SpecLev Retrieval: retrieve the SpecLevs of the three target senses. Weighted Edge Distance: We optimize the calculation by pre-calculating the Weighted Edge Distance from all SpecLev to its root (SpecLev 0) and store them into a 2-dimentional array for every Weighted Decreasing Rate. Web Service: SOAP::Lite Module is used to wrap the Weighted Edge interface into SOAP web service.

WEST Website http://bioinformatics.clemson.edu/west/

WEST Website (2)

Outline Word Semantic Similarity and WordNet Concept Specification vs. Categorization Weighted-Edge Based Similarity Measure Experimental Studies WEST Conclusion and Future Studies

Conclusion Humans are more sensitive to the semantic difference caused by categorization than specification. A novel weighted edge approach embedding the specification levels of both words and their Least Common Ancestor (LCA) into a weighted graph distance by exponentially decreasing the weight along its specification level. Experimental studies show that the similarity value obtained by this new method closely matches human perception. Based on this new method, online tools for word similarity measure are implemented as Web services.

Future Studies Measure the semantic similarity of sentences and that of text, based on WEST. Adapt the WEST method to measure the semantic similarity of terms in other ontologies. Combination of WEST and other approaches.

Thank you!