Exploiting WordNet as Background Knowledge

Similar documents
Understanding Interlinked Data

Ontology similarity in the alignment space

Conceptual Similarity: Why, Where, How

Outline. Structure-Based Partitioning of Large Concept Hierarchies. Ontologies and the Semantic Web. The Case for Partitioning

Using C-OWL for the Alignment and Merging of Medical Ontologies

Francisco M. Couto Mário J. Silva Pedro Coutinho

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

2 GENE FUNCTIONAL SIMILARITY. 2.1 Semantic values of GO terms

Combining Sum-Product Network and Noisy-Or Model for Ontology Matching

Comparison between ontology distances (preliminary results)

Toponym Disambiguation using Ontology-based Semantic Similarity

WEST: WEIGHTED-EDGE BASED SIMILARITY MEASUREMENT TOOLS FOR WORD SEMANTICS

Collaborative NLP-aided ontology modelling

Comparison between ontology distances (preliminary results)

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

Syntactic Patterns of Spatial Relations in Text

Similarity for Conceptual Querying

A Case Study for Semantic Translation of the Water Framework Directive and a Topographic Database

X X (2) X Pr(X = x θ) (3)

Uncertain Reasoning for Creating Ontology Mapping on the Semantic Web. Miklos Nagy, Maria Vargas-Vera and Enrico Motta

A System for Ontologically-Grounded Probabilistic Matching

6.3 More Sine Language

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Construction of thesaurus in the field of car patent Tingting Mao 1, Xueqiang Lv 1, Kehui Liu 2

Hierachical Name Entity Recognition

4CitySemantics. GIS-Semantic Tool for Urban Intervention Areas

Conceptual Modeling in the Environmental Domain

A Tutorial on Learning with Bayesian Networks

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Display/Resources ideas:

Deduction by Daniel Bonevac. Chapter 3 Truth Trees

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

ISCA Archive

Leveraging Data Relationships to Resolve Conflicts from Disparate Data Sources. Romila Pradhan, Walid G. Aref, Sunil Prabhakar

From Meaningful Orderings in the Web of Data to Multi-level Pattern Structures

Taxonomies of Building Objects towards Topographic and Thematic Geo-Ontologies

On improving matchings in trees, via bounded-length augmentations 1

A Refined Tableau Calculus with Controlled Blocking for the Description Logic SHOI

Horticulture in Florida

The Rise of Smart Oceans

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data

Semantic Similarity and Relatedness

Annotation tasks and solutions in CLARIN-PL

Models of Computation. by Costas Busch, LSU

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

A HYBRID SEMANTIC SIMILARITY MEASURING APPROACH FOR ANNOTATING WSDL DOCUMENTS WITH ONTOLOGY CONCEPTS. Received February 2017; revised May 2017

Beyond points: How to turn SMW into a complete Geographic Information System

M 140 Test 1 B Name (1 point) SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Breaking de Morgan s law in counterfactual antecedents

Partitions versus sets : a case of duality

Extraction of Opposite Sentiments in Classified Free Format Text Reviews

Harmonizing Level of Detail in OpenStreetMap Based Maps

Root Justifications for Ontology Repair

Ohio s State Tests ANSWER KEY & SCORING GUIDELINES GRADE 6 SOCIAL STUDIES PART 1

Solving and Graphing Inequalities

OWL Semantics COMP Sean Bechhofer Uli Sattler

Meronymy-based Aggregation of Activities in Business Process Models

A Global Constraint for Parallelizing the Execution of Task Sets in Non-Preemptive Scheduling

Modern Information Retrieval

An Effective Approach to Fuzzy Ontologies Alignment

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Ling 130 Notes: Syntax and Semantics of Propositional Logic

Tropical Tracks. Tropical rainforests are located along the Equator. Look at the map in the Biome. Draw the Equator on your map and label it.

RAINFORESTS. What are rainforests? Tall trees. What s the weather like? What else grows there?

Data Presentation. Naureen Ghani. May 4, 2018

The semantics of ALN knowledge bases is the standard model-based semantics of rst-order logic. An interpretation I contains a non-empty domain O I. It

Semantic Technologies in Prod-Trees

Proposition Knowledge Graphs. Gabriel Stanovsky Omer Levy Ido Dagan Bar-Ilan University Israel

Natural Language Processing

Spatial Role Labeling CS365 Course Project

6.3 More Sine Language

ISO Plant Hardiness Zones Data Product Specification

The OntoNL Semantic Relatedness Measure for OWL Ontologies

The Most Expensive House in the Universe

Curriculum Connections for Discovery Field Trips Based on Alabama Course of Study. The Secret Life of Trees Curriculum Connections

Theoretical approach to urban ontology: a contribution from urban system analysis

Extracting Modules from Ontologies: A Logic-based Approach

Mathematical Foundations of Programming. Nicolai Kraus. Draft of February 15, 2018

Year 34 B2 Geography - Continents and Oceans 2018 Key Skills to be covered: Taken from Level 3 Taken from Level 4

Randomized Decision Trees

ax + b < c ax + b c Graphing Inequalities:

A Convolutional Neural Network-based

A set theoretic view of the ISA hierarchy

Data Mining Project. C4.5 Algorithm. Saber Salah. Naji Sami Abduljalil Abdulhak

SPATIAL INFORMATION GRID AND ITS APPLICATION IN GEOLOGICAL SURVEY

Quasi-Classical Semantics for Expressive Description Logics

Guarded resolution for Answer Set Programming

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

Please bring the task to your first physics lesson and hand it to the teacher.

Data-Driven Logical Reasoning

UnSAID: Uncertainty and Structure in the Access to Intensional Data

Map Makeovers: How to Make Your Map Great!

Lesson 15: Structure in Graphs of Polynomial Functions

CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization

Notes on Latent Semantic Analysis

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Syntax versus Semantics:

Bounded Treewidth Graphs A Survey German Russian Winter School St. Petersburg, Russia

On Algebraic Spectrum of Ontology Evaluation

Topic 2 Part 1 [195 marks]

Transcription:

Exploiting WordNet as Background Knowledge Chantal Reynaud, Brigitte Safar LRI, Université Paris-Sud, Bât. G, INRIA Futurs Parc Club Orsay-Université - 2-4 rue Jacques Monod, F-91893 Orsay, France {chantal.reynaud, brigitte.safar}@lri.fr Abstract. A lot of alignment systems providing mappings between the concepts of two ontologies rely on an additional source, called background knowledge, represented most of the time by a third ontology. The objective is to complement others current matching techniques. In this paper, we present the difficulties encountered when using WordNet as background knowledge and we show how the TaxoMap system we implemented can avoid those difficulties. 1 Introduction In order to identify mappings between the concepts of an ontology, called the source ontology (O Src ), with concepts of another one, called the target ontology (O Tar ), a lot of recent works use additional descriptions, called background knowledge, represented by a third ontology O BK [1,2,9,4,7,8,10]. The common objective is to complement current matching techniques which may fail in some cases. Some works as [1,2,10] assume that ontology alignment can rely on a unique and predefined ontology that covers a priori all the concepts of the ontologies to be matched. Conversely, other works [9] suppose that there does not exist a priori any suitable ontology. Hence, their idea is to dynamically select online available ontologies. In this paper, we present the difficulties encountered when using WordNet as background knowledge, in particular the misinterpretation problem coming from the different senses of a term, and how the TaxoMap system we implemented avoids these difficulties. The solution that we propose aims at limiting the meanings of the terms involved in a match. Experimental results are given and the increase of precision obtained with a limitation of the senses of the terms is shown. 2 Use of WordNet WordNet is an online lexical resource for English language that groups synonym terms into synsets, each expressing a distinct concept. The term associated with a concept is represented in a lexicalized form without any mark of gender or plural. Synsets are related to each other with terminological relations such as hypernym relations. WordNet can be used for ontology matching in several ways. A first way is to extend the label of a concept with the synonyms in WordNet belonging to the synset of each term contained in the label [3]. Another way is to exploit WordNet restricted to a concept hierarchy only composed of hypernym relations. Given two

nodes in this hierarchy, equivalence relation can be inferred if their distance is lower than a given threshold [4]. Other works compute similarity measures [5,6,8]. This last approach leads to relevant results when the application domains of the ontologies to be mapped are very close and targeted. Conversely, results are much less satisfactory when application domains are larger. Indeed a term can belong to several synsets. This leads to misinterpretations and false positive mappings. We illustrate this problem with results coming from experiments performed with TaxoMap [8] on the taxonomies Russia-A (O Tar ) and Russia-B (O Src ) loaded from the Ontology Matching site [11]. Both taxonomies describe Russia from different view points but Russia-B contains extra information on the means of transport. Fig.1 represents the WordNet subgraph that is exploited in the search of the terms of O Tar (grey circles in Fig.1) the most similar to the terms in O Src (white circles in Fig.1) that denote vehicles. As no term in O Tar correspond to means of transport, all terms in O Src that refer to vehicles will be related to Berlin, a term belonging to three synsets respectively corresponding to a city in Germany, a musician and a kind of car. physical_object location region district city national_capital berlin unit limousine artifact car Figure 1. WordNet sub-graph. building_material transport wheeled_vehicle bus vehicle To avoid this problem, the TaxoMap system relies on a limitation of the senses of the terms in WordNet. It performs a two-step process: a sub-tree is first extracted from WordNet, which only corresponds to the senses assumed to be relevant to the domain of the involved ontologies. Second, mappings are identified in this sub-tree. 2.1 Extraction of a sub-tree relevant to the domain from WordNet The extraction of a sub-tree starts with a manual phase. If the application domains of the ontologies to be mapped are close and targeted, an expert has to identify the concept, noted root A, that is the most specialized concept in WordNet which generalizes all the concepts of both ontologies. If the target ontology is relative to several distinct application domains, the expert has to identify several root nodes in order to cover all the topics. Then, the extraction of the relevant sub-tree needs the search of relations between all the concepts in O Tar and in O Src not yet mapped and root A. Hypernyms of each concept are looked for in the WordNet hierarchy until root A, or one of the WordNet roots, is reached. For example, a search on cantaloupe will result in these two following derivation paths: Path 1: cantaloupe sweet melon melon gourd plant organism Living Path 2: cantaloupe sweet melon melon edible fruit green goods food plane craft watercraft aircraft ship steamer battleship X Tar O Tar X src O Src boat cruiser

The paths from the invoked terms to the root A (food in the example) will only be selected because they represent the only accurate senses for the application. That way, a sub-graph, denoted T WN, is obtained. It is composed of the union of all the concepts and relations of the selected paths (cf. Fig. 2). The T WN s root is the concept the most general in the application, root A, leaf nodes correspond to the concepts of the ontologies to be mapped (circles in Fig.2), middle nodes have been extracted from WordNet but possibly belong to one of the two ontologies too. X Src : Cantaloupe sweet melon melon edible fruit green goods food Y Tar : Water melon melon edible fruit green goods food food X Src : Asparagus vegetable green goods food green goods Cantaloupe sweet melon X Src melon Water melon edible fruit Figure 2. An example of sub-graph T WN where the root is food In the Russia experiment, the chosen roots (Location, Living Thing, Structure and Body of Water) covering all the topics of O Tar are not hypernyms of terms in O Src relative to vehicles and no derivation path computed from these terms is retained. Missing terms are preferred over misinterpretations. Recall of matching process will be smaller but precision higher. Y Tar vegetable Y Tar asparagus drink X Src YTar 2.2 Mappings identification Identification of relevant mappings consists of discovering, for each concept in O Src, the closest concept in O Tar that is its ancestor and that belongs to its derivation path rooted in root A. So, from the sub-graph in Fig.2, the mapping (asparagus isa vegetable) can be derived. That process does not allow the discovery of mappings for cantaloupe because none of its ancestor nodes is a concept in O Tar. All are middle nodes coming from WordNet. However, it should be very interesting to map cantaloupe to Watermelon because they are two specializations of melon, so very close semantically. Such a mapping can be derived using a similarity measure between nodes of T WN. There is evidence that correspondences discovered thanks to such measures cannot be used to derive semantic mappings (as isa or Eq relation) which have a clear semantic and which could then be automatically exploited [8]. But there is also evidence that it would be a great pity not to exploit discovered information. So, we propose to retain such relations which will be labelled isclose as potential mappings for which an expert evaluation will be necessary. Consequently, the choice in TaxoMap is to discover, for each concept X Src in O Src not yet mapped, the concept Y Sim in O Tar that is the most similar according to a similarity score. From that correspondence we derive the potential mapping (X Src isclose Y Sim ). Then we extract, as we mentioned before, the set of semantic mappings in T WN. If a concept Y Sim is linked to the same concept X Src both in a semantic and in a potential mapping, only the semantic mapping is retained. For example, the concept

vegetable in O Tar being the concept the most similar to the concept asparagus in O Src, we derive the potential mapping (asparagus isclose vegetable). However, the semantic mapping (asparagus isa vegetable) can also be derived. We will then consider that this semantic mapping will be the only retained one. On the opposite, as no semantic mapping has been derived for the concept cantaloupe, the potential mapping (cantaloupe isclose Watermelon) will be retained. 3 Experiments Different experiments have been performed in the micro-biology domain and on taxonomies used for tests in the Ontology Matching community [11]. All these experiments showed that if the application domain is too large, we can not use a unique root. Indeed, in that case, the concept in WordNet which is an hypernym of all the concepts to be mapped, is too general (entity) and T WN is too big. It is composed of all the nodes in the WordNet hierarchy without any restriction. Several meanings are mixed. This leads to the derivation of non relevant mappings. We give results obtained with the Russia taxonomies. As we see in Tab. 1, with a single root (Entity) our technique has derived 61 isa and 15 isclose mappings among 162 terms in Russia-B not yet mapped by others techniques (370 terms were to be mapped at the beginning). As no reference mappings were delivered, the results have been evaluated manually. Only 29 out of 61 isa mappings and 8 out of 15 isclose mappings were correct. In particular, all mappings relative to vehicle are false (cf. FIG.1). A significant increase of the precision of the found mappings has been obtained when several roots have been specified. In that case, several distinct subtrees are built in the same time, one per sub-domain. Four roots have been identified: Location, Living Thing, Structure and Body of Water. Then 35 isa and 11 isclose mappings have been derived. 29 out of 35 isa mappings and 9 out of 11 isclose mappings were correct. In particular, all geographic mappings relative to towns, countries, regions and rivers were relevant. Even though the same number of correct isa mappings (29) appears as the results of the two experiments, these mappings are not all the same. For example, the (alcohol isa drink) mapping is not identified in the second experiment because the concept drink of O Tar is not covered by the chosen roots. On the opposite, the (pine isa plant) mapping is identified whereas without senses limitation the incorrect (pine isa material) mapping was found. # isa mappings found (relevant) # isclose mappings found (relevant) Total Number of mappings (relevant) Recall (Precision) With a single root (Entity) 61 (29) 15 (8) 76 (37) 0,23 (0,49) With several roots 35 (29) 11 (9) 46 (38) 0,23 (0,83) Table 1. Number of found mappings among 162 terms in Russia-B not yet mapped

A more precise choice of the roots would very probably increase recall. In our application context, as the identification step of the roots in WordNet can be done just in reference to O Tar, this task is only performed once and the identified roots will be exploited whatever the taxonomies of information sources to be aligned with O Tar might be. Hence it is worthwhile to pay attention to this identification step. Our first results are already promising. Yet we think they could be even better with a more precise choice of the roots. 4 Conclusion So, in conclusion the use of external background knowledge can be very interesting when the context of interpretation of the involved concepts is precisely known. It allows the obtention of semantic relations and so overcomes the major limitations of syntactic approaches. WordNet is often used as background resource. However, the drawback is that it is difficult to get relevant information if the meaning of the searched terms is not known. The results of our experiments using WordNet indicate that our approach based on the definition of multiple roots is a promising solution when the domain of the background knowledge is too large. Whatever the domain, the sub-tree grouping terms relevant to the application can be extracted from WordNet with our system. We showed how semantic mappings, when they exist, can be found in this sub-tree and how, when they do not exist, meaningful proximity relationships can be found instead. References 1. Z. Aleksovski, M. Klein, W. Ten Kate, F. Van Harmelen. Matching Unstructured Vocabularies using a Background Ontology, In Proc. of EKAW'06, Springer-Verlag, 2006. 2. Z. Aleksovski, M. Klein, W. Ten Kate, F. Van Harmelen. Exploiting the Structure of Background Knowledge used in Ontology Matching». In Proceedings of ISWC 06 Workshop on Ontology Matching (OM-2006), Athens, Georgia, USA, November 2006. 3. T. L. Bach, R. Dieng-Kuntz, F. Gandon. On Ontology matching Problems for Building a Corporate Semantic Web in a Multi-Communities Organization, in Proc. of ICEIS (4), 2004. 4. F. Giunchiglia, P. Shvaiko, M. Yatskevich. Discovering Missing Background Knowledge in Ontology Matching, In Proc. of ECAI 06. 5. Y. Kalfoglou, B. Hu. Crosi Mapping System (CMS) Result of the 2005 Ontology Alignment Contest. In Proc of K-Cap 05 Integrating Ontologies WS, pp. 77-85, Banff, Canada, 2005. 6. T. Pedersen, S. Patwardhan, J. Michelizzi. WordNet::Similarity - Measuring the Relatedness of Concepts, In Proc. of AAAI-04, July 25-29, San Jose, CA, 2004. 7. C. Reynaud, B. Safar. When usual structural alignment techniques don t apply. In Proc. of ISWC 06 Ontology Matching WS, (OM-2006), Poster, 2006. 8. C. Reynaud, B. Safar. Structural Techniques for Alignment of Taxonomies: Experiments and Evaluation, In TR 1453, LRI, Université Paris-Sud, Juin 2006. 9. M. Sabou, M. D Aquin, E.Motta. Using the Semantic Web as Background Knowledge for Ontology Mapping, In Proc. of ISWC 06 Ontology Matching WS (OM-2006), 2006. 10. S. Zhang, O. Bodenreider. NLM Anatomical Ontology Alignement System. Results of the 2006 Ontology Alignment Contest. In Proc. of ISWC 06 Ontology Matching 2006. 11. http://www.atl.external.lmco.com/projects/ontology/ontologies/russia/