Collaborative NLP-aided ontology modelling

Similar documents
The OntoNL Semantic Relatedness Measure for OWL Ontologies

A Case Study for Semantic Translation of the Water Framework Directive and a Topographic Database

Formal Ontology Construction from Texts

Annotation tasks and solutions in CLARIN-PL

ToxiCat: Hybrid Named Entity Recognition services to support curation of the Comparative Toxicogenomic Database

Topic Models and Applications to Short Documents

Polionto: Ontology reuse with Automatic Text Extraction from Political Documents

SemaDrift: A Protégé Plugin for Measuring Semantic Drift in Ontologies

Information Extraction from Text

Geographic Analysis of Linguistically Encoded Movement Patterns A Contextualized Perspective

Week 4. COMP62342 Sean Bechhofer, Uli Sattler

Citation for published version (APA): Andogah, G. (2010). Geographically constrained information retrieval Groningen: s.n.

Geosciences Data Digitize and Materialize, Standardization Based on Logical Inter- Domain Relationships GeoDMS

UML. Design Principles.

Canadian Board of Examiners for Professional Surveyors Core Syllabus Item C 5: GEOSPATIAL INFORMATION SYSTEMS

Portal for ArcGIS: An Introduction. Catherine Hynes and Derek Law

A SEMANTIC SCHEMA FOR GEONAMES. Vincenzo Maltese and Feroz Farazi

2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes

CLRG Biocreative V

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context.

A To Z GIS: An Illustrated Dictionary Of Geographic Information Systems PDF

ISO Series Standards in a Model Driven Architecture for Landmanagement. Jürgen Ebbinghaus, AED-SICAD

TEMPLATE FOR CMaP PROJECT

Leveraging Web GIS: An Introduction to the ArcGIS portal

Designing and Evaluating Generic Ontologies

Topical Sequence Profiling

Using C-OWL for the Alignment and Merging of Medical Ontologies

The Alignment of Formal, Structured and Unstructured Process Descriptions. Josep Carmona

B u i l d i n g a n d E x p l o r i n g

Semantic Geospatial Data Integration and Mining for National Security

GIS data models: how to georeference a farm? Nicola Ferrè

Reducing Complex Spatial Data Management Needs in and Upstream Oil and Gas Operations Team. Jason Wilson GIS Business Analyst SM Energy

A Data Repository for Named Places and Their Standardised Names Integrated With the Production of National Map Series

The World Bank and the Open Geospatial Web. Chris Holmes

Ontology Summit Framing the Conversation: Ontologies within Semantic Interoperability Ecosystems

19.2 Geographic Names Register General The Geographic Names Register of the National Land Survey is the authoritative geographic names data

Topics in Natural Language Processing

Urban Spatial Scenario Design Modelling (USSDM) in Dar es Salaam: Background Information

Schema Free Querying of Semantic Data. by Lushan Han

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Exploiting WordNet as Background Knowledge

Desktop GIS for Geotechnical Engineering

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

A Convolutional Neural Network-based

Information Retrieval and Organisation

Outline. Structure-Based Partitioning of Large Concept Hierarchies. Ontologies and the Semantic Web. The Case for Partitioning

Introduction to Portal for ArcGIS

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

GEO-INFORMATION (LAKE DATA) SERVICE BASED ON ONTOLOGY

Collaborative topic models: motivations cont

Unit: Inside the Earth Inquiry Task Topography of the Oceans

Extracting Modules from Ontologies: A Logic-based Approach

Enabling ENVI. ArcGIS for Server

Multi-theme Sentiment Analysis using Quantified Contextual

WS/FCS Unit Planning Organizer

Seamless Model Driven Development and Tool Support for Embedded Software-Intensive Systems

Lecture 5 Neural models for NLP

A TOOLKIT FOR MARINE SPATIAL PLANNING Version: 17 July, 2009

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

An OWL Ontology for Quantum Mechanics

Enterprise Integration of Autodesk MapGuide at the City of Vancouver

The Five Themes of Geography

Building a Timeline Action Network for Evacuation in Disaster

DBpedia Ontology Enrichment for Inconsistency Detection and Statistical Schema Induction

An Effective Approach to Fuzzy Ontologies Alignment

Ontologies and Spatial Decision Support

Probability and Statistics

Esri Defense Mapping: Cartographic Production. Bo King

Beating Social Pulse: Understanding Information Propagation via Online Social Tagging Systems 1

A Comparative Study of Current Clinical NLP Systems on Handling Abbreviations

OWL Semantics COMP Sean Bechhofer Uli Sattler

GEIR: a full-fledged Geographically Enhanced Information Retrieval solution

An Introduction to GLIF

TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing

1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)

BASIC TECHNOLOGY Pre K starts and shuts down computer, monitor, and printer E E D D P P P P P P P P P P

A few words on white space control

Office of Geographic Information Systems

ArchaeoKM: Managing Archaeological data through Archaeological Knowledge

GENERALIZATION IN THE NEW GENERATION OF GIS. Dan Lee ESRI, Inc. 380 New York Street Redlands, CA USA Fax:

Internet Engineering Jacek Mazurkiewicz, PhD

Stigmergy: a fundamental paradigm for digital ecosystems?

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

GIS for spatial decision making

The Noisy Channel Model and Markov Models

Latent Dirichlet Allocation Introduction/Overview

Science Skills Station

Evaluation Strategies

YEAR 8 GEOGRAPHY. Landscapes & Landforms

ARGUS.net IS THREE SOLUTIONS IN ONE

pursues interdisciplinary long-term research in Spatial Cognition. Particular emphasis is given to:

Click Prediction and Preference Ranking of RSS Feeds

Introduction to ArcGIS Maps for Office. Greg Ponto Scott Ball

Machine Translation Evaluation

Detecting Heavy Rain Disaster from Social and Physical Sensors

The Relevance of Spatial Relation Terms and Geographical Feature Types

Can Vector Space Bases Model Context?

MOD Ontology. Ian Bailey, Model Futures Michael Warner, MOD ICAD

Portal for ArcGIS: An Introduction

gvsig a real tool for GIS technicians

Transcription:

Collaborative NLP-aided ontology modelling Chiara Ghidini ghidini@fbk.eu Marco Rospocher rospocher@fbk.eu International Winter School on Language and Data/Knowledge Technologies TrentoRISE Trento, 24 th February 2012 1

Part I ONTOLOGIES & ONTOLOGY MODELLING 2

What is an ontology? Many definitions of an ontology in literature; Here we refer to an ontology as a formal specifications of the terms in the domain and relations among them (*) Ontologies contain a formal explicit description of: Concepts (aka classes) Relations (aka roles) Individuals (aka instances) Classes (and relations) can be ordered in taxonomies using the subclass relation (*) [Gruber, T.R. (1993). A Translation Approach to Portable Ontology Specification. Knowledge Acquisition 5: 199-220.] 3

Andrew Charles Patty Rome Milan London Paris People Town haswife hasbrother livesin livesin Andrew Charles Patty Rome Milan London Paris People Town haswife hasbrother livesin livesin Andrew Charles Patty Rome Milan London Paris People Town haswife hasbrother livesin livesin Andrew Charles Patty Rome Milan London Paris People Town haswife hasbrother livesin livesin In a picture 4

Taxonomies Classes (and relations) can be ordered in taxonomies using the subclass relation Example: biological classification of species Same for roles 5

Axioms Concepts can be formally described through axioms A Pizza Margherita is a pizza which has both tomato topping and mozzarella topping P izzamargherita v P izza P izzam argherita v9hast opping.t omatot opping P izzamargherita v9hast opping.mozzarellat opping 6

Different types of Ontologies Slide taken from Ontology-Driven Conceptual Modelling A tutorial by Nicola Guarino. 7

Why to develop an ontology? To share common understanding of the structure of information among people or software agents To enable reuse of domain knowledge To make domain assumptions explicit To separate domain knowledge from the operational knowledge To analyze domain knowledge 8

Examples of ontologies Large taxonomies categorizing Web sites (such as on Yahoo!) Medical Ontologies (such as SNOMED) to annotate documents and share information Categorizations of products for sale and their features (such as on Amazon.com, but also smaller enterprises). Therefore The development of ontologies is moving from the realm of research labs to the desktop of domain experts 9

Problems in ontology modeling 1. Modelling is a collaborative activity How to write an ontology? How to change this axiom? Is this information relevant? Domain expert What is the meaning of this description? Knowledge engineer 10

Problems in ontology modeling 2. Modelling is a time-consuming and error-prone activity, and often needs parsing of a large quantity of material. Do I really need to read all this? 11

Our contribution Our Contribution to solve those problems 1. Framework for the collaborative modeling of ontologies using wikis 2. Automatic extraction of key-phrases for ontology modelling 12

Part II COLLABORATIVE FRAMEWORK FOR ONTOLOGY MODELING 13

Why a wiki-based conceptual modeling tool? Wikis support collaborative editing; Users are quite familiar with viewing/editing wiki content (e.g. Wikipedia); Only a web-browser is required on the client side; Wikis provide a shared knowledge repository accessible by users spread all over the world; Wikis can provide a uniform tool/interface for the specification of different model types (e.g. ontologies, processes, ); 14 14

An architecture for collaborative conceptual modeling in wikis 1. One element One page each element of the model is represented by a page in the wiki; Mountain Concept Mountain A mountain is a large landform that stretches above the surrounding land in a limited area usually in the form of a peak. A mountain is generally steeper than a hill. The highest mountain on earth is the Mount Everest 15 15

An architecture for collaborative conceptual modeling in wikis 2. Unstructured and structured descriptions each page contains both structured and unstructured content; Mountain A mountain is a large landform that stretches above the surrounding land in a limited area usually in the form of a peak. A mountain is generally steeper than a hill. The highest mountain on earth is the Mount Everest (unstructured content) v Landform v Hill u P lain v8madeof(earth t Rock) v9height. 2500 Mountain(Mt.Everest) Mountain(M t.kilimanjaro) (structured content) 16 16

An architecture for collaborative conceptual modeling in wikis 3. Different views to access the model: different views to support different modeling actors; Mountain Mountain is a different from landform hill, plain A mountain is a large landform that made of stretches above the surrounding land in made of a limited area usually in the form of a peak. A mountain is generally steeper height earth rock at least 2,500m than a hill. samples Mt. Everest The highest mountain on earth is Mt. Kilimanjaro the Mount Everest (semi - structured view) (unstructured view) Mountain v Landform v Hill u P lain v8madeof(earth t Rock) v9height. 2500 Mountain(Mt.Everest) Mountain(M t.kilimanjaro) 17 (fully - structured view) 17

An architecture for collaborative conceptual modeling Alignment between the different views Mountain A mountain is a large landform that stretches above the surrounding land in a limited area usually in the form of a peak. A mountain is generally steeper than a hill. The highest mountain on earth is the Mount Everest (unstructured view) is a different from made of made of landform earth height at least 2,500m samples hill, plain rock Mt. Everest Mt. Kilimanjaro (semi-structured view) v Landform v Hill u P lain v8madeof(earth t Rock) v9height. 2500 Mountain(Mt. Everest) Mountain(Mt. Kilimanjaro) (fully structured view) 18 18

MoKi: The modeling wiki Web 2.0 tool Collaborative editing between knowledge experts and knowledge engineers Term extraction features Automatic translation from and to OWL and BPMN Graphical and textual editing Integrated ontology and process modeling Support for validation and feedback Available as open source tool. Demo at moki.fbk.eu 19

Part III MOKI DEMO 20

Definition of the collaborative framework Hints on the applicability of the tool also for other conceptual modelling languages (BPMN) Showcase of results and usages 21

Part IV AUTOMATIC EXTRACTION OF KEY- PHRASES FOR ONTOLOGY MODELLING 22

NLP-aided ontology engineering Support ontology modeling by extracting concepts characterizing a domain from a reference text corpus actually, by automatically extracting key-phrases Key-phrases are the terms characterizing a document or a corpus of documents => candidate relevant concepts of the domain described by the corpus Automatic concepts extraction plays an important role in ontology modeling: To boost the ontology construction/extension phase To validate an ontology against a domain corpus

An NLP-aided ontology engineering framework A framework for supporting ontology building/evaluation by automatic concept extraction from a reference text corpus A fully-working and publicly available implementation of the proposed framework in MoKi

NLP-aided ontology engineering Domain corpus Corpus collection Validation / Evaluation Ontology metrics Extended ontology Key-concepts extraction Alignment with additional resources Enriched key-concepts list Candidate key-concepts list External resources (e.g Wordnet) Current ontology 25

Corpus Selection Corpus collection! Key-concepts extraction! Manual validation! Alignment with external resources! The corpus can be manually or automatically selected (e.g. crawling web pages). Corpus could consist of: (large) collection of documents e.g. pollen bulletins crawled on-line A single big document e.g. the BPMN specification.

Key-concept extraction Corpus collection! Key-concepts extraction! Manual validation! Alignment with external resources! Performed by KX (Keyphrase extraction) tool. exploits linguistic information and statistical measures to select a list of weighted keywords from documents; handles multi-words; flexible parameters configuration; easily adaptable to new languages; ranked 2 nd (out of 20) at SemEval2010, task on Automatic Keyphrase Extraction from Scientific Articles.

Alignment with additional resources Corpus collection! Key-concepts extraction! Manual validation! Alignment with external resources! Extracted key-concepts aligned and enriched with additional resources: WordNet (& WN domains): synonyms, definitions, SUMO labels; Wikipedia: link to the Wikipedia page corresponding to the term (exploiting BabelNet); Other external resources (e.g. dictionary). Enriched key-concepts list matched against the ontology, to detect already defined key-concepts.

Ontology Extension / Evaluation Corpus collection! Key-concepts extraction! Manual validation! Alignment with external resources! Ontology Extension: The user decides which of the extracted key-concepts to add to the ontology; The additional details provided in the enriched list may guide the formalization; e.g. is-a related synsets, definitions, Ontology Terminological Evaluation: Automatically computed metrics (variants of IR precision and recall) support users in determining the terminological coverage of the ontology wrt to the corpus used;

Application Scenarios The proposed approach can support several different ontology engineering tasks: Ontology construction boosting: building an ontology from scratch; Ontology extension: adding new concepts to an existing ontology; Ontology evaluation: evaluating terminologically an ontology against a domain corpus; Ontology ranking: ranking candidate ontologies wrt a given domain corpus; Ranking of ontology concepts: determining which are the domain-wise most relevant concepts defined in an ontology.

Framework fully-implemented in MoKi Publicly available @ moki.fbk.eu Accepts a collection of digital documents in any popular formats Let s see it in action!

Part V MOKI DEMO (CONTINUED) 32

Part VI PHD CALL ON INFORMATION EXTRACTION FOR ONTOLOGY ENGINEERING 33

Building Quality Ontologies Starting Point: a collaborative ontology modeling framework supported by NLP techniques Goal: to support building rich and high quality ontologies Issue: current state of the art NLP techniques for information extraction have some limitations wrt ontology modeling: mainly focused on the extraction of terms; more suitable to support the construction of light-weight medium-quality ontologies; Challenge: how to appropriately exploit NLP techniques to support the construction of rich and high quality ontologies?

PhD call on Information Extraction for Ontology Engineering Objective: Investigate how to combine work in automatic ontology learning and work in methodologies and tools for manual knowledge engineering to produce (semi)-automatic services for ontology learning better supporting the construction of rich and good quality ontologies. Address key research challenges in NLP and ontology engineering. Strong algorithmic and methodological aspects, together with implementation-oriented tasks.

team at FBK Collaborative modeling of ontologies and processes Ontological description of processes Guided domain expert modeling via template Multi-linguality and egovernment application

Thank You! Questions? Chiara Ghidini http://dkm.fbk.eu/ghidini ghidini@fbk.eu Marco Rospocher http://dkm.fbk.eu/rospocher rospocher@fbk.eu