Comparison between ontology distances (preliminary results)

Size: px
Start display at page:

Download "Comparison between ontology distances (preliminary results)"

Transcription

1 Comparison between ontology distances (preliminary results) Jérôme David, Jérôme Euzenat ISWC / 17

2 Context A distributed and heterogenous environment: the semantic Web Several ontologies on the same domain Problem: how to facilitate the communication between programs using different ontologies?? software program, peer, etc. software program, peer, etc. 2 / 17

3 Context A distributed and heterogenous environment: the semantic Web Several ontologies on the same domain Problem: how to facilitate the communication between programs using different ontologies? alignment = mediator software program, peer, etc. software program, peer, etc. 2 / 17

4 Context How to deal with many ontologies at Web scale? A solution: Distances between ontologies In peer-to-peer systems: find a peer using a close ontology In ontology engineering: find related ontologies for facilitating ontologies reuse etc. Objectives: Review some distances between ontologies distances issued from ontology matching Make a preliminary evaluation of these measures accuracy vs. speed 3 / 17

5 Various kinds of ontology distances Two kinds of ontology distances evaluated: Vectorial distances: ontologies are viewed as bags of terms 1. which weight scheme to use? 2. which vector distance to use? Entity-based distances: the distance between ontologies is function of the distance between their entities 1. entity distances: structural or/and lexical? 2. aggregating entity distance values: which collection distance to use? 4 / 17

6 Vectorial ontology distances RedWine Ontology Ontology vector... Médoc individual MyChateauMargauxBottle Château Margaux label comment ProducedIn GrapeVariety Producer Cantenac Médoc Ṁédoc wines are french red wine produced in the Bordeaux wine region. BordeauxArea Cabarnet Sauvignon Vintage Appellation Name City Merlot Margaux 1998 ChateauMargaux wine red médoc french bordeaux cabarnet sauvignon merlot margaux chateau cantenac 1998 grape appelation producer Vector measures Cosine Jaccard (Tanimoto index) Possible weights Boolean Frequency TF.IDF 5 / 17

7 Entity distances Label-based distance: String-distance between entity annotations : Jaro-Winkler Minimum Weight pairing + arithmetic mean Structural distances: OLA similarity Structural iterative similarity for OWL ontologies Triple-based iterative similarity Structural iterative similarity for RDF graphs Problem: How to aggregate entity measures values? concepts of o entity distance values concepts of o' ? ontology distance value? 6 / 17

8 Collection measures Goal: compute ontology measure from concept measure values. Collection measures used Average linkage : the arithmetic mean of concept measure values. concepts of o' concepts of o mean = / 17

9 Collection measures Goal: compute ontology measure from concept measure values. Collection measures used Average linkage : the arithmetic mean of concept measure values. Hausdorff : Hausdorff (o, o ) = max(max e o min e o δ K (e, e ), max e o min e o δ K (e, e )) concepts of o' concepts of o / 17

10 Collection measures Goal: compute ontology measure from concept measure values. Collection measures used Average linkage : the arithmetic mean of concept measure values. Hausdorff : Hausdorff (o, o ) = max(max e o min e o δ K (e, e ), max e o min e o δ K (e, e )) concepts of o' concepts of o / 17

11 Collection measures Goal: compute ontology measure from concept measure values. Collection measures used Average linkage : the arithmetic mean of concept measure values. Hausdorff : Hausdorff (o, o ) = max(max e o min e o δ K (e, e ), max e o min e o δ K (e, e )) concepts of o' concepts of o / 17

12 Collection measures Goal: compute ontology measure from concept measure values. Collection measures used Average linkage : the arithmetic mean of concept measure values. Hausdorff : Hausdorff (o, o ) = max(max e o min e o δ K (e, e ), max e o min e o δ K (e, e )) Minimum Weight Maximum Graph Matching distance: concepts of o' concepts of o mean = / 17

13 Selected Measures 12 measures evaluated: 3 vectorial measures: Jaccard: the most basic measure (proportion of common terms) Cosine + TF weights Cosine + TF IDF weights 3 entity measures: EntityLexicalMeasure TripleBasedEntitySim OLAEntitySim 9 entity based measures combined with 3 collections measures: AverageLinkage Hausdorff MWMGM 8 / 17

14 Benchmark suite OAEI benchmark test set: 1 reference ontology 101 and altered ontologies 6 kinds of alterations: Name: suppressed, randomized, synonyms, convention, other language Comment: suppressed, other language Specialization hierarchy: suppressed, expanded, flattened Instances: suppressed Properties: suppressed, discarded restrictions Classes: split, flattened Order between alterations Example on name: {suppressed, randomized} < { synonym, convention, other language } ontology with synonym names is closer to the reference one than ontology with no names 9 / 17

15 Benchmark suite Induced proximity order: 101 is closer to 225 than _101_ distance / 17

16 Benchmark suite Induced proximity order (with partial alterations): 11 / 17

17 Tests We performed 3 different tests: Order comparison on benchmark test set check if the similarity orders are compatible with the order induced by alterations For each o, o check if 101 < o < o δ(101, o) < δ(101, o ) - Test set is biased towards 1-1 matching: some measure favored 12 / 17

18 Tests We performed 3 different tests: Order comparison on benchmark test set check if the similarity orders are compatible with the order induced by alterations For each o, o check if 101 < o < o δ(101, o) < δ(101, o ) - Test set is biased towards 1-1 matching: some measure favored Different cardinality matching compare with related but different ontologies compare with unrelated ontologies 12 / 17

19 Tests We performed 3 different tests: Order comparison on benchmark test set check if the similarity orders are compatible with the order induced by alterations For each o, o check if 101 < o < o δ(101, o) < δ(101, o ) - Test set is biased towards 1-1 matching: some measure favored Different cardinality matching compare with related but different ontologies compare with unrelated ontologies Time consumption test compare the time consumption of evaluated measures 12 / 17

20 Order comparison Proportion of o, o satisfying the induced order: Measure Tests Passed (ratio) Original Enhanced MWMGM (EntityLexicalMeasure) Hausdorff (EntityLexicalMeasure) NaN NaN AverageLinkage (EntityLexicalMeasure) MWMGM (OLAEntitySim) Hausdorff (OLAEntitySim) AverageLinkage (OLAEntitySim) MWMGM (TripleBasedEntitySim) Hausdorff (TripleBasedEntitySim) AverageLinkage (TripleBasedEntitySim) CosineVM (TF) CosineVM (TFIDF) JaccardVM (TF) best entity measure : TripleBasedEntitySim 13 / 17

21 Order comparison Proportion of o, o satisfying the induced order: Measure Tests Passed (ratio) Original Enhanced MWMGM (EntityLexicalMeasure) Hausdorff (EntityLexicalMeasure) NaN NaN AverageLinkage (EntityLexicalMeasure) MWMGM (OLAEntitySim) Hausdorff (OLAEntitySim) AverageLinkage (OLAEntitySim) MWMGM (TripleBasedEntitySim) Hausdorff (TripleBasedEntitySim) AverageLinkage (TripleBasedEntitySim) CosineVM (TF) CosineVM (TFIDF) JaccardVM (TF) best entity measure : TripleBasedEntitySim best collection measure : MWMGM 13 / 17

22 Order comparison Proportion of o, o satisfying the induced order: Measure Tests Passed (ratio) Original Enhanced MWMGM (EntityLexicalMeasure) Hausdorff (EntityLexicalMeasure) NaN NaN AverageLinkage (EntityLexicalMeasure) MWMGM (OLAEntitySim) Hausdorff (OLAEntitySim) AverageLinkage (OLAEntitySim) MWMGM (TripleBasedEntitySim) Hausdorff (TripleBasedEntitySim) AverageLinkage (TripleBasedEntitySim) CosineVM (TF) CosineVM (TFIDF) JaccardVM (TF) best entity measure : TripleBasedEntitySim best collection measure : MWMGM best vectorial measure : Cosine with TF weights 13 / 17

23 Order comparison Proportion of o, o satisfying the induced order: Measure Tests Passed (ratio) Original Enhanced MWMGM (EntityLexicalMeasure) Hausdorff (EntityLexicalMeasure) NaN NaN AverageLinkage (EntityLexicalMeasure) MWMGM (OLAEntitySim) Hausdorff (OLAEntitySim) AverageLinkage (OLAEntitySim) MWMGM (TripleBasedEntitySim) Hausdorff (TripleBasedEntitySim) AverageLinkage (TripleBasedEntitySim) CosineVM (TF) CosineVM (TFIDF) JaccardVM (TF) best entity measure : TripleBasedEntitySim best collection measure : MWMGM best vectorial measure : Cosine with TF weights Best results: TripleBasedEntitySim with MWMGM Cosine with TF weights: basic measure but works well Hausdorff works only with TripleBasedEntitySim Is MWMGM favored by the 1-1 nature of this test? 13 / 17

24 Different cardinality matching Objective: Ensure that MWMGM is still the best measure when we use different cardinality ontologies Test set: A reference ontology: 101 Highly altered ontologies: 2xx Different but similar ontologies: 3xx Irrelevant ontologies: conference test set (could share some vocabulary with benchmark ontologies) Expected results: δ(101, 2xx) < δ(101, 3xx) < δ(101, conference) 14 / 17

25 Different cardinality matching Results: MWMGM still performs best, vectorial measures fail when ontologies are lexically altered Main misorderings: MWMGM Hausdorff : only one altered (250) with similar (3xx) and conference ontologies some similar (3xx) with conference ontologies with TripleBasedEntitySim: only 250 with similar (3xx) and a conference ontology with OLAEntitySim: 228, 248, 250, 251, 252 with 304 and conference ontologies AverageLinkage with LexicalEntitySim: does not work highly altered (2xx) and similar (3xx) ontologies Vectorial measures many altered (248, 250, 251, 252) ontologies with similar (3xx) ontologies 15 / 17

26 Time consumption Are measures useable in real-time applications? Vectorial measures are more time efficient Time consumption on benchmark test: Measure Total time Average time (s) per similarity value (s) MWMGM(EntityLexicalMeasure) MWMGM (OLAEntitySim) MWMGM (TripleBasedEntitySim) Hausdorff (EntityLexicalMeasure) Hausdorff (OLAEntitySim) Hausdorff (TripleBasedEntitySim) AverageLinkage (EntityLexicalMeasure) AverageLinkage (OLAEntitySim) AverageLinkage (TripleBasedEntitySim) CosineVM (TF) CosineVM (TFIDF) JaccardVM (TF) / 17

27 Time consumption Are measures useable in real-time applications? Vectorial measures are more time efficient Not significant differences between MWMGM (N 3 ), Hausdorff and AverageLinkage(N 2 ) Time consumption on benchmark test: Measure Total time Average time (s) per similarity value (s) MWMGM(EntityLexicalMeasure) MWMGM (OLAEntitySim) MWMGM (TripleBasedEntitySim) Hausdorff (EntityLexicalMeasure) Hausdorff (OLAEntitySim) Hausdorff (TripleBasedEntitySim) AverageLinkage (EntityLexicalMeasure) AverageLinkage (OLAEntitySim) AverageLinkage (TripleBasedEntitySim) CosineVM (TF) CosineVM (TFIDF) JaccardVM (TF) / 17

28 Time consumption Are measures useable in real-time applications? Vectorial measures are more time efficient Not significant differences between MWMGM (N 3 ), Hausdorff and AverageLinkage(N 2 ) Structural measures are not really useable in real-time applications Time consumption on benchmark test: Measure Total time Average time (s) per similarity value (s) MWMGM(EntityLexicalMeasure) MWMGM (OLAEntitySim) MWMGM (TripleBasedEntitySim) Hausdorff (EntityLexicalMeasure) Hausdorff (OLAEntitySim) Hausdorff (TripleBasedEntitySim) AverageLinkage (EntityLexicalMeasure) AverageLinkage (OLAEntitySim) AverageLinkage (TripleBasedEntitySim) CosineVM (TF) CosineVM (TFIDF) JaccardVM (TF) / 17

29 Conclusion Why use complex measures when basic measures performs well? for discriminating highly related ontologies More precise test set is needed Characterize confidence intervals Is there a threshold from which returned values are always correct? Other ontology distances measures: alignment based 17 / 17

Comparison between ontology distances (preliminary results)

Comparison between ontology distances (preliminary results) Comparison between ontology distances (preliminary results) Jérôme David, Jérôme Euzenat To cite this version: Jérôme David, Jérôme Euzenat. Comparison between ontology distances (preliminary results).

More information

Ontology similarity in the alignment space

Ontology similarity in the alignment space Ontology similarity in the alignment space Jérôme David 1, Jérôme Euzenat 1, Ondřej Šváb-Zamazal 2 1 INRIA & LIG Grenoble, France {Jerome.David,Jerome.Euzenat}@inrialpes.fr 2 University of Economics Prague,

More information

Uncertain Reasoning for Creating Ontology Mapping on the Semantic Web. Miklos Nagy, Maria Vargas-Vera and Enrico Motta

Uncertain Reasoning for Creating Ontology Mapping on the Semantic Web. Miklos Nagy, Maria Vargas-Vera and Enrico Motta Uncertain Reasoning for Creating Mapping on the Semantic Web Miklos Nagy, Maria Vargas-Vera and Enrico Motta Outline Introduction and context Motivation: Question Answering (QA) Belief for uncertain similarities

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

Combining Sum-Product Network and Noisy-Or Model for Ontology Matching

Combining Sum-Product Network and Noisy-Or Model for Ontology Matching Combining Sum-Product Network and Noisy-Or Model for Ontology Matching Weizhuo Li Institute of Mathematics,Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, P. R. China

More information

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1 Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic

More information

Mapping of Science. Bart Thijs ECOOM, K.U.Leuven, Belgium

Mapping of Science. Bart Thijs ECOOM, K.U.Leuven, Belgium Mapping of Science Bart Thijs ECOOM, K.U.Leuven, Belgium Introduction Definition: Mapping of Science is the application of powerful statistical tools and analytical techniques to uncover the structure

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture;

More information

A conceptualization is a map from the problem domain into the representation. A conceptualization specifies:

A conceptualization is a map from the problem domain into the representation. A conceptualization specifies: Knowledge Sharing A conceptualization is a map from the problem domain into the representation. A conceptualization specifies: What sorts of individuals are being modeled The vocabulary for specifying

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

Ontology-Based News Recommendation

Ontology-Based News Recommendation Ontology-Based News Recommendation Wouter IJntema Frank Goossen Flavius Frasincar Frederik Hogenboom Erasmus University Rotterdam, the Netherlands frasincar@ese.eur.nl Outline Introduction Hermes: News

More information

Exploiting WordNet as Background Knowledge

Exploiting WordNet as Background Knowledge Exploiting WordNet as Background Knowledge Chantal Reynaud, Brigitte Safar LRI, Université Paris-Sud, Bât. G, INRIA Futurs Parc Club Orsay-Université - 2-4 rue Jacques Monod, F-91893 Orsay, France {chantal.reynaud,

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

The OntoNL Semantic Relatedness Measure for OWL Ontologies

The OntoNL Semantic Relatedness Measure for OWL Ontologies The OntoNL Semantic Relatedness Measure for OWL Ontologies Anastasia Karanastasi and Stavros hristodoulakis Laboratory of Distributed Multimedia Information Systems and Applications Technical University

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

Polionto: Ontology reuse with Automatic Text Extraction from Political Documents

Polionto: Ontology reuse with Automatic Text Extraction from Political Documents Polionto: Ontology reuse with Automatic Text Extraction from Political Documents Adela Ortiz PRODEI-The Doctoral Program in Informatics Engineering, FEUP- Faculty of Engineering of the University of Porto,

More information

Chemical Space: Modeling Exploration & Understanding

Chemical Space: Modeling Exploration & Understanding verview Chemical Space: Modeling Exploration & Understanding Rajarshi Guha School of Informatics Indiana University 16 th August, 2006 utline verview 1 verview 2 3 CDK R utline verview 1 verview 2 3 CDK

More information

Extracting Modules from Ontologies: A Logic-based Approach

Extracting Modules from Ontologies: A Logic-based Approach Extracting Modules from Ontologies: A Logic-based Approach Bernardo Cuenca Grau, Ian Horrocks, Yevgeny Kazakov and Ulrike Sattler The University of Manchester School of Computer Science Manchester, M13

More information

Knowledge Sharing. A conceptualization is a map from the problem domain into the representation. A conceptualization specifies:

Knowledge Sharing. A conceptualization is a map from the problem domain into the representation. A conceptualization specifies: Knowledge Sharing A conceptualization is a map from the problem domain into the representation. A conceptualization specifies: What sorts of individuals are being modeled The vocabulary for specifying

More information

An OWL Ontology for Quantum Mechanics

An OWL Ontology for Quantum Mechanics An OWL Ontology for Quantum Mechanics Marcin Skulimowski Faculty of Physics and Applied Informatics, University of Lodz Pomorska 149/153, 90-236 Lodz, Poland mskulim@uni.lodz.pl Abstract. An OWL ontology

More information

UNIVERSITY OF CALGARY. The Automatic Grouping of Sensor Data Layers

UNIVERSITY OF CALGARY. The Automatic Grouping of Sensor Data Layers UNIVERSITY OF CALGARY The Automatic Grouping of Sensor Data Layers Using Semantic Clustering and Classification to Group Semantically Similar Sensor Data Layers by Ben Charles Knoechel A THESIS SUBMITTED

More information

.. CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. CSC 566 Advanced Data Mining Alexander Dekhtyar.. Information Retrieval Latent Semantic Indexing Preliminaries Vector Space Representation of Documents: TF-IDF Documents. A single text document is a

More information

Web Ontology Language (OWL)

Web Ontology Language (OWL) Web Ontology Language (OWL) Need meaning beyond an object-oriented type system RDF (with RDFS) captures the basics, approximating an object-oriented type system OWL provides some of the rest OWL standardizes

More information

On Algebraic Spectrum of Ontology Evaluation

On Algebraic Spectrum of Ontology Evaluation Vol. 2, o. 7, 2 On Algebraic Spectrum of Ontology Evaluation Adekoya Adebayo Felix, Akinwale Adio Taofiki 2 Department of Computer Science, University of Agriculture, Abeokuta, igeria Abstract Ontology

More information

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models

Motivation. User. Retrieval Model Result: Query. Document Collection. Information Need. Information Retrieval / Chapter 3: Retrieval Models 3. Retrieval Models Motivation Information Need User Retrieval Model Result: Query 1. 2. 3. Document Collection 2 Agenda 3.1 Boolean Retrieval 3.2 Vector Space Model 3.3 Probabilistic IR 3.4 Statistical

More information

An Ontology and Concept Lattice Based Inexact Matching Method for Service Discovery

An Ontology and Concept Lattice Based Inexact Matching Method for Service Discovery An Ontology and Concept Lattice Based Inexact Matching Method for Service Discovery Tang Shancheng Communication and Information Institute, Xi an University of Science and Technology, Xi an 710054, China

More information

SemaDrift: A Protégé Plugin for Measuring Semantic Drift in Ontologies

SemaDrift: A Protégé Plugin for Measuring Semantic Drift in Ontologies GRANT AGREEMENT: 601138 SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] SemaDrift: A Protégé

More information

SEMANTIC ALIGNMENT OF DOCUMENTS WITH 3D CITY MODELS

SEMANTIC ALIGNMENT OF DOCUMENTS WITH 3D CITY MODELS SEMANTIC ALIGNMENT OF DOCUMENTS WITH 3D CITY MODELS Camille Tardy 1, Laurent Moccozet 2 and Gilles Falquet 1 University of Geneva 1 Institute ICLE 2 Institute of Services Science FACULTÉ DES SCIENCES ÉCONOMIQUES

More information

Publishing Value Added EO Products as Linked Data

Publishing Value Added EO Products as Linked Data Publishing Value Added EO Products as Linked Data Massimo Zotti Head of Government & Security SBU, Planetek Italia Pkm002-02-8.2 Planetek Group Roma Bari Athens 2 Key Numbers year 1994 Planetek Group SME

More information

John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece

John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece John Pavlopoulos and Ion Androutsopoulos NLP Group, Department of Informatics Athens University of Economics and Business, Greece http://nlp.cs.aueb.gr/ A laptop with great design, but the service was

More information

Scoring, Term Weighting and the Vector Space

Scoring, Term Weighting and the Vector Space Scoring, Term Weighting and the Vector Space Model Francesco Ricci Most of these slides comes from the course: Information Retrieval and Web Search, Christopher Manning and Prabhakar Raghavan Content [J

More information

A Survey of Temporal Knowledge Representations

A Survey of Temporal Knowledge Representations A Survey of Temporal Knowledge Representations Advisor: Professor Abdullah Tansel Second Exam Presentation Knowledge Representations logic-based logic-base formalisms formalisms more complex and difficult

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model

Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model Ranked retrieval Thus far, our queries have all been Boolean. Documents either

More information

1 Information retrieval fundamentals

1 Information retrieval fundamentals CS 630 Lecture 1: 01/26/2006 Lecturer: Lillian Lee Scribes: Asif-ul Haque, Benyah Shaparenko This lecture focuses on the following topics Information retrieval fundamentals Vector Space Model (VSM) Deriving

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Retrieval Models and Implementation Ulf Leser Content of this Lecture Information Retrieval Models Boolean Model Vector Space Model Inverted Files Ulf Leser: Maschinelle

More information

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan

Term Weighting and the Vector Space Model. borrowing from: Pandu Nayak and Prabhakar Raghavan Term Weighting and the Vector Space Model borrowing from: Pandu Nayak and Prabhakar Raghavan IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes

More information

CAIM: Cerca i Anàlisi d Informació Massiva

CAIM: Cerca i Anàlisi d Informació Massiva 1 / 21 CAIM: Cerca i Anàlisi d Informació Massiva FIB, Grau en Enginyeria Informàtica Slides by Marta Arias, José Balcázar, Ricard Gavaldá Department of Computer Science, UPC Fall 2016 http://www.cs.upc.edu/~caim

More information

Inference in Probabilistic Ontologies with Attributive Concept Descriptions and Nominals URSW 2008

Inference in Probabilistic Ontologies with Attributive Concept Descriptions and Nominals URSW 2008 Inference in Probabilistic Ontologies with Attributive Concept Descriptions and Nominals URSW 2008 Rodrigo B. Polastro - Fabio G. Cozman rodrigopolastro@usp.br - fgcozman@usp.br Decision Making Lab, Escola

More information

SPARQL Rewriting for Query Mediation over Mapped Ontologies

SPARQL Rewriting for Query Mediation over Mapped Ontologies SPARQL Rewriting for Query Mediation over Mapped Ontologies Konstantinos Makris*, Nektarios Gioldasis*, Nikos Bikakis**, Stavros Christodoulakis* *Laboratory of Distributed Multimedia Information Systems

More information

An Analysis on Consensus Measures in Group Decision Making

An Analysis on Consensus Measures in Group Decision Making An Analysis on Consensus Measures in Group Decision Making M. J. del Moral Statistics and Operational Research Email: delmoral@ugr.es F. Chiclana CCI Faculty of Technology De Montfort University Leicester

More information

Information Retrieval Basic IR models. Luca Bondi

Information Retrieval Basic IR models. Luca Bondi Basic IR models Luca Bondi Previously on IR 2 d j q i IRM SC q i, d j IRM D, Q, R q i, d j d j = w 1,j, w 2,j,, w M,j T w i,j = 0 if term t i does not appear in document d j w i,j and w i:1,j assumed to

More information

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data Nikita Zhiltsov 1,2 Alexander Kotov 3 Fedor Nikolaev 3 1 Kazan Federal University 2 Textocat 3 Textual Data Analytics

More information

Theoretical approach to urban ontology: a contribution from urban system analysis

Theoretical approach to urban ontology: a contribution from urban system analysis Theoretical approach to urban ontology: a contribution from urban system analysis University of Pisa Department of Civil Engineering Polytechnic of Milan Department of Architecture and Planning ing. Caglioni

More information

Data-Driven Logical Reasoning

Data-Driven Logical Reasoning Data-Driven Logical Reasoning Claudia d Amato Volha Bryl, Luciano Serafini November 11, 2012 8 th International Workshop on Uncertainty Reasoning for the Semantic Web 11 th ISWC, Boston (MA), USA. Heterogeneous

More information

2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes

2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or

More information

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles

Investigation of Latent Semantic Analysis for Clustering of Czech News Articles Investigation of Latent Semantic Analysis for Clustering of Czech News Articles Michal Rott, Petr Červa Laboratory of Computer Speech Processing 4. 9. 2014 Introduction Idea of article clustering Presumptions:

More information

Improved Algorithms for Module Extraction and Atomic Decomposition

Improved Algorithms for Module Extraction and Atomic Decomposition Improved Algorithms for Module Extraction and Atomic Decomposition Dmitry Tsarkov tsarkov@cs.man.ac.uk School of Computer Science The University of Manchester Manchester, UK Abstract. In recent years modules

More information

Semantic Similarity from Corpora - Latent Semantic Analysis

Semantic Similarity from Corpora - Latent Semantic Analysis Semantic Similarity from Corpora - Latent Semantic Analysis Carlo Strapparava FBK-Irst Istituto per la ricerca scientifica e tecnologica I-385 Povo, Trento, ITALY strappa@fbk.eu Overview Latent Semantic

More information

Using C-OWL for the Alignment and Merging of Medical Ontologies

Using C-OWL for the Alignment and Merging of Medical Ontologies Using C-OWL for the Alignment and Merging of Medical Ontologies Heiner Stuckenschmidt 1, Frank van Harmelen 1 Paolo Bouquet 2,3, Fausto Giunchiglia 2,3, Luciano Serafini 3 1 Vrije Universiteit Amsterdam

More information

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2018. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-2018 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 2 1 Boolean Retrieval Thus

More information

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)

Information Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Information Retrieval and Topic Models Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of Shakespeare

More information

Journal of Theoretical and Applied Information Technology 30 th November Vol.96. No ongoing JATIT & LLS QUERYING RDF DATA

Journal of Theoretical and Applied Information Technology 30 th November Vol.96. No ongoing JATIT & LLS QUERYING RDF DATA QUERYING RDF DATA 1,* ATTA-UR-RAHMAN, 2 FAHD ABDULSALAM ALHAIDARI 1,* Department of Computer Science, 2 Department of Computer Information System College of Computer Science and Information Technology,

More information

Relaxed Precision and Recall for Ontology Matching

Relaxed Precision and Recall for Ontology Matching Relaxed Precision and Recall for Ontology Matching Marc Ehrig, Jérôme Euzenat To cite this version: Marc Ehrig, Jérôme Euzenat. Relaxed Precision and Recall for Ontology Matching. Proc. K-Cap 2005 workshop

More information

Ontologies for nanotechnology. Colin Batchelor

Ontologies for nanotechnology. Colin Batchelor Ontologies for nanotechnology Colin Batchelor batchelorc@rsc.org 2009-03-25 Ontologies for nanotechnology: Outline What is an ontology and how do they work? What do people use ontologies for? How are we

More information

OWL Semantics COMP Sean Bechhofer Uli Sattler

OWL Semantics COMP Sean Bechhofer Uli Sattler OWL Semantics COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Uli Sattler uli.sattler@manchester.ac.uk 1 Toward Knowledge Formalization Acquisition Process Elicit tacit knowledge A set of terms/concepts

More information

An Algebra of Qualitative Taxonomical Relations for Ontology Alignments

An Algebra of Qualitative Taxonomical Relations for Ontology Alignments An Algebra of Qualitative Taxonomical Relations for Ontology Alignments Armen Inants and Jérôme Euzenat Inria & Univ. Grenoble Alpes, Grenoble, France {Armen.Inants,Jerome.Euzenat}@inria.fr Abstract. Algebras

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Classification: k nearest neighbors Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 28 Introduction Classification = supervised method

More information

Information Retrieval

Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan Lecture 6: Scoring, Term Weighting and the Vector Space Model This lecture; IIR Sections

More information

SPARQL Formalization

SPARQL Formalization SPARQL Formalization Marcelo Arenas, Claudio Gutierrez, Jorge Pérez Department of Computer Science Pontificia Universidad Católica de Chile Universidad de Chile Center for Web Research http://www.cwr.cl

More information

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy

Ranked IR. Lecture Objectives. Text Technologies for Data Science INFR Learn about Ranked IR. Implement: 10/10/2017. Instructor: Walid Magdy Text Technologies for Data Science INFR11145 Ranked IR Instructor: Walid Magdy 10-Oct-017 Lecture Objectives Learn about Ranked IR TFIDF VSM SMART notation Implement: TFIDF 1 Boolean Retrieval Thus far,

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

Research Article Anatomy Ontology Matching Using Markov Logic Networks

Research Article Anatomy Ontology Matching Using Markov Logic Networks Scientifica Volume 2016, Article ID 1010946, 7 pages http://dx.doi.org/10.1155/2016/1010946 Research Article Anatomy Ontology Matching Using Markov Logic Networks Chunhua Li, Pengpeng Zhao, Jian Wu, and

More information

DESIGNING A CARTOGRAPHIC ONTOLOGY FOR USE WITH EXPERT SYSTEMS

DESIGNING A CARTOGRAPHIC ONTOLOGY FOR USE WITH EXPERT SYSTEMS DESIGNING A CARTOGRAPHIC ONTOLOGY FOR USE WITH EXPERT SYSTEMS Richard A. Smith University of Georgia, Geography Department, Athens, GA 30602 rasmith@uga.edu KEY WORDS: Ontology, Cartography, Expert Systems,

More information

Resource Description Framework (RDF) A basis for knowledge representation on the Web

Resource Description Framework (RDF) A basis for knowledge representation on the Web Resource Description Framework (RDF) A basis for knowledge representation on the Web Simple language to capture assertions (as statements) Captures elements of knowledge about a resource Facilitates incremental

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context.

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context. Marinos Kavouras & Margarita Kokla Department of Rural and Surveying Engineering National Technical University of Athens 9, H. Polytechniou Str., 157 80 Zografos Campus, Athens - Greece Tel: 30+1+772-2731/2637,

More information

Estimating query rewriting quality over LOD

Estimating query rewriting quality over LOD Semantic Web 1 (2016) 1 5 1 IOS Press Estimating query rewriting quality over LOD Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

Generating Sentences by Editing Prototypes

Generating Sentences by Editing Prototypes Generating Sentences by Editing Prototypes K. Guu 2, T.B. Hashimoto 1,2, Y. Oren 1, P. Liang 1,2 1 Department of Computer Science Stanford University 2 Department of Statistics Stanford University arxiv

More information

Lexical Analysis. DFA Minimization & Equivalence to Regular Expressions

Lexical Analysis. DFA Minimization & Equivalence to Regular Expressions Lexical Analysis DFA Minimization & Equivalence to Regular Expressions Copyright 26, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California

More information

Translating Ontologies from Predicate-based to Frame-based Languages

Translating Ontologies from Predicate-based to Frame-based Languages 1/18 Translating Ontologies from Predicate-based to Frame-based Languages Jos de Bruijn and Stijn Heymans Digital Enterprise Research Institute (DERI) University of Innsbruck, Austria {jos.debruijn,stijn.heymans}@deri.org

More information

A HYBRID SEMANTIC SIMILARITY MEASURING APPROACH FOR ANNOTATING WSDL DOCUMENTS WITH ONTOLOGY CONCEPTS. Received February 2017; revised May 2017

A HYBRID SEMANTIC SIMILARITY MEASURING APPROACH FOR ANNOTATING WSDL DOCUMENTS WITH ONTOLOGY CONCEPTS. Received February 2017; revised May 2017 International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 4, August 2017 pp. 1221 1242 A HYBRID SEMANTIC SIMILARITY MEASURING APPROACH

More information

Term Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze

Term Weighting and Vector Space Model. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze Term Weighting and Vector Space Model Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Ranked retrieval Thus far, our queries have all been Boolean. Documents either

More information

vector space retrieval many slides courtesy James Amherst

vector space retrieval many slides courtesy James Amherst vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the

More information

Minimal Coverage for Ontology Signatures

Minimal Coverage for Ontology Signatures Minimal Coverage for Ontology Signatures David Geleta, Terry R. Payne, and Valentina Tamma Department of Computer Science, University of Liverpool, UK {d.geleta t.r.payne v.tamma}@liverpool.ac.uk Abstract.

More information

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured. Text mining What can be used for text mining?? Classification/categorization

More information

Lecture 2: Axiomatic semantics

Lecture 2: Axiomatic semantics Chair of Software Engineering Trusted Components Prof. Dr. Bertrand Meyer Lecture 2: Axiomatic semantics Reading assignment for next week Ariane paper and response (see course page) Axiomatic semantics

More information

Ontological Modelling in Wikidata

Ontological Modelling in Wikidata Ontological Modelling in Wikidata Markus Krötzsch Knowledge-Based Systems, TU Dresden Workshop on Ontology Design and Patterns 2018 at ISWC 2018 Markus Krötzsch: Wikidata Toolkit Kickoff All slides CC-BY

More information

Towards Linkset Quality for Complementing SKOS Thesauri

Towards Linkset Quality for Complementing SKOS Thesauri Towards Linkset Quality for Complementing SKOS Thesauri Riccardo Albertoni, Monica De Martino, and Paola Podestà Istituto di Matematica Applicata e Tecnologie Informatiche Consiglio Nazionale delle Ricerche,

More information

Lecture 2 August 31, 2007

Lecture 2 August 31, 2007 CS 674: Advanced Language Technologies Fall 2007 Lecture 2 August 31, 2007 Prof. Lillian Lee Scribes: Cristian Danescu Niculescu-Mizil Vasumathi Raman 1 Overview We have already introduced the classic

More information

Exploring Large Digital Library Collections using a Map-based Visualisation

Exploring Large Digital Library Collections using a Map-based Visualisation Exploring Large Digital Library Collections using a Map-based Visualisation Dr Mark Hall Research Seminar, Department of Computing, Edge Hill University 7.11.2013 http://www.flickr.com/photos/carlcollins/199792939/

More information

Spatial Data on the Web Working Group. Geospatial RDA P9 Barcelona, 5 April 2017

Spatial Data on the Web Working Group. Geospatial RDA P9 Barcelona, 5 April 2017 Spatial Data on the Web Working Group Geospatial IG @ RDA P9 Barcelona, 5 April 2017 Spatial Data on the Web Joint effort of the World Wide Web Consortium (W3C) and the Open Geospatial Consortium (OGC)

More information

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from

INFO 4300 / CS4300 Information Retrieval. slides adapted from Hinrich Schütze s, linked from INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Schütze s, linked from http://informationretrieval.org/ IR 26/26: Feature Selection and Exam Overview Paul Ginsparg Cornell University,

More information

CSE 158 Lecture 10. Web Mining and Recommender Systems. Text mining Part 2

CSE 158 Lecture 10. Web Mining and Recommender Systems. Text mining Part 2 CSE 158 Lecture 10 Web Mining and Recommender Systems Text mining Part 2 Midterm Midterm is this time next week! (Nov 7) I ll spend next Monday s lecture on prep See also seven previous midterms on the

More information

Recap of the last lecture. CS276A Information Retrieval. This lecture. Documents as vectors. Intuition. Why turn docs into vectors?

Recap of the last lecture. CS276A Information Retrieval. This lecture. Documents as vectors. Intuition. Why turn docs into vectors? CS276A Information Retrieval Recap of the last lecture Parametric and field searches Zones in documents Scoring documents: zone weighting Index support for scoring tf idf and vector spaces Lecture 7 This

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Francisco M. Couto Mário J. Silva Pedro Coutinho

Francisco M. Couto Mário J. Silva Pedro Coutinho Francisco M. Couto Mário J. Silva Pedro Coutinho DI FCUL TR 03 29 Departamento de Informática Faculdade de Ciências da Universidade de Lisboa Campo Grande, 1749 016 Lisboa Portugal Technical reports are

More information

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim

More information

On Politics and Argumentation

On Politics and Argumentation On Politics and Argumentation Maria Boritchev To cite this version: Maria Boritchev. On Politics and Argumentation. MALOTEC, Mar 2017, Nancy, France. 2017. HAL Id: hal-01666416 https://hal.archives-ouvertes.fr/hal-01666416

More information

Toponym Disambiguation using Ontology-based Semantic Similarity

Toponym Disambiguation using Ontology-based Semantic Similarity Toponym Disambiguation using Ontology-based Semantic Similarity David S Batista 1, João D Ferreira 2, Francisco M Couto 2, and Mário J Silva 1 1 IST/INESC-ID Lisbon, Portugal {dsbatista,msilva}@inesc-id.pt

More information

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology

Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology Scoring (Vector Space Model) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2017 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

University of Florida CISE department Gator Engineering. Clustering Part 1

University of Florida CISE department Gator Engineering. Clustering Part 1 Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects

More information

Semantics, ontologies and escience for the Geosciences

Semantics, ontologies and escience for the Geosciences Semantics, ontologies and escience for the Geosciences Femke Reitsma¹, John Laxton², Stuart Ballard³, Werner Kuhn, Alia Abdelmoty ¹femke.reitsma@ed.ac.uk: School of Geosciences, Edinburgh University ²

More information

Machine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77

Machine Learning. Ludovic Samper. September 1st, Antidot. Ludovic Samper (Antidot) Machine Learning September 1st, / 77 Machine Learning Ludovic Samper Antidot September 1st, 2015 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77 Antidot Software vendor since 1999 Paris, Lyon, Aix-en-Provence 45 employees

More information

Outline Introduction Background Related Rl dw Works Proposed Approach Experiments and Results Conclusion

Outline Introduction Background Related Rl dw Works Proposed Approach Experiments and Results Conclusion A Semantic Approach to Detecting Maritime Anomalous Situations ti José M Parente de Oliveira Paulo Augusto Elias Emilia Colonese Carrard Computer Science Department Aeronautics Institute of Technology,

More information

Annotation algebras for RDFS

Annotation algebras for RDFS University of Edinburgh Semantic Web in Provenance Management, 2010 RDF definition U a set of RDF URI references (s, p, o) U U U an RDF triple s subject p predicate o object A finite set of triples an

More information