Incomplete Information in RDF

Similar documents
SPARQL Formalization

First-Order Logic First-Order Theories. Roopsha Samanta. Partly based on slides by Aaron Bradley and Isil Dillig

Designing a Query Language for RDF: Marrying Open and Closed Worlds

CSE 1400 Applied Discrete Mathematics Definitions

GAV-sound with conjunctive queries

ECE473 Lecture 15: Propositional Logic

Comp487/587 - Boolean Formulas

On the Complexity of the Reflected Logic of Proofs

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation

Learning Goals of CS245 Logic and Computation

RDF and Logic: Reasoning and Extension

Query answering using views

Representing and Querying Validity Time in RDF and OWL: A Logic-Based Approach

First-Order Theorem Proving and Vampire. Laura Kovács (Chalmers University of Technology) Andrei Voronkov (The University of Manchester)

Syntax. Notation Throughout, and when not otherwise said, we assume a vocabulary V = C F P.

Applied Logic. Lecture 1 - Propositional logic. Marcin Szczuka. Institute of Informatics, The University of Warsaw

Pattern Logics and Auxiliary Relations

Approximate Rewriting of Queries Using Views

Handbook of Logic and Proof Techniques for Computer Science

Propositional Logic: Models and Proofs


Lecture 2: Syntax. January 24, 2018

arxiv:cs/ v1 [cs.db] 26 May 2006

Propositional and Predicate Logic - II

First-Order Theorem Proving and Vampire

A brief introduction to Logic. (slides from

Answering Pattern Queries Using Views

Part 2: First-Order Logic

Parameterized Regular Expressions and Their Languages

03 Review of First-Order Logic

Motivation. CS389L: Automated Logical Reasoning. Lecture 10: Overview of First-Order Theories. Signature and Axioms of First-Order Theory

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

Outline. Formale Methoden der Informatik First-Order Logic for Forgetters. Why PL1? Why PL1? Cont d. Motivation

Introduction to Artificial Intelligence Propositional Logic & SAT Solving. UIUC CS 440 / ECE 448 Professor: Eyal Amir Spring Semester 2010

Monodic fragments of first-order temporal logics

The Complexity of Computing the Behaviour of Lattice Automata on Infinite Trees

Trichotomy Results on the Complexity of Reasoning with Disjunctive Logic Programs

ACLT: Algebra, Categories, Logic in Topology - Grothendieck's generalized topological spaces (toposes)

On the satisfiability problem for a 4-level quantified syllogistic and some applications to modal logic

Exchanging More than Complete Data

Introduction to first-order logic:

Automata, Logic and Games: Theory and Application

The non-logical symbols determine a specific F OL language and consists of the following sets. Σ = {Σ n } n<ω

Exercises 1 - Solutions

Finite Model Theory: First-Order Logic on the Class of Finite Models

Propositional and Predicate Logic. jean/gbooks/logic.html

Predicate Logic: Sematics Part 1

Introduction. Foundations of Computing Science. Pallab Dasgupta Professor, Dept. of Computer Sc & Engg INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Database Theory VU , SS Complexity of Query Evaluation. Reinhard Pichler

Loop Formulas for Description Logic Programs

Discovering Spatial and Temporal Links among RDF Data Panayiotis Smeros and Manolis Koubarakis

An Independence Relation for Sets of Secrets

First-Order Logic (FOL)

Querying for Provenance, Trust, Uncertainty and other Meta Knowledge in RDF 1

Computability and Complexity Theory: An Introduction

Overview of Topics. Finite Model Theory. Finite Model Theory. Connections to Database Theory. Qing Wang

Introduction to Complexity Theory. Bernhard Häupler. May 2, 2006

Artificial Intelligence Chapter 7: Logical Agents

Peter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages

LTCS Report. Decidability and Complexity of Threshold Description Logics Induced by Concept Similarity Measures. LTCS-Report 16-07

CS256/Spring 2008 Lecture #11 Zohar Manna. Beyond Temporal Logics

Classical First-Order Logic

A generalization of modal definability

Packet #2: Set Theory & Predicate Calculus. Applied Discrete Mathematics

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas.

CS 486: Lecture 2, Thursday, Jan 22, 2009

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation

2.2 Lowenheim-Skolem-Tarski theorems

Part 1: Propositional Logic

The Expressive Power of SPARQL

First-order resolution for CTL

Intelligent Agents. First Order Logic. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University. last change: 19.

Chapter 4: Computation tree logic

DESCRIPTION LOGICS. Paula Severi. October 12, University of Leicester

First-Order Logic. Chapter Overview Syntax

Introduction to Advanced Results

Linear Temporal Logic and Büchi Automata

Lifted Inference: Exact Search Based Algorithms

Model Checking for Modal Intuitionistic Dependence Logic

Propositional and Predicate Logic

Undecidable Problems. Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science May 12, / 65

Propositional Logic Language

CM10196 Topic 2: Sets, Predicates, Boolean algebras

SPARQL Algebra Sebastian Rudolph

PROPOSITIONAL LOGIC. VL Logik: WS 2018/19

Notes. Corneliu Popeea. May 3, 2013

An Introduction to Description Logics

Informal Statement Calculus

Logic: Propositional Logic (Part I)

Non-Deterministic Time

Classical First-Order Logic

Modal logics and their semantics

Handling Inconsistency in Knowledge Bases

SPATIAL LOGICS WITH CONNECTEDNESS PREDICATES

Notation Index. gcd(a, b) (greatest common divisor) NT-16

Modal Dependence Logic

Inverting Proof Systems for Secrecy under OWA

Preliminaries. Introduction to EF-games. Inexpressivity results for first-order logic. Normal forms for first-order logic

Foundations of Query Languages

The Expressive Power of SPARQL

Transcription:

Incomplete Information in RDF Charalampos Nikolaou and Manolis Koubarakis charnik@di.uoa.gr koubarak@di.uoa.gr Department of Informatics and Telecommunications National and Kapodistrian University of Athens Web Reasoning and Rule Systems (RR) 2013 July 27, 2013

Outline Motivation Previous work The RDF i framework SPARQL query evaluation over RDF i databases An algorithm for certain answer computation Preliminary complexity results Conclusions and future work C. Nikolaou and M. Koubarakis Incomplete Information in RDF 2/36

Motivation Incomplete information is an important issue in many research areas: relational databases, knowledge representation and the semantic web. Incomplete information arises in many practical settings (e.g., sensor data). RDF is often used to represent such data. Even if initial information is complete, incomplete information arises later on (e.g., relational view updates, data integration, data exchange). Although there is much work recently on incomplete information in XML, not much has been done for incomplete information in RDF. C. Nikolaou and M. Koubarakis Incomplete Information in RDF 3/36

Previous work Relational Relations extended to tables with various models of incompleteness [Imielinski/Lipski 84] Complexity results for the associated decision problems [Abiteboul/Kanellakis/Grahne 91] Dependencies and updates [Grahne 91] C. Nikolaou and M. Koubarakis Incomplete Information in RDF 4/36

Previous work Relational Relations extended to tables with various models of incompleteness [Imielinski/Lipski 84] Complexity results for the associated decision problems [Abiteboul/Kanellakis/Grahne 91] Dependencies and updates [Grahne 91] XML Dynamic enrichment of incomplete information [Abiteboul/Segoufin/Vianu 01, 06] General models of incompleteness, query answering, and computational complexity [Barceló/Libkin/Poggi/Sirangelo 09, 10] C. Nikolaou and M. Koubarakis Incomplete Information in RDF 4/36

Previous work (cont d) RDF Blank nodes as existential variables in the RDF standard SPARQL query evaluation under certain answer semantics (Open World Assumption) [Arenas/Pérez 11] Anonymous timestamps in general temporal RDF graphs [Gutierrez/Hurtado/Vaisman 05] General temporal RDF graphs with temporal constraints [Hurtado/Vaisman 06] C. Nikolaou and M. Koubarakis Incomplete Information in RDF 5/36

Previous work (cont d) RDF Blank nodes as existential variables in the RDF standard SPARQL query evaluation under certain answer semantics (Open World Assumption) [Arenas/Pérez 11] Anonymous timestamps in general temporal RDF graphs [Gutierrez/Hurtado/Vaisman 05] General temporal RDF graphs with temporal constraints [Hurtado/Vaisman 06] RDF i : It captures incomplete information for property values using constraints. It is for RDF what the c-tables model is for the relational model. C. Nikolaou and M. Koubarakis Incomplete Information in RDF 5/36

RDF i by example Example hotspot1 type Hotspot. fire1 type Fire. hotspot1 correspondsto fire1. fire1 occuredin _R1. y 19 8 6 P 23 x C. Nikolaou and M. Koubarakis Incomplete Information in RDF 6/36

RDF i by example Example y hotspot1 type Hotspot. fire1 type Fire. hotspot1 correspondsto fire1. fire1 occuredin _R1. 19 8 P R1 NTPP "x 6 x 23 y 8 y 19" 6 23 x C. Nikolaou and M. Koubarakis Incomplete Information in RDF 6/36

RDF i in a nutshell Extension of RDF for capturing incomplete information for property values that exist but are unknown or partially known Partial knowledge captured by constraints using an appropriate constraint language L interpreted over a fixed structure M L Syntax RDF graphs extended to RDF i databases: pair (G, φ) G: RDF graph with a new kind of literals, called e-literals φ: quantifier-free formula of L Semantics Possible world semantics as in [Imielinski/Lipski 84] and [Grahne 91] C. Nikolaou and M. Koubarakis Incomplete Information in RDF 7/36

Constraint languages L Examples ECL Equality constraints interpreted over an infinite domain: x EQ y, x EQ c Blank nodes as existential variables C. Nikolaou and M. Koubarakis Incomplete Information in RDF 8/36

Constraint languages L Examples ECL Equality constraints interpreted over an infinite domain: x EQ y, x EQ c Blank nodes as existential variables dipcl/depcl Difference constraints of the form x y c interpreted over the integers or rationals Incomplete temporal information [Koubarakis 94] C. Nikolaou and M. Koubarakis Incomplete Information in RDF 8/36

Constraint languages L Examples ECL Equality constraints interpreted over an infinite domain: x EQ y, x EQ c Blank nodes as existential variables dipcl/depcl Difference constraints of the form x y c interpreted over the integers or rationals Incomplete temporal information [Koubarakis 94] TCL Topological constraints of non-empty, regular closed subsets of topological space Six binary predicates: DC, EC, PO, EQ, TPP, NTPP x DC y x EC y x y x y x EQ y x TPP y x y y x x PO y x y x NTPP y y x C. Nikolaou and M. Koubarakis Incomplete Information in RDF 8/36

Constraint languages L Examples ECL Equality constraints interpreted over an infinite domain: x EQ y, x EQ c Blank nodes as existential variables TCL Topological constraints of non-empty, regular closed subsets of topological space Six binary predicates: DC, EC, PO, EQ, TPP, NTPP dipcl/depcl Difference constraints of the form x y c interpreted over the integers or rationals Incomplete temporal information [Koubarakis 94] PCL TCL plus constant symbols representing polygons in Q 2 e.g., r NTPP "x y 0 x 1 y 0" C. Nikolaou and M. Koubarakis Incomplete Information in RDF 8/36

RDF i : Vocabulary RDF RDF i L I (IRIs) B (blank nodes) L (literals) M (datatype map) I B L C (literals) U (e-literals) M A (datatypes) constants variables set of sorts C. Nikolaou and M. Koubarakis Incomplete Information in RDF 9/36

RDF i : Vocabulary RDF RDF i L I (IRIs) B (blank nodes) L (literals) M (datatype map) I B L C (literals) U (e-literals) M A (datatypes) constants variables set of sorts M L interprets the constants of L in agreement with function L2V of M C. Nikolaou and M. Koubarakis Incomplete Information in RDF 9/36

RDF i : Syntax I subject predicate object I B I B L C U θ I : IRIs B: blank nodes L : literals C : constants of L U: e-literals Definition (s, p, o) (I B) I (I B L C U) is called an e-triple C. Nikolaou and M. Koubarakis Incomplete Information in RDF 10/36

RDF i : Syntax I subject predicate object I B I B L C U θ I : IRIs B: blank nodes L : literals C : constants of L U: e-literals Definition (s, p, o) (I B) I (I B L C U) is called an e-triple If t is an e-triple and θ a conjunction of L-constraints, then the pair (t, θ) is called a conditional triple C. Nikolaou and M. Koubarakis Incomplete Information in RDF 10/36

RDF i : Syntax I subject predicate object I B I B L C U θ I : IRIs B: blank nodes L : literals C : constants of L U: e-literals Definition (s, p, o) (I B) I (I B L C U) is called an e-triple If t is an e-triple and θ a conjunction of L-constraints, then the pair (t, θ) is called a conditional triple A set of conditional triples is called a conditional graph C. Nikolaou and M. Koubarakis Incomplete Information in RDF 10/36

RDF i : Syntax (cont d) Definition An RDF i database D is a pair D = (G, φ) where G is a conditional graph and φ a Boolean combination of L-constraints (global constraint) Example hotspot1 type Hotspot. fire1 type Fire. hotspot1 correspondsto fire1. fire1 occuredin _R1. y 19 8 P R1 NTPP "x 6 x 23 y 8 y 19" 6 23 x C. Nikolaou and M. Koubarakis Incomplete Information in RDF 11/36

RDF i : Semantics RDF i database D Rep set of RDF graphs G 1, G 2,. C. Nikolaou and M. Koubarakis Incomplete Information in RDF 12/36

RDF i : Semantics RDF i database D Rep set of RDF graphs G 1, G 2,. Definition A valuation v is a function from U to C assigning to each e-literal from U a constant from C Definition Let G be a conditional graph and v a valuation. Then v(g) denotes the RDF graph {v(t) (t, θ) G and M L = v(θ)} C. Nikolaou and M. Koubarakis Incomplete Information in RDF 12/36

RDF i : Semantics (cont d) From RDF i databases to sets of RDF graphs An RDF i database D = (G, φ) corresponds to the following set of RDF graphs: { Rep(D) = H there exists valuation v and RDF graph H } such that M L = v(φ) and H v(g) Relation captures the OWA semantics An RDF i database corresponds to an infinite number of RDF graphs C. Nikolaou and M. Koubarakis Incomplete Information in RDF 13/36

Question How can we evaluate a query q over an RDF i database D (compute q D )? C. Nikolaou and M. Koubarakis Incomplete Information in RDF 14/36

Question How can we evaluate a query q over an RDF i database D (compute q D )? Semantic definition q Rep(D) = { q G G Rep(D)} C. Nikolaou and M. Koubarakis Incomplete Information in RDF 14/36

Question How can we evaluate a query q over an RDF i database D (compute q D )? Semantic definition q Rep(D) = { q G G Rep(D)} C. Nikolaou and M. Koubarakis Incomplete Information in RDF 14/36

Question How can we evaluate a query q over an RDF i database D (compute q D )? Semantic definition q Rep(D) = { q G G Rep(D)} In practice? Start with SPARQL algebra of [Pérez/Arenas/Gutierrez 06] with set semantics Define SPARQL query evaluation for RDF i databases C. Nikolaou and M. Koubarakis Incomplete Information in RDF 14/36

From mappings to e-mappings... {?F fire1,?s x 1 x 2 y 1 y 2 } C. Nikolaou and M. Koubarakis Incomplete Information in RDF 15/36

From mappings to e-mappings... {?F fire1,?s x 1 x 2 y 1 y 2 } {?F fire1,?s R1} C. Nikolaou and M. Koubarakis Incomplete Information in RDF 15/36

... to conditional mappings {?F fire1,?s x 1 x 2 y 1 y 2 } C. Nikolaou and M. Koubarakis Incomplete Information in RDF 16/36

... to conditional mappings ( ) {?F fire1,?s x 1 x 2 y 1 y 2 }, true C. Nikolaou and M. Koubarakis Incomplete Information in RDF 16/36

... to conditional mappings ( ) {?F fire1,?s R1}, R1 EQ x 1 x 2 y 1 y 2 C. Nikolaou and M. Koubarakis Incomplete Information in RDF 16/36

From compatible mappings to possibly compatible mappings Join of conditional mappings ( ) {?F fire1,?s R1}, R1 EQ x 1 x 2 y 1 y 2 ( ) {?S R2}, true C. Nikolaou and M. Koubarakis Incomplete Information in RDF 17/36

From compatible mappings to possibly compatible mappings Join of conditional mappings ( ) {?F fire1,?s R1}, R1 EQ x 1 x 2 y 1 y 2 ( ) {?S R2}, true C. Nikolaou and M. Koubarakis Incomplete Information in RDF 17/36

From compatible mappings to possibly compatible mappings Join of conditional mappings ( ) {?F fire1,?s R1}, R1 EQ x 1 x 2 y 1 y 2 ( ) {?S R2}, true = C. Nikolaou and M. Koubarakis Incomplete Information in RDF 17/36

From compatible mappings to possibly compatible mappings Join of conditional mappings ( ) {?F fire1,?s R1}, R1 EQ x 1 x 2 y 1 y 2 ( ) {?S R2}, true = ( {?F fire1,?s R1}, true R1 EQ R2 ) R1 EQ x 1 x 2 y 1 y 2 C. Nikolaou and M. Koubarakis Incomplete Information in RDF 17/36

Operations on conditional mappings Let Ω 1 and Ω 2 be sets of conditional mappings. We can define the operation of: Join (Ω 1 Ω 2 ) Union (Ω 1 Ω 2 ) Difference (Ω 1 \ Ω 2 ) Left-outer join (Ω 1 1Ω 2 ) C. Nikolaou and M. Koubarakis Incomplete Information in RDF 18/36

Graph pattern evaluation If D is an RDF i database and P a graph pattern, the evaluation of P over D is defined recursively: C. Nikolaou and M. Koubarakis Incomplete Information in RDF 19/36

Graph pattern evaluation If D is an RDF i database and P a graph pattern, the evaluation of P over D is defined recursively: base case: P is the triple pattern t recursion: P is (P 1 AND P 2 ) P 1 D P 2 D P is (P 1 UNION P 2 ) P 1 D P 2 D P is (P 1 OPT P 2 ) P 1 D 1 P 2 D C. Nikolaou and M. Koubarakis Incomplete Information in RDF 19/36

Graph pattern evaluation If D is an RDF i database and P a graph pattern, the evaluation of P over D is defined recursively: base case: P is the triple pattern t recursion: P is (P 1 AND P 2 ) P 1 D P 2 D P is (P 1 UNION P 2 ) P 1 D P 2 D P is (P 1 OPT P 2 ) P 1 D 1 P 2 D P is (P 1 FILTER R) where R is a conjunction of L-constraints C. Nikolaou and M. Koubarakis Incomplete Information in RDF 19/36

Graph pattern evaluation If D is an RDF i database and P a graph pattern, the evaluation of P over D is defined recursively: base case: P is the triple pattern t recursion: P is (P 1 AND P 2 ) P 1 D P 2 D P is (P 1 UNION P 2 ) P 1 D P 2 D P is (P 1 OPT P 2 ) P 1 D 1 P 2 D P is (P 1 FILTER R) where R is a conjunction of L-constraints C. Nikolaou and M. Koubarakis Incomplete Information in RDF 19/36

Triple pattern evaluation (case 1) Example Database D fire1 occuredin _R1. Query q?f occuredin?r R1 NTPP "x 6 x 23 y 8 y 19" C. Nikolaou and M. Koubarakis Incomplete Information in RDF 20/36

Triple pattern evaluation (case 1) Example Database D fire1 occuredin _R1. Query q?f occuredin?r R1 NTPP "x 6 x 23 y 8 y 19" Answer (set of conditional mappings) { ({?F ) } q D = fire1,?r R1}, true C. Nikolaou and M. Koubarakis Incomplete Information in RDF 20/36

Triple pattern evaluation (case 2) Example Database D fire1 occuredin _R1. R1 NTPP "x 6 x 23 y 8 y 19" Query q?f occuredin "x 1 x 2 y 1 y 2" C. Nikolaou and M. Koubarakis Incomplete Information in RDF 21/36

Triple pattern evaluation (case 2) Example Database D fire1 occuredin _R1. R1 NTPP "x 6 x 23 y 8 y 19" Query q?f occuredin "x 1 x 2 y 1 y 2" Answer (set of conditional mappings) { ({?F ) } q D = fire1}, R1 EQ "x 1 x 2 y 1 y 2" C. Nikolaou and M. Koubarakis Incomplete Information in RDF 21/36

Evaluation of FILTER graph patterns Example Database D fire1 occuredin _R1. R1 NTPP "x 6 x 23 y 8 y 19" Query q?f occuredin?r. FILTER (?R NTPP "x 1 x 2 y 1 y 2") C. Nikolaou and M. Koubarakis Incomplete Information in RDF 22/36

Evaluation of FILTER graph patterns Example Database D fire1 occuredin _R1. R1 NTPP "x 6 x 23 y 8 y 19" Query q?f occuredin?r. FILTER (?R NTPP "x 1 x 2 y 1 y 2") Answer q D = { ({?F fire1,?r R1}, R1 NTPP "x 1 x 2 y 1 y 2" )} C. Nikolaou and M. Koubarakis Incomplete Information in RDF 22/36

SELECT queries Example Database D fire1 occuredin _R1. R1 NTPP "x 6 x 23 y 8 y 19" Query q SELECT?F WHERE {?F occuredin?r. FILTER (?R NTPP "x 1 x 2 y 1 y 2")} C. Nikolaou and M. Koubarakis Incomplete Information in RDF 23/36

SELECT queries Example Database D fire1 occuredin _R1. R1 NTPP "x 6 x 23 y 8 y 19" Query q SELECT?F WHERE {?F occuredin?r. FILTER (?R NTPP "x 1 x 2 y 1 y 2")} Answer (set of conditional mappings) q D = { ({?F fire1}, R1 NTPP "x 1 x 2 y 1 y 2" )} C. Nikolaou and M. Koubarakis Incomplete Information in RDF 23/36

CONSTRUCT queries Example Database D fire1 occuredin _R1. R1 NTPP "x 6 x 23 y 8 y 19" Query q CONSTRUCT {?F type Fire } WHERE {?F occuredin?r } C. Nikolaou and M. Koubarakis Incomplete Information in RDF 24/36

CONSTRUCT queries Example Database D fire1 occuredin _R1. R1 NTPP "x 6 x 23 y 8 y 19" Answer (RDF i database) Query q CONSTRUCT {?F type Fire } WHERE {?F occuredin?r } D = (G, φ) fire1 type Fire. R1 NTPP "x 6 x 23 y 8 y 19" C. Nikolaou and M. Koubarakis Incomplete Information in RDF 24/36

CONSTRUCT queries Example Database D fire1 occuredin _R1. R1 NTPP "x 6 x 23 y 8 y 19" Answer (RDF i database) Query q CONSTRUCT {?F type Fire } WHERE {?F occuredin?r } D = (G, φ) fire1 type Fire. R1 NTPP "x 6 x 23 y 8 y 19" Closure property C. Nikolaou and M. Koubarakis Incomplete Information in RDF 24/36

Correctness of SPARQL query evaluation for RDF i Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? C. Nikolaou and M. Koubarakis Incomplete Information in RDF 25/36

Correctness of SPARQL query evaluation for RDF i Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute Rep D G q q D Rep G C. Nikolaou and M. Koubarakis Incomplete Information in RDF 25/36

Correctness of SPARQL query evaluation for RDF i Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute Rep D G q q D Rep G C. Nikolaou and M. Koubarakis Incomplete Information in RDF 25/36

Correctness of SPARQL query evaluation for RDF i Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute Rep D G q q D Rep G C. Nikolaou and M. Koubarakis Incomplete Information in RDF 25/36

Correctness of SPARQL query evaluation for RDF i Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute Rep D G q q OR Rep( q D ) = q Rep(D) D Rep G C. Nikolaou and M. Koubarakis Incomplete Information in RDF 25/36

Correctness of SPARQL query evaluation for RDF i Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute. Does it? Rep D G q q OR Rep( q D ) = q Rep(D) D Rep G C. Nikolaou and M. Koubarakis Incomplete Information in RDF 25/36

Correctness of SPARQL query evaluation for RDF i Does query evaluation compute the correct answer (the answer agrees with the semantic definition)? The following diagram should commute. Does it? Rep D G q q OR Rep( q D ) = q Rep(D) D Rep G C. Nikolaou and M. Koubarakis Incomplete Information in RDF 25/36

Certain answer to the rescue Definition The certain answer to query q over a set of RDF graphs G is set { q G G G} C. Nikolaou and M. Koubarakis Incomplete Information in RDF 26/36

Certain answer to the rescue Definition The certain answer to query q over a set of RDF graphs G is set { q G G G} Using the notion of certain answer we can relax the earlier equality requirement to one that uses Q-equivalence. C. Nikolaou and M. Koubarakis Incomplete Information in RDF 26/36

Certain answer to the rescue Definition The certain answer to query q over a set of RDF graphs G is set { q G G G} Using the notion of certain answer we can relax the earlier equality requirement to one that uses Q-equivalence. Definition Let Q be a fragment of SPARQL. Two sets of RDF graphs G, H will be Q-equivalent (denoted by G Q H) if they give the same certain answer to every query q Q { q G G G} = { q H H H} C. Nikolaou and M. Koubarakis Incomplete Information in RDF 26/36

Representation system Let D be the set of all RDF i databases G be the set of all RDF graphs Rep : D G be a function determining the set of possible RDF graphs corresponding to an RDF i database, and Q be a fragment of SPARQL D, Rep, Q is a representation system if for all D D and all q Q, there exists an RDF i database q D such that Rep( q D ) Q q Rep(D) C. Nikolaou and M. Koubarakis Incomplete Information in RDF 27/36

Representation system Let D be the set of all RDF i databases G be the set of all RDF graphs Rep : D G be a function determining the set of possible RDF graphs corresponding to an RDF i database, and Q be a fragment of SPARQL D, Rep, Q is a representation system if for all D D and all q Q, there exists an RDF i database q D such that Rep( q D ) Q q Rep(D) Are there interesting fragments Q of SPARQL that lead to a representation system? C. Nikolaou and M. Koubarakis Incomplete Information in RDF 27/36

Representation systems for RDF i Theorem The following fragments of SPARQL can give us representation systems for RDF i (with D and Rep as defined): Q C AUF : CONSTRUCT queries using only AND, UNION, and FILTER graph patterns, and without blank nodes in their templates Q C WD : CONSTRUCT queries using only well-designed graph patterns, and without blank nodes in their templates Well-designed graph patterns [Pérez/Arenas/Gutierrez 06] AND, FILTER, OPT fragment P FILTER R: safe P 1 OPT P 2 : variables in P 2 are properly scoped C. Nikolaou and M. Koubarakis Incomplete Information in RDF 28/36

Representation systems for RDF i (cont d) Monotonicity Definition A fragment Q of SPARQL is monotone if for every q Q and RDF graphs G and H such that G H, it is q G q H. Proposition [Arenas/Pérez 11] The fragment of SPARQL corresponding to AND, UNION, and FILTER graph patterns is monotone. The fragment of SPARQL corresponding to well-designed graph patterns is weakly-monotone ( ). Proposition Fragments Q C AUF and QC WD are monotone. C. Nikolaou and M. Koubarakis Incomplete Information in RDF 29/36

Computing certain answers Representation systems guarantee correctness of query evaluation for RDF i and SPARQL Query evaluation computes an RDF i database q D = D = (G, φ) How could we compute the certain answer? Rep( q D ) Rep( q D ) is infinite! C. Nikolaou and M. Koubarakis Incomplete Information in RDF 30/36

Computing certain answers (cont d) Theorem For D = (G, φ) and q from Q C AUF or QC WD, the certain answer of q over D can be computed as follows: i) compute q D = D q = (G q, φ), ii) compute the RDF i database (H q, φ) = ((D q ) EQ ), and iii) return the set of RDF triples {(s, p, o) ((s, p, o), θ) H q such that φ = θ and o / U} C. Nikolaou and M. Koubarakis Incomplete Information in RDF 31/36

The certainty problem CERT (q, H, D) Input An RDF graph H, a CONSTRUCT query q, and an RDF i database D Question Does H belong to the certain answer of q over D? H q Rep(D)? C. Nikolaou and M. Koubarakis Incomplete Information in RDF 32/36

The certainty problem CERT (q, H, D) Input An RDF graph H, a CONSTRUCT query q, and an RDF i database D Question Does H belong to the certain answer of q over D? H q Rep(D)? We study the data complexity of CERT (q, H, D) H and D are part of the input q is fixed C. Nikolaou and M. Koubarakis Incomplete Information in RDF 32/36

Deciding the certainty problem Theorem CERT (q, H, D) is equivalent to deciding whether formula is true ( l)(φ( l) Θ(t, q, D, l)) t H l is the vector of all e-literals in D Θ(t, q, D, l) is of the form θ 1 θ k, where θ i is a conjunction of L-constraints C. Nikolaou and M. Koubarakis Incomplete Information in RDF 33/36

Computational complexity Problem L data complexity CERT (q, H, D) ECL/diPCL/dePCL/RCL conp-complete TCL/PCL (RCC-5) EXPTIME C. Nikolaou and M. Koubarakis Incomplete Information in RDF 34/36

Computational complexity Problem L data complexity CERT (q, H, D) ECL/diPCL/dePCL/RCL conp-complete TCL/PCL (RCC-5) EXPTIME Problem combined complexity data complexity SPARQL PSPACE-complete SPARQL AUF NP-complete SPARQL WD conp-complete LOGSPACE C. Nikolaou and M. Koubarakis Incomplete Information in RDF 34/36

Conclusions RDF i framework Modeling of incomplete information for property values Formal semantics through possible worlds semantics SPARQL query evaluation and certain answer semantics Two representation systems for RDF i and SPARQL Algorithm for certain answer computation Preliminary complexity analysis C. Nikolaou and M. Koubarakis Incomplete Information in RDF 35/36

Future work More general models of incomplete information (subject, predicate) More refined complexity results Scalable implementation when L expresses topological constraints with/without constants (TCL/PCL) Connection with query processing for the topology vocabulary extension of GeoSPARQL Probabilistic extension to RDF i Data integration theory for linked data (only practice exists so far) Connection to geospatial OBDA using DL logics C. Nikolaou and M. Koubarakis Incomplete Information in RDF 36/36

Thank you

Constraint languages L Properties of L Many-sorted first-order language Interpreted over a fixed (intended) structure M L EQ: distinguished equality predicate L-constraints: quantifier-free formulae of L Weakly closed under negation: the negation of every atomic L-constraint is equivalent to a disjunction of L-constraints

Correctness of SPARQL query evaluation for RDF i An easy negative example Example (classical RDF - OWA) s p o. D CONSTRUCT { s?p?o } WHERE { s?p?o } q

Correctness of SPARQL query evaluation for RDF i An easy negative example Example (classical RDF - OWA) s p o. D CONSTRUCT { s?p?o } WHERE { s?p?o } q Then, q D = D

Correctness of SPARQL query evaluation for RDF i (cont d) An easy negative example Example Let us compare the the set of graphs represented by q D with q Rep(D)

Correctness of SPARQL query evaluation for RDF i (cont d) An easy negative example Example Let us compare the the set of graphs represented by q D with q Rep(D) Rep( q D ) = { { } { (s, p, o) (s, p, o), (c, d, e) } { (s, p, o), (s, b, c) } },

Correctness of SPARQL query evaluation for RDF i (cont d) An easy negative example Example Let us compare the the set of graphs represented by q D with q Rep(D) Rep( q D ) = { { } { (s, p, o) (s, p, o), (c, d, e) q Rep(D) = { { } { (s, p, o) (s, p, o), (s, b, c) } { (s, p, o), (s, b, c) } }, } },

Correctness of SPARQL query evaluation for RDF i (cont d) An easy negative example Example Let us compare the the set of graphs represented by q D with q Rep(D) Rep( q D ) = { { } { (s, p, o) (s, p, o), (c, d, e) q Rep(D) = { { } { (s, p, o) (s, p, o), (s, b, c) } { (s, p, o), (s, b, c) } }, } }, There is no g q Rep(D) containing the triple (c, d, e)!

Correctness of SPARQL query evaluation for RDF i (cont d) An easy negative example Example Let us compare the the set of graphs represented by q D with q Rep(D) Rep( q D ) = { { } { (s, p, o) (s, p, o), (c, d, e) q Rep(D) = { { } { (s, p, o) (s, p, o), (s, b, c) } { (s, p, o), (s, b, c) } }, } }, There is no g q Rep(D) containing the triple (c, d, e)! This would work if RDF made the CWA

Correctness of SPARQL query evaluation for RDF i (cont d) An easy negative example Example Let us compare the the set of graphs represented by q D with q Rep(D) Rep( q D ) = { { } { (s, p, o) (s, p, o), (c, d, e) q Rep(D) = { { } { (s, p, o) (s, p, o), (s, b, c) } { (s, p, o), (s, b, c) } }, } }, There is no g q Rep(D) containing the triple (c, d, e)! This would work if RDF made the CWA We know this already from the relational case [Imielinski/Lipski 84]

Computing certain answers Definitions Definition (EQ-completion) The EQ-completed form of D = (G, φ), denoted by D EQ = (G EQ, φ), is taken from D by replacing all e-literals l U appearing in G by the constant c C such that φ = l EQ c

Computing certain answers Definitions Definition (EQ-completion) The EQ-completed form of D = (G, φ), denoted by D EQ = (G EQ, φ), is taken from D by replacing all e-literals l U appearing in G by the constant c C such that φ = l EQ c Definition (Normalization) The normalized form of D is the RDF i database D = (G, φ) where G is the set {(t, θ) (t, θ i ) G for all i = 1... n, and θ is i θ i } G = {(t, θ 1 ), (t, θ 2 ), (t, θ )} G = {(t, θ 1 θ 2 ), (t, θ )}