SPARQL Formalization

Similar documents
arxiv:cs/ v1 [cs.db] 26 May 2006

The Expressive Power of SPARQL

SPARQL Extensions and Outlook

Incomplete Information in RDF

The Expressive Power of SPARQL

Designing a Query Language for RDF: Marrying Open and Closed Worlds

nsparql: A Navigational Language for RDF 1

Nested Constructs vs. Sub-Selects in SPARQL

On the Semantics of SPARQL Queries with Optional Matching under Entailment Regimes

The Expressive Power of SPARQL

Designing a Query Language for RDF: Marrying Open and Closed Worlds

Querying for Provenance, Trust, Uncertainty and other Meta Knowledge in RDF 1

Semantic Reasoning with SPARQL in Heterogeneous Multi-Context Systems

Incremental SPARQL Evaluation for Query Answering on Linked Data

Representing and Querying Validity Time in RDF and OWL: A Logic-Based Approach

SPARQL Algebra Sebastian Rudolph

Minimal Deductive Systems for RDF

Exchanging More than Complete Data

Preserving Information Content in RDF using Bounded Homomorphisms

Foundations of SPARQL Query Optimization

Unit 4: SPARQL1.0 in Detail & SPARQL s formal semantics. Axel Polleres. Siemens AG Österreich All rights reserved.

RDFLog: It s like Datalog for RDF

On the Satisfiability Problem of Patterns in SPARQL 1.1

Provenance Semirings. Todd Green Grigoris Karvounarakis Val Tannen. presented by Clemens Ley

Query-based Linked Data Anonymization

Preliminaries. Introduction to EF-games. Inexpressivity results for first-order logic. Normal forms for first-order logic

Data Dependencies in the Presence of Difference

OWL Basics. Technologies for the Semantic Web. Building a Semantic Web. Ontology

Ordering, Indexing, and Searching Semantic Data: A Terminology Aware Index Structure

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation

Semantic Web SPARQL. Gerd Gröner, Matthias Thimm. July 16,

Online Appendix to The Recovery of a Schema Mapping: Bringing Exchanged Data Back

Question Answering on Statistical Linked Data

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and

SPARQL Rewriting for Query Mediation over Mapped Ontologies

Relational Algebra & Calculus

Efficient OLAP Operations For RDF Analytics

I N F S Y S R E S E A R C H R E P O R T A QUERY MODEL TO CAPTURE EVENT PATTERN MATCHING IN RDF STREAM PROCESSING QUERY LANGUAGES

First-order logic Syntax and semantics

INTRODUCTION TO RELATIONAL DATABASE SYSTEMS

Lecture Notes: Axiomatic Semantics and Hoare-style Verification

Abstracting real-valued parameters in parameterised boolean equation systems

RDF and Logic: Reasoning and Extension

Annotation algebras for RDFS

Advanced DB CHAPTER 5 DATALOG

Enhancing the Updatability of Projective Views

Equational Logic. Chapter Syntax Terms and Term Algebras

arxiv: v1 [cs.db] 22 Mar 2016 ABSTRACT

Declarative Programming for Agent Applications

Modal Logics. Most applications of modal logic require a refined version of basic modal logic.

Distributed Query Processing in the Presence of Blank Nodes

Flexible Querying for SPARQL

03 Review of First-Order Logic

First-Order Logic. 1 Syntax. Domain of Discourse. FO Vocabulary. Terms

Relational Algebra and Calculus

First Order Logic (FOL) 1 znj/dm2017

Approximation and Relaxation of Semantic Web Path Queries

Equivalence of SQL Queries In Presence of Embedded Dependencies

Translating XML Web Data into Ontologies

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation

OWL Semantics. COMP60421 Sean Bechhofer University of Manchester

The Lambek-Grishin calculus for unary connectives

Distributed query processing in the presence of blank nodes.

Existential Second-Order Logic and Modal Logic with Quantified Accessibility Relations

SPARQL Query Containment under RDFS Entailment Regime

A Logic Primer. Stavros Tripakis University of California, Berkeley. Stavros Tripakis (UC Berkeley) EE 144/244, Fall 2014 A Logic Primer 1 / 35

SEMANTICS AND COMPLEXITY OF SPARQL 1.1 PROPERTY PATHS

GeoQuery Tool to Pose Spatial Queries over the Web

First-Order Logic. Chapter Overview Syntax

Beyond First-Order Logic

Databases 2011 The Relational Algebra

Representability in DL-Lite R Knowledge Base Exchange

A Logic Primer. Stavros Tripakis University of California, Berkeley

INTEGRITY CONSTRAINTS FOR THE SEMANTIC WEB: AN OWL 2 DL EXTENSION

Classical First-Order Logic

On Compensation Primitives as Adaptable Processes

Structured Descriptions & Tradeoff Between Expressiveness and Tractability

Lecture 3: Semantics of Propositional Logic

Semantics and Validation of Recursive SHACL [Extended Version]

Plan of the lecture. G53RDB: Theory of Relational Databases Lecture 2. More operations: renaming. Previous lecture. Renaming.

CS632 Notes on Relational Query Languages I

Adding ternary complex roles to ALCRP(D)

Outline. A recursive function follows the structure of inductively-defined data.

Expressive number restrictions in Description Logics

A NOTE ON COMPOSITIONALITY IN THE FIRST ORDER LANGUAGE

On the satisfiability problem for a 4-level quantified syllogistic and some applications to modal logic

Relations to first order logic

Outline. Overview. Syntax Semantics. Introduction Hilbert Calculus Natural Deduction. 1 Introduction. 2 Language: Syntax and Semantics

Lecture 2: Syntax. January 24, 2018

COMP 409: Logic Homework 5

Lightweight Description Logics: DL-Lite A and EL ++

EgGS - The Energy Game Strategizer

Categories, Proofs and Programs

Finite Universes. L is a fixed-length language if it has length n for some

Algebraic Trace Theory

Syntax. Notation Throughout, and when not otherwise said, we assume a vocabulary V = C F P.

Topic-based Selectivity Estimation for Hybrid Queries over RDF Graphs Technical Report

Knowledge Base Exchange: The Case of OWL 2 QL

Handbook of Logic and Proof Techniques for Computer Science

CS2742 midterm test 2 study sheet. Boolean circuits: Predicate logic:

Transcription:

SPARQL Formalization Marcelo Arenas, Claudio Gutierrez, Jorge Pérez Department of Computer Science Pontificia Universidad Católica de Chile Universidad de Chile Center for Web Research http://www.cwr.cl M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 1 / 34

SPARQL: A simple RDF query language SELECT?Name?Email WHERE {?X :name?name?x :email?email } The semantics of simple SPARQL queries is easy to understand, at least intuitively. Give me the name and email of the resources in the datasource M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 2 / 34

But things can become more complex... Interesting features of pattern matching on graphs Grouping Optional parts Nesting Union of patterns Filtering... { { P1 P2 } OPTIONAL { P5 } } { P3 P4 } OPTIONAL { P7 } } OPTIONAL { P8 } } } } UNION { P9 } FILTER ( R ) } M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 3 / 34

A formal semantics for SPARQL is needed. A formal approach would be beneficial Clarifying corner cases Helping in the implementation process Providing sound foundations We will see: A formal compositional semantics based on [PAG06: Semantics and Complexity of SPARQL] This formalization is the starting point of the official semantics of the SPARQL language by the W3C. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 4 / 34

Outline Motivation Basic Syntax Semantics Datasets Query result forms Dealing with bnodes Dealing with duplicates M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 5 / 34

First of all, a simplified algebraic syntax Triple patterns: RDF triple + variables (no bnodes for now) (?X, name,?name) The base case for the algebra is a set of triple patterns {t 1,t 2,...,t k }. This is called basic graph pattern (BGP). Example { (?X, name,?name), (?X, email,?email) } M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 6 / 34

First of all, a simplified algebraic syntax (cont.) We consider initially three basic operators: AND, UNION, OPT. We will use them to construct graph pattern expressions from basic graph patterns. A SPARQL graph pattern: ((({t 1,t 2 } AND t 3 ) OPT {t 4,t 5 }) AND (t 6 UNION {t 7,t 8 })) it is a full parenthesized expression Full parenthesized expressions give us explicit precedence/association. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 7 / 34

Mappings: building block for the semantics Definition A mapping is a partial function from variables to RDF terms. Given a mapping µ and a basic graph pattern P: dom(µ): the domain of µ. µ(p): the set obtained from P replacing the variables according to µ Example µ = {?X R 1,?Y R 2,?Name john,?email J@ed.ex} P = {(?X, name,?name), (?X, email,?email)} µ(p) = {(R 1, name, john), (R 1, email, J@ed.ex)} M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 8 / 34

The semantics of basic graph pattern Definition The evaluation of the BGP P over a graph G, denoted by [[P]] G, is the set of all mappings µ such that: dom(µ) is exactly the set of variables occurring in P µ(p) G M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 9 / 34

Example G (R 1, name, john) (R 1, email, J@ed.ex) (R 2, name, paul) [[{(?X, name,?y )}]] G { µ1 = {?X R 1,?Y john} µ 2 = {?X R 2,?Y paul} }?X?Y µ 1 R 1 john µ 2 R 2 paul [[{(?X, name,?y ), (?X, email,?z)}]] G { µ = {?X R1,?Y john,?z J@ed.ex} }?X?Y?Z µ R 1 john J@ed.ex M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 10 / 34

Example G (R 1, name, john) (R 1, email, J@ed.ex) (R 2, name, paul) [[{(R 1, webpage,?w)}]] G { } [[{(R 3, name, ringo)}]] G { } [[{(R 2, name, paul)}]] G { µ = { } } [[{ }]] G { µ = { } } M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 11 / 34

Compatible mappings: mappings that can be merged. Definition The mappings µ 1, µ 2 are compatibles iff they agree in their shared variables: µ 1 (?X) = µ 2 (?X) for every?x dom(µ 1 ) dom(µ 2 ). µ 1 µ 2 is also a mapping. Example?X?Y?U?V µ 1 R 1 john µ 2 R 1 J@edu.ex µ 3 P@edu.ex R 2 µ 1 µ 2 R 1 john J@edu.ex µ 1 µ 3 R 1 john P@edu.ex R 2 µ = { } is compatible with every mapping. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 12 / 34

Sets of mappings and operations Let M 1 and M 2 be sets of mappings: Definition Join: M 1 M 2 {µ 1 µ 2 µ 1 M 1,µ 2 M 2, and µ 1,µ 2 are compatibles} extending mappings in M 1 with compatible mappings in M 2 will be used to define AND Definition Union: M 1 M 2 {µ µ M 1 or µ M 2 } mappings in M 1 plus mappings in M 2 (the usual set union) will be used to define UNION M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 13 / 34

Sets of mappings and operations Definition Difference: M 1 M 2 {µ M 1 for all µ M 2, µ and µ are not compatibles} mappings in M 1 that cannot be extended with mappings in M 2 Definition Left outer join: M 1 M 2 = (M 1 M 2 ) (M 1 M 2 ) extension of mappings in M 1 with compatible mappings in M 2 plus the mappings in M 1 that cannot be extended. will be used to define OPT M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 14 / 34

Semantics of general graph patterns Definition Given a graph G the evaluation of a pattern is recursively defined [[(P 1 AND P 2 )]] G = [[P 1 ]] G [[P 2 ]] G [[(P 1 UNION P 2 )]] G = [[P 1 ]] G [[P 2 ]] G [[(P 1 OPT P 2 )]] G = [[P 1 ]] G [[P 2 ]] G the base case is the evaluation of a BGP. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 15 / 34

Example (AND) G : (R 1, name, john) (R 2, name, paul) (R 3, name, ringo) (R 1, email, J@ed.ex) (R 3, email, R@ed.ex) (R 3, webpage, www.ringo.com) [[{(?X, name,?n)} AND {(?X, email,?e)}]] G [[{(?X, name,?n)}]] G?X?N µ 1 R 1 john µ 2 R 2 paul µ 3 R 3 ringo [[{(?Y, email,?e)}]] G?X?E µ 4 R 1 J@ed.ex µ 5 R 3 R@ed.ex?X?N?E µ 1 µ 4 R 1 john J@ed.ex µ 3 µ 5 R 3 ringo R@ed.ex M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 16 / 34

Example (OPT) G : (R 1, name, john) (R 2, name, paul) (R 3, name, ringo) (R 1, email, J@ed.ex) (R 3, email, R@ed.ex) (R 3, webpage, www.ringo.com) [[{(?X, name,?n)} OPT {(?X, email,?e)}]] G [[{(?X, name,?n)}]] G?X?N µ 1 R 1 john µ 2 R 2 paul µ 3 R 3 ringo [[{(?X, email,?e)}]] G?X?E µ 4 R 1 J@ed.ex µ 5 R 3 R@ed.ex?X?N?E µ 1 µ 4 R 1 john J@ed.ex µ 3 µ 5 R 3 ringo R@ed.ex µ 2 R 2 paul M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 17 / 34

Example (UNION) G : (R 1, name, john) (R 2, name, paul) (R 3, name, ringo) (R 1, email, J@ed.ex) (R 3, email, R@ed.ex) (R 3, webpage, www.ringo.com) [[{(?X, email,?info)} UNION {(?X, webpage,?info)}]] G [[{(?X, email,?info)}]] G [[{(?X, webpage,?info)}]] G?X?Info µ 1 R 1 J@ed.ex µ 2 R 3 R@ed.ex?X?Info µ 3 R 3 www.ringo.com?x?info µ 1 R 1 J@ed.ex µ 2 R 3 R@ed.ex µ 3 R 3 www.ringo.com M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 18 / 34

Boolean filter expressions (value constraints) In filter expressions we consider the equality = among variables and RDF terms a unary predicate bound boolean combinations (,, ) A mapping µ satisfies?x = c if µ(?x) = c?x =?Y if µ(?x) = µ(?y ) bound(?x) if µ is defined in?x, i.e.?x dom(µ) M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 19 / 34

Satisfaction of value constraints If P is a graph pattern and R is a value constraint then (P FILTER R) is also a graph pattern. Definition Given a graph G [[(P FILTER R)]] G = {µ [[P]] G µ satisfies R} i.e. mappings in the evaluation of P that satisfy R. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 20 / 34

Example (FILTER) G : (R 1, name, john) (R 2, name, paul) (R 3, name, ringo) (R 1, email, J@ed.ex) (R 3, email, R@ed.ex) (R 3, webpage, www.ringo.com) [[({(?X, name,?n)} FILTER (?N = ringo?n = paul))]] G?X?N µ 1 R 1 john µ 2 R 2 paul µ 3 R 3 ringo?n = ringo?n = paul?x?n µ 2 R 2 paul µ 3 R 3 ringo M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 21 / 34

Example (FILTER) G : (R 1, name, john) (R 2, name, paul) (R 3, name, ringo) (R 1, email, J@ed.ex) (R 3, email, R@ed.ex) (R 3, webpage, www.ringo.com) [[(({(?X, name,?n)} OPT {(?X, email,?e)}) FILTER bound(?e))]] G?X?N?E µ 1 µ 4 R 1 john J@ed.ex µ 3 µ 5 R 3 ringo R@ed.ex µ 2 R 2 paul bound(?e)?x?n µ 2 R 2 paul M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 22 / 34

FILTER: differences with the official specification We restrict to the case in which all variables in R are mentioned in P. This restriction is not imposed in the official specification by W3C. The semantics without the restriction does not modify the expressive power of the language. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 23 / 34

SPARQL Datasets One of the interesting features of SPARQL is that a query may retrieve data from different sources. Definition A SPARQL dataset is a set D = {G 0, u 1,G 1, u 2,G 2,..., u n,g n } G 0 is the default graph, u i,g i are named graphs name(d) = {u 1,u 2,...,u n } d D is a function such d D (u i ) = G i. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 24 / 34

The GRAPH operator if u is an IRI,?X is a variable and P is a graph pattern, then (u GRAPH P) is a graph pattern (?X GRAPH P) is a graph pattern GRAPH will permit us to dynamically change the graph against which our pattern is evaluated. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 25 / 34

Semantics of GRAPH Definition Given a dataset D and a graph pattern P [[(u GRAPH P)]] G = [[P]] dd (u) ( [[(?X GRAPH P)]] G = [[P]] dd (u) u name(d) ) {{?X u}} Definition The evaluation of a general pattern P against a dataset D, denoted by [[P]] D, is the set [[P]] G0 where G 0 is the default graph in D. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 26 / 34

Example (GRAPH) G 0 : tb, G 1 : trs, G 2 : D (R 1, name, john) (R 1, email, J@ed.ex) (R 2, name, paul) (R 4, name, mick) (R 5, name, keith) (R 4, email, M@ed.ex) (R 5, email, K@ed.ex) [[( trs GRAPH {(?X, name,?n)})]] D [[( trs GRAPH {(?X, name,?n)})]] G0 [[{(?X, name,?n)}]] G2?X?N µ 1 R 4 mick µ 2 R 5 keith M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 27 / 34

Example (GRAPH) G 0 : tb, G 1 : trs, G 2 : D (R 1, name, john) (R 1, email, J@ed.ex) (R 2, name, paul) (R 4, name, mick) (R 5, name, keith) (R 4, email, M@ed.ex) (R 5, email, K@ed.ex)?X?N µ 1 R 1 john µ 2 R 2 paul [[(?G GRAPH {(?X, name,?n)})]] D [[{(?X, name,?n)}]] G1 [[{(?X, name,?n)}]] G2 {{?G tb}}?g?x?n tb R 1 john tb R 2 paul trs R 4 mick trs R 5 keith {{?G tb}} {{?G trs}}?x?n µ 3 R 4 mick µ 4 R 5 keith {{?G trs}} M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 28 / 34

SELECT Up to this point we have concentrated in the body of a SPARQL query, i.e. in the graph pattern matching expression. A query can also process the values of the variables. The most simple processing operation is the selection of some variables appearing in the query. Definition A SELECT query is a tuple (W,P) where P is a graph pattern and W is a set of variable. The answer of a SELECT query against a dataset D is {µ W µ [[P]] D } where µ W is the restriction of µ to domain W. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 29 / 34

CONSTRUCT A query can also output an RDF graph. The construction of the output graph is based on a template. A template is a set of triple patterns possibly with bnodes. Example T 1 = {(?X, name,?y ), (?X, info,?i), (?X, addr, B)} with B a bnode Definition A CONSTRUCT query is a tuple (T,P) where P is a graph pattern and T is a template. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 30 / 34

CONSTRUCT: Semantics Definition The answer of a CONSTRUCT query (T,P) against a dataset D is obtained by for every µ [[P]] D create a template T µ with fresh bnodes take the union of µ(t µ ) for every µ [[P]] D discard the not valid RDF triples some variables have not been instantiated. bnodes in predicate positions M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 31 / 34

Blank nodes in graph patterns We allow now bnodes in triple patterns. Bnodes act as existentials scoped to the basic graph pattern. Definition The evaluation of the BGP P with bnodes over the graph G denoted [[P]] G, is the set of all mappings µ such that: dom(µ) is exactly the set of variables occurring in P, there exists a function θ from bnodes of P to G such that µ(θ(p)) G. A natural extension of BGPs without bnodes. The algebra remains the same. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 32 / 34

Bag/Multiset semantics In a bag, a mapping can have cardinality greater than one. Every mapping µ in a bag M is annotated with an integer c M (µ) that represents its cardinality (c M (µ) = 0 if µ / M). Operations between sets of mappings can be extended to bags maintaining duplicates: Definition µ M = M 1 M 2, c M (µ) = c M1 (µ 1 ) c M2 (µ 2 ), µ=µ 1 µ 2 µ M = M 1 M 2, c M (µ) = c M1 (µ) + c M2 (µ), µ M = M 1 M 2, c M (µ) = c M1 (µ). Intuition: we simply do not discard duplicates. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 33 / 34

References R. Cyganiak, A Relational Algebra for SPARQL. Tech Report HP Laboratories, HPL-2005-170. E. Prud hommeaux, A. Seaborne, SPARQL Query Language for RDF. W3C Working Draft, 2007. J. Pérez, M. Arenas, C. Gutierrez, Semantics and Complexity of SPARQL. In Int. Semantic Web Conference 2006. J. Pérez, M. Arenas, C. Gutierrez, Semantics of SPARQL. Tech Report Universidad de Chile 2006, TR/DCC-2006-17. M. Arenas, C. Gutierrez, J. Pérez SPARQL Formalization 34 / 34