A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation

Similar documents
Probabilistic Databases

Brief Tutorial on Probabilistic Databases

Query answering using views

Knowledge Compilation Meets Database Theory : Compiling Queries to Decision Diagrams

GAV-sound with conjunctive queries

Query Evaluation on Probabilistic Databases. CSE 544: Wednesday, May 24, 2006

Parallel-Correctness and Transferability for Conjunctive Queries

Relational Calculus. Dr Paolo Guagliardo. University of Edinburgh. Fall 2016

arxiv: v1 [cs.ai] 21 Sep 2014

Tractable Lineages on Treelike Instances: Limits and Extensions

Factorised Representations of Query Results: Size Bounds and Readability

Towards Deterministic Decomposable Circuits for Safe Queries

Database Theory VU , SS Complexity of Query Evaluation. Reinhard Pichler

Propositional and Predicate Logic - V

Cylindrical Algebraic Decomposition in Coq

Scalable Uncertainty Management

Factorized Relational Databases Olteanu and Závodný, University of Oxford

A Toolbox of Query Evaluation Techniques for Probabilistic Databases

Queries and Materialized Views on Probabilistic Databases

Computational Logic. Relational Query Languages with Negation. Free University of Bozen-Bolzano, Werner Nutt

Bounded Query Rewriting Using Views

On the Optimal Approximation of Queries Using Tractable Propositional Languages

Lifted Inference: Exact Search Based Algorithms

Provenance Semirings. Todd Green Grigoris Karvounarakis Val Tannen. presented by Clemens Ley

Outline. Part 1: Motivation Part 2: Probabilistic Databases Part 3: Weighted Model Counting Part 4: Lifted Inference for WFOMC

Query Processing. 3 steps: Parsing & Translation Optimization Evaluation

Quantified Boolean Formulas: Complexity and Expressiveness

First-Order Knowledge Compilation for Probabilistic Reasoning

What is the decimal (base 10) representation of the binary number ? Show your work and place your final answer in the box.

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation

Comp487/587 - Boolean Formulas

Computing Query Probability with Incidence Algebras Technical Report UW-CSE University of Washington

Product rule. Chain rule

Database Design and Implementation

Fundamentos lógicos de bases de datos (Logical foundations of databases)

INTRODUCTION TO RELATIONAL DATABASE SYSTEMS

INSTITUTE OF MATHEMATICS THE CZECH ACADEMY OF SCIENCES. A trade-off between length and width in resolution. Neil Thapen

3. Only sequences that were formed by using finitely many applications of rules 1 and 2, are propositional formulas.

Algebraic Model Counting

arxiv: v1 [cs.db] 21 Feb 2017

CS 512, Spring 2017, Handout 10 Propositional Logic: Conjunctive Normal Forms, Disjunctive Normal Forms, Horn Formulas, and other special forms

Structural Tractability of Counting of Solutions to Conjunctive Queries

7 RC Simulates RA. Lemma: For every RA expression E(A 1... A k ) there exists a DRC formula F with F V (F ) = {A 1,..., A k } and

A brief introduction to Logic. (slides from

Introduction to Model Theory

2.5.2 Basic CNF/DNF Transformation

Tractable Reasoning in First-Order Knowledge Bases with Disjunctive Information

Propositional Logic Not Enough

Model Counting of Query Expressions: Limitations of Propositional Methods

CS2742 midterm test 2 study sheet. Boolean circuits: Predicate logic:

CS632 Notes on Relational Query Languages I

Sum-Product Networks: A New Deep Architecture

The complexity of enumeration and counting for acyclic conjunctive queries

Regular n-ary Queries in Trees and Variable Independence

Incomplete Information in RDF

FIRST-ORDER QUERY EVALUATION ON STRUCTURES OF BOUNDED DEGREE

Propositional Logic: Models and Proofs

15-855: Intensive Intro to Complexity Theory Spring Lecture 7: The Permanent, Toda s Theorem, XXX

Single-Round Multi-Join Evaluation. Bas Ketsman

CSE 544: Principles of Database Systems

On Factorisation of Provenance Polynomials

Finite Model Theory: First-Order Logic on the Class of Finite Models

Modal Dependence Logic

Simulation of quantum computers with probabilistic models

Classes of Boolean Functions

Probabilistic Theorem Proving. Vibhav Gogate (Joint work with Pedro Domingos)

With Question/Answer Animations. Chapter 2

Proofs. Chapter 2 P P Q Q

Equivalence of SQL Queries In Presence of Embedded Dependencies

Inconsistency-Tolerant Conjunctive Query Answering for Simple Ontologies

Non-Deterministic Time

Schema Mapping Discovery from Data Instances

Introduction to Temporal Logic. The purpose of temporal logics is to specify properties of dynamic systems. These can be either

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1

The Complexity of Computing the Behaviour of Lattice Automata on Infinite Trees

Model Checking for Modal Intuitionistic Dependence Logic

PROPOSITIONAL LOGIC. VL Logik: WS 2018/19

1 PSPACE-Completeness

Composing Schema Mappings: Second-Order Dependencies to the Rescue

ACO Comprehensive Exam March 20 and 21, Computability, Complexity and Algorithms

On Datalog vs. LFP. Anuj Dawar and Stephan Kreutzer


Bayesian Networks Factor Graphs the Case-Factor Algorithm and the Junction Tree Algorithm

Tractable Counting of the Answers to Conjunctive Queries

Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog

Part 1: Propositional Logic

Computing Query Probability with Incidence Algebras

A Unit Resolution Approach to Knowledge Compilation. 2. Preliminaries

On the satisfiability problem for a 4-level quantified syllogistic and some applications to modal logic

Any Wizard of Oz fans? Discrete Math Basics. Outline. Sets. Set Operations. Sets. Dorothy: How does one get to the Emerald City?

Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases

LTCS Report. Decidability and Complexity of Threshold Description Logics Induced by Concept Similarity Measures. LTCS-Report 16-07

Datalog : A Family of Languages for Ontology Querying

Relational completeness of query languages for annotated databases

Notes. Corneliu Popeea. May 3, 2013

Dichotomies in Ontology-Mediated Querying with the Guarded Fragment

Exercises 1 - Solutions

A New 3-CNF Transformation by Parallel-Serial Graphs 1

Transcription:

Dichotomy for Non-Repeating Queries with Negation Queries with Negation in in Probabilistic Databases Robert Dan Olteanu Fink and Dan Olteanu Joint work with Robert Fink Uncertainty in Computation Simons Institute for the Theory of Computing PODS, erkeley June 24, Oct 2014 5, 2016 1 / 20 28

Outline Probabilistic Databases 101 The Dichotomy The Interesting but Hard Queries The Easy Queries Leftovers 2 / 28

Tuple-Independent Probabilistic Databases Tuple-independent database of n tuples (t i ) i [n] : Each tuple t i associated with an independent oolean random variable x i. P(x i = true) gives the probability that t i exists in the database. Possible-worlds semantics: Each possible world defined by an assignment θ of the variables (x i ) i [n] : It consists of all tuples t i for which θ(x i ) = true. It has probability P(θ) = Π i [n] P(x i = θ(x i )). tuple-independent database with n tuples has 2 n possible worlds. 3 / 28

Relational lgebra Popular database query language since Codd times. lgebra carrier: set of all finite relations lgebra operations: π (projection), (Cartesian product), (set difference), (join), σ (selection), (set union), δ (renaming) s expressive as domain relational calculus (RC) In this talk: Relational algebra fragment 1R Included: Equality joins, selections, projections, difference Excluded: Repeating relation symbols, unions 4 / 28

Relational lgebra Popular database query language since Codd times. lgebra carrier: set of all finite relations lgebra operations: π (projection), (Cartesian product), (set difference), (join), σ (selection), (set union), δ (renaming) s expressive as domain relational calculus (RC) In this talk: Relational algebra fragment 1R Included: Equality joins, selections, projections, difference Excluded: Repeating relation symbols, unions Examples of (oolean) 1R queries: re there combinations of tuples in (R, T ) that are not in (U, V )? [( ) ( )] π R() T () U() V () [( ) ( )] R() T () U() V () (in RC) Does relation S hold hands with both R and T? π [ R() S(, ) T () ] [ R() S(, ) T () ] (in RC) 4 / 28

The Query Evaluation Problem For any oolean 1R query Q and tuple-independent database D: Compute the probability that Q is true in a random world of D. The case of non-oolean queries can be reduced to the oolean case. We are interested in the data complexity of this problem. Fix the query Q and take the database D as input. 5 / 28

Outline Probabilistic Databases 101 The Dichotomy The Interesting but Hard Queries The Easy Queries Leftovers 6 / 28

Data complexity of any 1R query Q in tuple-independent databases: Polynomial time if Q is hierarchical and #P-hard otherwise. 7 / 28

Hierarchical 1R Queries (oolean) 1R query Q is hierarchical if For every pair of distinct query variables and in Q, there is no triple of relation symbols R, S, and T in Q such that: R has but not, S has both and, and T has but not. 8 / 28

Hierarchical 1R Queries (oolean) 1R query Q is hierarchical if For every pair of distinct query variables and in Q, there is no triple of relation symbols R, S, and T in Q such that: R has but not, S has both and, and T has but not. R() S(, ) T () 8 / 28

Hierarchical 1R Queries (oolean) 1R query Q is hierarchical if For every pair of distinct query variables and in Q, there is no triple of relation symbols R, S, and T in Q such that: R has but not, S has both and, and T has but not. 9 / 28

Hierarchical 1R Queries (oolean) 1R query Q is hierarchical if For every pair of distinct query variables and in Q, there is no triple of relation symbols R, S, and T in Q such that: R has but not, S has both and, and T has but not. 9 / 28

Examples Hierarchical queries: π [( R() S(, ) ) T (, ) ] π [( R() T () ) ( U() V () )] π [ (M() N() ) [( R() T () ) ( U() V () )] ] π [ π [ M() N() ] π [( R() T () ) ( U() V () )] ] 10 / 28

Examples Hierarchical queries: π [( R() S(, ) ) T (, ) ] π [( R() T () ) ( U() V () )] π [ (M() N() ) [( R() T () ) ( U() V () )] ] π [ π [ M() N() ] π [( R() T () ) ( U() V () )] ] Non-hierarchical queries: [ ] π R() S(, ) T () [ ( ) ] π π R() S(, ) T () [ ( ) ] π T () π R() S(, ) [ π X () [ ( )] ] R() π T () S(, ) 10 / 28

Outline Probabilistic Databases 101 The Dichotomy The Interesting but Hard Queries The Easy Queries Leftovers 11 / 28

Hardness Proof Idea Reduction from #P-hard model counting problem for positive bipartite DNF: Given a non-hierarchical 1R query Q and ny positive bipartite DNF formula Ψ over disjoint sets X and Y of random variables. #Ψ can be computed using linearly many calls to an oracle for P(Q), where Q is evaluated on tuple-independent databases of sizes linear in the size of Ψ. 12 / 28

Simple Case Input formula and query: Ψ = x 1 y 1 x 1 y 2 x 2 y 1 over sets X = {x 1, x 2}, Y = {y 1, y 2} ] Q = π [R() S(, ) T () Construct a database D such that Ψ becomes the grounding of Q wrt D: Column holds formulas over random variables. We use for true and for false. Variables also used as constants for and. S(x i, y j, ): x i y j is a clause in Ψ. R(x i, x i ) and T (y j, y j ): x i is a variable in X and y j is a variable in Y. R x 1 x 1 x 2 x 2 T y 1 y 1 y 2 y 2 S x 1 y 1 x 1 y 2 x 2 y 1 R S T x 1 y 1 x 1 y 1 x 1 y 2 x 1 y 2 x 2 y 1 x 2 y 1 π [ R S T ] () Ψ 13 / 28

Simple Case Input formula and query: Ψ = x 1 y 1 x 1 y 2 x 2 y 1 over sets X = {x 1, x 2}, Y = {y 1, y 2} ] Q = π [R() S(, ) T () Construct a database D such that Ψ becomes the grounding of Q wrt D: Column holds formulas over random variables. We use for true and for false. Variables also used as constants for and. S(x i, y j, ): x i y j is a clause in Ψ. R(x i, x i ) and T (y j, y j ): x i is a variable in X and y j is a variable in Y. R x 1 x 1 x 2 x 2 T y 1 y 1 y 2 y 2 S x 1 y 1 x 1 y 2 x 2 y 1 R S T x 1 y 1 x 1 y 1 x 1 y 2 x 1 y 2 x 2 y 1 x 2 y 1 π [ R S T ] () Ψ This is the only minimal hard pattern for positive 1R queries! 13 / 28

Surprising Case Input formula and query: Ψ = x 1 y 1 x 1 y 2 over sets X = {x 1}, Y = {y 1, y 2} [ ( ) ] Q = π R() π T () S(, ) Construct a database D such that Ψ becomes the grounding of Q wrt D: S(a, b, ): Clause a has variable b in Ψ. R(a, ) and T (b, b): a is a clause and b is a variable in Ψ. R T S T S π (T S) R π (T S) 1 2 x 1 x 1 y 1 y 1 y 2 y 2 1 x 1 1 y 1 2 x 1 2 y 2 1 x 1 x 1 1 y 1 y 1 2 x 1 x 1 2 y 2 y 2 1 x 1 y 1 2 x 1 y 2 1 x 1 y 1 2 x 1 y 2 14 / 28

Surprising Case Input formula and query: Ψ = x 1 y 1 x 1 y 2 over sets X = {x 1}, Y = {y 1, y 2} [ ( ) ] Q = π R() π T () S(, ) Construct a database D such that Ψ becomes the grounding of Q wrt D: S(a, b, ): Clause a has variable b in Ψ. R(a, ) and T (b, b): a is a clause and b is a variable in Ψ. R T S T S π (T S) R π (T S) 1 2 x 1 x 1 y 1 y 1 y 2 y 2 1 x 1 1 y 1 2 x 1 2 y 2 1 x 1 x 1 1 y 1 y 1 2 x 1 x 1 2 y 2 y 2 1 x 1 y 1 2 x 1 y 2 1 x 1 y 1 2 x 1 y 2 This query is already hard when T is the only probabilistic input relation! 14 / 28

More Involved Case Input formula and query: Ψ = x 1 y 1 x 1 y 2 x 2 y 1 over sets X = {x 1, x 2}, Y = {y 1, y 2} ] Q = π [S(, ) R() T () We need a different reduction gadget: Use additional random variables Z = {z 1,..., z E }, one per clause in Ψ = ψ 1 ψ E. Construct a database D such that the grounding of Q wrt D is Υ = [ E i=1 z ] E i ψ i = i=1 (z i ψ i ). R x 1 x 1 x 2 x 2 T y 1 x 1 y 2 y 2 S x 1 y 1 z 1 x 1 y 2 z 2 x 2 y 1 z 3 S R T x 1 y 1 z 1 (x 1 y 1 ) x 1 y 2 z 2 (x 1 y 2 ) x 2 y 1 z 3 (x 2 y 1 ) π [ S R T ] () E i=1 z i ψ i Compute #Ψ using linearly many calls to the oracle for P Q = 1 P(Υ). 15 / 28

The Small Print (1/2) Ψ = (i,j) E x iy j = ψ 1 ψ E over disjoint variable sets X and Y Let Θ be the set of assignments of variables X Y that satisfy Ψ: #Ψ = 1. θ Θ:θ =Ψ Partition Θ into disjoint sets Θ 0 Θ E, such that θ Θ i if and only if θ satisfies exactly i clauses of Ψ: #Ψ = θ Θ 1 :θ =Ψ 1 + + θ Θ E :θ =Ψ 1 = Θ 1 + + Θ E. Θ 1,..., Θ E can be computed using an oracle for P Υ : E E Υ = z i ψ i or, equivalently Υ = (z i ψ i ) i=1 i=1 16 / 28

The Small Print (2/2) Express the probability of Υ = E i=1 (z i ψ i ) as a function of Θ 1,..., Θ E : Fix the probabilities of variables in X Y to 1/2 and of variables in Z to p z. Then: ( ) ( ) E exactly k clauses exactly k clauses P Υ = P Υ P k=0 of Ψ are satisfied of Ψ are satisfied }{{} p E k z = 1 X + Y E p E k z Θ k 2 k=0 } {{ } 1 X + Y Θ k 2 This is a polynomial in p z of degree E, with coefficients Θ 0,..., Θ E. The coefficients can be derived from E + 1 pairs (p z, P Υ ) using Lagrange s polynomial interpolation formula. E + 1 oracle calls to P Υ suffice to determine #Ψ = E i=0 Θ i. 17 / 28

Hard Query Patterns There are 48 (!) minimal non-hierarchical query patterns. inary trees with leaves,, and and inner nodes or. Some are symmetric and need not be considered separately: and can be exchanged, joins are commutative and associative. Still, many cases left to consider due to the difference operator. P 1.1 P 1.2 P 1.3 P 1.4 P 5.1 P 5.2 P 5.3 P 5.4............ There is a database construction scheme for each pattern. Each non-hierarchical query Q matches a pattern P x.y. 18 / 28

Hard Query Patterns There are 48 (!) minimal non-hierarchical query patterns. inary trees with leaves,, and and inner nodes or. Some are symmetric and need not be considered separately: and can be exchanged, joins are commutative and associative. Still, many cases left to consider due to the difference operator. P 1.1 P 1.2 P 1.3 P 1.4 P 5.1 P 5.2 P 5.3 P 5.4............ There is a database construction scheme for each pattern. Each non-hierarchical query Q matches a pattern P x.y. In the absence of negation, P 1.1 is the only hard pattern to consider! 18 / 28

Non-hierarchical Queries Match Minimal Hard Patterns Each non-hierarchical query Q matches a pattern P x.y: There is a total mapping from P x.y to Q s parse tree that is identity on inner nodes and, preserves ancestor-descendant relationships, maps leaves to relations: to R(); to S(, ); and to T (). π Pattern P 5.3 Query Q X () R() π T () S(, ) The match preserves the grounding of the query pattern: Q and P x.y have the same grounding for any database using our construction scheme. 19 / 28

Outline Probabilistic Databases 101 The Dichotomy The Interesting but Hard Queries The Easy Queries Leftovers 20 / 28

Evaluation of Hierarchical 1R Queries pproach based on knowledge compilation For any database D, the probability P Q(D) of a 1R query Q is the probability P Ψ of Q s grounding Ψ. Compile Ψ into ODD(Ψ) in polynomial time. Compute probability of ODD(Ψ) in time linear in its size. 21 / 28

Evaluation of Hierarchical 1R Queries pproach based on knowledge compilation For any database D, the probability P Q(D) of a 1R query Q is the probability P Ψ of Q s grounding Ψ. Compile Ψ into ODD(Ψ) in polynomial time. Compute probability of ODD(Ψ) in time linear in its size. Distinction from existing tractability results [O. & Huang 2008]: 1R without negation: Grounding formulas are read-once. Read-once formulas admit linear-size ODs. 1R : Grounding formulas are not read-once. They admit ODs of sizes linear in the database size but exponential in the query size. 21 / 28

The Inner Workings From hierarchical 1R to RC-hierarchical -consistent RC : Translate query Q into an equivalent disjunction of disjunction-free existential relational calculus queries Q 1 Q k. k can be very large for queries with projection under difference! RC-hierarchical: For each X (Q ), every relation symbol in Q has variable X. Each of the disjuncts yields a poly-size ODD. -consistent: The nesting order of the quantifiers is the same in Q 1,, Q k. ll ODDs have compatible variable orders and their disjunction is a poly-size ODD. The ODD width grows exponentially with k, its height stays linear in the size of the database. Width = maximum number of edges crossing the section between any two consecutive levels. 22 / 28

Query Evaluation Example (1/3) Consider the following query and tuple-independent database: Q = π [ (R() T () ) ( U() V () ) ] R T U V R T R T U V 1 r 1 2 r 2 1 t 1 2 t 2 1 u 1 2 u 2 1 v 1 2 v 2 1 1 r 1 t 1 1 2 r 1 t 2 2 1 r 2 t 1 2 2 r 2 t 2 1 1 r 1 t 1 (u 1 v 1 ) 1 2 r 1 t 2 (u 1 v 2 ) 2 1 r 2 t 1 (u 2 v 1 ) 2 2 r 2 t 2 (u 2 v 2 ) 23 / 28

Query Evaluation Example (1/3) Consider the following query and tuple-independent database: Q = π [ (R() T () ) ( U() V () ) ] R T U V R T R T U V 1 r 1 2 r 2 1 t 1 2 t 2 1 u 1 2 u 2 1 v 1 2 v 2 1 1 r 1 t 1 1 2 r 1 t 2 2 1 r 2 t 1 2 2 r 2 t 2 1 1 r 1 t 1 (u 1 v 1 ) 1 2 r 1 t 2 (u 1 v 2 ) 2 1 r 2 t 1 (u 2 v 1 ) 2 2 r 2 t 2 (u 2 v 2 ) The grounding of Q is: Ψ = r 1 [ t1 ( u 1 v 1 ) t 2 ( u 1 v 2 ) ] r 2 [ t1 ( u 2 v 1 ) t 2 ( u 2 v 2 ) ]. Variables entangle in Ψ beyond read-once factorization. This is the pivotal intricacy introduced by the difference operator. 23 / 28

Query Evaluation Example (2/3) Translate Q = π [ (R() T () ) ( U() V () ) ] into RC : ( ) ( ) Q RC = R() U() T () R() T () V () }{{}}{{} Q 1 oth Q 1 and Q 2 are RC-hierarchical. Q 1 Q 2 is -consistent: Same order for Q 1 and Q 2. Query grounding: Q 2. Ψ = (r 1 u 1 r 2 u 2 ) (t 1 t 2 ) (r 1 r 2 ) (t 1 v 1 t 2 v 2 ) }{{}}{{} Ψ 1 Ψ 2. oth Ψ 1 and Ψ 2 admit linear-size ODDs. The two ODDs have compatible orders and their disjunction is an ODD whose width is the product of the widths of the two ODDs. 24 / 28

Query Evaluation Example (3/3) Compile grounding formula into ODD: Ψ = (r 1 u 1 r 2 u 2 ) (t 1 t 2 ) (r 1 r 2 ) (t 1 v 1 t 2 v 2 ) }{{}}{{} Ψ 1 Ψ 2. r 1 r 1 r 1 u 1 u 1 r 2 r 2 r 2 r 2 u 2 u 2 u 2 t 1 t 1 = t 1 t 1 v 1 v 1 t 2 t 2 t 2 t 2 v 2 v 2 Ψ 1 Ψ 2 = Ψ 25 / 28

Outline Probabilistic Databases 101 The Dichotomy The Dichotomy The Interesting but Hard Queries The Interesting but Hard Queries The Easy Queries The Easy Queries Leftovers Leftovers 18 / 20 26 / 28

Dichotomies eyond 1R Some known dichotomies Non-repeating CQ, UCQ [Dalvi & Suciu 2004, 2010] Quantified queries, ranking queries [O.& team 2011, 2012] Non-repeating relational algebra = 1R + union. Hierarchical property not enough, consistency also needed. π [(R() S 1(, ) T () S 2(, )) S(, )] is hard, though it is equivalent to a union of two hierarchical 1R queries. Non-repeating relational calculus S(x, y) R(x) is tractable, S(x, y) (R(x) T (y)) is hard. oth are non-repeating, yet not expressible in 1R. Possible (though expensive) approach: Translate to RC and check RC-hierarchical and -consistency. Full relational algebra (or full relational calculus) It is undecidable whether the union of two equivalent relational algebra queries, one hard and one tractable, is tractable. 27 / 28

Thank you! 28 / 28