Database design and implementation CMPSCI 645. Lectures 11-12: Database Theory

Similar documents
CSE 544: Principles of Database Systems

CS 161: Design and Analysis of Algorithms

GAV-sound with conjunctive queries

Query answering using views

Database Theory VU , SS Complexity of Query Evaluation. Reinhard Pichler

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation

CSCI 1010 Models of Computa3on. Lecture 02 Func3ons and Circuits

Undecidable Problems. Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science May 12, / 65

Definition: Alternating time and space Game Semantics: State of machine determines who

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1

Fundamentos lógicos de bases de datos (Logical foundations of databases)

n CS 160 or CS122 n Sets and Functions n Propositions and Predicates n Inference Rules n Proof Techniques n Program Verification n CS 161

CMPS 277 Principles of Database Systems. Lecture #9

Provenance Semirings. Todd Green Grigoris Karvounarakis Val Tannen. presented by Clemens Ley

Predicate Calculus. Formal Methods in Verification of Computer Systems Jeremy Johnson

Query Languages for Data Exchange: Beyond Unions of Conjunctive Queries

Bounded Query Rewriting Using Views

Outline. Logic. Knowledge bases. Wumpus world characteriza/on. Wumpus World PEAS descrip/on. A simple knowledge- based agent

Parallel-Correctness and Transferability for Conjunctive Queries

The Query Containment Problem: Set Semantics vs. Bag Semantics. Phokion G. Kolaitis University of California Santa Cruz & IBM Research - Almaden

Circuits. Lecture 11 Uniform Circuit Complexity

Foundations of Query Languages

Predicate Calculus. CS 270 Math Foundations of Computer Science Jeremy Johnson

W3203 Discrete Mathema1cs. Logic and Proofs. Spring 2015 Instructor: Ilia Vovsha. hcp://

Computability and Complexity Theory: An Introduction

Essential facts about NP-completeness:

On the Succinctness of Query Rewriting for Datalog

Complete problems for classes in PH, The Polynomial-Time Hierarchy (PH) oracle is like a subroutine, or function in

Pushing Extrema Aggregates to Optimize Logic Queries. Sergio Greco

Logarithmic space. Evgenij Thorstensen V18. Evgenij Thorstensen Logarithmic space V18 1 / 18

Algorithmic Model Theory SS 2016

Lecture 21: Algebraic Computation Models

INAPPROX APPROX PTAS. FPTAS Knapsack P

Some algorithmic improvements for the containment problem of conjunctive queries with negation

Datalog and Constraint Satisfaction with Infinite Templates

DRAFT. Algebraic computation models. Chapter 14

Constraint satisfaction problems and dualities

Rela%ons and Their Proper%es. Slides by A. Bloomfield

Intelligent Agents. Formal Characteristics of Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Finite Model Theory: First-Order Logic on the Class of Finite Models

Limits to Approximability: When Algorithms Won't Help You. Note: Contents of today s lecture won t be on the exam

Definition: Alternating time and space Game Semantics: State of machine determines who

Sums of Products. Pasi Rastas November 15, 2005

CS 320, Fall Dr. Geri Georg, Instructor 320 NP 1

Overview of Topics. Finite Model Theory. Finite Model Theory. Connections to Database Theory. Qing Wang

Notes on Satisfiability-Based Problem Solving. First Order Logic. David Mitchell October 23, 2013

Brief Tutorial on Probabilistic Databases

Tractable Lineages on Treelike Instances: Limits and Extensions

Dualities for Constraint Satisfaction Problems

Lecture #14: NP-Completeness (Chapter 34 Old Edition Chapter 36) Discussion here is from the old edition.

Locally Consistent Transformations and Query Answering in Data Exchange

Notes for Lecture 2. Statement of the PCP Theorem and Constraint Satisfaction

1 PSPACE-Completeness

First&Order&Logics& Logics&and&Databases&

Definite Logic Programs

Querying Big Data by Accessing Small Data

A Novel Stable Model Computation Approach for General Dedcutive Databases

α-acyclic Joins Jef Wijsen May 4, 2017

PROPOSITIONAL LOGIC. VL Logik: WS 2018/19

9. PSPACE 9. PSPACE. PSPACE complexity class quantified satisfiability planning problem PSPACE-complete

CSCI 1010 Models of Computa3on. Lecture 11 Proving Languages NP-Complete

9. PSPACE. PSPACE complexity class quantified satisfiability planning problem PSPACE-complete

Lecture 8: Complete Problems for Other Complexity Classes

Course information. Winter ATFD

De Morgan s a Laws. De Morgan s laws say that. (φ 1 φ 2 ) = φ 1 φ 2, (φ 1 φ 2 ) = φ 1 φ 2.

CS6840: Advanced Complexity Theory Mar 29, Lecturer: Jayalal Sarma M.N. Scribe: Dinesh K.

1 Circuit Complexity. CS 6743 Lecture 15 1 Fall Definitions

Knowledge Base Exchange: The Case of OWL 2 QL

Automata Theory CS Complexity Theory I: Polynomial Time

if t 1,...,t k Terms and P k is a k-ary predicate, then P k (t 1,...,t k ) Formulas (atomic formulas)

Properties of Relational Logic

Notes for Lecture 3... x 4

Advanced DB CHAPTER 5 DATALOG

The DL-Lite Family of Languages A FO Perspective

Chapter 9. PSPACE: A Class of Problems Beyond NP. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013

an efficient procedure for the decision problem. We illustrate this phenomenon for the Satisfiability problem.

Transitive Closure and Recursive Datalog Implemented on Clusters

On Datalog vs. LFP. Anuj Dawar and Stephan Kreutzer

On the Relationship between Consistent Query Answering and Constraint Satisfaction Problems

1.1 P, NP, and NP-complete

Monadic Predicate Logic is Decidable. Boolos et al, Computability and Logic (textbook, 4 th Ed.)

About the relationship between formal logic and complexity classes

Data Exchange with Arithmetic Comparisons

Halting and Equivalence of Program Schemes in Models of Arbitrary Theories

NP and Computational Intractability

Abstract model theory for extensions of modal logic

Logical reasoning - Can we formalise our thought processes?

NP-Complete problems

Introduction. Foundations of Computing Science. Pallab Dasgupta Professor, Dept. of Computer Sc & Engg INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

JASS 06 Report Summary. Circuit Complexity. Konstantin S. Ushakov. May 14, 2006

CS2742 midterm test 2 study sheet. Boolean circuits: Predicate logic:

Introduction to Complexity Theory. Bernhard Häupler. May 2, 2006

Lecture Notes Each circuit agrees with M on inputs of length equal to its index, i.e. n, x {0, 1} n, C n (x) = M(x).

Non-Approximability Results (2 nd part) 1/19

Umans Complexity Theory Lectures

-bit integers are all in ThC. Th The following problems are complete for PSPACE NPSPACE ATIME QSAT, GEOGRAPHY, SUCCINCT REACH.

Transcription:

Database design and implementation CMPSCI 645 Lectures 11-12: Database Theory 1

Theory problems in databases } Expressiveness of languages } Any query in L1 can be expressed in L2 } Complexity of languages } Bounds on resources required to evaluate any query in language L } Sta?c analysis of queries (for op?miza?on) } Given q in L: is it minimal? } Given q1 and q2 in L: are they equivalent? } Views 2

Crash review of complexity classes } AC 0 } Circuits of O(1) depth and polynomial size } L (LOGSPACE) } Solvable in logarithmic (small) space } NL (NLOGSPACE) } "YES" answers checkable in logarithmic space } NC } Solvable efficiently (in polylogarithmic?me) on parallel computers } P (PTIME) } Solvable in polynomial?me } NP } "YES" answers checkable in polynomial?me } PSPACE } Solvable with polynomial memory 3

Rules of thumb } Step 1: check if you can solve the problem in that class } Step 2: if not, check if your problem looks like (is reducible from) the complete problem from the next class. } Of interest: PTIME-complete are not efficiently parallelizable. 4

Query complexity Given a query Q and a database D, what is the complexity of compu?ng Q(D)? } The answer depends on the query language } Rela?onal algebra, calculus, datalog } Design tradeoff: } High complexity à rich queries } Low complexity à implemented efficiently 5

Complexity of query languages Query Q, database D } Data complexity } Fix Q, complexity f(d) } Query complexity } Fix D, complexity f(q) } Combined complexity } Complexity f(d,q) Moshe Vardi 6

Conventions } Complexity is usually defined for a decision problem } We study the complexity of Boolean queries } Complexity usually assumes some encoding of the input } We encode instances using binary representa?on 7

Boolean queries Defini&on: A Boolean query is a query that returns either true or false. Non-Boolean SELECT DISTINCT R.x, S.y FROM R, S WHERE R.z = S.z Q(x,y) :- R(x,z), S(z,y) T(x,y) :- R(x,y) T(x,y) :- T(x,z), R(z,y) Boolean SELECT DISTINCT yes FROM R, S WHERE R.x = a and R.z = S.z and S.y = b Q :- R(x,z), S(z,y) T(x,y) :- R(x,y) T(x,y) :- T(x,z), R(z,y) Answer() :- T( a, b ) 8

Database encoding } Encode D = (D, R 1D,, R kd ) as follows: } Let n = ADom(D) } If R i has arity k, encode it as a string of n k bits: 2 } 0 means element (a 1, a k ) R i D } 1 means element (a 1, a k ) 2 R D i a a b b c b c b c a 0 1 1 0 1 1 1 0 0 9

Data complexity } Fix a Boolean query Q in the query language. Determine the complexity of the following problem: } Given an input database instance D = (D, R 1D,, R kd ) check if Q(D) = true } This is also known as model checking problem: check if D is a model for Q. 10

What is the complexity of relational queries? AC 0 L NL NC PTIME PSPACE all computable 11

Example Q = 9z. R( a,z) S(z, b ) Prove that Q is in AC 0 R: a b c a b c 0 1 1 0 1 1 1 0 0 S: a b c a b c 0 0 1 1 1 0 1 0 1 12

Example Q = 9z. R( a,z) S(z, b ) Prove that Q is in AC 0 Circuit of depth 2 OR has n inputs Each AND has 2 inputs R: a b c a b c 0 1 1 0 1 1 1 0 0 S: a b c a b c 0 0 1 1 1 0 1 0 1 13

What is the complexity of relational queries? All rela?onal queries (expressible in RC) are in AC 0 AC 0 L NL NC PTIME PSPACE all computable 14

What is the complexity of datalog queries? AC 0 L NL NC PTIME T(x,y) :- R(x,y) T(x,y) :- T(x,z), R(z,y) Answer() :- T( a, b ) PSPACE all computable 15

Datalog is not in AC 0 } Parity is not in AC 0 } We will reduce parity to the reachability problem } Given input (x 1, x 2, x 3, x 4, x 5 ) = (0,1,1,0,1) construct the graph: a 1 a 2 a 3 a 4 a 5 a 6 T(x,y) :- R(x,y) T(x,y) :- T(x,z), R(z,y) Answer() :- T( a 1, b 6 ) b 1 b 2 b 3 b 4 b 5 b 6 The # of 1s is odd iff Answer is true 16

Datalog is in PTIME } Fix any Boolean datalog program P. } Given D, check if P(D) = true is in PTIME } Proof argument: } If an IDB has arity k, then it will reach its fixpoint in at most n k itera?ons. 17

Conjunctive Queries (CQ) } A subset of FO (first order) } Less expressive } Many queries in prac?ce are conjunc?ve } Some op?mizers only handle CQs } Break larger queries into many CQs } CQs have beler theore?cal proper?es than arbitrary queries 18

Conjunctive Queries if variables subgoals P(x,z) :- R(x,y) & R(y,z) head body implicit 9 conjunc?on } R: Extensional database (EDB) stored } P: Inten?onal database (IDB) computed 19

Conjunctive Queries if variables subgoals P(x,z) :- R(x,y) & R(y,z) head body implicit 9 conjunc?on } When facts in the body are true, we infer the head } Consider all possible assignments of variables in the body 20

Conjunctive queries } A single datalog rule } Equivalent to SELECT-DISTINCT-FROM-WHERE } Select/project/join in RA } Existen?al/conjunc?ve frac?on of RC Strictly speaking, we are not allowed to have non-equality selec?on predicates 21

Example } Find all employees having the same manager as Smith A(x) :- ManagedBy( Smith,y) & ManagedBy(x,y) SELECT DISTINCT m2.name FROM ManagedBy m1, ManagedBy m2 WHERE m1.name = Smith AND m1.manager = m2.manager 22

Properties of CQ } Sa?sfiability } A query is sa?sfiable if there exists some input rela?on R such that q(r) is non-empty } Every CQ is sa?sfiable } Monotonicity } A query is monotonic if for each instance I, J over the schema, I J implies q(i) q(j) } Every CQ is monotonic 23

Satisfiability We can always generate sa?sfying EDB rela?ons from the body of the rule S(x,y,z) :- P(x,w) & R(w,y,v) & P(v,z) a b b c d d e S a c e P a b R b c d d e 24

Monotonicity ans(u):- R 1 (u 1 ) & & R n (u n ) } Consider 2 databases I, J, s.t. I J } Let t q(i) 2 } For some subs?tu?on v: } v(u i ) 2 I(R i ) for each i. } t = v(u) } Since I J, v(u i ) 2J(R i ) for each i. } So t q(j) 2 25

Consequence of monotonicity Product (pname, price, cid) Company (cid, cname, city) Q: Find all companies that make only products with price < 100! SELECT!DISTINCT C.cname! FROM!Company C! WHERE!100 > ALL!(SELECT price!!! FROM Product P!!! WHERE P.cid = C.cid)! } This query is not monotone } Therefore, not CQ } It cannot be expressed as a simple SFW query 26

Equivalence and containment } Needed for a variety of sta?c analysis tasks } Query op?miza?on } Query rewri?ng using views } Tes?ng for semijoin reduc?ons 27

Query equivalence Defini&on: Queries q1 and q2 are equivalent if for every database D, q1(d) = q2(d) Nota?on: q1 q2 28

Query containment Defini&on: Query q1 is contained in q2 if for every database D, q1(d) q2(d) Nota?on: q1 q2 Fact: q1 q2 and q2 q1 iff q1 q2 For the case of Boolean queries, containment is logical implica?on 29

Examples Is q1 q2? Yes q1(x) :- R(x,u), R(u, Smith ) q2(x) :- R(x,u), R(u,v) 30

Examples Is q1 q2? Yes q1(x) :- R(x,u), R(u,v), R(v,w) q2(x) :- R(x,u), R(u,v) x u v w x u v 31

Examples Is q1 q2? No q1(x) :- R(x,u), R(u,v), R(v,x) q2(x) :- R(x,u), R(u,x) x u v x u 32

Examples Is q1 q2? Yes q1(x) :- R(x,u), R(u,y) q2(x) :- R(x,u), R(v,u), R(u,y) x u y x u v y 33

Examples Is q1 q2? Yes q1(x) :- R(x,u), R(u,v) q2(x) :- R(x,u), R(x,y), R(u,v), R(u,w) x u v x u v y w 34

Examples Is q1 q2? Yes q1(x) :- R(x,u), R(u,u) q2(x) :- R(x,u), R(u,v), R(v,w) x x u v w u 35

Query containment Theorem: The query containment and query equivalence problems for CQ are NP-complete. Theorem: The query containment and query equivalence problems for Rela?onal Calculus are undecidable. 36

Query containment for CQ } Two ways to test } Check if q2 holds on the canonical database of q1 } Check if there exists a homomorphism q2 à q1 37

Canonical database } Canonical database for q1 is D = (D, R 1D,, R kd ) } D: all variables and constants in q1 } R 1D,, R kd : the body of q1 } Canonical tuple t q1 is the head of q1 38

Example } Canonical database D = (D, R D ) } D = {x,y,u,v} } R D = x v v q1(x,y) :- R(x,u), R(v,u), R(v,y) u u y } Canonical tuple t q1 = (x,y) 39

Example } Canonical database D = (D, R) } D = {x,u, Smith, Fred } } R= q1(x) :- R(x,u), R(u, Smith ), R(u, Fred ), R(u,u) x u u u } Canonical tuple t q1 = (x) u Smith Fred u 40

Checking containment using the canonical database } D={x,y,u,v} } R = x q1(x,y) :- R(x,u), R(v,u), R(v,y) q2(x,y) :- R(x,u), R(v,u), R(v,w), R(t,w), R(t,y) v v u u y q1 is contained in q2 41

Query homomorphisms } A homomorphism f: q2 à q1 is a func?on f:var(q2) à var(q1) U const(q1), such that: } f(body(q2)) body(q1) } f(t q1 ) = t q2 42

Example var(q1) = {x,u,v,y} var(q2) = {x,u,v,w,t,y} q1(x,y) :- R(x,u), R(v,u), R(v,y) q2(x,y) :- R(x,u), R(v,u), R(v,w), R(t,w), R(t,y) q1 is contained in q2 43

Example var(q1) U const(q1) = {x,u, Smith } var(q2) = {x,u,v,w} q1(x) :- R(x,u), R(u, Smith ), R(u, Fred ), R(u,u) q2(x) :- R(x,u), R(u,v), R(u, Smith ), R(w,u) q1 is contained in q2 44

Complexity Theorem: Checking containment of two CQs is NP-complete. =( X 3 _ X 1 _ X 4 ) ^ (X 1 _ X 2 _ X 3 ) ^ ( X 2 _ X 3 _ X 1 ) Proof: Reduc?on from 3-SAT Given a 3CNF Φ Step 1: construct q1 independently of Φ Step 2: construct q2 from Φ Prove: there exists a homomorphism q2àq1 iff Φ is sa?sfiable 45

Proof: Step 1 Construc?ng q1 There are 4 types of clauses in every 3SAT: Type 1: X _ Y _ Z Type 2: X _ Y _ Z Type 3: X _ Y _ Z Type 4: X _ Y _ Z For each type, q1 contains one rela?on with all 7 sa?sfying assignments, where u=0, v=1 R1 (misses v,v,v) R2 (misses v,v,u) R3 (misses v,u,u) R4 (misses u,u,u) u u u u u u u u u u u v u u v u u v u u v u v u u v u u v u u v u u v v u v v u v v u v v v u u v u u v u u v u v v u v v u v v u v v v u v v u v v u v v v v v v v v v 46

Proof: Step 2 Construc?ng q2 q2 has one atom for each clause is Φ: Rela?on name is R1, or R2, or R3, or R4 The variables are the same as those in the clause Example: =( X 3 _ X 1 _ X 4 ) ^ (X 1 _ X 2 _ X 3 ) ^ ( X 2 _ X 3 _ X 1 ) q2 = R2(x 3,x 1,x 4 ), R4(x 1,x 2,x 3 ), R2(x 2,x 3,x 1 ) 47

Proof } Suppose there is a sa?sfying assignment for Φ: it maps each X i to either 0 or 1 } Define func?on f: Vars(q2)àVars(q1): } If X i = 0 then f(x i ) = u } If X i = 1 then f(x i ) = v } Then f is a homomorphism f: q2àq1 } Suppose there exists a homomorphism f: q2àq1 } Define the assignment: } If f(x i ) = u then X i = 0 } If f(xi) = v then X i = 1 } This is a sa?sfying assignment for Φ 48

Beyond CQ } Containment for arbitrary rela?onal queries is undecidable } Any sta?c analysis on rela?onal queries is undecidable } All these results follow from Trakhtenbrot s theorem 49

Trakhtenbrot s theorem Defini&on: A sentence φ is called finitely sa?sfiable if there exists a finite database instance D s.t. D = φ Theorem: The following problem is undecidable: Given FO sentence φ, check if φ is finitely sa?sfiable. 50

Query containment for UCQ q 1 [ q 2 [ q 3... q 0 1 [ q 0 2 [ q 0 3... Note: q 1 [ q 2 [ q 3... q q 1 q q 2 q iff and and Theorem: q q1 0 [ q2 0 [ q3 0... that q qk 0 iff there exists some k such 51

Query minimization Defini&on: A conjunc?ve query q is minimal, if for every other query q such that q q, q has at least as many predicates (subgoals) as q Are these queries minimal? q(x) :- R(x,y), R(y,z), R(x,x) q(x) :- R(x,y), R(y,z), R(x, Alice ) 52

Query minimization } Algorithm: } Choose a subgoal g of q } Remove g: let q be the new query } q q Why? } If q q, then permanently remove g } The order in which we inspect subgoals doesn t maler 53

In practice } No database system performs minimiza?on } It s hard } Users usually write minimal queries } Non-minimal queries arise when using views intensely 54

Remember Semijoins? R S = Π A1,,An (R S) R S = (R S) S Prove that the following two datalog queries are equivalent: q1(x,y,z) :- R(x,y), S(x,z) R1(x,y) :- R(x,y), S(x,z) q2(x,y,z) :- R1(x,y), S(x,z) q1(x,y,z) :- R(x,y), S(x,z) q1(x,y,z) :- R(x,y), S(x,z) q2(x,y,z) :- R(x,y), S(x,u), S(x,z) q2(x,y,z) :- R(x,y), S(x,u), S(x,z) 55

Semijoins } Important in distributed databases } Owen combined with Bloom filters } See 22.10.2 in the textbook 56

Semijoin Reducer } Given a query: Q = R1 R2 Rn } A semijoin reducer for q is: } R i1 = R i1 R j1 } R i2 = R i2 R j2 } } R ip = R ip R jp } Such that the query is equivalent to } Q = R k1 R k2 R kn } In a full reducer, no dangling tuples remain 57

Example } Q = R(A,B) S(B,C) } Semijoin reducer: R1(A,B) = R(A,B) S(B,C) } Re-wrilen query: Q = R1(A,B) S(B,C) Are there any dangling tuples? 58

Example } Q = R(A,B) S(B,C) } Full semijoin reducer: R1(A,B) = R(A,B) S(B,C) S1(B,C) = S(B,C) R1(A,B) } Re-wrilen query: Q = R1(A,B) S1(B,C) No more dangling tuples 59

Example } More complex: Q = R(A,B) S(B,C) T(C,D,E) } Full reducer: S (B,C) = S(B,C) R(A,B) T (C,D,E) = T(C,D,E) S (B,C) S (B,C) = S (B,C) T (C,D,E) R (A,B) = R(A,B) S (B,C) } Q = R (A,B) S (B,C) T (C,D,E) 60

Semijoin Reducer } Example: Q = R(A,B) S(B,C) T(C,A) } No full reducer Theorem: A query has a full reducer iff it is acyclic. 61

Expressive power of FO } Let R(x,y) represent a graph } Query path(x,y) = } All x,y such that there is a path from x to y } Theorem: path(x,y) cannot be expressed in FO 62

Non-recursive rules } Graph R(x,y) P(x,y) :- R(x,u), R(u,v), R(v,y) A(x,y) :- P(x,u), P(u,y) } Can unfold into: A(x,y) :- R(x,u), R(u,v), R(v,w), R(w,m), R(m,n), R(n,y) 63

Non-recursive datalog with negation } Expresses FO queries } Negated subgoals } Implicit union } Can evaluate in an order such that all body predicates have been evaluated. 64

Recursion Two forms of transi?ve closure Path(x,y) :- R(x,y) Path(x,y) :- Path(x,u), R(u,y) Path(x,y) :- R(x,y) Path(x,y) :- Path(x,u), Path(u,y) 65

Recursion example } EDB Par(c,p) = p is parent of c } Generalized cousins: people with common ancestors one or more genera?ons back Sib(x,y) :- Par(x,p), Par(y,p), x y Cousin(x,y) :- Sib(x,y) Cousin(x,y) :- Par(x,xp), Par(y,yp), Cousin(xp,yp) 66

Definition of recursion } Form a dependency graph whose nodes are IDB predicates } Connect XàY iff there is a rule with X in the head and Y in the body } Cycle = recursion; no cycle = no recursion 67

Meaning of datalog rules } Model-theore?c } Rules define a set of sa?sfying rela?ons } Whenever body is true, head is true } Proof-theore?c } Set of facts derivable from EDB rela?ons by applying the rules. 68

Evaluating recursive rules } This works if there is no nega?on } Start with all IDB rela?ons empty } Repeatedly evaluate the rules using the EDB and the previous IDB to get the new IDB } End when there is no change in the IDB rela?ons 69

Naïve evaluation algorithm Start: IDB = ; Apply rules to IDB, EDB yes Change to IDB? no done 70

Semi-naïve evaluation } Since the EDB never changes, on each round we only get new IDB tuples if we use at least one IDB tuple obtained in the previous round } Saves work; lets us avoid re-discovering most known facts } Though a fact can s?ll be derived in more than one way 71

Par data: parent above child Round 1 a d Round 2 b c e Round 3 f g h j k i 72

Recursion + negation } Naïve evalua?on doesn t work when there are negated subgoals } Nega?on wrapped in recursion makes no sense in general } Even when they are separate, we can have ambiguity about the correct IDB rela?ons 73

Stratified negation } Stra?fica?on is a constraint usually placed on datalog with recursion and nega?on } It rules out nega?on wrapped inside recursion 74

Example P(x) :- R(x) & Q(x) :- R(x) & Q(x) P(x) } Suppose R = {(1)} } Two models sa?sfy the rules: } P = { }, Q={1} } P={1}, Q={ } 75

Stratum } Intui?vely, the stratum of an IDB predicate P is the maximum number of nega?ons that can be applied to an IDB predicate used in evalua?ng P } Stra?fied nega?on = finite strata 76

Stratum graph } Nodes = IDB predicates } Connect Aà B if predicate A depends on B } Label the edge - if the B subgoal is negated } The stratum is the maximum number of - edges on a path leading from that node } A datalog program is stra?fied if all its IDB predicates have finite strata 77

Example P(x) :- R(x) & Q(x) :- R(x) & Q(x) P(x) } Not stra?fied P - - Q 78

The stratified model } When a datalog program is stra?fied, we can evaluate IDB predicates lowest-stratum-first } Once evaluated, treat it as EDB for higher strata 79

Summary } Query complexity } Conjunc?ve queries } Containment, equivalence, minimality } Semijoin reduc?ons } Recursive datalog 80