Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog

Size: px
Start display at page:

Download "Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog"

Transcription

1 Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Hung Q. Ngo (Stealth Mode) With Mahmoud Abo Khamis and Dan PODS 2017

2 Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

3 Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

4 Exciting Recent Results on Query Evaluation

5 A Query Evaluation Problem Hypergraph H = ([n], E), E 2 [n]

6 A Query Evaluation Problem Hypergraph H = ([n], E), E 2 [n] Attribute A i, i [n], A J := {A j j J}, J [n]

7 A Query Evaluation Problem Hypergraph H = ([n], E), E 2 [n] Attribute A i, i [n], A J := {A j j J}, J [n] Relation R F (A F ) for each F E, R F for short e.g. R 123 for R 123 (A 1, A 2, A 3 ) e.g. T 25 for T 25 (A 2, A 5 )

8 A Query Evaluation Problem Hypergraph H = ([n], E), E 2 [n] Attribute A i, i [n], A J := {A j j J}, J [n] Relation R F (A F ) for each F E, R F for short e.g. R 123 for R 123 (A 1, A 2, A 3 ) e.g. T 25 for T 25 (A 2, A 5 )

9 A Query Evaluation Problem Hypergraph H = ([n], E), E 2 [n] Attribute A i, i [n], A J := {A j j J}, J [n] Relation R F (A F ) for each F E, R F for short e.g. R 123 for R 123 (A 1, A 2, A 3 ) e.g. T 25 for T 25 (A 2, A 5 ) Problem (Boolean Conjunctive Query (BCQ)) Q : S() F E R F, // can all R F be satisfied at once?

10 A Query Evaluation Problem Hypergraph H = ([n], E), E 2 [n] Attribute A i, i [n], A J := {A j j J}, J [n] Relation R F (A F ) for each F E, R F for short e.g. R 123 for R 123 (A 1, A 2, A 3 ) e.g. T 25 for T 25 (A 2, A 5 ) Problem (Boolean Conjunctive Query (BCQ)) Q : S() F E R F, // can all R F be satisfied at once? Question How do we evaluate Q efficiently?

11 A Query Evaluation Problem Hypergraph H = ([n], E), E 2 [n] Attribute A i, i [n], A J := {A j j J}, J [n] Relation R F (A F ) for each F E, R F for short e.g. R 123 for R 123 (A 1, A 2, A 3 ) e.g. T 25 for T 25 (A 2, A 5 ) Problem (Boolean Conjunctive Query (BCQ)) Q : S() F E R F, // can all R F be satisfied at once? Question How do we evaluate Q efficiently? Conjunctive, count, aggregate queries are fine too.

12 What do we mean by efficiency? Database centric complexity framework Assumption 1: query size data size Data complexity Fixed parameter tractability (e.g. parameter = some function of query size) Õ(something) = O (f( query ) polylog( data ) something)

13 What do we mean by efficiency? Database centric complexity framework Assumption 1: query size data size Data complexity Fixed parameter tractability (e.g. parameter = some function of query size) Õ(something) = O (f( query ) polylog( data ) something) Assumption 2: known constraints on input relations Cardinalities of materialized relations, or upper bounds cardinality constraints (CC) Functional dependencies, the more the merrier FD constraints (FDC) Degree bounds, the more the merrier Degree constraints (DC)

14 Example Q : S() R 12 R 23 R 34 R 41 A 1 + A 2 = A 3. H = ([4], {12, 23, 34, 14, 123}) A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 e 6 e 7 A 4 A Cardinalities: R 12 = 4, R 23 = 3,...

15 Example Q : S() R 12 R 23 R 34 R 41 A 1 + A 2 = A 3. H = ([4], {12, 23, 34, 14, 123}) A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 e 6 e 7 A 4 A Cardinalities: R 12 = 4, R 23 = 3,... FD: {A 1, A 2 } A 3, {A 3, A 2 } A 1,...

16 Example Q : S() R 12 R 23 R 34 R 41 A 1 + A 2 = A 3. H = ([4], {12, 23, 34, 14, 123}) A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 e 6 e 7 A 4 A Cardinalities: R 12 = 4, R 23 = 3,... FD: {A 1, A 2 } A 3, {A 3, A 2 } A 1,... Degree Bounds: deg 34 (A 4 A 3 = x) def = σ A3=x(R 34 ) 2, x,...

17 Degree- generalize cardinality- and FD-constraints DC CC FDC Degree Constraints (DC): deg F (A Y A X ) N Y X, X Y F E

18 Degree- generalize cardinality- and FD-constraints DC CC FDC Degree Constraints (DC): deg F (A Y A X ) N Y X, X Y F E Cardinality Constraints (CC): R F N deg F (A F A ) N F def = N.

19 Degree- generalize cardinality- and FD-constraints DC CC FDC Degree Constraints (DC): deg F (A Y A X ) N Y X, X Y F E Cardinality Constraints (CC): R F N deg F (A F A ) N F def = N. Functional Dependencies (FDC): A X A Y deg F (A Y A X ) N Y X def = 1.

20 Islands of tractability (redrawn from D. Marx s slides) Prior results with cardinality constraints not FPT FPT PTIME Bounded submodular width Bounded fractional hypertree width Bounded (generalized) Hypertree Width Bounded fractional edge cover number Bounded Treewidth

21 Islands of tractability (redrawn from D. Marx s slides) Prior results with cardinality constraints not FPT FPT PTIME Bounded submodular width Bounded fractional hypertree width Bounded (generalized) Hypertree Width Bounded fractional edge cover number Bounded Treewidth We want the same map with degree constraints.

22 A Meta Algorithm (i.e. Meta Query Plan) Database D F E R F itime Datalog rule Q i Q i (D) A propositional formula on relations T j, j J Q o otime overall time = Õ( input + itime + otime ) Output Answer

23 A Meta Algorithm (i.e. Meta Query Plan) Database D F E R F itime Datalog rule Q i Q i (D) A propositional formula on relations T j, j J Q o otime overall time = Õ( input + itime + otime ) Output Answer (a) otime = Q i (D) + answer

24 A Meta Algorithm (i.e. Meta Query Plan) Database D F E R F itime Datalog rule Q i Q i (D) A propositional formula on relations T j, j J Q o otime overall time = Õ( input + itime + otime ) Output Answer (a) otime = Q i (D) + answer i.e. Q o evaluatable in linear time

25 A Meta Algorithm (i.e. Meta Query Plan) Database D F E R F itime Datalog rule Q i Q i (D) A propositional formula on relations T j, j J Q o otime overall time = Õ( input + itime + otime ) Output Answer (a) otime = Q i (D) + answer i.e. Q o evaluatable in linear time (b) itime = max D =DC Q i(d)

26 A Meta Algorithm (i.e. Meta Query Plan) Database D F E R F itime Datalog rule Q i Q i (D) A propositional formula on relations T j, j J Q o otime overall time = Õ( input + itime + otime ) Output Answer (a) otime = Q i (D) + answer i.e. Q o evaluatable in linear time (b) itime = max D =DC Q i(d) i.e. Qi evaluatable within its worst-case output size

27 A Meta Algorithm (i.e. Meta Query Plan) Database D F E R F itime Datalog rule Q i Q i (D) A propositional formula on relations T j, j J Q o otime overall time = Õ( input + itime + otime ) Output Answer (a) otime = Q i (D) + answer i.e. Q o evaluatable in linear time (b) itime = max D =DC Q i(d) i.e. Qi evaluatable within its worst-case output size Design Q i s.t. (a) holds and (b) as small as possible

28 Option 1: Full Conjunctive Query Full conjunctive query Database D Q i Q i (D) Q o Output F E R F itime T [n] otime Answer

29 Option 1: Full Conjunctive Query Full conjunctive query Database D Q i Q i (D) Q o Output F E R F itime T [n] otime Answer otime = Q i (D) + answer trivially true

30 Option 1: Full Conjunctive Query Database D F E R F Full conjunctive query itime Q i Q i (D) T [n] Q o otime Output Answer otime = Q i (D) + answer itime = max Q i(d) D =DC trivially true worst-case optimal algorithm Known if all DC are cardinality constraints NPRR [Ngo, Porat, Ré, Rudra PODS 12] Leapfrog-Triejoin [Veldhuizen ICDT 14] Generic Join [Ngo, Ré, Rudra SIGMOD Records 2013] Unknown for general DC until our work

31 Detour: Tree Decompositions, Informally

32 Detour: Tree Decompositions, Informally S() R(a, b, d) c < d T (c, b, d) U(b, e) V (c, e) b + e = f W (b, e, g) X(i, j, h) e b = k.

33 Detour: Tree Decompositions, Informally S() R(a, b, d) c < d T (c, b, d) U(b, e) V (c, e) b + e = f W (b, e, g) X(i, j, h) e b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag

34 Detour: Tree Decompositions, Informally S() R(a, b, d) c < d T (c, b, d) U(b, e) V (c, e) b + e = f W (b, e, g) X(i, j, h) e b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag Every relation is covered by some bag

35 Detour: Tree Decompositions, Informally S() R(a, b, d) c < d T (c, b, d) U(b, e) V (c, e) b + e = f W (b, e, g) g/f = h X(i, j, h) e b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag Every relation is covered by some bag

36 Detour: Tree Decompositions, Informally S() R(a, b, d) c < d T (c, b, d) U(b, e) V (c, e) b + e = f W (b, e, g) X(i, j, h) e b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag Every relation is covered by some bag Bags conntaining a given variable are connected

37 Detour: Tree Decompositions, Formally Hypergraph H = ([n], E) A Tree Decomposition of H is a pair (T, χ) where T = (V (T ), E(T )) is a tree See [Gottlob et al 2016], Gems of PODS.

38 Detour: Tree Decompositions, Formally Hypergraph H = ([n], E) A Tree Decomposition of H is a pair (T, χ) where T = (V (T ), E(T )) is a tree χ : V (T ) 2 [n] assigns a bag χ(v) to each tree-node v Every hyperedge F E is covered by some bag (F χ(v)) Bags containing i [n] forms a subtree See [Gottlob et al 2016], Gems of PODS.

39 Option 2: A Single Tree Decomposition Fix (T, χ) Database D Multiple Conjunctive Rules itime R F T χ(v) F E v V (T ) Q i Q i (D) Q o otime Output Answer

40 Option 2: A Single Tree Decomposition Fix (T, χ) Database D Multiple Conjunctive Rules itime R F T χ(v) F E v V (T ) Q i Q i (D) Q o otime Output Answer Q i (D) is Olteanu s factorized database (FDB)

41 Option 2: A Single Tree Decomposition Fix (T, χ) Database D Multiple Conjunctive Rules itime R F T χ(v) F E v V (T ) Q i Q i (D) Q o otime Output Answer Q i (D) is Olteanu s factorized database (FDB) otime = Q i (D) + answer Yannakakis for join, FDB/InsideOut for aggregates

42 Option 2: A Single Tree Decomposition Fix (T, χ) Database D Multiple Conjunctive Rules itime R F T χ(v) F E v V (T ) Q i Q i (D) Q o otime Output Answer Q i (D) is Olteanu s factorized database (FDB) otime = Q i (D) + answer Yannakakis for join, FDB/InsideOut for aggregates itime = max Q i(d) max max P v(d) D =DC D =DC v V (T ) Pv : T χ(v) R F v V (T ) F E

43 Option 2: A Single Tree Decomposition Fix (T, χ) Database D Multiple Conjunctive Rules itime R F T χ(v) F E v V (T ) Q i Q i (D) Q o otime Output Answer Q i (D) is Olteanu s factorized database (FDB) otime = Q i (D) + answer Yannakakis for join, FDB/InsideOut for aggregates itime = max Q i(d) max max P v(d) D =DC D =DC v V (T ) Pv : T χ(v) R F v V (T ) F E

44 Option 2: A Single Tree Decomposition Fix (T, χ) Database D Multiple Conjunctive Rules itime R F T χ(v) F E v V (T ) Q i Q i (D) Q o otime Output Answer Q i (D) is Olteanu s factorized database (FDB) otime = Q i (D) + answer Yannakakis for join, FDB/InsideOut for aggregates itime = max Q i(d) max max P v(d) D =DC D =DC v V (T ) Pv : T χ(v) R F v V (T ) F E min max max P v(d) N fhtw(h) N ghtw(h) N tw(h)+1 (T,χ) D =CC v V (T )

45 Option 3: Multiple Tree Decompositions Database D Q i Q i (D) itime R F T χ(v) F E (T,χ) v V (T ) Q o otime Output Answer ranges over non-redundant TDs (T, χ)

46 Option 3: Multiple Tree Decompositions Database D Q i Q i (D) itime R F T χ(v) F E (T,χ) v V (T ) Q o otime Output Answer ranges over non-redundant TDs (T, χ) otime = Q i (D) + answer Union of Yannakakis on all TDs

47 Option 3: Multiple Tree Decompositions Database D How to evaluate this? Q i Q i (D) itime R F T χ(v) F E (T,χ) v V (T ) Q o otime Output Answer ranges over non-redundant TDs (T, χ) otime = Q i (D) + answer Union of Yannakakis on all TDs itime = max D =DC Q i(d)?

48 Example: Q : S() R 12 R 23 R 34 R 41 A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34

49 Example: Q : S() R 12 R 23 R 34 R 41 A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 A 3 A 1 A 4 A 3 A 1 A 2 A 4 A 2 A 4 A 3

50 Example: Q : S() R 12 R 23 R 34 R 41 A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 A 3 A 1 A 4 A 3 A 1 A 2 A 4 A 2 A 4 A 3 (T 123 T 134 ) (T 124 T 234 ) R 12 R 23 R 34 R 41

51 Example: Q : S() R 12 R 23 R 34 R 41 A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 A 3 A 1 A 4 A 3 A 1 A 2 A 4 A 2 A 4 A 3 (T 123 T 134 ) (T 124 T 234 ) R 12 R 23 R 34 R 41 By distributivity, rewrite an equivalent the head: (T 123 T 124 ) (T 123 T 234 ) (T 134 T 124 ) (T 134 T 234 ) R 12 R 23 R 34 R 41 (Each clause has one bag per TD)

52 Option 3: Multiple Tree Decompositions Database D Multiple Disjunctive Datalog Rules! itime R F T B F E B B B Q i Q i (D) Q o otime Output Answer otime = Q i (D) + answer Union of Yannakakis on all TDs itime = max Q i(d) max D =DC B PB : T B B B F E R F D =DC max P B(D) disjunctive datalog rule

53 Option 3: Multiple Tree Decompositions Database D Multiple Disjunctive Datalog Rules! itime R F T B F E B B B Q i Q i (D) Q o otime Output Answer otime = Q i (D) + answer Union of Yannakakis on all TDs itime = max Q i(d) max D =DC B PB : T B B B F E R F D =DC max P B(D) disjunctive datalog rule max max P B(D) N subw(h) N fhtw(h) D =CC B subw = submodular width (Daniel Marx, JACM 2013)

54 Example: Q : S() R 12 R 23 R 34 R 41 Option 1: Q i : T 1234 R 12 R 23 R 34 R 41

55 Example: Q : S() R 12 R 23 R 34 R 41 Option 1: Option 2: Q i : T 1234 R 12 R 23 R 34 R 41 either Q i : T 123 T 134 R 12 R 23 R 34 R 41 or Q i : T 124 T 234 R 12 R 23 R 34 R 41

56 Example: Q : S() R 12 R 23 R 34 R 41 Option 1: Option 2: Q i : T 1234 R 12 R 23 R 34 R 41 Option 3: either Q i : T 123 T 134 R 12 R 23 R 34 R 41 or Q i : T 124 T 234 R 12 R 23 R 34 R 41 Q i : (T 123 T 134 ) (T 124 T 234 ) R 12 R 23 R 34 R 41 Equivalent to: P 123,124 : T 123 T 124 R 12 R 23 R 34 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41 P 134,124 : T 134 T 124 R 12 R 23 R 34 R 41 P 134,234 : T 134 T 234 R 12 R 23 R 34 R 41

57 Example: Q : S() R 12 R 23 R 34 R 41 Option 1: itime = max D =DC Q i(d) = N 2 Q i : T 1234 R 12 R 23 R 34 R 41 Option 2: Option 3: either Q i : T 123 T 134 R 12 R 23 R 34 R 41 or Q i : T 124 T 234 R 12 R 23 R 34 R 41 Q i : (T 123 T 134 ) (T 124 T 234 ) R 12 R 23 R 34 R 41 Equivalent to: P 123,124 : T 123 T 124 R 12 R 23 R 34 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41 P 134,124 : T 134 T 124 R 12 R 23 R 34 R 41 P 134,234 : T 134 T 234 R 12 R 23 R 34 R 41

58 Example: Q : S() R 12 R 23 R 34 R 41 Option 1: itime = max D =DC Q i(d) = N 2 Q i : T 1234 R 12 R 23 R 34 R 41 Option 2: itime = max D =DC Q i(d) = N 2 Option 3: either Q i : T 123 T 134 R 12 R 23 R 34 R 41 or Q i : T 124 T 234 R 12 R 23 R 34 R 41 Q i : (T 123 T 134 ) (T 124 T 234 ) R 12 R 23 R 34 R 41 Equivalent to: P 123,124 : T 123 T 124 R 12 R 23 R 34 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41 P 134,124 : T 134 T 124 R 12 R 23 R 34 R 41 P 134,234 : T 134 T 234 R 12 R 23 R 34 R 41

59 Example: Q : S() R 12 R 23 R 34 R 41 Option 1: itime = max D =DC Q i(d) = N 2 Q i : T 1234 R 12 R 23 R 34 R 41 Option 2: itime = max D =DC Q i(d) = N 2 either Q i : T 123 T 134 R 12 R 23 R 34 R 41 or Q i : T 124 T 234 R 12 R 23 R 34 R 41 Option 3: itime = max D =DC Q i(d) = N 3/2 Q i : (T 123 T 134 ) (T 124 T 234 ) R 12 R 23 R 34 R 41 Equivalent to: P 123,124 : T 123 T 124 R 12 R 23 R 34 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41 P 134,124 : T 134 T 124 R 12 R 23 R 34 R 41 P 134,234 : T 134 T 234 R 12 R 23 R 34 R 41

60 Roadmap Given degree constraints DC, and a disjunctive datalog rule P : B B T B F E R F

61 Roadmap Given degree constraints DC, and a disjunctive datalog rule P : B B T B F E R F Question (Worst-case Output Size Bound) Find a good upper-bound for max P (D) D =DC

62 Roadmap Given degree constraints DC, and a disjunctive datalog rule P : B B T B F E R F Question (Worst-case Output Size Bound) Find a good upper-bound for max P (D) D =DC Question (Algorithm) Design an algorithm evaluating P within the bound.

63 Roadmap Given degree constraints DC, and a disjunctive datalog rule P : B B T B F E R F Question (Worst-case Output Size Bound) Find a good upper-bound for max P (D) D =DC Question (Algorithm) Design an algorithm evaluating P within the bound. Question (Gathering fruits) Plug bound/algorithm into Meta Algorithm, what do we get?

64 Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

65 High-level View of the Bound Given degree constraints DC, a disjunctive datalog rule P : B B T B F E We shall prove bounds of the form R F max log P (D) some function of h D =DC s.t. h is (approximately) entropic and h satisfies degree constraints

66 An Idea From Gottlob-Lee-Valiant-Valiant, JACM 12 Q(A 1, A 2, A 3, A 4 ) R 12 R 23 R 34 R 41. A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34

67 An Idea From Gottlob-Lee-Valiant-Valiant, JACM 12 Q(A 1, A 2, A 3, A 4 ) R 12 R 23 R 34 R 41. A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b

68 An Idea From Gottlob-Lee-Valiant-Valiant, JACM 12 Q(A 1, A 2, A 3, A 4 ) R 12 R 23 R 34 R 41. A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b

69 An Idea From Gottlob-Lee-Valiant-Valiant, JACM 12 Q(A 1, A 2, A 3, A 4 ) R 12 R 23 R 34 R 41. A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b

70 An Idea From Gottlob-Lee-Valiant-Valiant, JACM 12 Q(A 1, A 2, A 3, A 4 ) R 12 R 23 R 34 R 41. A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 a 1 1/4 b 1 2/4 b 2 1/4 A 2 A 3 1 c 1/4 1 d 2/4 2 c 1/4 A 3 A 4 c 3 2/4 d 4 2/4 d 5 0 A 4 A 1 3 b 2/4 4 a 1/4 4 b 1/4

71 An Idea From Gottlob-Lee-Valiant-Valiant, JACM 12 Q(A 1, A 2, A 3, A 4 ) R 12 R 23 R 34 R 41. A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif (A 1 A 2 A 3 A 4 ) = log Q A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 a 1 1/4 b 1 2/4 b 2 1/4 A 2 A 3 1 c 1/4 1 d 2/4 2 c 1/4 A 3 A 4 c 3 2/4 d 4 2/4 d 5 0 A 4 A 1 3 b 2/4 4 a 1/4 4 b 1/4

72 An Idea From Gottlob-Lee-Valiant-Valiant, JACM 12 Q(A 1, A 2, A 3, A 4 ) R 12 R 23 R 34 R 41. A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif (A 1 A 2 A 3 A 4 ) = log Q A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 a 1 1/4 b 1 2/4 b 2 1/4 A 2 A 3 1 c 1/4 1 d 2/4 2 c 1/4 A 3 A 4 c 3 2/4 d 4 2/4 d 5 0 A 4 A 1 3 b 2/4 4 a 1/4 4 b 1/4 H unif (A 1 A 2 ) log R 12, H unif (A 2 A 3 ) log R 23, H unif (A 3 A 4 ) log R 34,...

73 An Idea From Gottlob-Lee-Valiant-Valiant, JACM 12 Q(A 1, A 2, A 3, A 4 ) R 12 R 23 R 34 R 41. A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif (A 1 A 2 A 3 A 4 ) = log Q A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 a 1 1/4 b 1 2/4 b 2 1/4 A 2 A 3 1 c 1/4 1 d 2/4 2 c 1/4 A 3 A 4 c 3 2/4 d 4 2/4 d 5 0 A 4 A 1 3 b 2/4 4 a 1/4 4 b 1/4 H unif (A 1 A 2 ) log R 12, H unif (A 2 A 3 ) log R 23, H unif (A 3 A 4 ) log R 34,... H unif (A 2 A 1 = a ) log σa1 = a R 12, Hunif (A 2 A 1 = b ) log σa1 = b R 12,...

74 An Idea From Gottlob-Lee-Valiant-Valiant, JACM 12 Q(A 1, A 2, A 3, A 4 ) R 12 R 23 R 34 R 41. A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif (A 1 A 2 A 3 A 4 ) = log Q A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 a 1 1/4 b 1 2/4 b 2 1/4 A 2 A 3 1 c 1/4 1 d 2/4 2 c 1/4 A 3 A 4 c 3 2/4 d 4 2/4 d 5 0 A 4 A 1 3 b 2/4 4 a 1/4 4 b 1/4 H unif (A 1 A 2 ) log R 12, H unif (A 2 A 3 ) log R 23, H unif (A 3 A 4 ) log R 34,... H unif (A 2 A 1 = a ) log σa1 = a R 12, Hunif (A 2 A 1 = b ) log σa1 = b R 12,... H unif (A 2 A 1 ) log max σa1 =xr 12 x }{{}

75 An Idea From Gottlob-Lee-Valiant-Valiant, JACM 12 Q(A 1, A 2, A 3, A 4 ) R 12 R 23 R 34 R 41. A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif (A 1 A 2 A 3 A 4 ) = log Q A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 A 1 A 2 a 1 1/4 b 1 2/4 b 2 1/4 A 2 A 3 1 c 1/4 1 d 2/4 2 c 1/4 A 3 A 4 c 3 2/4 d 4 2/4 d 5 0 A 4 A 1 3 b 2/4 4 a 1/4 4 b 1/4 H unif (A 1 A 2 ) log R 12, H unif (A 2 A 3 ) log R 23, H unif (A 3 A 4 ) log R 34,... H unif (A 2 A 1 = a ) log σa1 = a R 12, Hunif (A 2 A 1 = b ) log σa1 = b R 12,... H unif (A 2 A 1 ) log max σa1 =xr 12 x }{{} deg R12 (A 2 A 1 )

76 Entropic Bound for Full Conjunctive Queries Q : T [n] F E R F, and degree constraints DC

77 Entropic Bound for Full Conjunctive Queries Q : T [n] F E R F, and degree constraints DC max log Q(D) sup h([n]) D =DC

78 Entropic Bound for Full Conjunctive Queries Q : T [n] F E R F, and degree constraints DC max log Q(D) sup h([n]) D =DC subject to (whatever H unif satisfies):

79 Entropic Bound for Full Conjunctive Queries Q : T [n] F E R F, and degree constraints DC max log Q(D) sup h([n]) D =DC subject to (whatever H unif satisfies): h is Entropic

80 Entropic Bound for Full Conjunctive Queries Q : T [n] F E R F, and degree constraints DC max log Q(D) sup h([n]) D =DC subject to (whatever H unif satisfies): h is Entropic There is some distribution on A [n] such that h(x) is the marginal entropy on A X, for all X

81 Entropic Bound for Full Conjunctive Queries Q : T [n] F E R F, and degree constraints DC max log Q(D) sup h([n]) D =DC subject to (whatever H unif satisfies): h is Entropic There is some distribution on A [n] such that h(x) is the marginal entropy on A X, for all X h satisfies DC

82 Entropic Bound for Full Conjunctive Queries Q : T [n] F E R F, and degree constraints DC max log Q(D) sup h([n]) D =DC subject to (whatever H unif satisfies): h is Entropic There is some distribution on A [n] such that h(x) is the marginal entropy on A X, for all X h satisfies DC h(y X) def = h(y ) h(x) log N Y X, X Y F E

83 Entropic Bound for Full Conjunctive Queries Q : T [n] F E R F, and degree constraints DC max log Q(D) sup h([n]) D =DC subject to (whatever H unif satisfies): h is Entropic There is some distribution on A [n] such that h(x) is the marginal entropy on A X, for all X h satisfies DC h(y X) def = h(y ) h(x) log N Y X, X Y F E Good Bound, but not computable!

84 Hierarchy of Set Functions h : 2 [n] R +, non-negative, monotone, h( ) = 0 h(x) h(y ) if X Y SA n := {h h is sub-additive} h(x Y ) h(x) + h(y ) Γ n := {h h is submodular} h(x Y ) + h(x Y ) h(x) + h(y ) Γ n: topological closure of Γ n Γ n = {h : h is entropic} M n : Modular h(x) = x x h(x)

85 Bounds for Full Conjunctive Query HDC def = { h h(y X) log N Y X, (X, Y, N Y X ) }

86 Bounds for Full Conjunctive Query HDC def = { h h(y X) log N Y X, (X, Y, N Y X ) } Then, max log Q(D) max h([n]) D =DC h Γ n HDC max h([n]) h Γ n HDC max h([n]) h SA n HDC entropic bound polymatroid bound sub-additive bound.

87 Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound Definition log Q max h Γ n HDC h([n]) log Q max h Γ n HDC h([n])

88 Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound Definition log Q max h Γ n HDC h([n]) log Q max h Γ n HDC h([n]) CC only AGM bound (Tight) [Atserias et al. FOCS 08] AGM bound (Tight) [Atserias et al. FOCS 08]

89 Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound Definition log Q max h Γ n HDC h([n]) log Q max h Γ n HDC h([n]) CC only CC + FD only AGM bound (Tight) AGM bound (Tight) [Atserias et al. FOCS 08] [Atserias et al. FOCS 08] Entropic Bound for FD Polymatroid Bound for FD [Gottlob et al. JACM 12] [Gottlob et al. JACM 12] (Tight [Gogacz et al. ICDT 17]) (Not tight [our work] )

90 Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound Definition log Q max h Γ n HDC h([n]) log Q max h Γ n HDC h([n]) CC only CC + FD only DC AGM bound (Tight) [Atserias et al. FOCS 08] Entropic Bound for FD [Gottlob et al. JACM 12] AGM bound (Tight) [Atserias et al. FOCS 08] Polymatroid Bound for FD [Gottlob et al. JACM 12] (Tight [Gogacz et al. ICDT 17]) (Not tight [our work] ) Entropic Bound for DC Polymatroid Bound for DC (Tight [our work] ) (Not tight [our work] )

91 Disjunctive Datalog: Size Bounds P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B Theorem ( our work ) max log P (D) max D =DC min h(b) h Γ B B n HDC }{{} Entropic bound max min h(b) h Γ n HDC B B }{{} Polymatroid bound Tight Not Tight Imply all known bounds for (Full) Conjunctive Queries!

92 Earlier Example P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B A 2 R 12 R 23 A 1 A 3 R 41 CC : R 12 N, R 23 N, R 34 N, R 41 N. P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34

93 Earlier Example P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B A 2 R 12 R 23 A 1 A 3 R 41 CC : R 12 N, R 23 N, R 34 N, R 41 N. P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 max log P 123,234(D) max min{h(a 1A 2 A 3 ), h(a 2 A 3 A 4 )} D =CC h Γ n CC 1 max h Γ n CC max h Γ n CC 3 log N. 2 2 [h(a 1A 2 A 3 ), h(a 2 A 3 A 4 )] 1 2 [h(a 1A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 )]

94 Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

95 Roadmap Given degree constraints DC and a disjunctive datalog rule P : B B T B F E R F Answer (Worst-case Output Size Bound) max log P (D) max D =DC min h Γ n HDC B B h(b) = polymatroid bound. Question (Algorithm) Compute a model for P within Õ(2polymatroid bound ) Question (Gathering fruits) Plug bound/algorithm into Meta Algorithm, what do we get?

96 Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = (λ B ) B B, with λ 1 = 1, s.t. max min h(b) = h Γ n HDC B B max h Γ n HDC B B λ B h(b) (1)

97 Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = (λ B ) B B, with λ 1 = 1, s.t. max min h(b) = h Γ n HDC B B max h Γ n HDC B B Lemma (Shannon-flow inequality) There exists δ 0 s.t. 2 polymatroid bound = λ B h(b) B B (X,Y,N Y X ) λ B h(b) (1) (X,Y,N Y X ) N δ Y X Y X, and δ Y X h(y X), h Γ n (2)

98 Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = (λ B ) B B, with λ 1 = 1, s.t. max min h(b) = h Γ n HDC B B max h Γ n HDC B B Lemma (Shannon-flow inequality) There exists δ 0 s.t. 2 polymatroid bound = λ B h(b) B B (X,Y,N Y X ) λ B h(b) (1) (X,Y,N Y X ) N δ Y X Y X, and δ Y X h(y X), h Γ n (2) (2) is a (vast) generalization of Shearer s lemma

99 PANDA (Proof-Assisted entropic Degree-Aware) What? Compute a model for our disjunctive datalog rule ( Run within Õ 2 polymatroid bound) = Õ (X,Y,N Y X ) N δ Y X Y X :

100 PANDA (Proof-Assisted entropic Degree-Aware) What? Compute a model for our disjunctive datalog rule ( Run within Õ 2 polymatroid bound) = Õ (X,Y,N Y X ) How? Proof as symbolic instructions Construct a Proof Sequence for the corresponding Shannon-flow inequality Proof steps relational operators. N δ Y X Y X :

101 Proof sequence Shannon-flow inequality: λ B h(b) B B h(y X) def = h(y ) h(x), X Y (X,Y,N Y X ) δ Y X h(y X) Proof sequence, convert RHS to LHS using following steps (In)equality Steps (X Y ) h(x) + h(y X) = h(y ) h(x) + h(y X) h(y ) h(y ) = h(x) + h(y X) h(y ) h(x) h(y ) h(x) + h(y X) h(y ) h(x) h(y X) h(y Z X Z) h(y X) h(y Z X Z)

102 Proof sequence Shannon-flow inequality: λ B h(b) B B h(y X) def = h(y ) h(x), X Y (X,Y,N Y X ) δ Y X h(y X) Proof sequence, convert RHS to LHS using following steps (In)equality Steps (X Y ) h(x) + h(y X) = h(y ) h(x) + h(y X) h(y ) h(y ) = h(x) + h(y X) h(y ) h(x) h(y ) h(x) + h(y X) h(y ) h(x) h(y X) h(y Z X Z) h(y X) h(y Z X Z) Theorem There is a proof sequence for every Shannon-flow inequality. The length is data-independent.

103 Proof Steps As Relational Operators Shannon-flow inequality: λ B h(b) δ Y X h(y X) B B (X,Y,N Y X ) Proof sequence, Steps (X Y ) h(x) + h(y X) h(y ) h(y ) h(x) + h(y X) h(y ) h(x) h(y X) h(y Z X Z) Relational Operator

104 Proof Steps As Relational Operators Shannon-flow inequality: λ B h(b) δ Y X h(y X) B B (X,Y,N Y X ) Proof sequence, Steps (X Y ) h(x) + h(y X) h(y ) h(y ) h(x) + h(y X) h(y ) h(x) h(y X) h(y Z X Z) Relational Operator (join)

105 Proof Steps As Relational Operators Shannon-flow inequality: λ B h(b) δ Y X h(y X) B B (X,Y,N Y X ) Proof sequence, Steps (X Y ) h(x) + h(y X) h(y ) h(y ) h(x) + h(y X) h(y ) h(x) h(y X) h(y Z X Z) Relational Operator (join) (data partition)

106 Proof Steps As Relational Operators Shannon-flow inequality: λ B h(b) δ Y X h(y X) B B (X,Y,N Y X ) Proof sequence, Steps (X Y ) h(x) + h(y X) h(y ) h(y ) h(x) + h(y X) h(y ) h(x) h(y X) h(y Z X Z) Relational Operator (join) (data partition) (projection)

107 Proof Steps As Relational Operators Shannon-flow inequality: λ B h(b) δ Y X h(y X) B B (X,Y,N Y X ) Proof sequence, Steps (X Y ) h(x) + h(y X) h(y ) h(y ) h(x) + h(y X) h(y ) h(x) h(y X) h(y Z X Z) Relational Operator (join) (data partition) (projection) (NOP)

108 Proof Steps As Relational Operators Shannon-flow inequality: λ B h(b) δ Y X h(y X) B B (X,Y,N Y X ) Proof sequence, Steps (X Y ) h(x) + h(y X) h(y ) h(y ) h(x) + h(y X) h(y ) h(x) h(y X) h(y Z X Z) Relational Operator (join) (data partition) (projection) (NOP) Theorem PANDA solves any disjunctive datalog rule P in time Õ(N + poly(log N) 2 polymatroid bound for P )

109 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N

110 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2

111 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints)

112 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) Proof Step

113 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) Proof Step ( h(a3 A 4 ) h(a 4 A 3 ) + h(a 3 ) )

114 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 3 ) + h(a 3 ) Proof Step ( h(a3 A 4 ) h(a 4 A 3 ) + h(a 3 ) )

115 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 3 ) + h(a 3 ) Proof Step ( h(a3 A 4 ) h(a 4 A 3 ) + h(a 3 ) ) ( h(a4 A 3 ) h(a 4 A 2 A 3 ) )

116 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 2 A 3 ) + h(a 3 ) Proof Step ( h(a3 A 4 ) h(a 4 A 3 ) + h(a 3 ) ) ( h(a4 A 3 ) h(a 4 A 2 A 3 ) )

117 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 2 A 3 ) + h(a 3 ) Proof Step ( h(a3 A 4 ) h(a 4 A 3 ) + h(a 3 ) ) ( h(a4 A 3 ) h(a 4 A 2 A 3 ) ) ( h(a2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) )

118 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 A 4 ) + h(a 3 ) Proof Step ( h(a3 A 4 ) h(a 4 A 3 ) + h(a 3 ) ) ( h(a4 A 3 ) h(a 4 A 2 A 3 ) ) ( h(a2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) )

119 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 A 4 ) + h(a 3 ) Proof Step ( h(a3 A 4 ) h(a 4 A 3 ) + h(a 3 ) ) ( h(a4 A 3 ) h(a 4 A 2 A 3 ) ) ( h(a2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) ) ( h(a1 A 2 ) h(a 1 A 2 A 3 ) )

120 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 A 4 ) + h(a 3 ) h(a 1 A 2 A 3 ) + h(a 2 A 3 A 4 ) + h(a 3 ) Proof Step ( h(a3 A 4 ) h(a 4 A 3 ) + h(a 3 ) ) ( h(a4 A 3 ) h(a 4 A 2 A 3 ) ) ( h(a2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) ) ( h(a1 A 2 ) h(a 1 A 2 A 3 ) )

121 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 A 4 ) + h(a 3 ) h(a 1 A 2 A 3 ) + h(a 2 A 3 A 4 ) + h(a 3 ) Proof Step ( h(a3 A 4 ) h(a 4 A 3 ) + h(a 3 ) ) ( h(a4 A 3 ) h(a 4 A 2 A 3 ) ) ( h(a2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) ) ( h(a1 A 2 ) h(a 1 A 2 A 3 ) ) ( h(a1 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 A 3 ) )

122 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 log P min(h(a 1 A 2 A 3 ), h(a 2 A 3 A 4 )) (polymatroid bound) 1 2( h(a1 A 2 A 3 ) + h(a 2 A 3 A 4 ) ) (linearize) 1 2( h(a1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) ) (Shannon-flow) 3 2 log N (Cardinality constraints) Proof sequence h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 3 A 4 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 ) + h(a 4 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 ) + h(a 2 A 3 A 4 ) + h(a 3 ) h(a 1 A 2 A 3 ) + h(a 2 A 3 A 4 ) + h(a 3 ) h(a 1 A 2 A 3 ) + h(a 2 A 3 A 4 ) Proof Step ( h(a3 A 4 ) h(a 4 A 3 ) + h(a 3 ) ) ( h(a4 A 3 ) h(a 4 A 2 A 3 ) ) ( h(a2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) ) ( h(a1 A 2 ) h(a 1 A 2 A 3 ) ) ( h(a1 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 A 3 ) )

123 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 h(a 3 A 4 ) h(a 4 A 3 ) + h(a 3 ) h(a 4 A 3 ) h(a 4 A 2 A 3 ) h(a 2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) h(a 1 A 2 ) h(a 1 A 2 A 3 ) h(a 1 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 A 3 )

124 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 h(a 3 A 4 ) h(a 4 A 3 ) + h(a 3 ) R 34 (A 3, A 4 ) R (l) 34 (A 3, A 4 ), R (h) 3 (A 3) h(a 4 A 3 ) h(a 4 A 2 A 3 ) h(a 2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) h(a 1 A 2 ) h(a 1 A 2 A 3 ) h(a 1 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 A 3 )

125 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 h(a 3 A 4 ) h(a 4 A 3 ) + h(a 3 ) h(a 4 A 3 ) h(a 4 A 2 A 3 ) R 34 (A 3, A 4 ) R (l) 34 (A 3, A 4 ), R (h) 3 (A 3) R (l) 34 (A 3, A 4 ) R (l) 34 (A 3, A 4 ) h(a 2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) h(a 1 A 2 ) h(a 1 A 2 A 3 ) h(a 1 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 A 3 )

126 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 h(a 3 A 4 ) h(a 4 A 3 ) + h(a 3 ) h(a 4 A 3 ) h(a 4 A 2 A 3 ) h(a 2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) R 34 (A 3, A 4 ) R (l) 34 (A 3, A 4 ), R (h) 3 (A 3) R (l) 34 (A 3, A 4 ) R (l) 34 (A 3, A 4 ) R 23 (A 2, A 3 ) R (l) 34 (A 3, A 4 ) T 234 (A 2, A 3, A 4 ) h(a 1 A 2 ) h(a 1 A 2 A 3 ) h(a 1 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 A 3 )

127 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 h(a 3 A 4 ) h(a 4 A 3 ) + h(a 3 ) h(a 4 A 3 ) h(a 4 A 2 A 3 ) h(a 2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) h(a 1 A 2 ) h(a 1 A 2 A 3 ) R 34 (A 3, A 4 ) R (l) 34 (A 3, A 4 ), R (h) 3 (A 3) R (l) 34 (A 3, A 4 ) R (l) 34 (A 3, A 4 ) R 23 (A 2, A 3 ) R (l) 34 (A 3, A 4 ) T 234 (A 2, A 3, A 4 ) R 12 (A 1, A 2 ) R 12 (A 1, A 2 ) h(a 1 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 A 3 )

128 Example: P : T 123 T 234 R 12 R 23 R 34 R 41. R 12, R 23, R 34, R 41 N P N 3/2 h(a 3 A 4 ) h(a 4 A 3 ) + h(a 3 ) h(a 4 A 3 ) h(a 4 A 2 A 3 ) h(a 2 A 3 ) + h(a 4 A 2 A 3 ) h(a 2 A 3 A 4 ) h(a 1 A 2 ) h(a 1 A 2 A 3 ) h(a 1 A 2 A 3 ) + h(a 3 ) h(a 1 A 2 A 3 ) R 34 (A 3, A 4 ) R (l) 34 (A 3, A 4 ), R (h) 3 (A 3) R (l) 34 (A 3, A 4 ) R (l) 34 (A 3, A 4 ) R 23 (A 2, A 3 ) R (l) 34 (A 3, A 4 ) T 234 (A 2, A 3, A 4 ) R 12 (A 1, A 2 ) R 12 (A 1, A 2 ) R 12 (A 1, A 2 ) R (h) 3 (A 3) T 123 (A 1, A 2, A 3 )

129 Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

130 Roadmap Given degree constraints DC and a disjunctive datalog rule P : B B T B F E R F Answer (Worst-case Output Size Bound) Polymatroid bound max log P (D) max min h(b) D =DC h Γ n HDC B B Answer (Algorithm) PANDA computes a model for P within Õ(2polymatroid bound ) Question (Gathering fruits) Plug bound/algorithm into Meta Algorithm, what do we get?

131 Option 1: Full Conjunctive Query Full conjunctive query Database D Q i Q i (D) Q o Output F E R F itime T [n] otime Answer itime = max Q i(d) max D =DC h Γ 2h([n]) = PANDA s runtime n HDC PANDA is worst-case optimal whenever the polymatroid bound is tight!

132 Option 2: A Single Tree Decomposition Let (T, χ) be a tree decomposition of H to be chosen later Database D F E R F Multiple Conjunctive Rules itime Q i Q i (D) v V (T ) T χ(v) itime = max Q i(d) max max P v(d) D =DC D =DC v V (T ) Q o otime 2 max v V (T ) max h Γn DC h(χ(v)) = PANDA s runtime Pick the best (T, χ) before running PANDA: min max max (T,χ) v V (T ) h Γ n CC h(χ(v)) log N fhtw(q) PANDA evaluates Q within Õ(N fhtw(q) )-time. Output Answer

133 Option 3: Multiple Tree Decompositions Database D Multiple Disjunctive Datalog Rules! itime R F T B F E B B B Q i Q i (D) Q o otime Output Answer log itime = max D =DC log Q i(d) max max B = max max D =DC B max min h(b) = h Γ n HDC B B max min h Γ n HDC (T,χ) v V (T ) max max h Γ n HDC min (T,χ) v V (T ) log P B(D) max max min h(b) h Γ n HDC B B B h(χ(v)) = log(panda s runtime) h(χ(v)) log N subw(q) PANDA evaluates Q within Õ(N subw(q) )-time.

134 Quantities of Interests X {Γ n, Γ n, SA n } Y {HDC, HFD, HCC, log N ED, log N VD} Define LogSizeBound X Y (P ) def = max h X Y min B B h(b) Minimaxwidth X Y (Q) def = min max (T,χ) v V (T ) max h(χ(v)), h X Y Maximinwidth X Y (Q) def = max min max h(χ(v)). h X Y (T,χ) v V (T )

135 Summary of Bounds Z SAn Γn HDC HCC ED log N VD log N Γ n DAEB(Q) log ρ 2 AGM(Q) (Q) log 2 N log 2 VB(Q) DAPB(Q) log ρ 2 AGM(Q) (Q) log 2 N log 2 VB(Q) ρ(q, (NF )F E) ρ(q) log 2 N log 2 VB(Q) eda-fhtw(q) LogSizeBound X Y (Q) da-fhtw(q) eda-subw(q) fhtw(q) tw(q) + 1 fhtw(q) tw(q) + 1 ghtw(q) tw(q) + 1 Minimaxwidth X Y (Q) X da-subw(q) ghtw(q) subw(q) tw(q) + 1 tw(q) + 1 tw(q) + 1 Maximinwidth X Y (Q) Y

136 Many Open Questions Is the entropic bound computable under CC HDC or DC?

137 Many Open Questions Is the entropic bound computable under CC HDC or DC? Worst-case optimal algorithm for full conjunctive queries under CC HDC or DC

138 Many Open Questions Is the entropic bound computable under CC HDC or DC? Worst-case optimal algorithm for full conjunctive queries under CC HDC or DC Worst-case optimal algorithm for disjunctive datalog rules under CC HDC or DC

139 Many Open Questions Is the entropic bound computable under CC HDC or DC? Worst-case optimal algorithm for full conjunctive queries under CC HDC or DC Worst-case optimal algorithm for disjunctive datalog rules under CC HDC or DC Remove the annoying poly-log factor from PANDA

140 Many Open Questions Is the entropic bound computable under CC HDC or DC? Worst-case optimal algorithm for full conjunctive queries under CC HDC or DC Worst-case optimal algorithm for disjunctive datalog rules under CC HDC or DC Remove the annoying poly-log factor from PANDA Other choices for the propositional formula in the Meta Algorithm, perhaps tradding off itime and?

141 Many Open Questions Is the entropic bound computable under CC HDC or DC? Worst-case optimal algorithm for full conjunctive queries under CC HDC or DC Worst-case optimal algorithm for disjunctive datalog rules under CC HDC or DC Remove the annoying poly-log factor from PANDA Other choices for the propositional formula in the Meta Algorithm, perhaps tradding off itime and? What about negations?

142 Many Thanks! Questions?

143 Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

144 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E

145 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34

146 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34

147 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34

148 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b

149 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3

150 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 3 b 1 c b 2 c A 2 A 3 A 4 1 d 4 1 c 3 2 d 4

151 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 3 b 1 c b 2 c A 2 A 3 A 4 1 d 4 1 c 3 2 d 4

152 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 3 b 1 c b 2 c Inclusive Disjunction A 2 A 3 A 4 1 d 4 1 c 3 2 d 4

153 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 3 b 1 c b 2 c A 2 A 3 A 4 1 d 4 1 c 3 2 d 4 No minimal model requirement

154 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 3 b 1 c b 2 c A 2 A 3 A 4 1 d 4 1 c 3 2 d 4 Model size is max( T 123, T 234 ) = 3

155 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 3 b 1 c b 2 c A 2 A 3 A 4 1 d 4 1 c 3 2 d 4 Output size is the minimum over all models

156 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 3 b 1 c b 2 c A 2 A 3 A 4 1 d 4 A minimum-sized model of size 2

157 Semantics of Our Disjunctive Datalog Rule P : T B (A B ) R F (A F ) B B F E A 2 R 12 R 23 A 1 A 3 R 41 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. A 4 R 34 A 1 A 2 a 1 b 1 b 2 A 2 A 3 1 c 1 d 2 c A 3 A 4 c 3 d 4 d 5 A 4 A 1 3 b 4 a 4 b A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 3 b 1 c b 2 c A 2 A 3 A 4 1 d 4 A minimum-sized model of size 2 Hence, the output size is 2

158 Output Size of a Disjunctive Datalog Rule P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B

159 Output Size of a Disjunctive Datalog Rule P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34

160 Output Size of a Disjunctive Datalog Rule P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B A 2 R 12 R 23 A 1 A 3 R 41 CC : R 12 N, R 23 N, R 34 N, R 41 N. A 4 R 34

161 Output Size of a Disjunctive Datalog Rule P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B A 2 R 12 R 23 A 1 A 3 R 41 CC : R 12 N, R 23 N, R 34 N, R 41 N. A 4 R 34 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41.

162 Output Size of a Disjunctive Datalog Rule P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B A 2 R 12 R 23 A 1 A 3 R 41 CC : R 12 N, R 23 N, R 34 N, R 41 N. A 4 R 34 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. max P 123,234(D) N 3/2 D =CC

163 Output Size of a Disjunctive Datalog Rule P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B A 2 R 12 R 23 A 1 A 3 R 41 CC : R 12 N, R 23 N, R 34 N, R 41 N. A 4 R 34 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. max P 123,234(D) N 3/2 D =CC P : T 123 R 12 R 23 R 34 R 41.

164 Output Size of a Disjunctive Datalog Rule P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B A 2 R 12 R 23 A 1 A 3 R 41 CC : R 12 N, R 23 N, R 34 N, R 41 N. A 4 R 34 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. max P 123,234(D) N 3/2 D =CC P : T 123 R 12 R 23 R 34 R 41. P (D) = N 2, for some D

165 Output Size of a Disjunctive Datalog Rule P : B B T B (A B ) F E R F (A F ) P (D) def = min T:T =P max B B T B A 2 R 12 R 23 A 1 A 3 R 41 CC : R 12 N, R 23 N, R 34 N, R 41 N. A 4 R 34 P 123,234 : T 123 T 234 R 12 R 23 R 34 R 41. max P 123,234(D) N 3/2 D =CC P : T 123 R 12 R 23 R 34 R 41. P (D) = N 2, for some D Using Option 3, 4-cycle query answerable in Õ(N 3/2 )-time, matching [Alon, Yuster, Zwick 97]

166 Polymatroid Bound: Examples Q(A 1, A 2, A 3, A 4 ) R 12 (A 1, A 2 ), R 23 (A 2, A 3 ), R 34 (A 3, A 4 ), R 41 (A 4, A 1 ). A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 R 12, R 23, R 34, R 41 N Q N 2

167 Polymatroid Bound: Examples Q(A 1, A 2, A 3, A 4 ) R 12 (A 1, A 2 ), R 23 (A 2, A 3 ), R 34 (A 3, A 4 ), R 41 (A 4, A 1 ). A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 R 12, R 23, R 34, R 41 N Q N 2 log Q = h(a 1 A 2 A 3 A 4 ) h(a 1 A 2 ) + h(a 3 A 4 ) 2 log N

168 Polymatroid Bound: Examples Q(A 1, A 2, A 3, A 4 ) R 12 (A 1, A 2 ), R 23 (A 2, A 3 ), R 34 (A 3, A 4 ), R 41 (A 4, A 1 ). A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 R 12, R 23, R 34, R 41 N Q N 2 log Q = h(a 1 A 2 A 3 A 4 ) h(a 1 A 2 ) + h(a 3 A 4 ) 2 log N deg 12 (A 1 A 2 A 1 ), deg 12 (A 1 A 2 A 2 ) D Q D N 3/2

169 Polymatroid Bound: Examples Q(A 1, A 2, A 3, A 4 ) R 12 (A 1, A 2 ), R 23 (A 2, A 3 ), R 34 (A 3, A 4 ), R 41 (A 4, A 1 ). A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 R 12, R 23, R 34, R 41 N Q N 2 log Q = h(a 1 A 2 A 3 A 4 ) h(a 1 A 2 ) + h(a 3 A 4 ) 2 log N deg 12 (A 1 A 2 A 1 ), deg 12 (A 1 A 2 A 2 ) D Q D N 3/2 2 log Q = 2h(A 1 A 2 A 3 A 4 ) h(a 2 A 3 ) + h(a 3 A 4 ) + h(a 4 A 1 ) + h(a 2 A 1 ) + h(a 1 A 2 ) 3 log N + 2 log D

170 Polymatroid Bound: Examples Q(A 1, A 2, A 3, A 4 ) R 12 (A 1, A 2 ), R 23 (A 2, A 3 ), R 34 (A 3, A 4 ), R 41 (A 4, A 1 ). A 2 R 12 R 23 A 1 A 3 R 41 A 4 R 34 R 12, R 23, R 34, R 41 N Q N 2 log Q = h(a 1 A 2 A 3 A 4 ) h(a 1 A 2 ) + h(a 3 A 4 ) 2 log N deg 12 (A 1 A 2 A 1 ), deg 12 (A 1 A 2 A 2 ) D Q D N 3/2 2 log Q = 2h(A 1 A 2 A 3 A 4 ) h(a 2 A 3 ) + h(a 3 A 4 ) + h(a 4 A 1 ) + h(a 2 A 1 ) + h(a 1 A 2 ) 3 log N + 2 log D A 1 A 2, A 2 A 1 Q N 3/2

171 Polymatroid Bound is not tight!

172 Polymatroid Bound is not tight! Proof Strategy

173 Polymatroid Bound is not tight! Proof Strategy Take a non-shannon inequality (modified from [Zhang-Yeung 98]) 11h(ABXY C) 3h(XY ) + 3h(AX) + 3h(AY ) + h(bx) + h(by ) + 5h(C) + h(abxy C AB) + h(abxy C AC) + 4h(ABXY C AXY ) + h(abxy C BXY ) + 2h(ABXY C XC) + 2h(ABXY C Y C) (3)

174 Polymatroid Bound is not tight! Proof Strategy Take a non-shannon inequality (modified from [Zhang-Yeung 98]) 11h(ABXY C) 3h(XY ) + 3h(AX) + 3h(AY ) + h(bx) + h(by ) + 5h(C) + h(abxy C AB) + h(abxy C AC) + 4h(ABXY C AXY ) + h(abxy C BXY ) + 2h(ABXY C XC) + 2h(ABXY C Y C) (3) Construct a query Q where (3) gives the (entropic) bound 11 log Q = 11h(ABXY C) 11 log N log N 2 = 43 log N

Entropy Bounds for Conjunctive Queries with Functional Dependencies

Entropy Bounds for Conjunctive Queries with Functional Dependencies Entropy Bounds for Conjunctive Queries with Functional Dependencies Tomasz Gogacz 1 and Szymon Toruńczyk 2 1 University of Edinburgh, Edinburgh, UK 2 University of Warsaw, Warsaw, Poland Abstract We study

More information

A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries

A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries A Worst-Case Optimal Multi-Round Algorithm for Parallel Computation of Conjunctive Queries Bas Ketsman Dan Suciu Abstract We study the optimal communication cost for computing a full conjunctive query

More information

arxiv: v6 [cs.db] 6 Feb 2017

arxiv: v6 [cs.db] 6 Feb 2017 FAQ: Questions Asked Frequently Mahmoud Abo Khamis Hung Q. Ngo Atri Rudra arxiv:1504.04044v6 [cs.db] 6 Feb 2017 LogicBlox Inc. {mahmoud.abokhamis,hung.ngo}@logicblox.com Department of Computer Science

More information

Approximating fractional hypertree width

Approximating fractional hypertree width Approximating fractional hypertree width Dániel Marx Abstract Fractional hypertree width is a hypergraph measure similar to tree width and hypertree width. Its algorithmic importance comes from the fact

More information

arxiv: v1 [cs.db] 8 Mar 2012

arxiv: v1 [cs.db] 8 Mar 2012 Worst-case Optimal Join Algorithms arxiv:12031952v1 [csdb] 8 Mar 2012 Hung Q Ngo Computer Science and Engineering, SUNY Buffalo, USA Christopher Ré Computer Science, University of Wisconsin Madison USA

More information

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation

A Dichotomy. in in Probabilistic Databases. Joint work with Robert Fink. for Non-Repeating Queries with Negation Queries with Negation Dichotomy for Non-Repeating Queries with Negation Queries with Negation in in Probabilistic Databases Robert Dan Olteanu Fink and Dan Olteanu Joint work with Robert Fink Uncertainty in Computation Simons

More information

Structural Tractability of Counting of Solutions to Conjunctive Queries

Structural Tractability of Counting of Solutions to Conjunctive Queries Structural Tractability of Counting of Solutions to Conjunctive Queries Arnaud Durand 1 1 University Paris Diderot Journées DAG, june 2013 Joint work with Stefan Mengel (JCSS (to appear) + ICDT 2013) 1

More information

Tractable Counting of the Answers to Conjunctive Queries

Tractable Counting of the Answers to Conjunctive Queries Tractable Counting of the Answers to Conjunctive Queries Reinhard Pichler and Sebastian Skritek Technische Universität Wien, {pichler, skritek}@dbai.tuwien.ac.at Abstract. Conjunctive queries (CQs) are

More information

Parallel-Correctness and Transferability for Conjunctive Queries

Parallel-Correctness and Transferability for Conjunctive Queries Parallel-Correctness and Transferability for Conjunctive Queries Tom J. Ameloot1 Gaetano Geck2 Bas Ketsman1 Frank Neven1 Thomas Schwentick2 1 Hasselt University 2 Dortmund University Big Data Too large

More information

Symmetries in the Entropy Space

Symmetries in the Entropy Space Symmetries in the Entropy Space Jayant Apte, Qi Chen, John MacLaren Walsh Abstract This paper investigates when Shannon-type inequalities completely characterize the part of the closure of the entropy

More information

On Factorisation of Provenance Polynomials

On Factorisation of Provenance Polynomials On Factorisation of Provenance Polynomials Dan Olteanu and Jakub Závodný Oxford University Computing Laboratory Wolfson Building, Parks Road, OX1 3QD, Oxford, UK 1 Introduction Tracking and managing provenance

More information

Tractable structures for constraint satisfaction with truth tables

Tractable structures for constraint satisfaction with truth tables Tractable structures for constraint satisfaction with truth tables Dániel Marx September 22, 2009 Abstract The way the graph structure of the constraints influences the complexity of constraint satisfaction

More information

Single-Round Multi-Join Evaluation. Bas Ketsman

Single-Round Multi-Join Evaluation. Bas Ketsman Single-Round Multi-Join Evaluation Bas Ketsman Outline 1. Introduction 2. Parallel-Correctness 3. Transferability 4. Special Cases 2 Motivation Single-round Multi-joins Less rounds / barriers Formal framework

More information

R E P O R T. Generalized Hypertree Decompositions: NP-Hardness and Tractable Variants INSTITUT FÜR INFORMATIONSSYSTEME DBAI-TR

R E P O R T. Generalized Hypertree Decompositions: NP-Hardness and Tractable Variants INSTITUT FÜR INFORMATIONSSYSTEME DBAI-TR TECHNICAL R E P O R T INSTITUT FÜR INFORMATIONSSYSTEME ABTEILUNG DATENBANKEN UND ARTIFICIAL INTELLIGENCE Generalized Hypertree Decompositions: NP-Hardness and Tractable Variants DBAI-TR-2007-55 Georg Gottlob

More information

Factorised Representations of Query Results: Size Bounds and Readability

Factorised Representations of Query Results: Size Bounds and Readability Factorised Representations of Query Results: Size Bounds and Readability Dan Olteanu and Jakub Závodný Department of Computer Science University of Oxford {dan.olteanu,jakub.zavodny}@cs.ox.ac.uk ABSTRACT

More information

Size and Treewidth Bounds for Conjunctive Queries

Size and Treewidth Bounds for Conjunctive Queries Size and Treewidth Bounds for Conjunctive Queries GEORG GOTTLOB University of Oxford, UK STEPHANIE TIEN LEE University of Oxford, UK GREGORY VALIANT University of California, Berkeley and PAUL VALIANT

More information

arxiv: v1 [cs.db] 9 Mar 2017

arxiv: v1 [cs.db] 9 Mar 2017 Juggling Functions Inside a Database Mahmoud Abo Khamis LogicBlox Inc. mahmoud.abokhamis@logicblox.com Hung Q. Ngo LogicBlox Inc. hung.ngo@logicblox.com Atri Rudra University at Buffalo, SUNY atri@buffalo.edu

More information

arxiv: v7 [cs.db] 22 Dec 2015

arxiv: v7 [cs.db] 22 Dec 2015 It s all a matter of degree: Using degree information to optimize multiway joins Manas R. Joglekar 1 and Christopher M. Ré 2 1 Department of Computer Science, Stanford University 450 Serra Mall, Stanford,

More information

arxiv: v2 [cs.db] 16 Oct 2013

arxiv: v2 [cs.db] 16 Oct 2013 Skew Strikes Back: New Developments in the Theory of Join Algorithms Hung Q. Ngo University at Buffalo, SUNY hungngo@buffalo.edu Christopher Ré Stanford University chrismre@cs.stanford.edu Atri Rudra University

More information

The complexity of enumeration and counting for acyclic conjunctive queries

The complexity of enumeration and counting for acyclic conjunctive queries The complexity of enumeration and counting for acyclic conjunctive queries Cambridge, march 2012 Counting and enumeration Counting Output the number of solutions Example : sat, Perfect Matching, Permanent,

More information

Skew Strikes Back: New Developments in the Theory of Join Algorithms

Skew Strikes Back: New Developments in the Theory of Join Algorithms Skew Strikes Back: New Developments in the Theory of Join Algorithms Hung Q. Ngo University at Buffalo, SUN hungngo@buffalo.edu Christopher Ré Stanford University chrismre@cs.stanford.edu Atri Rudra University

More information

Consistent Query Answering under Primary Keys: A Characterization of Tractable Queries

Consistent Query Answering under Primary Keys: A Characterization of Tractable Queries Consistent Query Answering under Primary Keys: A Characterization of Tractable Queries Jef Wijsen University of Mons-Hainaut Mons, Belgium jef.wijsen@umh.ac.be ABSTRACT This article deals with consistent

More information

Datalog and Constraint Satisfaction with Infinite Templates

Datalog and Constraint Satisfaction with Infinite Templates Datalog and Constraint Satisfaction with Infinite Templates Manuel Bodirsky 1 and Víctor Dalmau 2 1 CNRS/LIX, École Polytechnique, bodirsky@lix.polytechnique.fr 2 Universitat Pompeu Fabra, victor.dalmau@upf.edu

More information

arxiv: v1 [cs.ai] 21 Sep 2014

arxiv: v1 [cs.ai] 21 Sep 2014 Oblivious Bounds on the Probability of Boolean Functions WOLFGANG GATTERBAUER, Carnegie Mellon University DAN SUCIU, University of Washington arxiv:409.6052v [cs.ai] 2 Sep 204 This paper develops upper

More information

A Toolbox of Query Evaluation Techniques for Probabilistic Databases

A Toolbox of Query Evaluation Techniques for Probabilistic Databases 2nd Workshop on Management and mining Of UNcertain Data (MOUND) Long Beach, March 1st, 2010 A Toolbox of Query Evaluation Techniques for Probabilistic Databases Dan Olteanu, Oxford University Computing

More information

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam.

P Q1 Q2 Q3 Q4 Q5 Tot (60) (20) (20) (20) (60) (20) (200) You are allotted a maximum of 4 hours to complete this exam. Exam INFO-H-417 Database System Architecture 13 January 2014 Name: ULB Student ID: P Q1 Q2 Q3 Q4 Q5 Tot (60 (20 (20 (20 (60 (20 (200 Exam modalities You are allotted a maximum of 4 hours to complete this

More information

Bounded Query Rewriting Using Views

Bounded Query Rewriting Using Views Bounded Query Rewriting Using Views Yang Cao 1,3 Wenfei Fan 1,3 Floris Geerts 2 Ping Lu 3 1 University of Edinburgh 2 University of Antwerp 3 Beihang University {wenfei@inf, Y.Cao-17@sms}.ed.ac.uk, floris.geerts@uantwerpen.be,

More information

Provenance Semirings. Todd Green Grigoris Karvounarakis Val Tannen. presented by Clemens Ley

Provenance Semirings. Todd Green Grigoris Karvounarakis Val Tannen. presented by Clemens Ley Provenance Semirings Todd Green Grigoris Karvounarakis Val Tannen presented by Clemens Ley place of origin Provenance Semirings Todd Green Grigoris Karvounarakis Val Tannen presented by Clemens Ley place

More information

Factorized Relational Databases Olteanu and Závodný, University of Oxford

Factorized Relational Databases   Olteanu and Závodný, University of Oxford November 8, 2013 Database Seminar, U Washington Factorized Relational Databases http://www.cs.ox.ac.uk/projects/fd/ Olteanu and Závodný, University of Oxford Factorized Representations of Relations Cust

More information

Brief Tutorial on Probabilistic Databases

Brief Tutorial on Probabilistic Databases Brief Tutorial on Probabilistic Databases Dan Suciu University of Washington Simons 206 About This Talk Probabilistic databases Tuple-independent Query evaluation Statistical relational models Representation,

More information

Received: 1 September 2018; Accepted: 10 October 2018; Published: 12 October 2018

Received: 1 September 2018; Accepted: 10 October 2018; Published: 12 October 2018 entropy Article Entropy Inequalities for Lattices Peter Harremoës Copenhagen Business College, Nørre Voldgade 34, 1358 Copenhagen K, Denmark; harremoes@ieee.org; Tel.: +45-39-56-41-71 Current address:

More information

arxiv: v1 [cs.db] 20 Dec 2017

arxiv: v1 [cs.db] 20 Dec 2017 Boolean Tensor Decomposition for Conjunctive Queries with Negation Mahmoud Abo Khamis & Hung Q. Ngo RelationalAI Dan Olteanu University of Oxford Dan Suciu University of Washington arxiv:1712.07445v1 [cs.db]

More information

Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries

Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries Tractable Hypergraph Properties for Constraint Satisfaction and Conjunctive Queries Dániel Marx School of Computer Science Tel Aviv University Tel Aviv, Israel dmarx@cs.bme.hu ABSTRACT An important question

More information

Entropy Vectors and Network Information Theory

Entropy Vectors and Network Information Theory Entropy Vectors and Network Information Theory Sormeh Shadbakht and Babak Hassibi Department of Electrical Engineering Caltech Lee Center Workshop May 25, 2007 1 Sormeh Shadbakht and Babak Hassibi Entropy

More information

The complexity of acyclic subhypergraph problems

The complexity of acyclic subhypergraph problems The complexity of acyclic subhypergraph problems David Duris and Yann Strozecki Équipe de Logique Mathématique (FRE 3233) - Université Paris Diderot-Paris 7 {duris,strozecki}@logique.jussieu.fr Abstract.

More information

Propositional Logic: Models and Proofs

Propositional Logic: Models and Proofs Propositional Logic: Models and Proofs C. R. Ramakrishnan CSE 505 1 Syntax 2 Model Theory 3 Proof Theory and Resolution Compiled at 11:51 on 2016/11/02 Computing with Logic Propositional Logic CSE 505

More information

GAV-sound with conjunctive queries

GAV-sound with conjunctive queries GAV-sound with conjunctive queries Source and global schema as before: source R 1 (A, B),R 2 (B,C) Global schema: T 1 (A, C), T 2 (B,C) GAV mappings become sound: T 1 {x, y, z R 1 (x,y) R 2 (y,z)} T 2

More information

Hypertree-Width and Related Hypergraph Invariants

Hypertree-Width and Related Hypergraph Invariants Hypertree-Width and Related Hypergraph Invariants Isolde Adler, Georg Gottlob, Martin Grohe To cite this version: Isolde Adler, Georg Gottlob, Martin Grohe. Hypertree-Width and Related Hypergraph Invariants.

More information

Tractable Lineages on Treelike Instances: Limits and Extensions

Tractable Lineages on Treelike Instances: Limits and Extensions Tractable Lineages on Treelike Instances: Limits and Extensions Antoine Amarilli 1, Pierre Bourhis 2, Pierre enellart 1,3 June 29th, 2016 1 Télécom ParisTech 2 CNR CRItAL 3 National University of ingapore

More information

Functional Dependencies and Normalization

Functional Dependencies and Normalization Functional Dependencies and Normalization There are many forms of constraints on relational database schemata other than key dependencies. Undoubtedly most important is the functional dependency. A functional

More information

SLD-Resolution And Logic Programming (PROLOG)

SLD-Resolution And Logic Programming (PROLOG) Chapter 9 SLD-Resolution And Logic Programming (PROLOG) 9.1 Introduction We have seen in Chapter 8 that the resolution method is a complete procedure for showing unsatisfiability. However, finding refutations

More information

α-acyclic Joins Jef Wijsen May 4, 2017

α-acyclic Joins Jef Wijsen May 4, 2017 α-acyclic Joins Jef Wijsen May 4, 2017 1 Motivation Joins in a Distributed Environment Assume the following relations. 1 M[NN, Field of Study, Year] stores data about students of UMONS. For example, (19950423158,

More information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015 Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:

More information

Query answering using views

Query answering using views Query answering using views General setting: database relations R 1,...,R n. Several views V 1,...,V k are defined as results of queries over the R i s. We have a query Q over R 1,...,R n. Question: Can

More information

The Logic of Partitions with an application to Information Theory

The Logic of Partitions with an application to Information Theory 1 The Logic of Partitions with an application to Information Theory David Ellerman University of California at Riverside and WWW.Ellerman.org 2 Outline of Talk Logic of partitions dual to ordinary logic

More information

On Datalog vs. LFP. Anuj Dawar and Stephan Kreutzer

On Datalog vs. LFP. Anuj Dawar and Stephan Kreutzer On Datalog vs. LFP Anuj Dawar and Stephan Kreutzer 1 University of Cambridge Computer Lab, anuj.dawar@cl.cam.ac.uk 2 Oxford University Computing Laboratory, kreutzer@comlab.ox.ac.uk Abstract. We show that

More information

Hypertree Decompositions: Structure, Algorithms, and Applications

Hypertree Decompositions: Structure, Algorithms, and Applications Hypertree Decompositions: Structure, Algorithms, and Applications Georg Gottlob 1, Martin Grohe 2, Nysret Musliu 1, Marko Samer 1, and Francesco Scarcello 3 1 Institut für Informationssysteme, TU Wien,

More information

A Relational Knowledge System

A Relational Knowledge System Relational Knowledge System by Shik Kam Michael Wong, Cory James utz, and Dan Wu Technical Report CS-01-04 January 2001 Department of Computer Science University of Regina Regina, Saskatchewan Canada S4S

More information

The Inflation Technique for Causal Inference with Latent Variables

The Inflation Technique for Causal Inference with Latent Variables The Inflation Technique for Causal Inference with Latent Variables arxiv:1609.00672 (Elie Wolfe, Robert W. Spekkens, Tobias Fritz) September 2016 Introduction Given some correlations between the vocabulary

More information

Dynamic Programming on Trees. Example: Independent Set on T = (V, E) rooted at r V.

Dynamic Programming on Trees. Example: Independent Set on T = (V, E) rooted at r V. Dynamic Programming on Trees Example: Independent Set on T = (V, E) rooted at r V. For v V let T v denote the subtree rooted at v. Let f + (v) be the size of a maximum independent set for T v that contains

More information

Agnostic Learning of Disjunctions on Symmetric Distributions

Agnostic Learning of Disjunctions on Symmetric Distributions Agnostic Learning of Disjunctions on Symmetric Distributions Vitaly Feldman vitaly@post.harvard.edu Pravesh Kothari kothari@cs.utexas.edu May 26, 2014 Abstract We consider the problem of approximating

More information

The Complexity of Causality and Responsibility for Query Answers and non-answers

The Complexity of Causality and Responsibility for Query Answers and non-answers The Complexity of Causality and Responsibility for Query Answers and non-answers Alexandra Meliou Wolfgang Gatterbauer Katherine F. Moore Dan Suciu Department of Computer Science and Engineering, University

More information

Notes. Corneliu Popeea. May 3, 2013

Notes. Corneliu Popeea. May 3, 2013 Notes Corneliu Popeea May 3, 2013 1 Propositional logic Syntax We rely on a set of atomic propositions, AP, containing atoms like p, q. A propositional logic formula φ Formula is then defined by the following

More information

Lecture 19 October 28, 2015

Lecture 19 October 28, 2015 PHYS 7895: Quantum Information Theory Fall 2015 Prof. Mark M. Wilde Lecture 19 October 28, 2015 Scribe: Mark M. Wilde This document is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike

More information

Knowledge Compilation Meets Database Theory : Compiling Queries to Decision Diagrams

Knowledge Compilation Meets Database Theory : Compiling Queries to Decision Diagrams Knowledge Compilation Meets Database Theory : Compiling Queries to Decision Diagrams Abhay Jha Computer Science and Engineering University of Washington, WA, USA abhaykj@cs.washington.edu Dan Suciu Computer

More information

arxiv: v1 [cs.db] 10 Nov 2017

arxiv: v1 [cs.db] 10 Nov 2017 Size bounds and query plans for relational joins arxiv:1711.03860v1 [cs.db] 10 Nov 2017 Albert Atserias Universitat Politècnica de Catalunya Barcelona, Spain Martin Grohe Humboldt Universität zu Berlin

More information

Lifted Inference: Exact Search Based Algorithms

Lifted Inference: Exact Search Based Algorithms Lifted Inference: Exact Search Based Algorithms Vibhav Gogate The University of Texas at Dallas Overview Background and Notation Probabilistic Knowledge Bases Exact Inference in Propositional Models First-order

More information

Games and Isomorphism in Finite Model Theory. Part 1

Games and Isomorphism in Finite Model Theory. Part 1 1 Games and Isomorphism in Finite Model Theory Part 1 Anuj Dawar University of Cambridge Games Winter School, Champéry, 6 February 2013 2 Model Comparison Games Games in Finite Model Theory are generally

More information

arxiv: v1 [cs.db] 21 Feb 2017

arxiv: v1 [cs.db] 21 Feb 2017 Answering Conjunctive Queries under Updates Christoph Berkholz, Jens Keppeler, Nicole Schweikardt Humboldt-Universität zu Berlin {berkholz,keppelej,schweika}@informatik.hu-berlin.de February 22, 207 arxiv:702.06370v

More information

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k.

P is the class of problems for which there are algorithms that solve the problem in time O(n k ) for some constant k. Complexity Theory Problems are divided into complexity classes. Informally: So far in this course, almost all algorithms had polynomial running time, i.e., on inputs of size n, worst-case running time

More information

Propositional Logic: Part II - Syntax & Proofs 0-0

Propositional Logic: Part II - Syntax & Proofs 0-0 Propositional Logic: Part II - Syntax & Proofs 0-0 Outline Syntax of Propositional Formulas Motivating Proofs Syntactic Entailment and Proofs Proof Rules for Natural Deduction Axioms, theories and theorems

More information

Multi-join Query Evaluation on Big Data Lecture 2

Multi-join Query Evaluation on Big Data Lecture 2 Multi-join Query Evaluation on Big Data Lecture 2 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 2 March, 2015 1 / 34 Multi-join Query Evaluation Outline Part 1 Optimal Sequential Algorithms. Thursday

More information

Bounded-width QBF is PSPACE-complete

Bounded-width QBF is PSPACE-complete Bounded-width QBF is PSPACE-complete Albert Atserias Universitat Politècnica de Catalunya Barcelona, Spain atserias@lsi.upc.edu Sergi Oliva Universitat Politècnica de Catalunya Barcelona, Spain oliva@lsi.upc.edu

More information

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1

Logic and Databases. Phokion G. Kolaitis. UC Santa Cruz & IBM Research Almaden. Lecture 4 Part 1 Logic and Databases Phokion G. Kolaitis UC Santa Cruz & IBM Research Almaden Lecture 4 Part 1 1 Thematic Roadmap Logic and Database Query Languages Relational Algebra and Relational Calculus Conjunctive

More information

Lecture 23: Alternation vs. Counting

Lecture 23: Alternation vs. Counting CS 710: Complexity Theory 4/13/010 Lecture 3: Alternation vs. Counting Instructor: Dieter van Melkebeek Scribe: Jeff Kinne & Mushfeq Khan We introduced counting complexity classes in the previous lecture

More information

Enumeration Complexity of Conjunctive Queries with Functional Dependencies

Enumeration Complexity of Conjunctive Queries with Functional Dependencies Enumeration Complexity of Conjunctive Queries with Functional Dependencies Nofar Carmeli Technion, Haifa, Israel snofca@cs.technion.ac.il Markus Kröll TU Wien, Vienna, Austria kroell@dbai.tuwien.ac.at

More information

On Resolution Like Proofs of Monotone Self-Dual Functions

On Resolution Like Proofs of Monotone Self-Dual Functions On Resolution Like Proofs of Monotone Self-Dual Functions Daya Ram Gaur Department of Mathematics and Computer Science University of Lethbridge, Lethbridge, AB, Canada, T1K 3M4 gaur@cs.uleth.ca Abstract

More information

Some algorithmic improvements for the containment problem of conjunctive queries with negation

Some algorithmic improvements for the containment problem of conjunctive queries with negation Some algorithmic improvements for the containment problem of conjunctive queries with negation Michel Leclère and Marie-Laure Mugnier LIRMM, Université de Montpellier, 6, rue Ada, F-3439 Montpellier cedex

More information

On Incomplete XML Documents with Integrity Constraints

On Incomplete XML Documents with Integrity Constraints On Incomplete XML Documents with Integrity Constraints Pablo Barceló 1, Leonid Libkin 2, and Juan Reutter 2 1 Department of Computer Science, University of Chile 2 School of Informatics, University of

More information

Proving simple set properties...

Proving simple set properties... Proving simple set properties... Part 1: Some examples of proofs over sets Fall 2013 Proving simple set properties... Fall 2013 1 / 17 Introduction Overview: Learning outcomes In this session we will...

More information

CS632 Notes on Relational Query Languages I

CS632 Notes on Relational Query Languages I CS632 Notes on Relational Query Languages I A. Demers 6 Feb 2003 1 Introduction Here we define relations, and introduce our notational conventions, which are taken almost directly from [AD93]. We begin

More information

Lecture 2: Proof of Switching Lemma

Lecture 2: Proof of Switching Lemma Lecture 2: oof of Switching Lemma Yuan Li June 25, 2014 1 Decision Tree Instead of bounding the probability that f can be written as some s-dnf, we estimate the probability that f can be computed by a

More information

Junta Approximations for Submodular, XOS and Self-Bounding Functions

Junta Approximations for Submodular, XOS and Self-Bounding Functions Junta Approximations for Submodular, XOS and Self-Bounding Functions Vitaly Feldman Jan Vondrák IBM Almaden Research Center Simons Institute, Berkeley, October 2013 Feldman-Vondrák Approximations by Juntas

More information

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 3: Query Processing Query Processing Decomposition Localization Optimization CS 347 Notes 3 2 Decomposition Same as in centralized system

More information

α-formulas β-formulas

α-formulas β-formulas α-formulas Logic: Compendium http://www.ida.liu.se/ TDDD88/ Andrzej Szalas IDA, University of Linköping October 25, 2017 Rule α α 1 α 2 ( ) A 1 A 1 ( ) A 1 A 2 A 1 A 2 ( ) (A 1 A 2 ) A 1 A 2 ( ) (A 1 A

More information

Maximizing Conjunctive Views in Deletion Propagation

Maximizing Conjunctive Views in Deletion Propagation Maximizing Conjunctive Views in Deletion Propagation Extended Version Benny Kimelfeld Jan Vondrák Ryan Williams IBM Research Almaden San Jose, CA 9510, USA {kimelfeld, jvondrak, ryanwill}@us.ibm.com ABSTRACT

More information

Multi-Tuple Deletion Propagation: Approximations and Complexity

Multi-Tuple Deletion Propagation: Approximations and Complexity Multi-Tuple Deletion Propagation: Approximations and Complexity Benny Kimelfeld IBM Research Almaden San Jose, CA 95120, USA kimelfeld@us.ibm.com Jan Vondrák IBM Research Almaden San Jose, CA 95120, USA

More information

Schema Refinement: Other Dependencies and Higher Normal Forms

Schema Refinement: Other Dependencies and Higher Normal Forms Schema Refinement: Other Dependencies and Higher Normal Forms Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) Higher Normal Forms 1 / 14 Outline 1

More information

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and private study only. The thesis may not be reproduced elsewhere

More information

Disjunctive Databases for Representing Repairs

Disjunctive Databases for Representing Repairs Noname manuscript No (will be inserted by the editor) Disjunctive Databases for Representing Repairs Cristian Molinaro Jan Chomicki Jerzy Marcinkowski Received: date / Accepted: date Abstract This paper

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

Lecture 3: Oct 7, 2014

Lecture 3: Oct 7, 2014 Information and Coding Theory Autumn 04 Lecturer: Shi Li Lecture : Oct 7, 04 Scribe: Mrinalkanti Ghosh In last lecture we have seen an use of entropy to give a tight upper bound in number of triangles

More information

Course information. Winter ATFD

Course information. Winter ATFD Course information More information next week This is a challenging course that will put you at the forefront of current data management research Lots of work done by you: extra reading: at least 4 research

More information

Endre Boros a Ondřej Čepekb Alexander Kogan c Petr Kučera d

Endre Boros a Ondřej Čepekb Alexander Kogan c Petr Kučera d R u t c o r Research R e p o r t A subclass of Horn CNFs optimally compressible in polynomial time. Endre Boros a Ondřej Čepekb Alexander Kogan c Petr Kučera d RRR 11-2009, June 2009 RUTCOR Rutgers Center

More information

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation

1 First-order logic. 1 Syntax of first-order logic. 2 Semantics of first-order logic. 3 First-order logic queries. 2 First-order query evaluation Knowledge Bases and Databases Part 1: First-Order Queries Diego Calvanese Faculty of Computer Science Master of Science in Computer Science A.Y. 2007/2008 Overview of Part 1: First-order queries 1 First-order

More information

Lecture 29: Computational Learning Theory

Lecture 29: Computational Learning Theory CS 710: Complexity Theory 5/4/2010 Lecture 29: Computational Learning Theory Instructor: Dieter van Melkebeek Scribe: Dmitri Svetlov and Jake Rosin Today we will provide a brief introduction to computational

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

Answer-Set Programming with Bounded Treewidth

Answer-Set Programming with Bounded Treewidth Answer-Set Programming with Bounded Treewidth Michael Jakl, Reinhard Pichler and Stefan Woltran Institute of Information Systems, Vienna University of Technology Favoritenstrasse 9 11; A-1040 Wien; Austria

More information

arxiv: v1 [cs.db] 20 Feb 2016

arxiv: v1 [cs.db] 20 Feb 2016 arxiv:1602.06458v1 [cs.db] 20 Feb 2016 Causes for Query Answers from Databases, Datalog Abduction and View-Updates: The Presence of Integrity Constraints Babak Salimi and Leopoldo Bertossi Carleton University,

More information

Composing Schema Mappings: Second-Order Dependencies to the Rescue

Composing Schema Mappings: Second-Order Dependencies to the Rescue Composing Schema Mappings: Second-Order Dependencies to the Rescue RONALD FAGIN IBM Almaden Research Center PHOKION G. KOLAITIS IBM Almaden Research Center LUCIAN POPA IBM Almaden Research Center WANG-CHIEW

More information

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch] NP-Completeness Andreas Klappenecker [based on slides by Prof. Welch] 1 Prelude: Informal Discussion (Incidentally, we will never get very formal in this course) 2 Polynomial Time Algorithms Most of the

More information

In-Database Learning with Sparse Tensors

In-Database Learning with Sparse Tensors In-Database Learning with Sparse Tensors Mahmoud Abo Khamis, Hung Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich Toronto, October 2017 RelationalAI Talk Outline Current Landscape for DB+ML

More information

Propositional and First Order Reasoning

Propositional and First Order Reasoning Propositional and First Order Reasoning Terminology Propositional variable: boolean variable (p) Literal: propositional variable or its negation p p Clause: disjunction of literals q \/ p \/ r given by

More information

Topics in Probabilistic and Statistical Databases. Lecture 9: Histograms and Sampling. Dan Suciu University of Washington

Topics in Probabilistic and Statistical Databases. Lecture 9: Histograms and Sampling. Dan Suciu University of Washington Topics in Probabilistic and Statistical Databases Lecture 9: Histograms and Sampling Dan Suciu University of Washington 1 References Fast Algorithms For Hierarchical Range Histogram Construction, Guha,

More information

Algorithmic Meta-Theorems

Algorithmic Meta-Theorems Algorithmic Meta-Theorems Stephan Kreutzer Oxford University Computing Laboratory stephan.kreutzer@comlab.ox.ac.uk Abstract. Algorithmic meta-theorems are general algorithmic results applying to a whole

More information

Reasoning with Deterministic and Probabilistic graphical models Class 2: Inference in Constraint Networks Rina Dechter

Reasoning with Deterministic and Probabilistic graphical models Class 2: Inference in Constraint Networks Rina Dechter Reasoning with Deterministic and Probabilistic graphical models Class 2: Inference in Constraint Networks Rina Dechter Road Map n Graphical models n Constraint networks Model n Inference n Search n Probabilistic

More information

Knowledge base (KB) = set of sentences in a formal language Declarative approach to building an agent (or other system):

Knowledge base (KB) = set of sentences in a formal language Declarative approach to building an agent (or other system): Logic Knowledge-based agents Inference engine Knowledge base Domain-independent algorithms Domain-specific content Knowledge base (KB) = set of sentences in a formal language Declarative approach to building

More information

Generating Maximal Independent Sets for Hypergraphs with Bounded Edge-Intersections

Generating Maximal Independent Sets for Hypergraphs with Bounded Edge-Intersections Generating Maximal Independent Sets for Hypergraphs with Bounded Edge-Intersections Endre Boros 1, Khaled Elbassioni 1, Vladimir Gurvich 1, and Leonid Khachiyan 2 1 RUTCOR, Rutgers University, 640 Bartholomew

More information

On the Succinctness of Query Rewriting for Datalog

On the Succinctness of Query Rewriting for Datalog On the Succinctness of Query Rewriting for Datalog Shqiponja Ahmetaj 1 Andreas Pieris 2 1 Institute of Logic and Computation, TU Wien, Austria 2 School of Informatics, University of Edinburgh, UK Foundational

More information

The Sum-Product Theorem: A Foundation for Learning Tractable Models (Supplementary Material)

The Sum-Product Theorem: A Foundation for Learning Tractable Models (Supplementary Material) The Sum-Product Theorem: A Foundation for Learning Tractable Models (Supplementary Material) Abram L. Friesen AFRIESEN@CS.WASHINGTON.EDU Pedro Domingos PEDROD@CS.WASHINGTON.EDU Department of Computer Science

More information