Join Ordering. Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H :

Similar documents
First Lemma. Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H : C H (AUVB) = C H (A)

Join Ordering. 3. Join Ordering

Classification of Join Ordering Problems

Query Optimization: Exercise

Dynamic Programming-Based Plan Generation

1 Basic Definitions. 2 Proof By Contradiction. 3 Exchange Argument

Lecture 7: Dynamic Programming I: Optimal BSTs

CSE 591 Foundations of Algorithms Homework 4 Sample Solution Outlines. Problem 1

Greedy Algorithms My T. UF

On the Complexity of Partitioning Graphs for Arc-Flags

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Left-Deep Processing Trees. for general join graphs and a quite complex cost function counting disk

Assignment 5: Solutions

Efficient Reassembling of Graphs, Part 1: The Linear Case

CS 161: Design and Analysis of Algorithms

The Simplest Construction of Expanders

Variable Elimination (VE) Barak Sternberg

Proof Techniques (Review of Math 271)

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17

CS60007 Algorithm Design and Analysis 2018 Assignment 1

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

CS473 - Algorithms I

Santa Claus Schedules Jobs on Unrelated Machines

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16

1 Some loose ends from last time

Optimal Tree-decomposition Balancing and Reachability on Low Treewidth Graphs

Introduction to Mathematical Programming IE406. Lecture 21. Dr. Ted Ralphs

Design of Distributed Systems Melinda Tóth, Zoltán Horváth

Flow Algorithms for Parallel Query Optimization

You are here! Query Processor. Recovery. Discussed here: DBMS. Task 3 is often called algebraic (or re-write) query optimization, while

Lecture 5: Splay Trees

CS Data Structures and Algorithm Analysis

CSE 202 Homework 4 Matthias Springer, A

Alpha-Beta Pruning: Algorithm and Analysis

Chapter 5 Data Structures Algorithm Theory WS 2017/18 Fabian Kuhn

1 Approximate Quantiles and Summaries

3.4 Relaxations and bounds

Proof of Theorem 1. Tao Lei CSAIL,MIT. Here we give the proofs of Theorem 1 and other necessary lemmas or corollaries.

Branch-and-Bound for the Travelling Salesman Problem

Algorithms for pattern involvement in permutations

Search and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012

NP-Completeness Review

arxiv: v1 [cs.ds] 29 Aug 2015

Optimisation and Operations Research

Recoverable Robustness in Scheduling Problems

Dominating Set Counting in Graph Classes

Module 10: Query Optimization

Discrete Mathematical Structures: Theory and Applications

CS 161 Summer 2009 Homework #2 Sample Solutions

The minimum G c cut problem

Evolutionary Tree Analysis. Overview

where X is the feasible region, i.e., the set of the feasible solutions.

Relations. We have seen several types of abstract, mathematical objects, including propositions, predicates, sets, and ordered pairs and tuples.

Chapter 3: Homology Groups Topics in Computational Topology: An Algorithmic View

On improving matchings in trees, via bounded-length augmentations 1

Yet Harder Knapsack Problems

Sums of Products. Pasi Rastas November 15, 2005

CSC236H Lecture 2. Ilir Dema. September 19, 2018

Cardinality Networks: a Theoretical and Empirical Study

Algorithms and Theory of Computation. Lecture 9: Dynamic Programming

ACO Comprehensive Exam October 14 and 15, 2013

More Dynamic Programming

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms

Classical Complexity and Fixed-Parameter Tractability of Simultaneous Consecutive Ones Submatrix & Editing Problems

More Dynamic Programming

Exercises 1 - Solutions

Intrinsic products and factorizations of matrices

Greedy Alg: Huffman abhi shelat

arxiv: v1 [cs.ds] 20 Nov 2016

CS 4407 Algorithms Lecture: Shortest Path Algorithms

Algorithmic Approach to Counting of Certain Types m-ary Partitions

ICS 252 Introduction to Computer Design

CMPSCI611: The Matroid Theorem Lecture 5

Logical Provenance in Data-Oriented Workflows (Long Version)

Aside: Golden Ratio. Golden Ratio: A universal law. Golden ratio φ = lim n = 1+ b n = a n 1. a n+1 = a n + b n, a n+b n a n

1 Non-deterministic Turing Machine

CMPUT 675: Approximation Algorithms Fall 2014

Math 5707: Graph Theory, Spring 2017 Midterm 3

Appendix of Computational Protein Design Using AND/OR Branch and Bound Search

Machine Learning 4771

Data Structure. Mohsen Arab. January 13, Yazd University. Mohsen Arab (Yazd University ) Data Structure January 13, / 86

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Lecture 1 : Probabilistic Method

Discrete Optimization 2010 Lecture 2 Matroids & Shortest Paths

CS 4100 // artificial intelligence. Recap/midterm review!

Easy Shortcut Definitions

Tree sets. Reinhard Diestel

A General Lower Bound on the I/O-Complexity of Comparison-based Algorithms

CS4311 Design and Analysis of Algorithms. Analysis of Union-By-Rank with Path Compression

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

arxiv: v1 [math.co] 22 Jan 2013

APPROXIMATION ALGORITHMS RESOLUTION OF SELECTED PROBLEMS 1

Analysis of Algorithms I: Asymptotic Notation, Induction, and MergeSort

15.1 Matching, Components, and Edge cover (Collaborate with Xin Yu)

Data Mining and Matrices

AUTOMORPHISM GROUPS AND SPECTRA OF CIRCULANT GRAPHS

NP-COMPLETE PROBLEMS. 1. Characterizing NP. Proof

1 Matchings in Non-Bipartite Graphs

Covering Linear Orders with Posets

Transcription:

IKKBZ First Lemma Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H : and, hence, C H (AUVB) = C H (A) +T (A)C H (U) +T (A)T (U)C H (V ) +T (A)T (U)T (V )C H (B) C H (AUVB) C H (AVUB) = T (A)[C H (V )(T (U) 1) C H (U)(T (V ) 1 The lemma follows. = T (A)C H (U)C H (V )[rank(u) rank(v )] 115 / 336

IKKBZ Module Let M = {A 1,..., A n } be a set of sequences of nodes in a given precedence graph. Then, M is called a module, if for all sequences B that do not overlap with the sequences in M, one of the following conditions holds: B A i, A i M A i B, A i M B A i and A i B, A i M 116 / 336

IKKBZ Second Lemma Lemma: Let C be any cost function with the ASI property and {A, B} a module. If A B and additional rank(b) rank(a), then we find an optimal sequence among those in which B directly follows A. Proof: by contradiction. Every optimal permutation must have the form UAVBW since A B. Assumption: V ɛ for all optimal solutions. if rank(v ) rank(a), we can exchange V and A without increasing the costs. if rank(a) rank(v ), rank(b) rank(v ) due to the transitivity of. Hence, we can exchange B and V without increasing the costs. Both exchanges produces legal sequences since {A, B} is a module. 117 / 336

IKKBZ Contradictory Sequences and Compound Relations if the precedence graph demands A B but rank(b) rank(a), we speak of contradictory sequences A and B second lemma no non-empty subsequence can occur between A and B we combine A and B into a new single node replacing A and B this nodes represents a compound relation comprising of all relations in A and B its cardinality is computed by multiplying the cardinalities of all relations in A and B its selectivity is the product of all selectivities s i of relations R i contained in A and B 118 / 336

IKKBZ Normalization and Denormalization the continued process of building compound relations until no more contradictory sequences exist is called normalization the opposite step, replacing a compound relation by the sequence of relations it was derived from is called denormalization 119 / 336

IKKBZ Algorithm IKKBZ(G, C H ) Input: an acyclic query graph G for relations R = {R 1,..., R n }, a cost function C H Output:the optimal left-deep tree S = for R i R { G i = the precedence graph derived from G rooted at R i S i = IKKBZ-Sub(G i,c H ) S = S {S i } } return arg min Si S C H (S i ) considers each relation as starting relation constructs the precedence graph and starts the main algorithm 120 / 336

IKKBZ Algorithm (2) IKKBZ-Sub(G i, C H ) Input: a precedence graph G i for relations R = {R 1,..., R n } rooted at R i, a cost function C H Output:the optimal left-deep tree under G i while G i is not a chain { r = a subtree of G i whose subtrees are chains IKKBZ-Normalize(r) merge the chains under r according to the rank function (ascending) } IKKBZ-Denormalize(G i ) return G i transforms the precedence graph into a chain wherever there are multiple choices, there are serialized according to the rank normalization required to preserve the query graph 121 / 336

IKKBZ Algorithm (3) IKKBZ-Normalize(R) Input: a subtree R of a precedence graph G = (V, E) Output:a normalized subtree while r, c T, (r, c) E : rank(r) > rank(c) { replace r and c by a compound relation r that represent rc } return R merges relations that would have been reorder if only considering the rank guarantees that the rank is ascending in each subchain 122 / 336

IKKBZ Algorithm (4) IKKBZ-Denormalize(R) Input: a precedence graph R containing relations and compound relations Output:a denormalized precedence graph, containing only relations while r R : r is a compound relation { replace r by the sequence of relations it represents } return R unpacks the compound relations required to get a real join tree as final result 123 / 336

IKKBZ Sample Algorithm Execution 100 R 2 R 3 1 2 R 1 1 4 10 100 2 5 1 3 R 4 1 2 10 R 5 R 6 1 100 10 20 10 R 7 R 1 R 2 R 4 R 3 24 25 R 5 R 6 Input: query graph Step 1: precedence graph for R 1 the precedence graph includes the ranks 49 50 5 6 9 10 19 20 R 7 4 5 1 2 124 / 336

IKKBZ Sample Algorithm Execution (2) R 1 R 2 R 4 49 50 R 3 24 25 R 5 R 6 5 6 9 10 19 20 R 7 4 5 1 2 Step 1: precedence graph for R 1 R 2 49 50 R 1 R 3 24 25 9 10 R 5 5 6 R 4 19 20 9 R 6,7 10 Step 2: normalization rank(r 6 ) > rank(r 7 ), but R 6 R 7 125 / 336

IKKBZ Sample Algorithm Execution (3) R 2 49 50 R 1 24 25 9 10 R 3 49 24 R 2 R 3 R 4 R 5 5 6 R 4 19 20 9 R 6,7 10 Step 2: normalization 50 R 1 25 9 10 R 5 19 20 5 6 R 6,7 9 10 Step 3: merging subchains rank(r 5 ) < rank(r 6,7 ) 126 / 336

IKKBZ Sample Algorithm Execution (3) R 1 9 10 R 2 R 4 49 50 R 3 24 25 R 5 19 20 5 6 R 6,7 9 10 Step 3: merging subchains R 2 49 50 R 1 R 3 24 25 9 10 119 R 4,5 120 R 6,7 9 10 Step 4: normalization rank(r 4 ) > rank(r 5 ), but R 4 R 5 127 / 336

IKKBZ Sample Algorithm Execution (4) R 2 49 50 R 1 R 3 24 25 9 10 119 R 4,5 120 R 6,7 9 10 Step 4: normalization R 2 49 50 R 1 R 3 24 25 9 10 1119 R 4,5,6,7 1200 Step 5: normalization rank(r 4,5 ) > rank(r 6,7 ), but R 4,5 R 6,7 128 / 336

IKKBZ Sample Algorithm Execution (5) R 1 9 10 R 2 49 50 R 1 R 3 24 25 9 10 1119 R 4,5,6,7 1200 Step 5: normalization R 3 R 2 24 25 49 50 R 4,5,6,7 1119 1200 Step 6: merging subchains rank(r 3 ) < rank(r 2 ) < rank(r 4,5,6,7 ) 129 / 336

IKKBZ Sample Algorithm Execution (6) R 1 R 3 R 1 R 3 R 2 9 10 24 25 49 50 R 2 R 4 R 5 R 6 R 4,5,6,7 1119 1200 Step 6: merging subchains R 7 Step 7: denormalization Algorithm has to continue for all other root relations. 130 / 336

MVP Maximum Value Precedence Algorithm greedy heuristics can produce poor results IKKBZ only support acyclic queries and ASI cost functions Maximum Value Precedence (MVP) [4] algorithm is a polynomial time heuristic with good results considers join ordering a graph theoretic problem 131 / 336

MVP Directed Join Graph Given a conjunctive query with predicates P. for all join predicates p P, we denote by R(p) the relations whose attributes are mentioned in p. the directed join graph of the query is a triple G = (V, E p, E v ), where V is the set of predicates and E p and E v are sets of directed edges defined as follows for any nodes u, v V, if R(u) R(v) then (u, v) E p and (v, u) E p if R(u) R(v) = then (u, v) E v and (v, u) E v edges in E p are called physical edges, those in E v virtual edges Note: all nodes u, v there is an edge (u, v) that is either physical or virtual. Hence, G is a clique. 132 / 336

MVP Examples: Spanning Tree and Join Tree every spanning tree in the directed join graph leads to a join tree p 2,3 R 1 R 2 R 3 R 4 p 1,2 p 3,4 query graph directed join graph p 2,3 R 4 R 3 p 1,2 p 3,4 spanning tree I R 1 R 2 join tree I 133 / 336

MVP Examples: Spanning Tree and Join Tree (2) p 2,3 R 1 R 2 R 3 R 4 p 1,2 p 3,4 query graph directed join graph p 2,3 p 1,2 p 3,4 R 1 R 2 R 3 R 4 spanning tree II join tree II 134 / 336

MVP Examples: Spanning Tree and Join Tree (3) p 2,3 p 3,4 p 1,2 R 1 R 2 R 3 R 4 R 5 p 4,5 query graph directed join graph p 2,3 p 3,4 R 5 p 1,2 p 4,5 R 2 R 3 R 3 R 4 R 1 R 2 spanning tree III join tree III (?) spanning tree does not correspond to a (effective) join tree! 135 / 336

MVP Effective Spanning Trees It can be shown that a spanning tree T = (V, E) is effective, it is satisfies the following conditions: 1. T is a binary tree 2. for all inner nodes v and nodes u with (u, v) E: R(T (u))) R(v) 3. for all nodes v, u 1, u 2 with u 1 u 2, (u 1, v) E and (u 2, v) E one of the following conditions holds: 3.1 ((R(T (u 1 )) R(v)) (R(T (u 2 )) R(v))) = or 3.2 (R(T (u 1 )) = R(v)) (R(T (u 2 )) = R(v)) We denote by T (v) the partial tree rooted at v. 136 / 336

MVP Adding Weights to the Edges For two nodes v, u V we define u v = R(u) R(v) for simplicity, we assume that every predicate involves exactly two relations then for all u, v V, a v contains a single relation (or none) Let v V be a node with R(v) = {R i, R j } we abbreviate R i v R j by v Using these notations, we can attach weights to the edges to define the weighted directed join graph. 137 / 336

MVP Adding Weights to the Edges (2) Let G = (V, E p, E v ) be a directed join graph for a conjunctive query with join predicates P. The weighted directed join graph is derived from G by attaching a weight to each edge as follows: Let (u, v) E p be a physical edge. The weight w u,v of (u, v) is defined as w u,v = u u v For virtual edges (u, v) E v, we define w u,v = 1 Note that w u,v is not symmetric. 138 / 336

MVP Remark on Edge Weights The weights of physical edges are equal to the s i used in the IKKBZ-Algorithm. Assume R(u) = {R 1, R 2 }, R(v) = {R 2, R 3 }. Then w u,v = u u v = R 1 R 2 R 2 = f 1,2 R 1 R 2 R 2 = f 1,2 R 1 Hence, if the join R 1 u R 2 is executed before the join R 2 v R 3, the input size to the latter join changes by a factor of w u,v 139 / 336

MVP Adding Weights to the Nodes the weight of a node reflects the change in cardinality to be expected when certain other joins have been executed before it depends on a (partial) spanning tree S Given S, we denote by S pi,j the result of the join p i,j if all joins preceding p i,j in S have been executed. Then the weight attached to node p i,j is defined as w(p i,j, S) = S pi,j R i p i,j R j For empty sequences we define w(p i,j, ɛ) = R i p i,j R j. Similarly, we define the cost of a node p i,j depending on other joins preceding it in some given spanning tree S. We denote this by C(p i,j, S). the actual cost function can be chosen arbitrarily if we have several join implementations: take the minimum 140 / 336

MVP Algorithm Overview The algorithm builds an effective spanning tree in two phases: 1. it takes those edges with a weight < 1 2. it adds the remaining edges keeping track of effectiveness during the process. rational: weight < 1 is good decreases the work for later operators should be done early increasing intermediate results as late as possible 141 / 336

MVP MVP Algorithm MVP(G) Input: a weighted directed join graph G = (V, E p, E v ) Output:an effective spanning tree Q 1 = a priority queue for nodes, smallest w first Q 2 = a priority queue for nodes, largest w first insert all nodes in V to Q 1 G = (V, E ) with V = V and E = E p // working graph S = (V S, E s ) with V s = V and E s = // result MVP-Phase1(G, G, S, Q 1, Q 2 ) MVP-Phase2(G, G, S, Q 1, Q 2 ) return S 142 / 336

MVP MVP Algorithm (2) MVP-Phase1(G, G, S, Q 1, Q 2 ) Input: state from MVP Output:modifies the state while Q 1 > 0 E s < V 1 { v = head of Q 1 U = {u (u, v) E w u,v < 1 (V, E S {(u, v)}) is acyclic and effective} if U = { Q 1 = Q 1 \ {v} Q 2 = Q 2 {v} } else { u = arg max u U C( v, S) C( v, (V, E S {(u, v)})) MVPUpdate(G, G, S, (u, v)) recompute w for v and its ancestors } } 143 / 336

MVP MVP Algorithm (3) MVP-Phase2(G, G, S, Q 1, Q 2 ) Input: state from MVP Output:modifies the state while Q 2 > 0 E s < V 1 { v = head of Q 2 U = {(x, y) (x, y) E (x = v y = v) (V, E S {(x, y)}) is acyclic and effective} (x, y) = arg min (x,y) U C( v, (V, E S {(x, y)})) C( v, S) MVPUpdate(G, G, S, (x, y)) recompute w for y and its ancestors } 144 / 336

MVP MVP Algorithm (4) MVPUpdate(G, G, S, (u, v)) Input: state from MVP, an edge to be added to S Output:modifies the state E S = E S {(u, v)} E = E \ {(u, v), (v, u)} E = E \ {(u, w) (u, w) E } E = E {(v, w) (u, w) E p, (v, w) E v } if v has two incoming edges in S { E = E \ {(w, v) (w, v) E } } if v has one outflowing edge in S { E = E \ {(v, w) (v, w) E } } checks that S is a tree (one parent, at most two children) detects transitive physical edges 145 / 336

Dynamic Programming Dynamic Programming Basic premise: optimality principle avoid duplicate work A very generic class of approaches: all cost functions (as long as optimality principle holds) left-deep/bushy, with/without cross products finds the optimal solution Concrete algorithms can be more specialized of course. 146 / 336

Dynamic Programming Optimality Principle Consider the two joins trees (((R 1 R 2 ) R 3 ) R 4 ) R 5 and (((R 3 R 1 ) R 2 ) R 4 ) R 5 if we know that ((R 1 R 2 ) R 3 ) is cheaper than ((R 3 R 1 ) R 2 ), we know that the first join is cheaper than the second join hence, we could avoid generating the second alternative and still won t miss the optimal join tree 147 / 336

Dynamic Programming Optimality Principle (2) More formally, the optimality for join ordering: Let T be an optimal join tree for relations R 1,..., R n. Then, every subtree S of T must be an optimal join tree for the relations contained in it. optimal substructure: the optimal solution for a problem can be constructed from optimal solutions to its subproblems not true with physical properties (but can be fixed) 148 / 336

Dynamic Programming Overview Dynamic Programming Strategy generate optimal join trees bottom up start from optimal join trees of size one (relations) build larger join trees by (re-)using those of smaller sizes To keep the algorithms concise, we use a subroutine CreateJoinTree that joins two trees. 149 / 336

Dynamic Programming Creating Join Trees CreateJoinTree(T 1,T 2 ) Input: two (optimal) join trees T 1, T 2 for linear trees: assume that T 2 is a single relation Output:an (optimal) join tree for T 1 T 2 B = for impl { applicable join implementations } { if right-deep only { B = B {T 1 impl T 2 } } if left-deep only { B = B {T 2 impl T 1 } } } return arg min T B C(T ) 150 / 336

Dynamic Programming Search Space with Sharing under Optimality Principle {R 1, R 2, R 3, R 4 } {R 1, R 2, R 4 } {R 1, R 2, R 3 } {R 1, R 3, R 4 } {R 2, R 3, R 4 } {R 1, R 4 } {R 1, R 3 } {R 1, R 2 } {R 2, R 3 } {R 2, R 4 } {R 3, R 4 } R 1 R 2 R 3 R 4 151 / 336

Dynamic Programming Generating Linear Trees a (left-deep) linear tree T with T > 1 has the form T R i, with T = T + 1 if T is optimal, T must be optimal too basic strategy: find the optimal T by joining all optimal T with T \ T enumeration order varies between algorithms 152 / 336

Dynamic Programming Generating Linear Trees (2) DPsizeLinear(R) Input: a set of relations R = {R 1,..., R n } to be joined Output:an optimal left-deep (right-deep, zig-zag) join tree B = an empty DP table 2 R join tree for R i R B[{R i }] = R i for 1 < s n ascending { for S R, R i R : S = s 1 R i S { if cross products S connected to R i continue p 1 = B[S], p 2 = B[{R i }] if p 1 = ɛ continue P = CreateJoinTree(p 1, p 2 ); if B[S {R i }] = ɛ C(B[S {R i }]) > C(P) B[S {R i }] = P } } return B[{R 1,..., R n }] 153 / 336

Dynamic Programming Order in which Subtrees are generated The ordering in which subtrees are generated does not matter as long as the following condition is not violated: Let S be a subset of {R 1,..., R n }. Then, before a join tree for S can be generated, the join trees for all relevant subsets of S must already be available. relevant means that they are valid subproblems by the algorithm usually this means connected (no cross products) 154 / 336

Dynamic Programming Generation in Integer Order 000 {} 001 {R 1 } 010 {R 2 } 011 {R 1, R 2 } 100 {R 3 } 101 {R 1, R 3 } 110 {R 2, R 3 } 111 {R 1, R 2, R 3 } can be done very efficiently set representation is just a number 155 / 336

Dynamic Programming Generating Linear Trees (3) DPsubLinear(R) Input: a set of relations R = {R 1,..., R n } to be joined Output:an optimal left-deep (right-deep, zig-zag) join tree B = an empty DP table 2 R join tree for R i R B[{R i }] = R i for 1 < i 2 n 1 ascending { S = {R j R ( i/2 j 1 mod 2) = 1} for R j S { if cross products S \ {R j } connected to R j continue p 1 = B[S \ {R j }], p 2 = B[{R j }] if p 1 = ɛ continue P = CreateJoinTree(p 1, p 2 ); if B[S] = ɛ C(B[S]) > C(P) B[S] = P } } return B[{R 1,..., R n }] 156 / 336

Dynamic Programming Generating Bushy Trees a bushy tree T with T > 1 has the form T 1 T 2, with T = T 1 + T 2 if T is optimal, both T 1 and T 2 must be optimal too basic strategy: find the optimal T by joining all pairs of optimal T 1 and T 2 157 / 336

Dynamic Programming Generating Bushy Trees (2) DPsize(R) Input: a set of relations R = {R 1,..., R n } to be joined Output:an optimal bushy join tree B = an empty DP table 2 R join tree for R i R B[{R i }] = R i for 1 < s n ascending { for S 1, S 2 R : S 1 + S 2 = s { if ( cross products S 1 connected to S 2 ) (S 1 S 2 ) continue p 1 = B[S 1 ], p 2 = B[S 2 ] if p 1 = ɛ p 2 = ɛ continue P = CreateJoinTree(p 1, p 2 ); if B[S 1 S 2 ] = ɛ C(B[S 1 S 2 ]) > C(P) B[S 1 S 2 ] = P } } return B[{R 1,..., R n }] 158 / 336

Dynamic Programming Generating Bushy Trees (3) DPsub(R) Input: a set of relations R = {R 1,..., R n } to be joined Output:an optimal bushy join tree B = an empty DP table 2 R join tree for R i R B[{R i }] = R i for 1 < i 2 n 1 ascending { S = {R j R ( i/2 j 1 mod 2) = 1} for S 1 S, S 2 = S \ S 1 { if cross products S 1 connected to S 2 continue p 1 = B[S 1 ], p 2 = B[S 2 ] if p 1 = ɛ p 2 = ɛ continue P = CreateJoinTree(p 1, p 2 ); if B[S] = ɛ C(B[S]) > C(P) B[S] = P } } return B[{R 1,..., R n }] 159 / 336

Dynamic Programming Efficient Subset Generation If we use integers as set representation, we can enumerate all subsets of S as follows: S 1 = S&( S) do { S 2 = S S 1 // Do something with S 1 and S 2 S 1 = S&(S 1 S) } while (S 1! = S) enumerates all subsets except and S itself very fast 160 / 336

Dynamic Programming Remarks DPsize/DPsizeLinear does not really test for p 1 = ɛ it keeps a list of plans for a given size candidates can be found very fast ensures polynomial time in some cases (we will look at it again) DPsub/DPsubLinear is faster if the problem is not polynomial, though 161 / 336

Dynamic Programming Memoization top-down formulation of dynamic programming recursive generation of join trees memoize already generated join trees to avoid duplicate work easier code sometimes more efficient (more knowledge, allows for pruning) but usually slower than dynamic programming 162 / 336

Dynamic Programming Memoization (2) Memoization(R) Input: a set of relations R = {R 1,..., R n } to be joined Output:an optimal bushy join tree B = an empty DP table 2 R join tree for R i R B[{R i }] = R i MemoizationRec(B, R) return B[{R 1,..., R n }] initializes the DP table and triggers the recursive search main work done during recursion 163 / 336

Dynamic Programming Memoization (3) MemoizationRec(B,S) Input: a DP table B and a set of relations S to be joined Output:an optimal bushy join tree for the subproblem if B[S] = ɛ { for S 1 S, S 2 = S \ S p 1 =MemoizationRec(B, S 1 ), p 2 =MemoizationRec(B, S 2 ) P=CreateJoinTree(p 1, p 2 ) if B[S] = ɛ C(B[S]) > C(P) B[S] = P } } return B[S] checks for connectedness omitted 164 / 336