Query Optimization: Exercise

Similar documents
Join Ordering. Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H :

First Lemma. Lemma: The cost function C h has the ASI-Property. Proof: The proof can be derived from the definition of C H : C H (AUVB) = C H (A)

Join Ordering. 3. Join Ordering

You are here! Query Processor. Recovery. Discussed here: DBMS. Task 3 is often called algebraic (or re-write) query optimization, while

Module 10: Query Optimization

Lecture 7: Dynamic Programming I: Optimal BSTs

Integer Programming. Wolfram Wiesemann. December 6, 2007

CSC 1700 Analysis of Algorithms: Warshall s and Floyd s algorithms

Tree Decompositions and Tree-Width

CS60007 Algorithm Design and Analysis 2018 Assignment 1

Cographs; chordal graphs and tree decompositions

Hybrid Fault diagnosis capability analysis of Hypercubes under the PMC model and MM model

Nearest Neighbor Search with Keywords

CS 151. Red Black Trees & Structural Induction. Thursday, November 1, 12

More on NP and Reductions

Algorithm Design and Analysis

Lecture 6 September 21, 2016

Data selection. Lower complexity bound for sorting

Dynamic Programming-Based Plan Generation

Environment (Parallelizing Query Optimization)

Lecture 2 September 4, 2014

Algorithm Analysis Divide and Conquer. Chung-Ang University, Jaesung Lee

Course Staff. Textbook

CS Data Structures and Algorithm Analysis

Sublinear Algorithms Lecture 3. Sofya Raskhodnikova Penn State University

Data Mining and Matrices

CS173 Lecture B, November 3, 2015

CS360 Homework 12 Solution

Left-Deep Processing Trees. for general join graphs and a quite complex cost function counting disk

Discrete Optimization 2010 Lecture 1 Introduction / Algorithms & Spanning Trees

Discrete Mathematics

CS281A/Stat241A Lecture 19

Outline. Computer Science 331. Cost of Binary Search Tree Operations. Bounds on Height: Worst- and Average-Case

CSE 202 Homework 4 Matthias Springer, A

Generating Random Variates II and Examples

Lecture: Workload Models (Advanced Topic)

13 Dynamic Programming (3) Optimal Binary Search Trees Subset Sums & Knapsacks

P P P NP-Hard: L is NP-hard if for all L NP, L L. Thus, if we could solve L in polynomial. Cook's Theorem and Reductions

Design and Analysis of Algorithms

3. Branching Algorithms COMP6741: Parameterized and Exact Computation

COSE312: Compilers. Lecture 17 Intermediate Representation (2)

Algorithms for pattern involvement in permutations

Classification of Join Ordering Problems

Combinatorial Optimization

Search and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints.

Sums of Products. Pasi Rastas November 15, 2005

Enumerate all possible assignments and take the An algorithm is a well-defined computational

Tree-width and algorithms

In-Class Soln 1. CS 361, Lecture 4. Today s Outline. In-Class Soln 2

Comp487/587 - Boolean Formulas

Lecture 4 Thursday Sep 11, 2014

Analysis of Algorithms. Outline. Single Source Shortest Path. Andres Mendez-Vazquez. November 9, Notes. Notes

Advanced linear programming

UNIVERSITY OF CALIFORNIA, RIVERSIDE

CS 395T Computational Learning Theory. Scribe: Mike Halcrow. x 4. x 2. x 6

Discrete Wiskunde II. Lecture 5: Shortest Paths & Spanning Trees

Trees. A tree is a graph which is. (a) Connected and. (b) has no cycles (acyclic).

Increasing the Span of Stars

Figure 1: Bayesian network for problem 1. P (A = t) = 0.3 (1) P (C = t) = 0.6 (2) Table 1: CPTs for problem 1. (c) P (E B) C P (D = t) f 0.9 t 0.

3. Branching Algorithms

Discrete Mathematics & Mathematical Reasoning Course Overview

Multiple-Site Distributed Spatial Query Optimization using Spatial Semijoins

Variable Elimination (VE) Barak Sternberg

CONVOLUTION TREES AND PASCAL-T TRIANGLES. JOHN C. TURNER University of Waikato, Hamilton, New Zealand (Submitted December 1986) 1.

Integer Programming ISE 418. Lecture 8. Dr. Ted Ralphs

Properties of Context-Free Languages

Chapter 3: Proving NP-completeness Results

Worksheet for Lecture 23 (due December 4) Section 6.1 Inner product, length, and orthogonality

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

(P ) Minimize 4x 1 + 6x 2 + 5x 3 s.t. 2x 1 3x 3 3 3x 2 2x 3 6

CS5371 Theory of Computation. Lecture 9: Automata Theory VII (Pumping Lemma, Non-CFL, DPDA PDA)

Lecture 2: Proof of Switching Lemma

Bayesian Networks: Independencies and Inference

Weighted Heuristic Anytime Search

CSCE 222 Discrete Structures for Computing. Review for Exam 2. Dr. Hyunyoung Lee !!!

Recoverable Robustness in Scheduling Problems

Minimal Spanning Tree From a Minimum Dominating Set

Discrete Optimization 2010 Lecture 2 Matroids & Shortest Paths

4 Packing T-joins and T-cuts

Web Ranking. Classification (manual, automatic) Link Analysis (today s lesson)

Binary Decision Diagrams

MVE165/MMG630, Applied Optimization Lecture 6 Integer linear programming: models and applications; complexity. Ann-Brith Strömberg

Optimal Color Range Reporting in One Dimension

Lecture 15 April 9, 2007

directed weighted graphs as flow networks the Ford-Fulkerson algorithm termination and running time

MAS210 Graph Theory Exercises 5 Solutions (1) v 5 (1)

Dynamic Programming on Trees. Example: Independent Set on T = (V, E) rooted at r V.

Disconnecting Networks via Node Deletions

Discrete Optimization 2010 Lecture 8 Lagrangian Relaxation / P, N P and co-n P

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

UNIVERSITY OF YORK. MSc Examinations 2004 MATHEMATICS Networks. Time Allowed: 3 hours.

Lecture 1 and 2: Random Spanning Trees

Mathematical Induction

Two Applications of Maximum Flow

Check off these skills when you feel that you have mastered them. Write in your own words the definition of a Hamiltonian circuit.

Algorithm Design and Analysis

A.REI.B.3: Solve Linear Equations and Inequalities in One Variable.

Fault-Tolerant Consensus

Lecture 1: September 25, A quick reminder about random variables and convexity

Transcription:

Query Optimization: Exercise Session 6 Bernhard Radke November 27, 2017

Maximum Value Precedence (MVP) [1]

Weighted Directed Join Graph (WDJG) Weighted Directed Join Graph (WDJG) 1000 0.05 R 1 0.005 R 2 100 0.001 Query graph to WDJG = (V, E p, E v ): nodes V = joins physical edges E p between adjacent joins (share one relation) virtual edges E v everywhere else 100 R 4 0.02 R 3 0 R(v): relations participating in join v Observation: every spanning tree in the WDJG leads to a join tree

Weighted Directed Join Graph (WDJG) 1000 R 1 0.005 R 2 100 v 12 v 23 0.05 0.001 R 4 0.02 R 3 0 v 14 v 34 100

Weighted Directed Join Graph (WDJG) Weights and Costs: edge weights: { u R(u) R(v) if (u, v) E p w u,v = 1 if (u, v) E v node costs: C out of the join 0 00 v 12 0.5 5 v 14 5 0.5 10 v 23 0.1 2 v 34 1000

Effective Spanning Tree (EST) Three conditions: 1. EST is binary Effective Spanning Tree (EST) 2. For every non-leaf node v i, for every edge v j v i there is a common base relation between v i and the subtree with the root v j 3. For every node v i = R S with two incoming edges v k v i and v j v i R or S can be present at most in one of the subtrees v k or v j unless the subtree v j (or v k ) contains both R and S

MVP - informally Construct an EST in two steps: MVP - informally Step 1 - Choose an edge to reduce the cost of an expensive operation Start with the most expensive node Find the incoming edge that reduces the cost the most Add the edge to the EST and check the conditions Update the WDJG Repeat until no edges can reduce costs anymore or no further nodes to consider Step 2 - Find edges causing minimum increase to the result of joins Similar to Step 1 Start with the cheapest node Find the incoming edge that increases the cost the least

Example Example We start with a graph without virtual edges. Two cost lists: 0 v 12 5 0.5 v 23 for the Step 1: Q 1 = v 14, v 34, v 12, v 23 0.5 5 0.1 2 for the Step 2: Q 2 = 00 v 14 10 v 34 1000

Example 0 v 12 0.5 5 5 v 23 0.5 0.1 2 00 3000 v 14 10 v 34 1000 Q 1 = v 14, v 34, v 12, v 23 ; Q 2 = Consider v 14, select the edge v 12 v 14 After v 12 is executed, R 1 R 2 = 0 Replace R 1 by R 1 R 2 in v 14 = R 1 R 4 : v 14 = (R 1 R 2 ) R 4 cost(v 14 ) = 0 100 0.05 + 0 = 3000

Example 0 v 12 0.5 v 23 v 12 0.1 2 3000 v 14 10 v 34 1000 v 14 Move v 12 v 14 to EST Update WDJG, remove edges v 14 v 12 and v 12 v 23, add edge v 14 v 23 Q 1 = v 14, v 34, v 12, v 23 ; Q 2 = Consider v 14, no more incoming edges with w < 1 Q 1 = v 34, v 12, v 23 ; Q 2 = v 14

Example 0 v 12 0.5 v 23 v 12 v 23 0.1 2 3000 v 14 10 v 34 1000 1 v 14 v 34 Q 1 = v 34, v 12, v 23 ; Q 2 = v 14 Consider v 34, select the edge v 23 v 34 Recompute cost: cost(v 34 ) = 100 0.02 + = 1 Move to EST, Update WDJG Q 1 = v 12, v 34, v 23 ; Q 2 = v 14

Example 0 v 12 v 23 v 12 v 23 3000 v 14 10 v 34 1 v 14 v 34 Q 1 = v 12, v 34, v 23 ; Q 2 = v 14 Consider v 12, no edges v 34, v 23 : no incoming edges with w < 1 Q 1 = ; Q 2 = v 23, v 34, v 12, v 14 End of Step 1

Example 0 v 12 v 23 v 12 v 23 3000 v 14 10 v 34 1 v 14 v 34 Q 2 = v 23, v 34, v 12, v 14 Consider v 23, edge v 14 v 23 Adding the edge would not violate EST conditions Add edge to EST Done.

Dynamic Programming Dynamic Programming

Dynamic Programming Overview Overview generate optimal join trees bottom up start from optimal join trees of size one (relations) build larger join trees by (re-)using optimal solutions for smaller sizes

Dynamic Programming Overview First Approach: DPsizeLinear [2] Enumerate increasing in size Generate linear trees by adding a single relation at a time Modifications/Extensions: enumerate in integer order generate bushy trees by considering all pairs of subproblems

Dynamic Programming Example Example

Dynamic Programming Example 10 R 2 0.5 R 3 10 0.1 0.6 0.2 10 R 1 R 4 10

Homework Enumerate connected-subgraph-complement pairs Query Simplification Reordering constraints for non-inner joins

Lecture Evaluation Remember to bring your laptop for the lecture evaluation next week

Info Slides and exercises: db.in.tum.de/teaching/ws1718/queryopt Send any questions, comments, solutions to exercises etc. to radke@in.tum.de Exercise due: 9 AM, December 4

References [1] C. Lee, C. Shih, and Y. Chen. Optimizing large join queries using A graph-based approach. IEEE Trans. Knowl. Data Eng., 13(2):298 315, 2001. [2] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, Boston, Massachusetts, May 30 - June 1., pages 23 34, 1979.