Mario A. Nascimento. Univ. of Alberta, Canada http: //

Size: px

Start display at page:

Download "Mario A. Nascimento. Univ. of Alberta, Canada http: //"

Barry Smith
5 years ago
Views:

1 DATA CACHING IN W Mario A. Nascimento Univ. of Alberta, Canada http: // With R. Alencar and A. Brayner. Work partially supported by NSERC and CBIE (Canada) and CAPES (Brazil)

2 Outline Motivation Cache-Aware Query Processing Cache-Aware Query Optimization Query Partitioning 2 Cached Data Selection Cache Maintenance Experimental Results 2 /23

3 (One) Application Scenario User Satellite User User Base Station 3 W User 3 /23

4 Using Previous (Cached) Queries { P i } : Set of previous queries Q : Current query P2 Q : Q minus {P i } P 1 Q 4 Q P3 (a) (b) 4 /23

5 Query Partitioning Overhead BS BS (a) 5 (b) Query Processing: query is forwarded, locally flooded, results are collected and shipped back Query processing cost is estimated through an analytical cost-model 5 /23

6 Overall Architecture Current query User Answer Q D(Q) Cache Manager Q, P,!! Query Processor P, D(P ) D(!) Q P Q, P P,! 6 W Cache Index Base Station Query Optimizer Subset of relevant queries and sub-queries (min: query cost) Relevant Cached Queries Non-stale subset of P and its dataset 6 /23

7 Query Plan Problem (QSP) 7 Less larger sub-queries vs. more smaller sub-queries For obtaining Q we used the General Polygon Clipper library. For partitioning Q into the set of sub-queries Θ we used a O(v log v) algorithm which finds a sub-optimal solution (minimizing the number of sub-queries). 7 /23

8 B+B (Heuristic) Solution to QSP P = P For each node Q is clipped using a subset of P, a set of sub-queries is generated and its cost is obtained. The search stops at a local minimun. 8 P = P \ {P 1 } P = P \ {P 2 } P = P \ {P 3 } P = P \ {P 4 } P = P \ {P 2, P 1} P = P \ {P 2, P 4 } P = P \ {P 2, P 3} 8 /23

9 Other Heuristic Solutions to QSP In addition to the B+B we also used two more aggressive greedy heuristics: GrF (GrE) starts with all (no) cached queries removing (inserting) the smallest (largest) cached query as long as there is some gain. 9 P = P GrF path P = P \ {P 1 } P = P \ {P 2 } P = P \ {P 3 } P = P \ {P 4 } P = P \ {P 2, P 1} P = P \ {P 2, P 4 } P = P \ {P 2, P 3} 9 /23

10 Cache Maintenance Q P, D(P ) P, P,! Query Processor Cache Reader P \ P Cache 10 Updater Cache Manager (internals) Q P Q, P \ P, P \ P,! Cache Index 10 /23

11 Cache Maintenance P 1(dropped) P 2 (used) Data that can be used to refresh P s data 1 P 1,1 P1,2 P 2 11 P 3 Q Q (a) (b) (c) 11 /23

12 Losses wrt Optimal Solution Frequency [%] B+B GrF GrE 0 <1 (1-10] (20-30] (40-50] >100 (80-90] (60-70] Energy loss (range) wrt OPT [%] B+B is the Branch-and-Bound heuristic. GrF (GrE) is an aggressive greedy heuristic, starting with all (no) cache and removing (inserting) the smallest (largest) cached queries available as long as there is some gain. 12 /23

13 Gains wrt NOT Using Cache Frequency [%] (0-10] (20-30] 13 (40-50] B+B GrF GrE (60-70] (80-90] Energy savings (range) wrt no cache [%] By design GrE cannot be any worse that no using any cache. 13 /23

14 Gains wrt Using ALL Cache Frequency [%] (0-10] (40-50] B+B GrF GrE (20-30] (60-70] (80-90] Energy savings (range) wrt FC [%] By design GrF cannot be any worse that using all of the cache. 14 /23

15 Detailed results or skip to main conclusions? /23

16 Detailed results We investigate the performance of the proposed approach wrt efficiency (for finding the query plan) and effectiveness (cost of solution) when varying: Number of sensors Size of cache (number of cached queries) Query size (wrt total area) Validity time (of cached results) /23

17 Varying # of Sensors Energy cost loss wrt OPT [%] B+B FC GrF GrE Number of sensors (x 1,000) Number of states explored GrE GrF B+B OPT Number of sensors (x 1,000) 17 /23

18 Varying Cache Size Energy cost loss wrt OPT [%] B+B FC GrF GrE Cache size [# Queries] Number of states explored GrE GrF B+B OPT Cache size [# Queries] /23

19 Varying Query Size Energy cost loss wrt OPT [%] B+B FC GrF GrE Query size [% of total area] Number of explored states GrE GrF B+B OPT Query size [% total area] /23

20 Varying Query Validity Time 12 Energy cost loss wrt OPT [%] B+B FC GrF GrE Validity time [number of timestamps] 20 Number of states explored GrE GrF B+B OPT Validity time [# timestamps] 20 /23

21 Conclusions The cached query selection, query clipping and subqueries generation amounts to a fairly complex and combinatorial problem 21 Although a query cost model is needed, our proposal is orthogonal to it If nothing can be done your best shot is to use all of the cache, but 21 /23

22 Conclusions The Branch-and-Bound heuristic : Finds a query plan orders of magnitude faster than the exhaustive search Is typically less than 2% more expensive than the optimal query cost 22 Is robust with respect to a number of different parameters Next stop: Aggregation queries 22 /23

23 Thanks /23

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing Spring 2016 Notes 4: Query Optimization Query Optimization Cost estimation Strategies for exploring plans Q min CS 347 Notes 4 2 Cost Estimation Based on