Window-aware Load Shedding for Aggregation Queries over Data Streams

Window-aware Load Shedding for Aggregation Queries over Data Streams Nesime Tatbul Stan Zdonik

Talk Outline Background Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental results Related work Conclusions and Future directions 2/26

Load Shedding in Aurora Aurora Query Network QoS. QoS. QoS Problem: When load > capacity, latency QoS degrades. Solution: Insert drop operators into the query plan. Result: Deliver approximate answers with low latency. 3/26

Subset-based based Approximation For all queries, the delivered tuple set must be a subset of the original query answer set. Maximum subset measure (e.g., [SIGMOD 03, VLDB 04] 04]) Loss-tolerance QoS of Aurora [VLDB 03] For each query, the number of consecutive result tuples missed (i.e., gap tolerance) ) must be below a certain threshold. 4/26

Why Subset Results? The application may expect to get correct values. Missing tuples get updated with more recent ones. Preserving subset guarantee anywhere in the query plan helps understand how the error propagates. It depends on the application. 5/26

Finite excerpts of a data stream Two parameters: size (ω)( ) and slide (δ)( Example: StockQuote(symbol, time, price) size = 10 min slide by 5 min Windows ( IBM, 10:00, 20) ( INTC, 10:00, 15) ( MSFT, 10:00, 22) ( IBM, 10:05, 18) ( MSFT, 10:05, 21) ( IBM, 10:10, 18) ( MSFT, 10:10, 20) ( IBM, 10:15, 20) ( INTC, 10:15, 20) ( MSFT, 10:15, 20).. 6/26

Windowed Aggregation Apply an aggregate function on the window Average, Sum, Count, Min, Max User-defined function Can have grouping, nesting, and sharing Example: Filter Aggregate Filter Aggregate Filter symbol= IBM ω = 5 min δ = 5 min diff = high-low diff > 5 ω = 60 min δ = 60 min count count > 0 7/26

Dropping from an Aggregation Query Tuple-based Approach.. 30 15 30 20 10 30 Average ω = 3 δ = 3.. 25 20 Drop before : non-subset result of nearly the same size.. 30 15 30 20 10 30 Drop p = 0.5.. 30 15 30 20 10 30 Average ω = 3 δ = 3.. 15 15 Drop after : subset result of smaller size.. 30 15 30 20 10 30 Average.. 25 20 Drop.. 25 20 ω = 3 p = 0.5 δ = 3 8/26

Dropping from an Aggregation Query Window-based Approach.. 30 15 30 20 10 30 Average ω = 3 δ = 3.. 25 20 Drop before : subset result of smaller size.. 30 15 30 20 10 30 Window Drop ω = 3, δ = 3 p = 0.5.. 30 15 30 20 10 30 Average ω = 3 δ = 3.. 25 20 window-aware load shedding 9/26

Functionality: Attach window keep/drop specifications into tuples. Discard tuples that can be early-dropped dropped. Key parameters: Window Drop Operator Window Specification -1 0 T (T > 0) Window Drop ω, δ, p, B Meaning Don t care. Drop the window. Keep the window. Keep all tuples with timestamp < T. ω: window size δ: window slide p: drop probability (one per group) B: batch size (one per group) 10/26

Window Drop for Nested Aggregation (Pipeline Arrangements) Window Drop ω, δ, B Aggregate 1 ω 1, δ 1 Aggregate n ω n, δ n B Example: ω 1 =3, δ 1 =2 ω 2 =3, δ 2 =3 1 ω=5, δ=3 1 2 3 4 5 6 7 8 9 10 11 1 3 5 7 9 4 7 10 1 2 3 4 5 6 7 8 9 10 11 1 4 7 10 n ω= ωi ( n 1) i= 1 δ = δn B = B 11/26

Window Drop for Shared Aggregation Example: ω 1 =3, δ 1 =3 extent = 0 ω 2 =3, δ 2 =2 extent = 1 ω=7, δ=6 Window Drop ω, δ, B (Fan-out Arrangements) common start every 6 units 1 4 7 10 1 2 3 4 5 6 7 8 9 10 11 max extent 1 3 5 7 9 1 2 3 4 5 6 7 8 9 10 11 1 7 12/26 Aggregate 1 ω 1, δ 1 Aggregate n ω n, δ n ω = lcm δ δ ( 1,..., n) n i= 1 B 1 B n { extent Ai } + max ( ) where extent( A ) δ = lcm( δ1,..., δ n) = ω δ i i i n Bi B = mini= 1 lcm( δ 1,..., δ n) / δi

Window Drop for Nesting + Sharing (Composite Arrangements) Apply the two basic rules in any order. Window Drop ω, δ, B Aggregate 0 ω 0, δ 0 Aggregate 1 ω 1, δ 1 Aggregate 2 ω 2, δ 2 Fan-out - Pipeline Pipeline - Fan-out B 1 B 2 ω 0 =4, δ 0 =1 ω=7, δ=6 ω 1 =3, δ 1 =3 ω=6, δ=3 ω 0 =4, δ 0 =1 ω 1 =3, δ 1 =3 ω 2 =3, δ 2 =2 ω=6, δ=2 ω 0 =4, δ 0 =1 ω 2 =3, δ 2 =2 ω=10, δ=6 ω 0 =4, δ 0 =1 ω 1 =3, δ 1 =3 ω 2 =3, δ 2 =2 ω=10, δ=6 ω 0 =4, δ 0 =1 ω 1 =3, δ 1 =3 ω 2 =3, δ 2 =2 13/26

Window Drop in Action Mark tuples & apply early drops at the Window Drop. drop keep.. 30 15 30 20 10 30 Window Drop.. 30 15 30 20 10 30.. 0-1 -1 4 ω = 3, δ = 3 p = 0.5 dropped fake Decode the marks at the Aggregate. skip open.. 30 20 10 30.. 0-1 -1 4 Average ω = 3, δ = 3.. 20.. 1 14/26

Fake Tuples Fake tuples carry no data, but only window specs. They arise when: the tuple content is not needed as the tuple only indicates a window drop the tuple to mark is originally missing, or filtered out during query processing Overhead is low. 15/26

Early Drops Sliding windows may overlap. Window drop can discard a tuple if and only if all of its windows are going to be dropped. At steady state, each tuple belongs to ~ ω/δ windows (i.e., for early drops, B ω/δ). If B < ω/δ,, then no need to push window drop further upstream than the first aggregate in the pipeline. 16/26

Background Talk Outline Load shedding in Aurora Windowed aggregation queries Window-aware load shedding Experimental results Related work Conclusions and Future directions 17/26

Setup Experiments Single-node node Borealis Streams: (time, value) pairs Aggregation queries with delays Basic measure: % total loss Goals Performance on nested and shared query plans Effect of window parameters Processing overhead 18/26

Performance for Nested Plans Window Drop Random Drop Delay 5ms Aggregate 10, 10 Delay 50ms Aggregate 100, 100 Delay 500ms 100 90 Window Drop Random Drop 80 70 60 % Loss 50 40 30 20 10 0 7 25 50 75 100 % Excess Rate 19/26

Performance for Shared Plans Window Drop Delay 5ms Aggregate 10, 10 Aggregate 20, 20 Random Drop Delay 50ms Delay 100ms 100 90 80 Window Drop Random Drop 70 % Total Loss 60 50 40 30 20 10 0 20 40 60 80 100 % Excess Rate 20/26

Effect of Window Slide Window Drop ω = 100, δ Delay 5ms Aggregate ω = 100, δ Delay 5ms 100 90 80 Excess Rate = 25% Excess Rate = 50% Excess Rate = 66% % Loss 70 60 50 40 30 20 10 0 1 2 10 100 Window Slide (δ) 21/26

Processing Overhead Window Drop ω, δ = ω Filter Delay 10ms Aggregate ω, δ = ω Throughput Ratio (Window-Drop (p=0) / No Window-Drop) Window size (ω) 25 50 75 100 Selectivity = 1.0 0.99 0.99 1.0 1.0 Selectivity = 0.5 0.96 0.98 0.98 1.0 22/26

Related Work Load shedding for aggregation queries Statistical approach of Babcock et al [ICDE 04] Punctuation-based stream processing Tucker et al [TKDE 03], Li et al [SIGMOD 05] Other load shedding work (examples) General: Aurora [VLDB 03], Data Triage [ICDE 05] On joins: Das et al [SIGMOD 03], STREAM [VLDB 04] With optimization: NiagaraCQ [ICDE 03, SIGMOD 04] Approximate query processing on aggregates Synopsis-based approaches, online aggregation [SIGMOD 97] 23/26

Conclusions We provide a Window-aware Load Shedding technique for a general class of aggregation queries over data streams with: sliding windows user-defined aggregates nesting, sharing, grouping Window Drop preserves window integrity, enabling: easy control of error propagation subset guarantee for query results early load reduction that minimizes error 24/26

Future Directions Prediction-based load shedding Can we guess the missing values? Statistical approaches vs. Subset-based based approaches Window-awareness on joins Memory-constrained environments Distributed environments 25/26

Questions? 26/26

Decoding Window Specifications Window Start Window Spec Keep Boundary To Do T * Open window. Set boundary = T (ω 1) Set spec = T (ω 1) 0-1 Within Open window. Set boundary = t + ω (if >) 0-1 Beyond Skip window. T * Set boundary = T (ω 1) Set spec = T (ω 1) Mark as fake tuple. 0 * Mark as fake tuple. -1 * Ignore. 27/26