Approximate Query Processing Using Wavelets

Similar documents
Wavelets. Lecture 28

Information Management course

Wavelets for Efficient Querying of Large Multidimensional Datasets

Noise & Data Reduction

Singular Value Decompsition

Wavelet decomposition of data streams. by Dragana Veljkovic

Image Compression. 1. Introduction. Greg Ames Dec 07, 2002

Wavelets and Multiresolution Processing

Image Compression Using the Haar Wavelet Transform

A First Course in Wavelets with Fourier Analysis

Let p 2 ( t), (2 t k), we have the scaling relation,

CS 347 Parallel and Distributed Data Processing

Discrete Wavelet Transform

On the Optimality of the Greedy Heuristic in Wavelet Synopses for Range Queries

Wavelets For Computer Graphics

Scientific Computing: An Introductory Survey

Contents. Acknowledgments

Wavelet Transform. Figure 1: Non stationary signal f(t) = sin(100 t 2 ).

Sparse linear models

Module 4 MULTI- RESOLUTION ANALYSIS. Version 2 ECE IIT, Kharagpur

Lecture Notes 5: Multiresolution Analysis

1 Approximate Quantiles and Summaries

An Introduction to Wavelets

( nonlinear constraints)

Improving Retrieval Cost by Choosing the Best Wavelet Decomposition for Multidimensional Datasets

Analysis of Fractals, Image Compression and Entropy Encoding

Pulse characterization with Wavelet transforms combined with classification using binary arrays

1 Introduction to Wavelet Analysis

Linear Algebra and Eigenproblems

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases. Previous Lecture. 4.1 Multiresolution Analysis. 4 Shape-based Features. 4.1 Multiresolution Analysis

Wavelets in Scattering Calculations

Image Compression by Using Haar Wavelet Transform and Singular Value Decomposition

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Revised Simplex Method

Correlated subqueries. Query Optimization. Magic decorrelation. COUNT bug. Magic example (slide 2) Magic example (slide 1)

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

The Haar Wavelet Transform: Compression and Reconstruction

The Structure of Digital Imaging: The Haar Wavelet Transform. Wavelet Transformation and Image Compression

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

Multimedia Databases. 4 Shape-based Features. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis. 4.1 Multiresolution Analysis

Wavelets in Image Compression

Module 7:Data Representation Lecture 35: Wavelets. The Lecture Contains: Wavelets. Discrete Wavelet Transform (DWT) Haar wavelets: Example

Learning goals: students learn to use the SVD to find good approximations to matrices and to compute the pseudoinverse.

MULTIRATE DIGITAL SIGNAL PROCESSING

Wavelets Marialuce Graziadei

SHIFT-SPLIT: I/O Efficient Maintenance of Wavelet-Transformed Multidimensional Data

Wavelets in Pattern Recognition

An Introduction to Wavelets and some Applications

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics

Sparse linear models and denoising

Introduction to Data Mining

The Haar Wavelet Transform: Compression and. Reconstruction

Wavelet Filter Transforms in Detail

Wavelets and Image Compression. Bradley J. Lucier

Introduction to statistics

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides

CSE 344 AUGUST 6 TH LOSS AND VIEWS

Multiresolution Analysis

Topics in Probabilistic and Statistical Databases. Lecture 9: Histograms and Sampling. Dan Suciu University of Washington

Niklas Grip, Department of Mathematics, Luleå University of Technology. Last update:

Proyecto final de carrera

Introduction to Wavelets and Wavelet Transforms

The New Graphic Description of the Haar Wavelet Transform

Nontechnical introduction to wavelets Continuous wavelet transforms Fourier versus wavelets - examples

CSE 5243 INTRO. TO DATA MINING

Jun Zhang Department of Computer Science University of Kentucky

POINT VALUES AND NORMALIZATION OF TWO-DIRECTION MULTIWAVELETS AND THEIR DERIVATIVES

Schema Refinement & Normalization Theory: Functional Dependencies INFS-614 INFS614, GMU 1

CS6931 Database Seminar. Lecture 6: Set Operations on Massive Data

2.3. Clustering or vector quantization 57

Matrices and systems of linear equations

A Review of Matrix Analysis

WAVELET EXPANSIONS IN VOLUME INTEGRAL METHOD OF EDDY-CURRENT MODELING

Application of Wavelets to N body Particle In Cell Simulations

CS60021: Scalable Data Mining. Dimensionality Reduction

DM545 Linear and Integer Programming. Lecture 7 Revised Simplex Method. Marco Chiarandini

Processing Aggregate Queries over Continuous Data Streams

Linear Programming The Simplex Algorithm: Part II Chapter 5

Similarity Join Size Estimation using Locality Sensitive Hashing

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Wavelet analysis on financial time series. By Arlington Fonseca Lemus. Tutor Hugo Eduardo Ramirez Jaime

A linear algebra approach for. Supply Chain Management

From Fourier to Wavelets in 60 Slides

CS122A: Introduction to Data Management. Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li

4.1 Haar Wavelets. Haar Wavelet. The Haar Scaling Function

Linear Algebra V = T = ( 4 3 ).

Toward a Realization of Marr s Theory of Primal Sketches via Autocorrelation Wavelets: Image Representation using Multiscale Edge Information

Introduction. Normalization. Example. Redundancy. What problems are caused by redundancy? What are functional dependencies?

Sparse Recovery of Streaming Signals Using. M. Salman Asif and Justin Romberg. Abstract

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Multiscale Image Transforms

CSE 562 Database Systems

Multiresolution analysis & wavelets (quick tutorial)

Matrices, Vector Spaces, and Information Retrieval

Entropy Encoding Using Karhunen-Loève Transform

Histograms and Wavelets on Probabilistic Data

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

A STUDY ON INCREMENTAL OBJECT-ORIENTED MODEL AND ITS SUPPORTING ENVIRONMENT FOR CARTOGRAPHIC GENERALIZATION IN MULTI-SCALE SPATIAL DATABASE

Transcription:

Approximate Query Processing Using Wavelets Kaushik Chakrabarti Minos Garofalakis Rajeev Rastogi Kyuseok Shim Presented by Guanghua Yan

Outline Approximate query processing: Problem and Prior solutions Another Solution wavelets Using wavelets to construct synopsis: 1D Haar Wavelets MultiD Haar Wavelets Construction of Synopsis Query processing in wavelets domain: Select Project Join Rendering the result Experimental Evaluation Conclusions 2

Why do we need Approximate Query Processing? Characteristics of DSS applications Huge Amount of Data(GB/TB) High Query Complexity Stringent responsetime requirement EXACT answer NOT always required Data Warehouse (GB/TB) Exploratory nature of DSS applications Aggregate query : Precision to penny?no Fast, approximate answer is preferable Approximate Query Processing Approximate answers Quick response SQL Query Exact Answers Problem: Long Response Time 3

How does Approximate Query Processing work? Data Warehouse (GB/TB) Construct Compact Relations (in advance) Compact Relations (MB) SQL Query Transformation Algebra Transformed SQL Query Fast Response Times Approximate Answers 4

Previous Work Construct compact relations using: Random Sampling (AQUA system) accurate for aggregate queries(count, SUM, AVG) not suitable when joins are involved (too few tuples) not suitable for nonaggregate queries Histograms (Ioannidis and Poosala) effectiveness at high dimensions is unclear construction is costly (And Storage, dimensionality curse) needs to expand for joins(join makes the Dim even higher) Wavelets (Vitter and Wang) effective for aggregate queries even at high dimensions limited in query processing scope (only rangesum queries) 5

Overview of the work in this paper Construct compact synopsis of interesting tables using multiresolution wavelet decomposition (done in advance) fast, takes just a single pass over the relation in the best case, otherwise logarithmic passes SQL queries are answered by working just on the compact relations i.e. entirely in the wavelet (compressed) domain fast response times results converted back to relational domain (rendering) at the end all types of queries supported: aggregate, nonaggregate Fast, accurate, general 6

Overview of the work the big picture Data Warehouse (GB/TB) Construct Compact Relations (in advance) Compact Relations (MB) Step 1 SQL Query Result Relation Transformation Algebra Step 2 Query Result Rendering (If needed) Step 3 Transformed SQL Query Fast Response Times Approximate Answers 7

Step1 : Construct synopsis with wavelets decomposition 1D Haar Wavelets MultiD Haar Wavelets Construction of Synopsis 8

What s decomposition? Vector Decomposition V = (1, 2, 3, 4) V = 1 * (1, 0, 0, 0) 2 * (0, 1, 0, 0) 3 * (0, 0, 1, 0) 4 * (0, 0, 0, 1) Basis Vectors 1, 2, 3, 4 called coefficients. b1 = (1, 0, 0, 0) called basis vector 3 = (1, 2, 3, 4) * (0, 0, 1, 0) Orthogonal : Given two basis vectors b i & b j No redundancy, regular, easy to reconstruct P dot = b i * b j = 1 i = j 0 otherwise Looks useless(from (1, 2, 3, 4) to (1, 2, 3, 4)) except the idea of decomp. 9

What s decomposition? Idea of Decomposition Fix a set of basis Compute a set of coefficients Multiplying the original data by one basis gives us one coefficient Dot product vs. Inner product # of basis = # of coefficients = # of elements(original data) Represent the original data(or function) by a set of coefficients in terms of a set of basis Motivation Find new features of data (Fourier) Compress data (Wavelets in this paper) The original data could be reconstructed (Easy for orthogonal basis) Multiply the coefficient by the corresponding basis Sum up all the products 10

What s decomposition? Function Decomposition Fourier Transformation and Inverse Trans. Basis functions Basis functions : cosine and sine functions. Widely used in Engineering Problem : 1. Losing time resolution, good for periodic signal 2. Basis functions fixed 11

What s decomposition? Wavelets Decomposition Share the idea with Fourier Transformation Time resolution added Wavelets function (Mother Wavelets) Basis functions Basis functions scaled & shifted version of mother wavelets Orthogonal Vanishing moments, Compact support, Regularity Wavelet decomposition generates compact representations that exploit the local structure of the function Wavelets decomposition Scaling function & wavelets function Problem : What wavelets decomposition to use? (Haar, CDF(2, X), CDF(3, X), Daubechies series) 12

Background on Wavelets: 1d Haar Wavelets Why Haar Wavelets? Simplest wavelets function Fast to compute( averaging & differencing ) Performing well in practice(image Compression) What does Haar Wavelets look like? First Example 56 40 8 24 48 48 40 16 48 8 16 8 48 0 28 12 48 16 48 28 8 8 0 12 32 16 38 10 8 8 0 12 Blue : Original or average coefficient Red : Detail coefficient 32 38 16 10 8 8 0 12 35 3 16 10 8 8 0 12 35 3 16 10 8 8 0 12 13

Haar Wavelets functions Scaling function ( Father Wavelets) h 0 (t) = 1 t in [0, 1] 0 otherwise 1 0 1 Scaling Scaled & Shifted 1 0 1 Scaled 1 0 1 Scaled & Shifted Wavelets function ( Mother Wavelets) 1 0 1 Wavelets Scaled & Shifted 1 0 1 Scaled 1 0 1 1 t in [0, ½] h 0 (t) = 1 t in [½, 1] 0 otherwise Scaled & Shifted 14

1d Haar basis functions (Daughter Wavelets) 1 0 1 1 0 1 Scaled and shifted version of mother wavelets 1 0 1 h : (1,1, 1, 1, 1, 1, 1, 1) h1 : (1,1, 1, 1, 1, 1, 1, 1) h2 : (1,1, 1, 1, 0, 0, 0, 0) h3 : (0,0, 0, 0, 1, 1, 1, 1) Scaling function 1 0 1 Wavelets function h4 : (1,1, 0, 0, 0, 0, 0, 0) h5 : (0,0, 1, 1, 0, 0, 0, 0) h6 : (0,0, 0, 0, 1, 1, 0, 0) h7 : (0,0, 0, 0, 0, 0, 1, 1) 1 0 1 1 0 1 1 0 1 1 0 1 Set of basis functions(complete decomp.) for signal S of length 8 Vector below each basis function is a sampling of the basis function Multiply S by each basis will give each coefficient(result : 8 coefficients) Connection with the First Example 56 40 8 24 48 48 40 16 35 3 16 10 8 8 0 12 15

Compute 1d Haar wavelets decomp. By linear algebra Decomp. Matrix M a ( Collecting the 8 basis vectors, put each one as a column) Dot product of any two columns is ZERO Normalizing each column is easy Decomp.(Complete) Given any signal S of length 8 Multiplying S by M a gives the wavelets decomp. Y = S * M a 1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0 0 1 0 0 Ma = 1 1 1 0 0 1 0 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 0 1 1 1 0 1 0 0 0 1 Decomp. Matrix Reconstruction Make M a orthogonal (M a 1 = M a T ) S = Y * M a 1 = Y * M a T 16

Compute 1d Haar wavelets decomp. Scale by scale Decomposition Pair wise averaging and differencing [One scale decomposition] Distribution, put average(approximate coefficient) together and put difference(detail coefficient) together Repeat above on average until only one average number left [Recursive, Complete decomposition] Result : Last average all detail coefficients Reconstruction Exactly the inverse of decomposition 17

How does 1d Haar Wavelet work? Example 56 40 8 24 48 48 40 16 48 8 16 8 48 0 28 12 48 16 48 28 8 8 0 12 32 16 38 10 8 8 0 12 32 38 16 10 8 8 0 12 35 3 16 10 8 8 0 12 35 3 16 10 8 8 0 12 Blue : Original or average coefficient Red : Detail coefficient Decomposition ( logn steps needed ) 3 Steps are used to do the complete decomposition Reconstruction Exact inverse of the above process 18

Where s the compression and Approximate? Thresholding Set a threshold value C Replace those wavelet coefficients whose absolute value less than C with ZERO More zero in the wavelet coefficients Compression store ONLY nonzero The more similar data we have, the more compression we get How much does this influence the original data? 56 40 8 24 48 48 40 16 35 3 16 10 8 8 0 12 56 40 8 24 48 48 40 16 Row 1 : original data Row 2 : coefficients Row 3 : Reconstructed data Threshold C = 4 56 40 8 24 48 48 40 16 35 0 16 10 8 8 0 12 59 43 11 27 45 45 37 13 Threshold C = 9 56 40 8 24 48 48 40 16 35 0 16 10 0 0 0 12 51 51 19 19 45 45 37 13 19

Haar wavelets compression and approximate Threshold C = 4 56 40 8 24 48 48 40 16 35 0 16 10 8 8 0 12 Threshold C = 9 56 40 8 24 48 48 40 16 35 0 16 10 0 0 0 12 59 43 11 27 45 45 37 13 51 51 19 19 45 45 37 13 Blue line : Original signal Red line : Reconstructed signal 20

Background on Wavelets: Multid Haar Wavelets Data cube has multi dimensions(of equallength) Standard decomposition Nonstandard decomposition Standard decomposition Fix an ordering for the data dimensions, say 1, 2,, d For each dimension k, fix other (d1) dimensions, we get an 1D row vector Perform complete 1D Haar wavelet decomposition on the ID vector Repeat the last two steps in the order fixed in step 1 Nonstandard decomposition Fix an ordering for the data dimensions, say 1, 2,, d In this order for each dimension, perform one scale of 1D Haar decomp Collect the averages together, repeat the last step on the averages Conceptualizing : using a hyperbox of size 2 X 2 X 2 X 2( = 2 d ) 21

Multid Haar Wavelets (nonstandard) c d One step along Dim 1 (x axis) ( c d ) / 2 ( c d ) / 2 a b ( a b ) / 2 ( a b ) / 2 rebuilding One step along Dim 2 (y axis) S = ( ( a b ) / 2 ( c d)/ 2 ) / 2 = ( a b c d ) / 4 a = S d1 d2 d3 d1, d2, d3 b = S d2 d1 d3 c = S d1 d2 d3 d 2 d 3 d = S d3 d1 d2 s d 1 Wavelets Coefficients S = (a b c d) / 4 d1 = (a c b d) / 4 d2 = (a b c d) / 4 d3 = (a d c d) / 4 22

Multid Haar Wavelets Example Bad Position 23

Multid Haar Coefficients: Semantics and Representation Question : What s the contribution of each coefficient (W) in rebuilding the data array?how to store a coefficient? Answer : W = <R, S, v> R : ddimensional support hyperrectangle of W S : sign information for all ddimensional cells of W.R V : magnitude of the coefficient of W R & S only depends on Haar basis function V depends on the original data 24

Multid Haar Coefficients: Semantics and Representation 3 A :2D Data Array 2 1 0 Wa: Wavelet Coefficients 0 1 2 3 W = Wa[1, 2] W.v = 2 W.R.bound[1].lo = 2 W.R.bound[1].hi = 3 W.R.bound[2].lo = 0 W.R.bound[2].hi = 1 W.S.sign[1].lo = W.S.sign[1].hi = W.S.sign[2].lo = W.S.sign[2].hi = W.S.schg[1] = 2 W.S.schg[2] = 1 A[0,1] = Wa[0,0]Wa[0,1]Wa[1,0]Wa[1,1]Wa[0,2]Wa[2,0]Wa[2,2]=2.5(1)(.5) = 3 25

Notation used in the paper 26

Construction of Compact Relations: Wavelet decomposition of JFD Matrix Relation (Numeric Attributes) Joint Frequency Distribution (JFD) Matrix 27

Thresholding Retain the k coefficients with largest absolute value after normalization Minimizes overall mean squared error The set of coefficients retained after thresholding is the waveletcoefficient synopsis All SQL queries will be on the synopsis 28

Summary of Step1 Wavelets Decomp. & Construction of synopsis 1D Haar wavelets Decomp. Simple & fast to compute Pair wise averaging & differencing Recursive fashion MD Haar wavelets Decomp. Nonstandard extension Alternate between dimensions Thresholding Thresholding smallest coefficients Lossy data compression approximation How to store coefficients Semantics of the notations W = (R, S, v) SQL will be on coefficients 29

Query Processing(Step 2) Entire processing in compressed (wavelet) domain Querying in Wavelet Domain Query Results in Wavelet Domain Compressed domain (FAST) Render Wavelet Synopses Final Approximate Results Relation domain (SLOW) Render Approximate Relations Querying in Relation Domain 30

Query Processing Set of tuples Each operator (e.g., select, project, join, aggregates etc.) input: set of coefficients render Set of coeffs output: set of coefficients Finally, rendering step join input: set of coefficients output: (multi)set of tuples Questions How to map query algebra? Can we maintain the semantics of the coefficients? project select Set of coeffs select Set of coeffs 31

Query algebra mapping Selection : Definition Select pred (W T ) ; T is a ddimensional relation W T is T s wavelets synopsis Pred = ( l i 1 D i1 h i1 ) ^ ^ (l ik D ik h ik ) Kdimensional range selection Range defined for k dimensions, D = {D i 1, D i2,, D ik } Range unspecified for remaining (d k) dimensions : 0 X D x Example 32

Query algebra mapping Selection : example Dim. D1 1 3 2 2 3 3 JFD Matrix 4 Dim. D2 1 6 3 7 8 6 Query Range Dim D1 (Attr1) Dim D2 (Attr2) 0 6 6 1 2 3 1 3 4 1 5 6 1 6 8 2 6 7 3 0 1 4 2 3 5 2 2 6 1 3 6 2 2 6 5 1 6 6 3 Count D 1 : (0, 7) D 2 : (0, 7) Pred = (1 D 1 4 ) ^ ( 2 D 2 6 ) D = { D 1, D 2 } In relation domain, interested in only those cells inside query range In wavelet domain, interested in only the coefficients that contribute to those cells 33

Query algebra mapping Selection : Mapping 1. For each W in W T do 2. If for every D i in D /* Check overlapping */ j l i j W.R.bound[i i].lo h i j or W.R.bound[i i ].lo l i j W.R.bound[i i].hi then goto 3 else goto 5 3. For all D i in D do j set /* Overlapping area is the new hyperrectangle*/ W.R.bound[i i ].lo := max{l i, W.R.bound[i j i ].lo} W.R.bound[i i ].hi := min {h, i j W.R.bound[i i ].hi} if W.R.bound[i i ].hi < W.R.schg[i i ] then set /* no sign change any more */ W.S.schg[i i ] := W.R.bound[i i ].lo W.S.sign[i i ] := [W.S.sign[i i ]. lo, W.S.sign[i i ]. lo] elseif W.R.bound[i i ].lo W.S.schg[i i ] then set /* no sign change any more */ W.S.schg[i i ] := W.R.bound[i i ].lo W.S.sign[i i ] := [W.S.sign[i i ].hi, W.S.sign[i i ].hi] 4. Output updated W, W s = W s W 5. Goto 1, select next W D1 D1 W4 W4 Query Range W3 W3 W2 W2 W1 D2 D2 34

Query algebra mapping Projection : Definition Project X i 1, Xi 2,, Xi k (W T ) ; T is a ddimensional relation W T is T s wavelets synopsis X i 1,,X i2,, X are the set of attributes we are interested ik Remaining (dk) dimensions will be projected out Project out (dk) dimensions one by one Example 35

Query algebra mapping Projection : example Retain this dim. (D1) 3 2 2 3 1 3 JFD Matrix 4 1 6 3 Eliminate this 7 8 6 dimension (D2) 9 2 3 1 7 21 6 Result of projection Dim D1 (Attr1) Dim D2 (Attr2) 6 1 3 6 2 2 6 5 1 6 6 3 Dim D1 (Attr1) 6 9 Project Count Count D 1 is to be retained, D 2 will be projected out In relation domain, sum elements in each row along eliminated dimension In wavelet domain, sum the contribution of coefficient along eliminated dimension 36

Query algebra mapping Projection : Mapping D1 X D1 Project on D1 X1 X2 W2 W 1.v = X * W 1.v W 2.v =( X2 X1 )* W 1.v W1 D2 1. For each D j in D (To be projected out) 2. For every W in W T do 2.1 Set W.v = W.v * P j where P j equals to (W.R.bound[j].hi W.S.schg[j] 1) * W.S.sign[j].hi (W.S.schg[j] W.R.bound[j].lo) * W.R.bound[j].lo 2.2 Discard dimension D j (Hyperrectangle and sign) from W 3. Goto 1, select next D j In Step 2, by summing up the contributions of W along D j, we are projecting out D j In a word we can simply do for each W W2 W.v := W.v * PROD Dj in D D P j Discard dimensions D D 37

Query algebra mapping EquiJoin : Definition Join pred (W T1,W T2 ) Dim(T 1 ) = d 1, Dim(T 2 ) = d 2 wavelets synopsis(t 1 ) = W T1, wavelets synopsis(t 2 ) = W T2 Pred = ( X 11 = X 2 1 ) ^ ^ ( X 1k = X 2k ) Pred is of kdim, k d 1 && k d 2 WLOG, assume they are the first k dimensions of both T 1 and T 2 Let D = (D 1, D 2,, D k ) Dimension of Result would be ( d 1 d 2 k ) Example 38

Query algebra mapping EquiJoin : example These two cells have the same value on D1 7 JFD Matrix of Relation1 Dim. D2 Join Dimension D1 JFD Matrix of Relation2 In relation domain, join count = 7*3 In wavelet domain, consider all pairs of coefficients and check joinability (and compute new coefficients) 3 Dim. D3 Dim D1 (Attr1) 6 Dim D1 (Attr1) Dim D2 (Attr2) 6 2 7 4 3 6 Dim D3 (Attr3) Count 6 3 3 Dim D1 (Attr1) Relation1 Relation2 Count Join along D1 Dim D2 (Attr2) Dim D3 (Attr3) 6 2 3 21 Count 39

Query algebra mapping EquiJoin : example Case 1 : no overlapping Output nothing Case 2: Overlapping Cell A(X 1, X 2 ) and Cell B(X 1, X 3 ) W 11 and W 12 cover A (W 12 not shown) W 21 and W 22 cover B (W 22 not shown) Calculate join result for (X 1, X 2, X 3 ) (W 11.v W 12.v) * (W 21.v W 22.v) = W 11.v * W 21.v W 11.v * W 22.v W 12.v * W 21.v W 12.v * W 22.v Consider each coefficient pair Join range along any dimension can contain at most one true sign change due to the complete containment property of the Haar wavelets decomposition X1 D1 D1 Join Dimension D1 D1 W11 D2 D2 D1 A(X1, X2) B(X1, X3) D3 NOTHING W.v =W11.v*W21.v W21 W21 W11 X2 X3 D3 40

EquiJoin : Mapping 1. For each pair (W 1,W 2 ) W 1 in W T1 && W 2 in W T2 do 2. If for every D i in D /* 2. Check overlapping in the k join dimensions*/ If ( W 1.R.bound[i].lo W 2.R.bound[i].lo W 1.R.bound[i].hi ) OR ( W 2.R.bound[i].lo W 1.R.bound[i].lo W 2.R.bound[i].hi ) then goto 3 else goto 7 3. For each join dimension D i in D do /* 3,4,5,6 build a new coefficient on join range */ 1.1 set W.R.bound[i].lo := max{w 1.R.bound[i].lo, W 2.R.bound[i].lo} /* set join boundary */ W.R.bound[i].hi := min {W 1.R.bound[i].hi, W 2.R.bound[i].hi} 1.2 For j = 1, 2 /*Let S j be a temporary signvector variable*/ /* compute sign info */ if W.R.bound[i].hi < W j.s.schg[i] then S j := [W j.s.sign[i].lo, W j.s.sign[i].lo]; elseif W.R.bound[i].lo W j.s.schg[i] then S j := [W j.s.sign[i].hi, W j.s.sign[i].hi]; else set S j := W j.s.sign[i]; 1.3 Set W.S.sign[i] := [S 1.lo * S 2.lo, S 1.hi * S 2.hi]; 1.4 If W.S.sign[i].lo == W.S.sign[i].hi then set W.S.schg[i] := W.R.bound[i].lo 1.5 else set W.S.schg[i] := max j=1,2 {W j.s.schg[i] : W j.s.schg[i] in [W.R.bound[i].lo, W.R.bound[i].hi] } 4. For each nonjoin dimension D i, i = k 1,, d 1 do /* 4,5 inherit nonjoin dimensions */ set W.R.bound[i] := W 1.R.bound[i], W.S.sign[i] := W 1.S.sign[I], W.S.schg[i] := W 1.S.schg[i] 5. For each nonjoin dimension D i, i = d 1 1,, d 1 d 2 k do set W.R.bound[i] := W 2.R.bound[i d 1 k], W.S.sign[i] := W 2.S.sign[i d1 k], W.S.schg[i] := W 2.S.schg[i d1 k] 6. Set W.v : = W 1.v * W 2.v and output W, Ws = Ws W 7. Goto 1, select another pair 41

Query algebra mapping EquiJoin : example D1 Join Dimension D1 D1 NOTHING D2 D3 D1 D1 val = val1*val2 D2 D3 42

Query algebra mapping EquiJoin : example D1 D1 val = D2 D3 val1*val2 D1 D1 val = D2 D3 val1*val2 43

Summary of Step2 Query algebra mapping(only nonaggregate) Selection Update those wavelets coefficients whose hyperrectangle overlapping the selection range Projection Sum up all wavelets coefficients along all dimensions to be projected out Join Create new wavelets coefficients Hyperrectangle equals to the join range plus nonjoin dimensions Compute sign information Results need to be rendered Output of above queries are wavelets coefficients Need to be converted to database relation 44

Rendering(Step 3) Go back from wavelets domain to database relations Semantics of wavelets coefficients unchanged Range, Sign, Signchange, Magnitude Inverse wavelets decomposition is easy Sum up the contributions of all coefficients to each cell 45

Experimental Results Compare waveletsbased technique With sampling and histograms In terms of efficiency and accuracy Measuring accuracy (Error Metrics) Aggregate : Absolute relative error Nonaggregate : EMD error Query types SELECT, SELECTSUM, SELECTJOIN, SELECTJOINSUM 46

Datasets and Queries Synthetic data set Real data set: CENSUS Population Survey (www.census.gov) 1992 & 1994 4d data: age (017), education level (046), income (041), hrs/week (013) JFD Matrix size: 2 million cells( 32 * 64 * 64 * 16) Relation sizes (2 relations) ~ 16,000 Density ~ 0.001 Queries: Selects: 5 age < 10 ^ 10 income < 15, selectivity ~ 6% Joins: join age on 1992 and 1994 data Sum : sum on age 47

Query Execution Time TwoD synthetic data set used Running time on base relation is 3.6 seconds (Enough memory used) Sampling is not counted here Giving too less tuples of join Wavelets runs faster (than Histograms) More than two orders of magnitude Histograms expanded to generate tuplevalue distribution Wavelets expanded at the very end 48

Query Execution Accuracy 49

Query Execution Accuracy 50

Conclusion Wavelets are an effective tool for general purpose approximate query answering fast query processing (entirely in wavelet (compressed) domain) low synopsis construction cost high accuracy even at high dimensions can handle all types of queries 51