Partial-Sum Queries in OLAP Data Cubes Using Covering Codes

Similar documents
Multi-objective Programming Approach for. Fuzzy Linear Programming Problems

Intermediate Division Solutions

Chapter 3.1: Polynomial Functions

A New Method for Finding an Optimal Solution. of Fully Interval Integer Transportation Problems

Fourier Method for Solving Transportation. Problems with Mixed Constraints

ENGI 4421 Central Limit Theorem Page Central Limit Theorem [Navidi, section 4.11; Devore sections ]

Fourier Series & Fourier Transforms

Grade 3 Mathematics Course Syllabus Prince George s County Public Schools

Author. Introduction. Author. o Asmir Tobudic. ISE 599 Computational Modeling of Expressive Performance

D.S.G. POLLOCK: TOPICS IN TIME-SERIES ANALYSIS STATISTICAL FOURIER ANALYSIS

Quantum Mechanics for Scientists and Engineers. David Miller

MATHEMATICS 9740/01 Paper 1 14 Sep hours

A Hartree-Fock Calculation of the Water Molecule

Design and Implementation of Cosine Transforms Employing a CORDIC Processor

ENGI 4421 Central Limit Theorem Page Central Limit Theorem [Navidi, section 4.11; Devore sections ]

Solutions. Definitions pertaining to solutions

Ch. 1 Introduction to Estimation 1/15

x 2 x 3 x b 0, then a, b, c log x 1 log z log x log y 1 logb log a dy 4. dx As tangent is perpendicular to the x axis, slope

IJISET - International Journal of Innovative Science, Engineering & Technology, Vol. 2 Issue 12, December

Mean residual life of coherent systems consisting of multiple types of dependent components

K [f(t)] 2 [ (st) /2 K A GENERALIZED MEIJER TRANSFORMATION. Ku(z) ()x) t -)-I e. K(z) r( + ) () (t 2 I) -1/2 e -zt dt, G. L. N. RAO L.

Copyright 1978, by the author(s). All rights reserved.

5.1 Two-Step Conditional Density Estimator

Unifying the Derivations for. the Akaike and Corrected Akaike. Information Criteria. from Statistics & Probability Letters,

Study of Energy Eigenvalues of Three Dimensional. Quantum Wires with Variable Cross Section

The Excel FFT Function v1.1 P. T. Debevec February 12, The discrete Fourier transform may be used to identify periodic structures in time ht.

Axial Temperature Distribution in W-Tailored Optical Fibers

Markov processes and the Kolmogorov equations

A Study on Estimation of Lifetime Distribution with Covariates Under Misspecification

, the random variable. and a sample size over the y-values 0:1:10.

Lecture 21: Signal Subspaces and Sparsity

Claude Elysée Lobry Université de Nice, Faculté des Sciences, parc Valrose, NICE, France.

BIO752: Advanced Methods in Biostatistics, II TERM 2, 2010 T. A. Louis. BIO 752: MIDTERM EXAMINATION: ANSWERS 30 November 2010

UNIVERSITY OF TECHNOLOGY. Department of Mathematics PROBABILITY THEORY, STATISTICS AND OPERATIONS RESEARCH GROUP. Memorandum COSOR 76-10

The Complexity of Translation Membership for Macro Tree Transducers

MATH Midterm Examination Victor Matveev October 26, 2016

ON FREE RING EXTENSIONS OF DEGREE N

Directional Duality Theory

Efficient Processing of Continuous Reverse k Nearest Neighbor on Moving Objects in Road Networks

Frequency-Domain Study of Lock Range of Injection-Locked Non- Harmonic Oscillators

5.80 Small-Molecule Spectroscopy and Dynamics

[1 & α(t & T 1. ' ρ 1

arxiv: v1 [cs.cg] 31 Mar 2013

The generation of successive approximation methods for Markov decision processes by using stopping times

Function representation of a noncommutative uniform algebra

Aligning Anatomy Ontologies in the Ontology Alignment Evaluation Initiative

are specified , are linearly independent Otherwise, they are linearly dependent, and one is expressed by a linear combination of the others

Active redundancy allocation in systems. R. Romera; J. Valdés; R. Zequeira*

Wavelet Video with Unequal Error Protection Codes in W-CDMA System and Fading Channels

Recovery of Third Order Tensors via Convex Optimization

ESWW-2. Israeli semi-underground great plastic scintillation multidirectional muon telescope (ISRAMUTE) for space weather monitoring and forecasting

An S-type upper bound for the largest singular value of nonnegative rectangular tensors

Physical Chemistry Laboratory I CHEM 445 Experiment 2 Partial Molar Volume (Revised, 01/13/03)

Every gas consists of a large number of small particles called molecules moving with very high velocities in all possible directions.

Solutions to Midterm II. of the following equation consistent with the boundary condition stated u. y u x y

Review for cumulative test

MODIFIED LEAKY DELAYED LMS ALGORITHM FOR IMPERFECT ESTIMATE SYSTEM DELAY

Preliminary Test Single Stage Shrinkage Estimator for the Scale Parameter of Gamma Distribution

Christensen, Mads Græsbøll; Vera-Candeas, Pedro; Somasundaram, Samuel D.; Jakobsson, Andreas

AP Statistics Notes Unit Eight: Introduction to Inference

Super-efficiency Models, Part II

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

Chapter 3: Cluster Analysis

Matching a Distribution by Matching Quantiles Estimation

Physical Layer: Outline

Information Sciences

Comparative analysis of bayesian control chart estimation and conventional multivariate control chart

Tactics-Based Remote Execution

ALE 26. Equilibria for Cell Reactions. What happens to the cell potential as the reaction proceeds over time?

Study in Cylindrical Coordinates of the Heat Transfer Through a Tow Material-Thermal Impedance

RMO Sample Paper 1 Solutions :

Full algebra of generalized functions and non-standard asymptotic analysis

x. Itrducti The k-d tree, r k-dimesial biary search tree, was prpsed by Betley i 75. I this paper, we prpse a mdicati, the squarish k-d tree, ad aalyz

Gusztav Morvai. Hungarian Academy of Sciences Goldmann Gyorgy ter 3, April 22, 1998

Identical Particles. We would like to move from the quantum theory of hydrogen to that for the rest of the periodic table

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

TEST TUBE SYSTEMS WITH CUTTING/RECOMBINATION OPERATIONS Rudolf FREUND Institut fur Computersprachen, Technische Universitat Wien Resselgasse 3, 1040 W

Homology groups of disks with holes

Pipe Networks - Hardy Cross Method Page 1. Pipe Networks

Literature Review of Spatio-Temporal Database Models

Chapter 5. Root Locus Techniques

Revisiting the Socrates Example

General Chemistry 1 (CHEM1141) Shawnee State University Fall 2016

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Unit -2 THEORY OF DILUTE SOLUTIONS

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

Computational Intelligence and Application of Frame Theory in Communication Systems

Pattern Recognition 2014 Support Vector Machines

Examination No. 3 - Tuesday, Nov. 15

The Molecular Diffusion of Heat and Mass from Two Spheres

RADICAL EXPRESSION. If a and x are real numbers and n is a positive integer, then x is an. n th root theorems: Example 1 Simplify

6.867 Machine learning, lecture 14 (Jaakkola)

cannot commute.) this idea, we can claim that the average value of the energy is the sum of such terms over all points in space:

READING STATECHART DIAGRAMS

Portfolio Performance Evaluation in a Modified Mean-Variance-Skewness Framework with Negative Data

OPTIMAL ALGORITHMS -- SUPPLEMENTAL NOTES

THE MATRIX VERSION FOR THE MULTIVARIABLE HUMBERT POLYNOMIALS

Distributed Trajectory Generation for Cooperative Multi-Arm Robots via Virtual Force Interactions

Fortgeschrittene Datenstrukturen Vorlesung 11

Transcription:

326 IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, DECEMBER 998 Partial-Sum Queries i OLAP Data Cubes Usig Cverig Cdes Chig-Tie H, Member, IEEE, Jehshua Bruck, Seir Member, IEEE, ad Rakesh Agrawal, Seir Member, IEEE Abstract A partial-sum query btais the summati ver a set f specified cells f a data cube. We establish a cecti betwee the cverig prblem i the thery f errr-crrectig cdes ad the partial-sum prblem ad use this cecti t devise algrithms fr the partial-sum prblem with efficiet space-time trade-ffs. Fr example, usig ur algrithms, with 44 percet additial strage, the query respse time ca be imprved by abut 2 percet; by rughly dublig the strage requiremet, the query respse time ca be imprved by abut 34 percet. Idex Terms Partial-sum query, cverig cde, errr-crrectig cde, -lie aalytical prcessig, data cube, multidimesial database, precmputati, query algrithm. F INTRODUCTION O N-LINE Aalytical Prcessig (OLAP) [7] allws cmpaies t aalyze aggregate databases built frm their data warehuses. A icreasigly ppular data mdel fr OLAP applicatis is the multidimesial database (MDDB) [8], als kw as data cube [9]. T build a MDDB frm a data warehuse, certai umber f attributes are selected. Thus, each data recrd ctais a value fr each f these attributes. Sme f these attributes are chse as metrics f iterest ad are referred t as the measure attributes. The remaiig attributes, say d f them, are referred t as dimesis r the fuctial attributes. The measure attributes f all recrds with the same cmbiati f fuctial attributes are cmbied (e.g., summed up) it a aggregate value. Thus, a MDDB ca be viewed as a d- dimesial array, idexed by the values f the d fuctial attributes, whse cells ctai the values f the measure attributes fr the crrespdig cmbiati f fuctial attributes. Csider a data cube frm a isurace cmpay as a example. Assume the data cube has fur fuctial attributes (dimesis): age, time, state, ad (isurace) type. Further assume that the dmai f age is t 00, f time is Qtr87 t 4Qtr96 (fur quarters per year ad ver 0 years), f state is the 50 states i the U.S., ad f type is {health, hme, aut, life}. The data cube will have 00 40 50 4 cells, with each cell ctaiig the ttal reveue (the measure attribute) fr the crrespdig cmbiati f age, time, state, ad type, e.g., (35, Qtr96, Califria, aut). We csider a class f queries, which we shall call partialsum queries, that sum ver all selected cells f a data cube, ²²²²²²²²²²²²²²²² C.-T. H ad R. Agrawal are with IBM Almade Research Ceter, 650 Harry Rad, Sa Jse, CA 9520. E-mail: {h, ragrawal}@almade.ibm.cm. J. Bruck is with the Califria Istitute f Techlgy, Mail Stp 36-93, Pasadea, CA 925. E-mail: bruck@paradise.caltech.edu. Mauscript received 29 Sept. 997. Fr ifrmati btaiig reprits f this article, please sed e-mail t: tc@cmputer.rg, ad referece IEEECS Lg Number 05727. where selecti is specified by prvidig a subset f values fr sme f the fuctial attributes. Partial-sum queries are frequet with respect t categrical attributes whse values d t have a atural rderig, althugh they ca arise with respect t umeric attributes as well. Usig the same example f a isurace data cube, a partial-sum query may btai the ttal reveue frm the states f Califria, Flrida, Texas, ad Ariza, fr life ad health isuraces, ad fr Qtr94, Qtr95, ad Qtr96. I a iteractive explrati f data cube, which is the predmiat OLAP applicati area, it is imperative t have a system with fast respse time.. Partial-Sum Prblem The e-dimesial partial-sum prblem ca be frmally stated as fllws. (The d-dimesial partial-sum prblem will be defied i Secti 7.) Let A be a array f size m, idexed frm 0 thugh m, whse value is kw i advace. Let M = {0,, L, m } be the set f idex dmai f A. Give a subset f A s idex dmai I M at query time, we are iterested i gettig partial sum f A, specified by I as: Psum0A, I5 = A i. EXAMPLE. Fr example, csider the fllwig array A with six elemets: A = (259, 40, 680, 937, 452, 63). Let I= {0,, 5}, the Psum(A, I) = 259 + 40 + 63 = 723. Let I = {0, 3, 4}, the Psum(A, I) = 259 + 937 + 452 =,648. We will use tw metrics t measure the cst f slvig the partial-sum prblem: time verhead T ad space verhead S. The partial-sum cmputati requires a access t a elemet f A fllwed by a additi f its value t a existig value (the cumulative partial sum). Thus, a time step ca be mdeled as the average time fr accessig e array elemet ad e arithmetic perati. We defie T f a algrithm as the maximum umber f time steps i I 008-9340/98/$0.00 998 IEEE

HO ET AL.: PARTIAL-SUM QUERIES IN OLAP DATA CUBES USING COVERING CODES 327 Fig.. The best (s, t) data pits fr cmputig partial sum. required by the algrithm (ver all pssible iput I). We defie S as the umber f strage cells required fr the executi f the partial-sum perati. The strage may be used fr the rigial array A ad fr precmputed data that will help i achievig better respse time. Clearly, a lwer bud S is m sice at least the etire array A, r sme ecded frm f it, has t be stred. Withut ay precmputati, i.e., S = m, the wrst-case time cmplexity is T = m (which ccurs whe I = M). O the ther had, if e precmputes ad stres all pssible cmbiatis f partial sums (S = 2 m ), which is clearly ifeasible fr large m, ly e data access is eeded (T = ). A straightfrward bservati is that if we precmpute m ly the ttal sum f A, say A[*] = A[ i], the the i= 0 wrst-case time cmplexity fr ay partial sum ca be reduced frm m t m/2. This is because a partial sum ca als be derived frm A[*] Psum(A, I ) where I = M I. Fr example, csiderig Example, we ca stre the sum f the elemets A[*] = 2,792. Assume I = {0,, 2, 4, 5}, the Psum(A, I) = A[*] A[3] = 2,792 937 =,855. We will csider the rmalized measures fr time ad space. Namely, s = S/m ad t = T/m. Clearly, usig the A[*] we ca get (s, t) (, 0.5)..2 Ctributis The gal f the paper is t derive a suite f (s, t) pairs, better tha (s, t) (, 0.5). I particular, we will fcus fidig (s, t) fr t < 0.5 ad s beig a small cstat (say, less tha five r s). The best (s, t)-pairs btaied i this paper are summarized i Fig.. (Mre detailed (s, t) values are listed i Table 9 later.) Fr example, the etry (s, t) = (.44, 0.44) implies that, with 44 percet additial strage, e ca imprve the query respse time by abut 2 percet (i.e., frm t = 0.5 t t = 0.44). Ather etry (s, t) = (2.7, 0.33) meas that if we rughly duble the strage requiremet, the query respse time ca be imprved by abut 34 percet. The mai ctributis f the paper are as fllws: First, we establish the cecti betwee cverig cdes [2], [3] ad the partial-sum prblem. Secd, we apply fur kw cverig cdes frm [2], [5], ad [4] t the partialsum prblem t btai algrithms with varius space-time trade-ffs. Third, we mdify the requiremets cverig cdes t better reflect the partial-sum prblem ad devise ew cverig cdes with respect t the ew requiremets. As a result, we further imprve may f the (s, t) pits ad give better space-time trade-ffs. Althugh we explicitly discuss ly the SUM aggregati perati, the techiques preseted apply t the ther cmm OLAP aggregati peratis f COUNT ad AVER- AGE COUNT is a special case f SUM ad AVERAGE ca be btaied by keepig the 2-tuple (sum, cut). I geeral, these techiques ca be applied t ay biary perati p fr which there exists a iverse biary perati ip such that a p b ip b = a, fr ay a ad b i the dmai..3 Related Wrk Fllwig the itrducti f the data cube mdel i [9], there has bee csiderable research i develpig algrithms fr cmputig the data cube [], fr decidig what subset f a data cube t precmpute [4], [], [2], fr estimatig the size f multidimesial aggregates [9], ad fr idexig precmputed summaries [20], [5]. Related wrk als icludes wrk de i the ctext f statistical databases [6] idexig precmputed aggregates [2] ad icremetally maitaiig them [7]. Als relevat is the wrk maiteace f materialized views [6] ad prcessig f aggregati queries [8], [0], [22]. Hwever, these wrks d t directly addresses efficiet precmputati techiques fr partial-sum queries. Clsest t the wrk preseted i this paper is the accmpayig paper [3], i which we csider rage-sum queries ver data cubes ad give fast algrithms fr them. A rage-sum query btais the sum ver all selected cells f a

328 IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, DECEMBER 998 data cube where the selecti is specified by prvidig ctiguus rages f values fr umeric dimesis. A example f a rage-sum query ver a isurace data cube is t fid the reveue frm custmers with a age betwee 37 ad 52, i a time frm Qtr88 t 4Qtr96, i all f the U.S., ad with aut isurace. Althugh a rage-sum query ca be viewed as a special case f the partial-sum query (thus, the geeral techiques prpsed here ca als be applied t the rage-sum query), the techiques specialized fr ragesum queries take advatage f the ctiguus rages f selecti ad shuld be preferred fr better perfrmace..4 Orgaizati f the Paper The rest f the paper is rgaized as fllws: I Secti 2, we give a brief backgrud the cverig cdes that is pertiet t the partial-sum prblem. I Secti 3, we give mai therems that relate the prperties f cverig cdes t the space ad time cmplexities i slvig the partialsum prblem. I Secti 4, we apply the kw cverig cdes t the partial-sum prblem. I Secti 5, we mdify the defiiti f the cverig cde by assumig all the weight- vectrs are icluded as cdewrds, i rder t derive faster algrithms. I Secti 6, we further mdify the defiiti f the cverig cde based a cmpsiti fucti. This results i further imprvemet i space ad time verheads i slvig the partial sum prblem. Secti 7 discusses partial-sum queries ver multidimesial cubes. We cclude with a summary i Secti 8. 2 COVERING CODES I this secti, we briefly review sme ccepts frm the thery f errr-crrectig cdes [2], [3] that are pertiet t the partial-sum prblem. A cde is a set f cdewrds where each cdewrd defies a valid strig f digits. Fr the purpses f this paper, we are ly iterested i biary cdes f fixed legth. We will represet a biary vectr i a bit strig frmat ad use the terms vectr ad bit strig iterchageably depedig the ctext. The bit psiti f a legth-m bit strig (r vectr) is labeled frm 0 thrugh m frm left (the mst sigificat bit) t right (the least sigificat bit). Als, 5*(V) detes ay bit-rtati f vectr V ad detes ccateati f tw bit strigs (vectrs). The Hammig weight f a legth-m biary vectr V = (b 0 b m b 0 i L b m ) is, i.e., the umber f -bits i this vectr. i= The Hammig distace f tw biary vectrs V ad V, deted Hammig(V, V ), is the Hammig weight f V V, where is the bit-wise exclusive-r peratr. Fr istace, the Hammig weight f the vectr V = (0000) is three. The Hammig distace betwee V = (0000) ad V = (00000) is three, which is the Hammig weight f V V = (0000). Thrughut the paper, the weight f a cdewrd r a vectr always meas the Hammig weight. The cverig radius R f a biary cde is the maximal Hammig distace f ay vectr f the same legth frm a cdewrd (a vectr i the cde). A biary cde C is a (m, K, R)-cverig cde if TABLE THE (5, 7, )-COVERING CODE {(00000), (00), (0000), (0000), (0), (0), (0)} 5*(V) detes ay bit-rtati f vectr V ad detes ccateati f tw bit strigs. ) each cdewrd is f legth m; 2) there are K (legal) cdewrds i C (ut f all 2 m pssible cmbiatis i the vectr space); ad 3) the cverig radius f the cde is R. EXAMPLE 2. The cde C = {(00000), ()} is a (5, 2, 2)- cverig cde because m = 5, K = 2, ad R = 2. Fr this cde, R = 2 because every biary vectr f legth five is withi distace tw frm either (00000) r (). As ather example, the cde C = {(00000), (00), (0000), (0000), (0), (0), (0)} ca be verified frm Table as a (5, 7, )-cverig cde because all 32 vectrs are withi distace e frm e f the seve cdewrds. 3 RELATING THE COVERING RADIUS OF CODES TO PARTIAL SUMS 3. A Mtivatig Example We first give a mtivatig example based the (5, 7, )- cverig cde. Suppse the array A is f size m = 5 ad the iitial values f A[0] thrugh A[4] are kw. We first precmpute the partial sums crrespdig t all seve cdewrds f the (5, 7, )-cverig cde. Fr istace, crrespdig t the cdewrd (00), the precmputed partial sum is A[2] + A[3] + A[4]. Nte that the crrespdig partial sum fr (00000) is zer ad eed t be cmputed. Als, the crrespdig partial sums fr (0000) ad (0000) are already kw as part f the rigial array elemets.

HO ET AL.: PARTIAL-SUM QUERIES IN OLAP DATA CUBES USING COVERING CODES 329 Nw suppse the partial sum query is Psum(A, I), where I = {0, 2, 3, 4}, i.e., crrespdig t the vectr (0). We ca derive its partial sum as the sum f the precmputed partial sum crrespdig t cdewrd (00) ad the value f A[0]. I fact, ay partial sum Psum(A, I) fr this example ca be derived as sme precmputed partial sum plus r mius sme array value. This is because the radius f the (5, 7, )-cverig cde is. We are w ready t relate cverig cdes t the partial-sum prblem frmally. 3.2 Usig Cverig Cdes t Slve Partial Sums Give a legth-m cverig cde C ad ay m-bit vectr V, we use f t (m) ad f s (m) t dete the time ad assciated space verheads, respectively, i derivig the idex t cdewrd i C that is clsest t V. Nte that f t (m) ad f s (m) may deped certai prperty f the cde, i additi t the legth f the cdewrd. Hwever, fr tatial simplicity, we mit the parameter C i f t ad f s. Fr cveiece, we defie a m-bit mask f I as mask (I) = (b 0 b L b m ) where b i = if i I, ad b i = 0 therwise. Als, if V = mask(i), the the set I will be called the supprt f vectr V, deted supprt(v) = I. (Supprt ad mask are iverse fuctis). Fr istace, if m = 5, I = {0,, 3}, the mask(i) = (00). Als, supprt((00)) = {0,, 3}. LEMMA. Give a (m, K, R)-cverig cde with c cdewrds f Hammig weight r 0 i the cde, we ca cstruct a algrithm t derive the partial sum Psum(A, I) i time T = R + f t (m) + ad i space S = m + K c + f s (m). PROOF. Dete the K cdewrds (vectrs) by V, V 2, L, V K. Let I i = supprt(v i ). Withut lss f geerality, assume that the c cdewrds with weight r 0 are the first c the list. (Thus, the partial sum fr each f I, I 2, L, I c is already kw as they crrespd t etries i array A.) We will precmpute ad stre the partial sums fr K c differet subsets specified by I c+, I c+2, L, I K, respectively. This requires a space verhead f K c. Give a idex subset parameter I at ru time, let V = mask(i). We first fid a idex i such that V i is the clsest cdewrd frm V. This requires a time verhead f f t (m) ad a space verhead f f s (m). The, we access the precmputed Psum(A, I i ) i e step. Sice V is at mst distace R away frm V i (due t the prperty f a (m, K, R)-cverig cde), the partial sum Psum(A, I) ca be btaied frm Psum(A, I i ) by accessig ad addig r subtractig up t R elemets f A, which crrespd t the -bit psitis f V V i. Thus, the time verhead fr this mdificati is at mst R. Overall, we have T = R + f t (m) + ad S = m + K c + f s (m). 3.3 Reducig Space Overhead Recall that array A is f size m. The abve lemma applies ay cverig cde f legth m t the etire array. Hwever, may cverig cdes have small R ad large K relative t m [2], [5], [4]. Applyig these cverig cdes directly t the etire array typically yields a ureasable space verhead, eve thugh the time is much imprved. Furthermre, the space verhead depeds the array size m. I the fllwig therem, we will partiti the array it blcks f size ad apply legth- cverig cdes t each blck. THEOREM 2. Give a (, K, R)-cverig cde with c cdewrds f Hammig weight r 0 i the cde, we ca cstruct a algrithm t derive the partial sum Psum(A, I) i time ad i space 05 2 t 7 T R + f + S + K c m + f s PROOF. Assume first that m is a multiple f. Lgically partiti the array A it m/ blcks f size each. Let x = m/. Dete them as A 0, L, A x. Als partiti I it x I 0, L, I x. The, Psum A, I = Psum A, I. T m 0 5 0 5. 0 5 2 i i i7 = 0 derive Psum(A i, I i ) fr each 0 i < x, we apply the algrithm cstructed i Lemma, which icurs verhead T i = R + f t () + i time ad S i = + K c + f s () i space. The space verhead f s () is the same fr all i because the same cverig cde is applied. Thus, the verall time x m cmplexity is T = Ti = R + f i t + = 0 2 05 7 ad the verall space verhead is x 2 057 05 0 5 05 S = S f f K c m i s s f s i + = + +. = 0 Whe m is t a multiple f, we ca exted the array A t a size m = m/ by paddig m m elemets f value 0. This itrduces the apprximati sig i the cmplexities f T ad S. By cmparig the time ad space cmplexities f this therem t that f Lemma, it may appear that bth time ad space cmplexities are wrse i this therem. Nte, hwever, that R is a fucti f the vectr legth (m r ) fr a fixed K. 3.4 Implemetati Usig Lk-up Tables I this subsecti, we give a ccrete example f implemetati based Therem 2 ad give a geeral estimate f the time ad space verhead (f t () ad f s ()) thrugh the use f lk-up tables. We assume m is a multiple f. (If t, we ca exted the size f A t m/ by paddig zer elemets t A.) First, we will restructure A as a tw-dimesial array A[i, j], where i idexes a blck, 0 i < m/, ad j idexes a elemet f A withi the blck, 0 j <. Thus, the ew A[i, j] is the same as the ld A[i + j]. The, fr each blck i, we precmpute the K c partial sums ad stre their value i A[i, j] fr j < + K c i sme arbitrary rder (thugh the rder is the same fr all blcks). The augmeted tw-dimesial array A is a partial-sum lk-up table icludig the rigial elemets f A (i.e., all cdewrds with a Hammig weight fr each blck) ad selected precmputed partial sums fr each blck f A. Table 2 shws a example f the partial-sum lk-up table fr the ith blck f A, based the (5, 7, )-cverig cde described i Table. The cdewrds f the (5, 7, )-cverig cde are marked with * i the table. Als te that cdewrd (00000) is t eeded i the table because the crrespdig

330 IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, DECEMBER 998 TABLE 2 THE PARTIAL-SUM LOOK-UP TABLE FOR THE ith BLOCK OF A BASED ON THE (5, 7, )-COVERING CODE TABLE 3 THE INDEX LOOK-UP TABLE The cdewrds f the (5, 7, )-cverig cde are marked with *. Als, (00000) is t eeded. partial-sum is 0, which ca be mitted. The secd clum i the table is icluded fr clarity ly ad is t eeded i the lk-up table. There are m/ such tables, e fr each blck ad each f size + K c. Thus, a ttal f size ( + K c) m/ is eeded fr the partial- lk-up table. Secd, we will create a idex lk-up table with 2 etries, idexed frm t 2. Fr each etry, we stre a list f (idex, sig)-pairs, deted (j, s ), (j 2, s 2 ), L, s that the partial sum f the ith blck with vectr V ca be derived as 3sx A i, jx8 fr all (j x, s x )-pairs defied i the list. Nte that the list has at mst R + pairs. Fllwig the same example, Table 3 gives a example f the idex lkup table. I the table, a idex f marks the ed f the list ad a questi mark? implies a d t-care value. As befre, the vectr-clum is icluded here fr clarity ly ad is t eeded i the lk-up table. Als, it is pssible t build the table s that the sig fr the first idex is always psitive (such as the example give) ad ca be mitted. As a example, assume the ith blck f I is (000). We use the value f (000), which is three, t idex this table. Accrdig t the table, the partial sum crrespdig t (000) i the i-blck ca be derived by A[i, 3] + A[i, 4]. The, frm Table 2, A[i, 3] ad A[i, 4] are prestred with values A[5i + 3] ad A[5i + 4], respectively. As ather example, assume the ith blck f I is (00). Accrdig t Table 3, the partial sum is A[i, 6] A[i, 0], which, accrdig t Table 2, yields (A[5i] + A[5i + ] + A[5i + 3] + A[5i + 4]) A[5i] = A[5i + ] + A[5i + 3] + A[5i + 4]. The size f the idex lk-up table is buded by f s () = O(2 R) frm abve. With the implemetati f the idex lk-up table, the time verhead fr fidig the clsest cdewrd f a -bit vectr, f t (), becmes the time t idex a array f 2 etries. Sice the same cverig cde is used fr all blcks, the same idex lk-up table will be used fr idexig fr all blcks. 4 APPLYING KNOWN COVERING CODES I this secti, we will apply sme kw cverig cdes t the partial-sum prblem, based Therem 2. Differet cverig cdes lead t differet lk-up tables ad, hece, differet space-time trade-ffs. We have chse (, K, R)- cverig cdes with cmbiatis f miimum radius R ad miimum umber f cdewrds K, give the legth f cdewrds. Specifically, we csider fur classes f cdes: tw classes fr tw differet geeralizatis f Hammig cde (7, 6, ), e class fr the geeralizati f (5, 7, ) cde, ad e class fr the geeralizati f (6, 2, ) cde. These are the ly cdes that yielded useful (s, t)-pairs amg all the cdes icluded i [2], [5], ad [4]. 4. The (7 + 2i, 6, i + )-Cverig Cdes It was shw i [2] that the (7, 6, ) Hammig cde ca be geeralized t (7 + 2i, 6, i + )-cverig cdes, fr all i 0. Fr example, (9, 6, 2) ad (, 6, 3) are i this family f cdes. 4.2 The ( + i, 2 i K, R)-Cverig Cdes A (, K, R)-cverig cde ca als be exteded t a ( + i, 2 i K, R)-cverig cde simply by replicatig the same set f cdewrds 2 i times, each i a cpy f the 2 vectrs. Thus, (7, 6, ) Hammig cde als geeralizes t (7 + i, 2 i+4, )- cverig cdes fr all i 0. Hwever, fr may 9, better (, K, )-cverig cdes tha the aive extesi frm (7, 6, ) are kw [5], [4]. I particular, (9, 62, ) is such a cde icluded i [4]. 4.3 Piecewise Cstat Cdes A family f cdes, called piecewise cstat cdes, was itrduced i [5]. We iclude its defiiti ad give a example here fr easy readig.

HO ET AL.: PARTIAL-SUM QUERIES IN OLAP DATA CUBES USING COVERING CODES 33 TABLE 4 A (5, 7, ) PIECEWISE CONSTANT CODE AS A COVERING CODE TABLE 5 A (6, 2, ) PIECEWISE CONSTANT CODE AS A COVERING CODE Fig. 2. Tw-dimesial array represetig the (5, 7, ) cverig cde f Table 4. First, the legth f a cdewrd is partitied it t parts: = + 2 + L + t. Each cdewrd c is partitied i the same way, as c = (c (), c (2), L, c (t) ), where legth (c (i) ) = i. The, C is a piecewise cstat cde if it has the prperty that if C ctais e wrd with weights wt(c () ) = w, L, wt (c (t) ) = w t, the it ctais all such wrds. Fr example, Table 4 shws a piecewise cstat cde f legth = 5 crrespdig t the partiti = + 2, where = 2 ad 2 = 3. There are seve cdewrds, crrespdig t the weights w = 0, w 2 = 0, w = 0, w 2 = 3, w =, w 2 = 0, wrd, wrd, 2 wrds, w = 2, w 2 = 2, 3 wrds. Ay piecewise cstat cde f legth five partitied as 5 = + 2 = 2 + 3 ca be represeted by a subset f the tw-dimesial array f cells shw i Fig. 2. The cell at psiti (w, w 2 ) represets the set f vectrs c = (c (), c (2) ) with wt(c () ) = w, wt(c (2) ) = w 2. There are w 2 2 3 w w w 2 = 2 such vectrs, ad this umber is writte i the cell. A piecewise cstat cde is the specified by circlig sme f the cells i the array, ad the umber f cdewrds is the Fig. 3. Tw-dimesial array represetig the (6, 2, ) cverig cde f Table 5 ad shwig the Mahatta spheres f cverig radius arud the circled cells. sum f the circled umbers. The fur circled cells i Fig. 2 represet the cde f Table 4, ad there are a ttal f seve cdewrds. Piecewise cstat cdes have the desirable prperty that the cverig radius R is easy t calculate frm this array f cells. This is because radius R is simply the maximal distace f ay cell frm the cde (i.e., frm the earest circled cell), whe the distace betwee tw cells is measured i the Mahatta metric. I Fig. 2, the Mahatta distace betwee tw cells is the umber f hriztal ad vertical steps eeded t mve frm e t the ther. It is clear that, i Fig. 2, every cell is withi Mahatta distace f a circled cell, s the cverig radius R is. Thus, we have a (, K, R) = (5, 7, ) cverig cde. A secd example f a piecewise cstat cde is give i Table 5 ad Fig. 3. This crrespds t the partiti 6 = 3 + 3 ad ctais 2 cdewrds. Fig. 3 shws the spheres f Mahatta radius arud the cdewrds, prvig that R =. Thus, we have a (, K, R) = (6, 2, ) cverig cde.

332 IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, DECEMBER 998 peratig pit depedig up the bjective. Recall that we defied the ttal space required, icludig the rigial array f size m, as sm. (That is, s is the multiplicative verhead.) There is, hwever, a additive verhead f f s () = O(2 R) t icluded i this ad subsequet tables with a s-clum. 5 SINGLE-WEIGHT-EXTENDED COVERING CODES I this secti, we will mdify the prperty f cverig cdes t better reflect the partial-sum prblem. We will first defie a ew type f cverig cdes, which we shall call the sigle-weight-exteded cverig cdes. The, we preset a geeral therem relatig this type f cverig cdes t the partial-sum prblem. Fially, we will devise a class f cverig cdes f this type. Fig. 4. Three-dimesial array shwig a family f piecewise cstat cdes as the (2R + 3, 7, R)-cverig cdes. 4.4 The (2R + 3, 7, R)-Cverig Cdes Fig. 4 shws a family f piecewise cstat cdes, give i [5], which are (2R + 3, 7, R)-cverig cdes. The cde is partitied it three parts: = (2R ) + 3 + = 2R + 3. The figure shws certai key budaries f the Mahatta spheres f radius R. Each regi is marked by the cdewrd(s) cverig it. Recall that the umber f cdewrds, seve, is the sum f the circled umbers. I fact, the family f (2R + 3, 7, R)-cverig cdes ca be viewed as a geeralizati f the (5, 7, ) cde (Table 4) thrugh a amalgamated direct sum techique described i [2] ad [5]. 4.5 The (2R + 4, 2, R)-Cverig Cdes Fig. 5 shws ather family f piecewise cstat cdes, which are (2R + 4, 2, R)-cverig cdes. The cde is partitied it three parts: = (2R 2) + 3 + 3 = 2R + 4. As befre, the figure shws certai key budaries f the Mahatta spheres f radius R ad each regi is marked by the cdewrd(s) cverig it. Frmally, the family f (2R + 4, 2, R)- cverig cdes ca be viewed as a result f applyig the amalgamated direct sum f (6, 2, ) cde with (3, 2, ) cde iteratively [2], [5]. 4.6 Results The results f applyig the abve cdes t the partial-sum prblem are summarized i Table 6. The results shw a spectrum f space-time trade-ffs ad e ca chse a 5. Specialized Cverig Cdes fr Partial Sums I applyig existig (, K, R)-cverig cdes t the partialsum prblem i the previus secti, we chse cdes with cmbiatis f miimum radius R ad miimum umber f cdewrds K, give the legth f cdewrds. Miimizig the time fr the partial-sum prblem is differet frm miimizig the cverig radius R give legth ad K cdewrds f a (, K, R)-cverig cde i tw ways. First, the all-0 vectr (00 L 0) eed t be cvered (sice the crrespdig partial sum is always 0). Secd, the weight- vectrs ca be icluded i the cverig cde withut space cst sice they are preset i array A, which may reduce R. We, therefre, defie the sigle-weight-exteded cverig cde. T derive efficiet algrithms fr partial sums, ur ew bjective is t derive (, K, R) + -cverig cdes with cmbiatis f miimum R ad K, fr varius give small. DEFINITION. A biary cde C is a (, K, R) sigle-weightexteded cverig cde, deted (, K, R) + -cverig cde, if ) each cdewrd is f legth ; 2) there are K cdewrds i C; ad 3) lettig C = C {5*(00 L 0)}, i.e., C exteded with all weight- vectrs, the cverig radius f the cde C is R. Sice the all-0 vectr is always distace e frm ay weight- vectr ad R fr all ur cases, cverig the all-0 vectr (t be csistet with the defiiti f cverig cdes) des t icrease the cmplexities f K ad R f the cde. Clearly, a (, K, R)-cverig cde is als a (, K c, R) + - cverig cde. We will use K thrughut this secti t dete the umber f cdewrds excludig the all-0 vectr ad all weight- vectrs. THEOREM 3. Give a (, K, R) + -cverig cde, we ca cstruct a algrithm t derive the partial sum Psum(A, I) i time m T R + f + ad i space S K m + + f 05 2 t 7 PROOF. Fllws frm Therem 2 ad Defiiti. 0 5 0 5. 5.2 The (2R + 3, 4, R) + -Cverig Cdes We w give a cstructi f a (2R + 3, 4, R) + -cverig cde C fr all R ad prve its crrectess. The cstructi ca s

HO ET AL.: PARTIAL-SUM QUERIES IN OLAP DATA CUBES USING COVERING CODES 333 Fig. 5. Three-dimesial array shwig a family f piecewise cstat cdes as the (2R + 4, 2, R)-cverig cdes. be defied by Fig. 6, which is mdified frm Fig. 4 by takig it accut that all weight- cdewrds will be icluded. I Fig. 6, the 2R + 3 weight- cdewrds are represeted by the three dashed circles ((2R ) + 3 + = 2R + 3), ad deted by c 5, c 6, ad c 7. The K = 4 cdewrds are deted as c, L, c 4, respectively. As befre, each regi is marked by the cdewrd(s) cverig it. We w give a frmal defiiti ad prf f a (2R + 3, 4, R) + -cverig cde fr ay psitive iteger R. Recall that each cdewrd has 2R + 3 bits. We will use Y t dete the all- vectr ( L ) f legth 2R ad use Z t dete the all-0 vectr (00 L 0) f legth 2R. The, the fur cdewrds i the (2R + 3, 4, R) + -cverig cde, csistet with Fig. 6, ca be deted as C = {c = (Z ), c 2 = (Y ), c 3 = (Y 0), c 4 = (Y 000)}. THEOREM 4. The cde C defied abve is a (2R + 3, 4, R) + - cverig cde. PROOF. Csider ay vectr V f legth 2R + 3. Partiti the vectr V it three subvectrs, frm left t right: V f legth 2R, V 2 f legth three, ad V 3 f legth e. Let w, w 2, ad w 3 be the Hammig weight f V, V 2, ad V 3, respectively. Let W be the set f all legth-(2r + 3) weight- vectrs, i.e., W icludes c 5, c 6, c 7 f the figure. Recall frm Defiiti that the cverig radius f a sigle-weight-exteded cverig cde is defied with respect t C W. Csider the fllwig three cases that cver all cmbiatis f V: Case : w 3 = 0. If w + w 2 R + 2 (the lwer left regi f the figure) the the Hammig distace f V ad c 3 = (Y 0) is at mst (2R + 2) (R + 2) = R. Otherwise (the upper left regi), w + w 2 R + ad there exists a vectr i W whse Hammig distace is at mst R frm V. Case 2: w 3 = ad w R (the upper right regi). If w 2, the the Hammig distace betwee V ad c 7 = (Z 000) W is Hammig(V, Z) + w 2 = w + w 2 (R ) + = R. Otherwise, w 2 2 ad the Hammig distace betwee V ad c = (Z ) is Hammig(V, Z) + (3 w 2 ) (R ) + = R. Case 3: w 3 = ad w R (the lwer right regi). If w 2, the the Hammig distace betwee V ad c 4 = (Y 000) is Hammig(V, Y) + w 2 ((2R ) R) + = R. Otherwise, w 2 2 ad the Hammig distace betwee V ad c 2 = (Y ) is Hammig(V, Y) + (3 w 2 ) ((2R ) R) + = R.

334 IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, DECEMBER 998 5.3 Results Table 7 summarizes the best (s, t)-pairs btaied based the previus Table 6 ad the class f ew cdes devised i this secti. Nte that the (4, 2, 5)-cverig cde frm Table 6 is remved frm the ew table because the ew (7, 4, 2) + -cverig cde has a better (s, t)-pair. TABLE 6 BEST CHOICES OF S AND T BASED ON EXISTING COVERING CODES 6 FURTHER IMPROVEMENTS We w further mdify the defiiti f the cverig cde by addig a cmpsiti fucti, resultig i a ew class f cdes, which we shall call cmpsiti-exteded cverig cdes. The mai result (space ad time verheads) fr the partial-sum prblem implied by the ew class f cverig cdes is described i Therem 6. The key t the ew class f cdes is that a partial sum may be writte by a sum r differece f tw ther partial sums. Thus, sme efficiet cdig scheme ca be implemeted usig this. 6. Cverig Cdes with Cmpsiti Fucti Let be the bit-wise r peratr, the bit-wise ad peratr, ad the bit-wise exclusive-r peratr. Let dete a udefied value. DEFINITION 2. Defie a cmpsiti fucti f tw biary vectrs V ad V as fllws: 0 5 cmp V, V = V V = % & K ' K V V, if V V = 0; V V, if V V = V r V V = V ;, therwise. Fr example, cmp((00), (0)) = (00), cmp((00), (00)) = (0)), ad cmp((0), (0)) =. The ituiti behid this fucti lies i the fllwig lemma: LEMMA 5. Let V, V be tw -bit vectrs where V = cmp(v, V ). Als let I, I, ad I be supprt(v), supprt(v ), ad supprt(v ), respectively. The, give Psum(A, I) ad Psum(A, I ), e ca derive Psum(A, I ) i e additi r subtracti perati. PROOF. By Defiiti 2, it ca be shw that 0 5 Psum A, I = % &K 'K 0 5 0 5 0 5 0 5 Psum A, I + Psum A, I, if V V = 0; Psum0A, I 5 Psum0A, I5, if V V = V; Psum A, I Psum A, I, if V V = V. Fr csistecy, we will let cmp(v, V ) = if either V = r V =. (All ther rules still fllw Defiiti 2.) We assume peratr assciates frm left t right, i.e., V V V = (V V ) V. Nte that is cmmutative, but t assciative. Fr istace, (00) (0) (00) = (0), while (00) ((0) (00)) =. DEFINITION 3. A biary cde C is a (, K, R) cmpsitiexteded cverig cde, deted (, K, R)*-cverig cde, if ) each cdewrd is f legth, 2) there are K cdewrds i C, ad 3) every legth- cdewrd vectr V C ca be derived by up t R cmpsitis f R + cdewrds, i.e., Fig. 6. Three-dimesial array shwig a family f piecewise cstat cdes as the (2R + 3, 4, R) + -cverig cdes.

HO ET AL.: PARTIAL-SUM QUERIES IN OLAP DATA CUBES USING COVERING CODES 335 TABLE 7 BEST CHOICES OF S AND T BASED ON EXISTING AND SINGLE-WEIGHT-EXTENDED COVERING CODES TABLE 8 THE (4, 6, )*-COVERING CODE V = C C 2 L C i+, fr i R, C i C. Fr example, csider a cde C = {C = (), C 2 = (0), C 3 = (00), C 4 = (00), C 5 = (00), C 6 = (000)}. It ca be verified frm Table 8 that this cde is a (4, 6, )*-cverig cde. Clearly, a (, K, R) + -cverig cde is als a (, K +, R)*- cverig cde, but t vice versa. We will use K thrughut this secti t dete the ttal umber f cdewrds. Nte that the cde may t ctai all weight- vectrs as cdewrds. Hwever, i ur cmputer search, we miimize K first, give ad R, the maximize the ttal umber f weight- vectrs amg all miimum-k slutis. We were able t fid a miimum-k sluti with all weight- vectrs icluded as cdewrds fr all cases listed belw. Give a (, K, R)*-cmpsiti-exteded cverig cde C ad ay -bit vectr V, we will redefie f t () ad f s () as the time ad assciated space verheads, respectively, t fid the set f cdewrds C, L, C i+ ad its precmputed crrespdig partial sums such that V = C C 2 L C i+ where 0 i R. THEOREM 6. Give a (, K, R)*-cverig cde, we ca cstruct a algrithm t derive the partial sum Psum(A, I) i m time T R + f + ad i space S K m + f 05. 05 2 t 7 PROOF. We first shw that give a (m, K, R)*-cverig cde C, we ca cstruct a algrithm t derive the partial sum Psum(A, I) i time T = R + f t (m) + ad i space S = K + f s (m). We will precmpute ad stre the K partial sums f A that crrespd t the K cdewrds. Give a idex subset I at ru time, let V = mask(i). By Defiiti 3, we ca assume V = C C 2 L C x+ where 0 x R ad C x C. Let I i = supprt(c i ) fr all i x +. By Lemma 5, we ca derive Psum(A, I) by cmbiig Psum(A, I i )s thrugh additi r subtracti fr all i x +. This requires a verhead f f t (m) + R + i time ad f s (m) + K i space. The rest f the prf is similar t that f Therem 2 by applyig the time ad space verhead t each blck f A f size. s 6.2 Lwer Buds K LEMMA 7. Let S i {+, }, i x. If C C 2 L C x = V, the there exists a set f S i s such that S C + S 2 C 2 + L + S x C x = V, where the additi is bit-wise. PROOF. By Defiiti 3 ad the fact that V, we have C C 2 {C + C 2, C + C 2, C C 2 }. By applyig the same argumet t the sequece C C 2 L C x, the prf fllws. LEMMA 8. Let π be a permutati fucti f {, 2, L, x}. If C C 2 L C x = V ad Cπ () Cπ (2) L Cπ (x) = V, the V = V. PROOF. Let S i {+, } be the sig assciated with C i i x rder t derive V, Lemma 7. That is, SC i i = V. i= Let S be the rdered set {S, S 2, L, S x }. Assume that V V. The, there exists a ew rdered set S = < S, S, 2 K, S x A x such that SC = V i= i i ad S S (i.e., Si Si fr sme i {, 2, L, x}). The set S ca be derived frm the set f S by chagig all differet 2Si, Si 7-pairs. Nte, hwever, that every chage f sig frm S i t S i will result i a distace-2 r distace-0 mve f all digits i V. Mre specifically, the jth digit with value v will be chaged t e f {v + 2, v, v 2}, depedig the jth bit f C i. Thus, a digit which is eve (psitive, zer, r egative) remais eve due t the chages f sigs. Similarly, a digit which is dd (psitive r egative) remais dd. Fr istace, a 0- digit i V will be chaged t e i { 2, 0, 2} due t e sig chage, while a -digit will be chaged t e i {,, 3}. Sice 0 is the ly valid eve digit f ay defied vectr ad is the ly valid dd digit f ay defied vectr, V = V. I the abve prf, it is pssible that V = V while S S. I this case, there must be sme umber f cdewrds which cmpse t a all-0 vectr.

336 IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, DECEMBER 998 TABLE 9 BEST OBTAINED CHOICES OF S AND T BASED ON ALL TECHNIQUES THEOREM 9. Ay (, K, R)*-cverig cde must have R+ K i 2. i= PROOF. Fllws frm Lemma 8. COROLLARY 0. Ay (, K, )*-cverig cde must have K K + 2 2 COROLLARY. Ay (, K, 2)*-cverig cde must have 0 5. 2 4 9. K K + 5 6 2 6.3 Sme Useful Cmpsiti-Exteded Cverig Cdes T fid gd cmpsiti-exteded cverig cdes, we implemeted a cmputer search prgram based varius heuristics t search i selected subspace tha a exhaustive e. I the fllwig, we list the best cmpsiti-exteded cverig cdes that we fud s far; each is a result f a ru f at least e day a typical wrkstati. It may be pssible t imprve these cdes by havig lger rus. 6.3. The (6, 3, )*-Cverig Cde C = {, 2, 4, 6, 8, 6, 25, 32, 34, 36, 47, 55, 62}. This cde imprves frm previus K = K c + = 5 (due t (6, 2, )-cverig cde i Secti 4.5) t 3. The umber f weight- cdewrds is 6. The lwer bud K is, by Crllary 0. 6.3.2 The (7, 2, )*-Cverig Cde C = {, 2, 4, 8, 6, 24, 32, 33, 38, 39, 64, 72, 80, 9, 93, 94, 95, 22, 23, 24, 25}. This cde imprves frm previus K = 22 (due t (7, 6, ) Hammig cde i Secti 4.) t 2. The umber f weight- cdewrds is 7. The lwer bud K is 6, by Crllary 0. 6.3.3 The (8, 29, )*-Cverig Cde C = {, 2, 3, 4, 8, 6, 7, 8, 9, 32, 64, 76, 00, 08, 28, 29, 30, 3, 44, 45, 46, 59, 83, 87, 9, 25, 29, 243, 25}. This cde imprves frm previus K = 39 (due t (8, 32, )-cverig cde i Secti 4.2) t 29. The umber f weight- cdewrds is eight. The lwer bud K is 23, by Crllary 0. 6.3.4 The (9, 45, )*-Cverig Cde C = {, 2, 3, 4, 8, 6, 7, 8, 9, 32, 36, 40, 44, 64, 68, 96, 00, 04, 28, 32, 36, 40, 60, 232, 236, 256, 257, 258, 259, 272, 273, 274, 287, 347, 35, 383, 439, 443, 447, 467, 47, 475, 479, 499, 503}. This cde imprves frm previus K = 70 (due t (9, 62, )-cverig cde i Secti 4.2) t 45. The umber f weight- cdewrds is ie. The lwer bud K is 32, by Crllary 0. 6.3.5 The (8, 5, 2)*-Cverig Cde C = {, 2, 3, 4, 8, 6, 32, 33, 34, 64, 5, 28, 9, 204, 255}. This cde imprves frm previus K = 7 (due t (8, 2, 2)- cverig cde i Secti 4.5) t 5. The umber f weight- vectrs is eight. The lwer bud K is 2, by Crllary. 6.4 Results Table 9 summarizes the best (s, t)-pairs btaied based the previus Table 7 ad the ew cdes give i this secti. Fig. 7 shws three sets f data pits crrespdig t the (s, t)-pairs derived frm the existig cverig cdes, ew sigle-weight-exteded cverig cdes, ad ew cmpsiti-exteded cverig cdes. Fig. shws the best (s, t)-pairs cmbiig results frm all three types f cverig cdes, i.e., crrespdig t Table 9. Nte that i Fig. 7, the data pits fr cverig cdes ad thse fr sigle-weight-exteded cverig cdes d t verlap. Fr the cmpsiti-exteded cverig cdes, the curve stps at s = 5 because the ext (s, t) pit requires searchig a gd

HO ET AL.: PARTIAL-SUM QUERIES IN OLAP DATA CUBES USING COVERING CODES 337 Fig. 7. Three types f (s, t) data pits fr cmputig partial sum. (0, K, )*-cverig cde, a cmplicated search fr little gai i time, frm 0.22 fr = 9 t 0.2 fr = 0. 7 PARTIAL SUMS FOR MULTIDIMENSIONAL ARRAYS I this secti, we will geeralize the e-dimesial partialsum algrithm t the d-dimesial case. Assume A is a d- d dimesial array f frm m L m d ad let m = m i= i be the ttal size f A. Let M be the idex dmai f A. Let D = {,, d} be the set f dimesis. Fr each i D, let I i be a arbitrary subset f {0,, m i } specified by the user at query time. Als let I = {(x,, x d ) ( i D)(x i I i )}. That is, I = I L I d ad I M. Give A i advace ad I durig the query time, we are iterested i gettig partial sum f A, specified by I as: 0 5 x, Kxd I Psum A, I = A x, K, x d. 2 7 7. A Mtivatig Example Befre givig the geeral d-dimesial algrithm ad therem, we first give a mtivatig tw-dimesial example. Assume A is a tw-dimesial array f frm 5 5. Als assume that we are applyig the (5, 7, )-cverig cde, which is als a (5, 9, ) + -sigle-weight-exteded cverig cde, t each dimesi. Dete the ie cdewrds by C 0 thrugh C 8, csistet with the rder i Table 2. The idex lk-up table, deted by X, is still the same as that fr the e-dimesial case, Table 3. O the ther had, the partial-sum lk-up table will be exteded frm Table 2 (which has ie etries) t a tw-dimesial table, deted by P, f 9 9 etries. The, we will let P[i, j] ctai the precmputed partial sum Psum(A, supprt(c i ) supprt(c j )). Fr cveiece, we will view each etry f X as a set f (sig, idex) pairs. Assume give I = {3, 4} ad I 2 = {, 3, 4} at query time. We use mask(i ), which is (000) = 3, as a idex t the idex lk-up table X ad btai X[mask(I )] = {(+, 3), (+, 4)}. Als, we use mask(i 2 ), which is (00) =, as a idex t the same idex lk-up table X ad btai X[mask(I 2 )] = {(+, 6), (, 0)}. We will shw later that Psum(A, I) ca be cmputed as fllws. 0 5 Psum A, I = si P x, K, xd. 2si, xi7 X mack2ii7 % &K 'K i D Fllwig this, we have Psum(A, I) = P[3, 6] + P[4, 6] P[3, 0] P[4, 0] fr ur example. Ituitively, the fial partial sum Psum(A, I) is derived frm cmbiati f additis ad subtractis f all relevat etries i P, where the relevat etries are Cartesia prducts f differet etries idexed by X[mask(I i )]. Table 0 shws the precmputed partial sums crrespdig t the fur terms the right had side f the frmula. Fig. 8 gives a pictrial view crrespdig t the frmula. I the figure, meas a selected value. 7.2 The Mai Therem We are w ready t prve a lemma fr the geeral case f the abve example. LEMMA 2. Let B be a d-dimesial array f frm L, ad let Psum(B, I) be the partial-sum query. The, give a (, K, R)*-cverig cde, we ca cstruct a algrithm t derive Psum(B, I) fr ay I i time T = (R + ) d + f t ()d ad i space S = K d + f s (). TABLE 0 EXAMPLES OF INDEXED PARTIAL SUMS IN THE PARTIAL-SUM LOOK-UP TABLE ( )K *K

338 IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, DECEMBER 998 Fig. 8. A pictrial view f Psum(A, I) = P[3, 6] + P[4, 6] P[3, 0] P[4, 0]. Fig. 9. The best (s, t) data pits fr cmputig tw-dimesial partial sum. PROOF. Dete the set f K cdewrds by C = {C 0, C,, C K }. Let J i = supprt(c i ). We first cstruct a d- dimesial partial-sum lk-up table, f frm K L K. A etry idexed by (x, L, x d ) i the table will ctai precmputed result fr Psum(B, J), where J = J L J x x d. Give I at query time, let I = I L I d. Nte that, i the e-dimesial dmai, each I i ca be derived by cmbiig up t R + existig partial sums. Thrugh a iductive prf, e ca shw that I ca be derived by cmbiig up t (R + ) d existig partial sums frm the partial-sum lk-up table. Fr each dimesi, a time verhead f f t () is eeded t derive the idex f that dimesi t the partial-sum lk-up table. Thus, the verall time is T = (R+) d + f t ()d. Fr the space verhead, the partialsum lk-up table is f size K d ad the idex lk-up table is f size f s (). Sice we apply the same cverig cde t all d dimesis, there is ly e idex lkup table eeded. Thus, the verall space verhead is S = K d + f s (). As i the e-dimesial case, we will w partiti array A it blcks f frm L ad apply cverig cdes t each blck (usig the abve lemma) i rder t derive better space verheads. The prf f the fllwig therem is straightfrward: THEOREM 3. Give a (, K, R)*-cverig cde, we ca cstruct a algrithm t derive the d-dimesial partial sum R+ d m Psum(A, I) i time T m + df K d 3 8 s05. S m + f 3 8 t05 ad i space d The abve therem assumes that the same cverig cde is applied t all dimesis f each blck ad, thus, each blck is f frm L. I geeral, e ca apply differet cverig cdes t differet dimesis ad btai a wider rage f space-time trade-ffs. I this case, the legth f each side f the blck will be tailred t the legth f each cverig cde applied. COROLLARY 4. Give a (, K, R)*-cverig cde, we ca cstruct a algrithm t derive the d-dimesial partial sum R+ α d α m Psum(A, I) i time T m + df ad i 3 8 27 2 t05 α K α d α 3 8 3 m8 s05 space S + m + f.

HO ET AL.: PARTIAL-SUM QUERIES IN OLAP DATA CUBES USING COVERING CODES 339 PROOF. Apply a (, K, R)*-cmpsiti-exteded cverig cde t α dimesis ad the (m i, m i +, m i /2 ) + - sigle-weight-exteded cverig cde t the remaiig d α dimesis. The prf cmpletes by ticig that the latter cde has (s, t) (, 0.5). 7.3 Results Fig. 9 shws varius (s, t) data pits fr cmputig twdimesial partial sum based cmbiati f edimesial (s, t) data pits frm Table 9. The best (s, t) data pits are jied tgether by a curve. Nte the leftmst (s, t) data pit has bee chaged frm (, 0.5) i Fig. t (, 0.25) i this figure. 8 SUMMARY Partial-sum queries btai the summati ver specified cells f a data cube. I this paper, we established the cecti betwee the cverig prblem [2] i the thery f errr-crrectig cdes ad the partial-sum prblem. We use this cecti t apply fur kw cverig cdes frm [2], [5], ad [4] t the partial-sum prblem t btai algrithms with varius space-time trade-ffs. We the mdified the requiremets cverig cdes t better reflect the partial-sum prblem ad devise ew cverig cdes with respect t the ew requiremets. As a result, we develp ew algrithms with better space-time trade-ffs. Fr example, usig these algrithms, with 44 percet additial strage, the query respse time ca be imprved by abut 2 percet; by rughly dublig the strage requiremet, the query respse time ca be imprved by abut 34 percet. ACKNOWLEDGMENTS This research was supprted i part by U.S. Natial Sciece Fudati Yug Ivestigatr Award CCR-94578 ad by the Sla Research Fellwship. REFERENCES [] S. Agrawal, R. Agrawal, P.M. Deshpade, A. Gupta, J.F. Naught, R. Ramakrisha, ad S. Sarawagi, O the Cmputati f Multidimesial Aggregates, Prc. 22d It l Cf. Very Large Databases, pp. 506-52, Mumbai (Bmbay), Idia, Sept. 996. [2] L.S. Clby, R.L. Cle, E. Haslam, N. Jazayeri, G. Jhs, W.J. McKea, L. Schumacher, ad D. Wilhite, Red Brick Vista: Aggregate Cmputati ad Maagemet, Prc. 4th It l Cf. Data Eg., pp. 74-77, 998. [3] G.D. Che, I. Hkala, S. Litsy, ad A.C. Lbstei, Cverig Cdes. Elsevier, 977. [4] G.D. Che, S. Litsy, A.C. Lbstei, ad H.F. Matts Jr., Cverig Radius 985-994, J. Applicable Algebra i Eg., Cmm., ad Cmputig, special issue, vl. 8,. 3, 997. [5] G.D. Che, A.C. Lbstei, ad N.J.A. Slae, Further Results the Cverig Radius f Cdes, IEEE Tras. Ifrmati Thery, vl. 32,. 5, pp. 680-694, Sept. 986. [6] M.C. Che ad L.P. McNamee, The Data Mdel ad Access Methd f Summary Data Maagemet, IEEE Tras. Kwledge ad Data Eg., vl.,. 4, pp. 59-529, 989. [7] E.F. Cdd, Prvidig OLAP (O-Lie Aalytical Prcessig) t User-Aalysts: A IT Madate, techical reprt, E.F. Cdd ad Assc., 993. [8] S. Chaudhuri ad K. Shim, Icludig Grup-By i Query Optimizati, Prc. 20th It l Cf. Very Large Databases, pp. 354-366, Satiag, Chile, Sept. 994. [9] J. Gray, A. Bswrth, A. Layma, ad H. Pirahesh, Data Cube: A Relatial Aggregati Operatr Geeralizig Grup-By, Crss- Tabs, ad Sub-Ttals, Prc. 2th It l Cf. Data Eg., pp. 52-59, 996. [0] A. Gupta, V. Hariaraya, ad D. Quass, Aggregate-Query Prcessig i Data Warehusig Evirmets, Prc. Eighth It l Cf. Very Large Databases (VLDB), pp. 358-369, Zurich, Switzerlad, Sept. 995. [] H. Gupta, V. Hariaraya, A. Rajarama, ad J.D. Ullma, Idex Selecti fr OLAP, Prc. 3th It l Cf. Data Eg., Birmigham, U.K., Apr. 997. [2] R.L. Graham ad N.J.A. Slae, O the Cverig Radius f Cdes, IEEE Tras. Ifrmati Thery, vl. 3,. 3, pp. 385-40, May 985. [3] C.-T. H, R. Agrawal, N. Megidd, ad R. Srikat, Rage Queries i OLAP Data Cubes, Prc. ACM SIGMOD Cf. Maagemet f Data, Tucs, Ariz., May 997. [4] V. Hariaraya, A. Rajarama, ad J.D. Ullma, Implemetig Data Cubes Efficietly, Prc. ACM SIGMOD Cf. Maagemet f Data, Jue 996. [5] T. Jhs ad D. Shasha, Hierarchically Split Cube Frests fr Decisi Supprt: Descripti ad Tued Desig, wrkig paper, 996. [6] Special issue materialized views ad data warehusig, D. Lmet, ed., IEEE Data Eg. Bulleti, vl. 8,. 2, Jue 995. [7] Z. Michalewicz, Statistical ad Scietific Databases. Ellis Hrwd, 992. [8] The OLAP Cucil, MD-API the OLAP Applicati Prgram Iterface Versi 0.5 Specificati, Sept. 996. [9] A. Shukla, P.M. Deshpade, J.F. Naught, ad K. Ramasamy, Strage Estimati fr Multidimesial Aggregates i the Presece f Hierarchies, Prc. 22d It l Cf. Very Large Databases, pp. 522-53, Mumbai (Bmbay), Idia, Sept. 996. [20] B. Salzberg ad A. Reuter, Idexig fr Aggregati, wrkig paper, 996. [2] J. Srivastava, J.S.E. Ta, ad V.Y. Lum, TBSAM: A Access Methd fr Efficiet Prcessig f Statistical Queries, IEEE Tras. Kwledge ad Date Eg., vl.,. 4, 989. [22] W.P. Ya ad P. Lars, Eager Aggregati ad Lazy Aggregati, Prc. Eighth It l Cf. Very Large Databases (VLDB), pp. 345-357, Zurich, Switzerlad, Sept. 995. Chig-Tie H received a BS degree i electrical egieerig frm the Natial Taiwa Uiversity i 979 ad MS, MPhil, ad PhD degrees i cmputer sciece frm Yale Uiversity i 985, 986, ad 990, respectively. He jied the IBM Almade Research Ceter as a research staff member i 989. He was maager f the Fudatis f Massively Parallel Cmputig Grup frm 994 t 996, where he led the develpmet f cllective cmmuicati, as part f IBM MPL ad MPI, fr IBM SP- ad SP-2 parallel systems. His primary research iterests iclude cmmuicati issues fr itercecti etwrks, algrithms fr cllective cmmuicatis, graph embeddigs, fault tlerace, ad parallel algrithms ad architectures. His curret iterests are data miig ad -lie aalytical prcessig. He has published mre tha 80 jural ad cferece papers i these areas. Dr. H is a c-recipiet f the 986 Outstadig Paper Award f the Iteratial Cferece Parallel Prcessig. He has received a IBM Outstadig Ivati Award, tw IBM Outstadig Techical Achievemet Awards, ad fur IBM Plateau Iveti Achievemet Awards. He has patets grated r pedig. He is the Editrial Bard f the IEEE Trasactis Parallel ad Distributed Systems. He is e f prgram vice-chairs fr the 998 Iteratial Cferece Parallel Prcessig. He has wrked prgram cmmittees f may parallel prcessig cfereces ad wrkshps. He is a member f the ACM, the IEEE, ad the IEEE Cmputer Sciety.

340 IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, DECEMBER 998 Jehshua Bruck received the BSc ad MSc degrees i electrical egieerig frm the Techi, Israel Istitute f Techlgy, i 982 ad 985, respectively, ad the PhD degree i electrical egieerig frm Stafrd Uiversity i 989. He is a prfessr f cmputati ad eural systems ad electrical egieerig at the Califria Istitute f Techlgy. His research iterests iclude parallel ad distributed cmputig, fault-tlerat cmputig, errr-crrectig cdes, cmputati thery, ad bilgical systems. Dr. Bruck has extesive idustrial experiece, icludig servig as maager f the Fudatis f Massively Parallel Cmputig Grup at the IBM Almade Research Ceter frm 990 t 994, a research staff member at the IBM Almade Research Ceter frm 989 t 990, ad a researcher at the IBM Haifa Sciece Ceter frm 982 t 985. Dr. Bruck is the recipiet f a 997 IBM Partership Award, a 995 Sla Research Fellwship, a 994 U.S. Natial Sciece Fudati Yug Ivestigatr Award, six IBM Plateau Iveti Achievemet Awards, 992 IBM Outstadig Ivati Award fr his wrk harmic aalysis f eural etwrks, ad a 994 IBM Outstadig Techical Achievemet Award fr his ctributis t the desig ad implemetati f the SP-, the first IBM scalable parallel cmputer. He has published mre tha 30 jural ad cferece papers i his areas f iterests ad he hlds 2 patets. Dr. Bruck is a seir member f the IEEE ad a member f the Editrial Bard f the IEEE Trasactis Parallel ad Distributed Systems. Rakesh Agrawal received the MS ad PhD degrees i cmputer sciece frm the Uiversity f Wiscsi, Madis i 983. He als has a BE degree i electrics ad cmmuicati egieerig frm the Uiversity f Rrkee, Rrkee, Idia, ad a tw-year pstgraduate diplma i idustrial egieerig frm NITIE, Bmbay, Idia. He is the prject leader ad maager f the Quest prject Data Miig ad Decisi Supprt Techlgies at the IBM Almade Research Ceter, Sa Jse, Califria. Frm 983 t 989, he was with AT&T Bell Labratries, Murray Hill, New Jersey, where he was a member f the techical staff i the Cmputig Systems Research Labratry. He has bee with the IBM Almade Research Ceter sice Jauary 990. Dr. Agrawal is curretly a prgram chair fr the Furth Iteratial Cferece Kwledge Discvery ad Data Miig (KDD-98) ad a trustee f the VLDB Edwmet. He is als a editr f the ACM Trasactis Database Systems, VLDB Jural, ad the Data Miig ad Kwledge Discvery Jural. He has bee the chair f the IEEE Techical Cmmittee Data Egieerig, a prgram chair fr the 9th Iteratial Cferece Very Large Databases (VLDB- 93), a prgram chair fr the Secd Iteratial Sympsium Databases i Parallel ad Distributed Systems (DPDS-90), a editr f the IEEE Trasactis Kwledge ad Data Egieerig, ad editr f the IEEE Trasactis Parallel ad Distributed Systems, ad a assciate editr f the IEEE Data Egieerig Bulleti. He has published extesively i techical jurals ad cfereces. He has als authred a bk Prgrammig i ANSI C. Dr. Agrawal is a member f the IBM Academy f Techlgy ad a Research Divisi Master Ivetr. He has received IBM s Outstadig Ivati Award fr his wrk data miig ad Research Divisi Award fr his wrk bject-rieted databases. His curret research iterests iclude data miig, text ad web miig, OLAP, ad electric cmmerce. He is a seir member f IEEE.