Low-Contention Data Structures

Size: px

Start display at page:

Download "Low-Contention Data Structures"

Kory Mason
5 years ago
Views:

1 Low-Contenton Data Structures [Extended Abstract] James Aspnes Department of Computer Scence Yale Unversty New Haven, CT Davd Esenstat Ytong Yn State Key Laboratory for Novel Software Technology Nanjng Unversty Nanjng, Chna ABSTRACT We consder the problem of mnmzng contenton n statc dctonary data structures, where the contenton on each cell s measured by the expected number of probes to that cell gven an nput that s chosen from a dstrbuton that s not known to the query algorthm (but that may be known when the data structure s bult). When all postve queres are equally probable, and smlarly all negatve queres are equally probable, we show that t s possble to construct a data structure usng lnear space s, a constant number of queres, and wth contenton O(1/s) on each cell, correspondng to a nearly-flat load dstrbuton. All of these quanttes are asymptotcally optmal. For arbtrary query dstrbutons, the lack of knowledge of the query dstrbuton by the query algorthm prevents perfect load levelng n ths case: we present a lower bound, based on VC-dmenson, that shows that for a wde range of data structure problems, achevng contenton even wthn a polylogarthmc factor of optmal requres a cell-probe complexty of Ω(log log n). Categores and Subject Descrptors E.1 [Data Sstructures]; F.1. [Computaton by Abstract Devces]: Modes of Computaton Parallelsm and concurrency General Terms Algorthms, Performance, Theory Supported n part by NSF grant CNS Part of the work was done whle Davd Esenstat was a graduate student at Brown Unversty. Supported by the Natonal Scence Foundaton of Chna under Grant No Part of the work was done whle Ytong Yn was a graduate student at Yale Unversty. Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. SPAA 10, June 13 15, 010, Thra, Santorn, Greece. Copyrght 010 ACM /10/06...$ Keywords Memory contenton, data structure, cell-probe model 1. INTRODUCTION For shared-memory multprocessors, memory contenton measures the extent to whch processors mght access the same memory locaton at the same tme, and s one of the man ssues for realstc systems [9, 13] and theoretcal models [,7]. In [6], a theoretcal model s ntroduced by Dwork et al. to formally address the contenton costs of algorthmc problems. In ths paper, we propose to study the contenton cost of a data structure, whch measures how many queres to the data structure mght smultaneously access the same memory cell. To avod the queston of how many queres to the data structure are runnng at the same tme, we measure contenton ndrectly, by countng the expected number of probes to a gven cell for each ndvdual query. The expected number of probes to the cell for some fxed number m of smultaneous queres can then be bounded usng lnearty of expectaton. Wth bnary search, for example, the entry n the mddle of the table s accessed on every query, as s the cell storng the hash functon or root-level ndex wth perfect hashng and smlar ndex-based data structures. Dependng on the query dstrbuton, the remanng load may be balanced almost as badly over the other cells. We consder how to avod ths problem n case of statc data structures, where the data structure s bult n advance by a constructon algorthm that may know the query dstrbuton, but queres are performed by a unform algorthm that does not (although t may use randomzaton tself to spread the query load more evenly). The assumpton that the query algorthm does not know the query dstrbuton s natural. Often the query dstrbuton wll be hghly correlated wth the contents of the data structure, as n our smplest case where we consder a unform dstrbuton on successful queres to a statc dctonary. Provdng the query dstrbuton to the query algorthm n such a case would, n effect, gve t sgnfcant nformaton about the contents of the data structure. 1.1 Model Formally, a data structure problem s a functon f : Q D {0, 1}, such that for every query x Q and every data set S D, f(x, S) specfes the answer to the query x to data set S. A classc problem s the membershp

2 problem, where Q = [N] and D = `[N] n for some N n, and f(x, S) = 1 f and only f x S. We assume that the query x Q follows some probablty dstrbuton q over Q. For any data set S D and any query dstrbuton q over Q, a table T S,q : [s] {0, 1} b of s cells, each of whch contans b bts, s prepared. Gven a query x, a probablstc cellprobng algorthm A computes the value of f(x, S) by makng t randomzed adaptve cell-probes I x (1), I x (),..., I x (t) [s]. The algorthm A may depend on f, but not on S or q (except to the extent that later probes may depend on the outcome of earler probes, whose results mght encode nformaton about S and q). The contenton of a cell s the expected number of probes to the cell durng one executon of A. Ths wll be equal to the probablty that the cell s probed at all, provded A s sensble enough not to probe the same cell twce, but t s easer to work wth expectatons. In more detal: Defnton 1. For a fxed table T S,q, for a query chosen randomly from Q accordng to the dstrbuton q, the sequence of cell-probes s I (1), I(),..., I(t). Let Y (t) (x, j) be the 0-1 valued random varable ndcatng whether I x (t) = j. The contenton of cell j at step t s defned by h Φ t(j) := E Y (t) (, j), where the expectaton s taken over both and the random. The total contenton of cell j s Φ(j) := P t Φt(j). I (t) x It s obvous that P j Φt(j) = 1, therefore 1 maxj Φt(j) s 1. Ideally, we want max j Φ t(j) to approach 1. s A balanced cell-probng scheme s defned as follows: Defnton. An (s, b, t, φ)-balanced-cell-probng scheme for problem f : Q D {0, 1} s a cell-probng scheme such that for any S D and any probablty dstrbuton q over Q, a table T S,q : [s] {0, 1} b s constructed, such that for any query x Q, the algorthm returns f(x, S) by probng at most t cells, and for a query x Q generated accordng to the dstrbuton q, the contenton Φ k (j) s bounded by φ for any 1 k t and any j [s]. Such schemes have the very strong property that not only s contenton bounded across an executon of the query algorthm, but each ndvdual step gves low contenton. Gven a fxed table T S,q, we can summarze the contenton succnctly usng lnear algebra. Let P t be a Q s matrx wth P t(x, j) := Pr[I (t) = j] = E[Y (t) (, j)]. The contenton on all cells can be computed by Φ t = qp t, specfcally, h Φ t(j) = E Y (t) (, j) = x Q Pr [ = x] E = x Q q x P t(x, j). h Y (t) (x, j) Fnally, for our lower bound, t wll be helpful to consder data structure problems from the perspectve of communcaton complexty. In ths vew, a data structure s a communcaton protocol between an adaptve player Alce for the cell-probng algorthm and an oblvous player Bob for the table. The nput to Bob s a par (S, q), and the nput to Alce s a query x Q, whch s generated accordng to the dstrbuton q. Together they compute f(x, S) va communcaton. The contenton then counts the probablty of each type of message sent by Alce. 1. Our contrbutons Ths paper makes followng contrbutons: We formalze a natural and nterestng problem: memory contenton caused by concurrent data structure queres. We ntroduce contenton to the classc cellprobe model. In our model, contenton s measured by the chance that a memory cell s probed durng the executon of the cell-probe algorthm. Ths level of abstracton allows us to study the trade-off between the contenton and the complexty of data structures wthout regard to specfc contenton resoluton schemes. We note an especally nterestng class of query dstrbutons: dstrbutons that are unform over both the set of postve queres and the set of negatve queres (but not necessarly unform over all queres). We ntroduce a lnear-sze, constant-tme cell-probng scheme for the membershp problem, wth maxmum contenton O(1/n). It s easy to see that ths data structure s asymptotcally optmal n all three parameters. We study data structures wth arbtrary query dstrbuton. For ths general case, we prove a lower bound on any balanced cell-probng scheme satsfyng a certan natural techncal restrcton. The lower bound s a tme-contenton trade-off: for any problem whch has a non-degenerate subproblem of sze n, f the contenton s wthn a Polylog(n) factor of optmal, the tme complexty s Ω(log log n). Ths drectly mples the same lower bound for the membershp problem. 1.3 Related work Our frst upper bound s based on the well-known FKS constructon of Fredman et al. [8] and subsequent work by Detzfelbnger and Meyer auf der Heyde extendng these results to the dynamc case [3 5]; we wll refer to ths latter constructon, as descrbed n [4], by DM. The FKS constructon s a statc data structure for the membershp problem, based on a two-level tree of hash tables, wth lnear space and constant lookup tme. In DM, the hash functons used n FKS are replaced wth a new famly that gves a more even dstrbuton of load across the second layer of the tree, whch s used to get bounded worst-case update costs for the dynamc case. In [3] and [5], DM s mplemented n the PRAM model and the model of a complete synchronzed network of processors respectvely. Both mplementatons optmze the contenton on ndvdual processors, but do not consder the contenton on ndvdual memory locatons. Membershp can also be solved wth optmal tme and space complexty usng cuckoo hashng [1]; as wth FKS and DM, the contenton of the standard mplementaton s hgh, mostly because all queres read the hash functon parameters from the same locatons. For FKS, DM, and cuckoo hashng, contenton can be decreased by storng the hash functon redundantly. Under the assumpton that the query s dstrbuted unformly

3 wthn both the postve set and the negatve set, ths gves a maxmum contenton of Θ( n) tmes optmal for FKS, and Θ(ln n/ ln ln n) tmes optmal for DM and cuckoo hashng; whle for arbtrary query dstrbutons, the contentons can be arbtrarly bad. Ths s not surprsng, gven that none of these data structures are desgned wth memory contenton n mnd; nonetheless, we show that t s possble to do substantally better.. LOW-CONTENTION UNIFORM MEMBERSHIP QUERIES Let N = U be the sze of the unverse. We assume that N n, and each cell n the table contans a b-bt word, where b = log N. Theorem 3. For the membershp problem of n elements, wth the assumpton that the query s unformly dstrbuted wthn both postve queres and negatve queres, there exsts an (O(n), b, O(1), O(1/n))-balanced-cell-probng scheme. Gven a data set S `U n, the data structure can be constructed n expected O(n) tme on a unt-cost RAM. To see how our data structure works, t may help to start by consderng the query procedure for FKS hashng. FKS hashng works by takng a standard hash table and usng a secondary perfect hashng scheme wthn each of O(n) buckets to resolve collsons between elements hashed to the same bucket. Even though the largest bucket may contan O( n) elements, and the sze of the -th bucket s proportonal to the square n of the number of tems n n that bucket, because most buckets are small, the sum of these squares s lkely to be lnear n n. FKS guarantees that all queres fnsh n exactly three probes: the frst probe reads the parameters of the hash functon; the second reads a ponter to the bucket n whch the target tem wll be found, as well as nformaton about the sze of the bucket and the perfect hash functon used wthn the bucket; and the thrd reads the actual element. Ths produces contenton 1 on the cell for the frst probe and contenton Θ(n /n) on the cell for the second probe; both are much worse than our goal of O(1/n). We can reduce the contenton for the frst probe by replcaton; nstead of probng a sngle cell, we probe one of n dentcal copes. The second probe s trcker; we would lke to replcate the nformaton for large buckets, but the query algorthm does not know whch buckets are large. Our approach s to organze the buckets nto Θ(n/ log n) groups of Θ(log n) buckets each. Whle ndvdual buckets may vary sgnfcantly n sze, we can show that when usng the hash functons of DM [4], the total sze of each group wll be O(log n) wth reasonably hgh probablty. A btvector encodng allows us to ndcate the sze of all buckets n a group n a sngle O(log n)-bt cell, whch s replcated O(log n) tmes to reduce contenton to O(1/n). Knowng the sze of each bucket n the group, the query algorthm can deduce the storage range for the replcated headers of the target bucket, read the relevant header nformaton (ncludng both a ponter to the actual locaton of the bucket and the parameters of ts secondary hash functon) from a randomly-dstrbuted probe, and fnally use the bucket s perfect hash functon to fnd the target element. Ths fourphase procedure requres a constant number of probes and stll uses only O(n) space wth O(1/n) contenton, for ether unform postve or unform negatve queres..1 Hash famles In [1], unversal hash classes were ntroduced. For d, a famly of functons from U to [m] s d-wse ndependent (or d-unversal) f for any d dstnct elements x 1, x,..., x d from U, the hash values h(x 1), h(x ),..., h(x d ) are unformly and ndependently dstrbuted over [m]. Let H d m denote a d-wse ndependent hash famly of hash functons from U to [m]. It s well known that f d and m n, for any S `U, n wth at least 1 probablty a unformly random h H d m maps each element n S to a dstnct value;.e., t s a perfect hash functon. We use the followng hash famly, frst ntroduced n [4]. Defnton 4 (DM [4]). For f Hm, d g Hr, d and z [m] r the hash functon h f,g,z : U [m] s defned by The hash famly R d r,m s h f,g,z (x) := (f(x) + z g(x) ) mod m. R d r,m := {h f,g,z f H d m, g H d r, z [m] r }. Gven a hash functon and a set of elements, we defne the buckets and loads as follows. Defnton 5. For h : U [m], S U, and [m], the -th bucket B(S, h, ) := {x S h(x) = }, and the load of the -th bucket s l(s, h, ) := B(S, h, ). The followng theorem s from [11]. It bounds the devaton of the sum of a 0-1 valued d-wse ndependent sequence. Theorem 6 (Corollary 4.0, [11]). Let 1,..., n be 0-1 valued, d-wse ndependent, equdstrbuted random varables. Let = P n. If d E[], then «(E[]) d/ Pr [ E[] > t] O. The followng s a specal case of the Hoeffdng s theorem [10]. Theorem 7 (Hoeffdng). Let Y 1,..., Y r be ndependent random varables wth range of values n [0, d]. Let Y = P r Y, and c > e be some constant. If ce[y ] rd, then e c d E[Y ] Pr [Y ce [Y ]]. c For d-unversal hash famles, the followng theorem holds. Theorem 8 (Fact., [4]). Let S be a fxed set of n elements. Let f be chosen from Hm d unformly at random, where d > s a constant and m n/d. Then Pr [ [m], l(s, f, ) d] 1 n (n/m) d. The followng lemma characterzes the load dstrbuton of functons from varous famles. Lemma 9 (extended from [4, 8, 11]). Fx an S `U n. Let c > e and d > be constants. The followng holds: 1. For r = n 1 δ where < δ < 1 1, and g from d+ d Hd r, Pr [ [r], l(s, g, ) cn/r] 1 o (1).. For m = n where α > d, and α ln n c(ln c 1) h from R d r,m, Pr [ [m], l(s, h, ) cn/m] 1 o(1) t d

4 3. (FKS condton) h For s = βn where β, and h from P R d r,s, Pr [s] (l(s, h, )) s 1. Proof. Let S = {x 1, x,..., x n}. 1. For a fxed j [r], let be a 0-1 valued random varable P that ndcates whether g(x ) = j, and let = n. It s obvous that E[] = n = r nδ. Due to Theorem 6, h Pr n δ (c 1)n δ O n δd/ /n δd = O n δd/. Therefore, h Pr j [r], l(s, g, j) > cn r r Pr hl(s, g, j) > cn δ h n δ > (c 1)n δ n 1 δ Pr O n 1 δ δd/ = o(1).. We assume that h s defned by (f, g, z) where f and g are randomly drawn from H d m and H d r respectvely, and z s chosen unformly from [m] r. We defne the followng two events E 1 : [r], l(s, g, ) cn/r; E : [r], j [m], l(b(s, g, ), f, j) d. Due to the frst part, E 1 holds wth probablty 1 o(1). Condtonng on E 1, accordng to Theorem 8, wth probablty 1 O(n δ(d+1) /m d ), t holds that j [m], l(b(s, g, ), f, j) d. By unon bound, event E holds wth probablty 1 O(n 1 δ n δ(d+1) /m d ) = 1 O(n 1 d(1 δ) (ln n) d ) Therefore, = 1 o(1). Pr[E 1 E ] = Pr[E 1] Pr[E E 1] = (1 o(1))(1 o(1)) = 1 o(1). Condtonng on E 1 E, for any fxed j [m], for = 1,,..., r, defne random varable Y as Y := {x S g(x) = and Let Y = P r Y. Note that f(x) + z g(x) j Y = l(b(s, g, ), h, j) (mod m)}. = l(b(s, g, ), f, (j z + m) mod m), and Y = l(s, h, j). Because E holds, Y d for all [r], and Y are ndependent because z are ndependent. E[Y f, g] = 1 l(b(s, g, ), f, (j z + m) mod m) m = 1 m z [m] k [m] = 1 l(s, g, ). m Therefore l(b(s, g, ), f, k) E[Y ] = E [E [Y f, g]] 3 = 1 m E 4 l(s, g, ) 5 [r] = S m = n m. Accordng to Theorem 7, t holds that Pr[Y cn/m] (e/c) cα ln n/d = o(n 1 ). By unon bound, Pr[ j [m], l(s, h, j) cn/m] = 1 m Pr[Y cn/m] = 1 o(1). Recall that the above holds when condtonng on E 1 E. Snce Pr[E 1 E ] = 1 o(1), the event j [m], l(s, h, j) cn/m holds uncondtonally wth probablty at least (1 o(1))(1 o(1)) = 1 o(1). 3. For every, j [n] where j, let j be 0-1 valued random varable that ndcates whether h(x ) = h(x j). Let = P j j be the total number of ordered collson pars. It s easy to see that =! l(s, h, ) = h, )) [s] [s](l(s, n. Note that h s at least -wse ndependent, thus for any j, E[ j] = Pr[h(x ) = h(x j)] = 1/s, thus E[] = n(n 1)/s. Due to Markov s nequalty, 3 Pr 4 [s](l(s, h, )) > s5 = Pr[ > s n] E[] (s n) 1 β(β 1) 1/.. Data structure constructon Let c = e. For d >, choose approprate constants α and β as stated n Lemma 9, and let r = n 1 δ and m = n. α ln n In addton, choose an approprate constant β to make s = βn dvsble by m.

5 Gven any data set S `U n, unformly choose f H d s, g H d r, and z [s] r, and construct a unformly random h R d r,s by lettng h(x) := (f(x) + z g(x) ) mod s. Defne a new hash functon h : U [m] by h (x) = h(x) mod m. Note that h s a unformly random functon from the famly R d r,m because m dvdes s. Specfcally, h (x) = `f(x) + z g(x) mod s mod m = `f(x) mod m + z g(x) mod m mod m. For unform f Hs d and unform z [s] r, (f mod m) and (z mod m) are unform over Hm d and [m] r respectvely. Therefore, h s unform over R d r,m. We would lke our hash functon to satsfy the property: P(S) := j`g, h, h Hr d R d r,m R d r,s [r], l(s, g, ) cn/r, and [m], l(s, h, ) cn/m, ) and [s] l (S, h, ) s Due to Lemma 9, and by applyng the unon bound to all unwanted events, for the above g, h and h, t holds that (g, h, h) P(S) wth probablty at least 1/ o(1). Therefore by repeatedly generatng (g, h, h), we satsfy P(S) wthn expected O(1) trals. Note that P(S) can be verfed n O(n) tme n a unt-cost machne, thus a good hash functon can be found wthn expected O(n) tme. The data structure s organzed n rows of cells where each row contans s cells. Let T (, j) represent the j-th cell n the -th row n the data structure. T s constructed as follows: Let a 0, a 1,..., a d 1 denote the d words that represent the two d-unversal functons f and g. Let T (, j) = a for every [d] and every j [s]. Let T (d, j) = z[j mod r] for every j [s]. We say that h assgns the n elements n S nto s buckets, and h arranges the buckets nto m groups accordng to the congruence classes of h modulo m. For group [m], we defne the group-base-address GBA S() as GBA S(0) = 0 and GBA S() = GBA S( 1) + l (S, h, km + 1). k [s/m] The vector GBA S can be computed n O(n) tme n a unt-cost machne. Due to the property P(S), GBA S() s for any [m]. Let T (d+1, j) = GBA S(j mod m),.e. each bucket stores the group-base-address of the group that the bucket belongs to. Let a group-hstogram be a bnary strng where the load of each bucket n the group s represented consecutvely n unary code separated by zeros. Each group contans s/m = αβ ln n buckets, and due to property P(S), each group contans at most cn/m = cα ln n elements from S. Therefore the group-hstogram α(β+c) ln n uses at most α(β + c) ln n bts. Let ρ :=. b Observe that because b = Θ(log n), ρ = O(1). Let a 0j, a 1j..., a ρ 1,j denote the ρ words that store the group-hstogram of group j. Let T (d++, j) = a,(j mod m), for = 0, 1,..., ρ 1, and for all j [s]. The last two rows are used to perfectly hash each bucket. Each bucket [s] owns l (S, h, ) cells n each row. Due to P(S), the total space s at most s. The spaces owned by the buckets are organzed n groups. If bucket s the k-th bucket n group j, then the spaces owned by the buckets are sorted lexcographcally. Ths can be done n a total O(n) tme n a unt-cost machne. In the (d + ρ + 1)th row, for each ndvdual bucket, the perfect hash functon h s stored repeatedly n the space owned by the bucket. In the (d + ρ + )th row, the actual data n each bucket s stored accordng to the hash functon h. The table T has (d + ρ + ) = O(1) rows, each of whch contans s = O(n) words, for a total of O(n) words. Each step of the constructon costs O(n) tme, for a total of O(n) tme..3 Queres and contenton We query whether x s n S wth the followng algorthm. Each random choce s assumed to be ndependent and unform wthn ts range. 1. For each [d], choose j [s], and read T (, j); ths gves f and g. Next choose k [s/r] and read T (d, kr + g(x)), whch stores z g(x). We can now compute h = (f + z g) mod s and h = h mod m.. Choose k n [s/m], and read T (d, km + h (x)), whch stores GBA S(h (x)). For each [ρ] where ρ = α(β+c) ln n, choose some j [s/m], and read T (d + b 1 +, jm + h (x)); we thus obtan the group-hstogram group h (x). Wth the group-base-address and the group-hstogram, the exact range of the address owned by bucket h(x) can be determned: t runs from h(x) to h(x) nclusvely, where h(x) := GBA S(h (x)) h(x)/m 1 + k=0 h(x) := h(x) + l (S, h, h(x)) 1. l `S, h, km + h (x), All the values l (S, h, km + h (x)) for k [s/m] are stored n the group-hstogram of group h (x). 3. If h(x) < h(x), the bucket h(x) s empty: return 0. Otherwse, choose j [ h(x), h(x)] and read T (d+ρ+ 1, j) to get the perfect hash functon h. If T (d + ρ +, h(x) + h (x)) = x, return 1, else return 0. The correctness of the algorthm s guaranteed by the exstence of hash functons wth property P(S) and the exstence of the perfect hashng scheme for each bucket, whch s guaranteed. The query algorthm makes at most one probe to each row of T, thus the cell-probe complexty s O(1). For contenton, we frst consder the contrbuton of the postve queres. All events below are condtoned on the target element beng n S. At each step before the last probe, an 1 expected 1, l(s, g, 1), 1 l(s, n n h, ), or 1 l(s, h, 3) probes n

6 are balanced over a range of sze s, s/r, s/m, or l (S, h, 3) respectvely, therefore due to property P(S), the maxmum contenton s O(1/n). For the last probe, the perfect hash functon sends each query to a dstnct cell so the contenton s obvously O(1/n). Therefore, the total contenton contrbuted by postve queres s at most O(1/n). Lemma 10. Let S denote U \ S, and N = U = ω(n). For any hash functon h : U [k] whch s unform over the doman, for suffcently large n, [k], l( S, h, ) (N n)/k. Proof. Because h s unform over the doman, l(u, h, ) = N/k for any [k]. For N = ω(n), t holds that l( S, h, ) = l(u, h, ) l(s, h, ) N/k (N n)/k. Note that g, h, and h are all unform over the doman. Ths s because any d-unversal functon must be 1-unversal. Due to Lemma 10, the loads of negatve queres to all types of buckets are asymptotcally even. The same argument as above can be appled to bound the contenton caused by negatve queres to O(1/n). 3. A LOWER BOUND FOR ARBITRARY QUERY DISTRIBUTIONS In ths secton, we prove a cell-probe lower bound for lowcontenton data structures wth arbtrary query dstrbuton. The lower bound s on the cost of queres by an algorthm that does not know the dstrbuton; however, the algorthm that constructs the data structure may known the dstrbuton, and may optmze the data structure to mnmze contenton by encodng the nformaton of query dstrbuton n the data structure to gude the query algorthm. The lower bound tself s proved based on the followng ntuton: the more unform a random probe s, the less specfc nformaton t retreves; but non-unform probes wll only result n low contenton f the query algorthm already has some knowledge about the query dstrbuton. So to obtan nformaton about a specfc query, the query algorthm must steadly ncrease ts knowledge of the query dstrbuton through ncreasngly non-unform probes. By trackng how much nformaton the query algorthm has about the query dstrbuton (and thus how evenly spread out t must keep ts probes), we can bound the query tme subject to bounds on the contenton. We formalze ths argument usng VC-dmenson [14], a measure of complexty of classfcaton problems, by treatng a data structure problem f : Q D {0, 1} as a class of D many classfcatons of Q. Defnton 11. The VC-dmenson of a data structure problem f : Q D {0, 1}, denoted by VC-dm(f), s the maxmum n such that there exsts a set {x 1, x,..., x n} `Q n such that for any assgnment y {0, 1} n, there exsts some S D, wth f(x, S) = y for all. It s easy to see that the VC-dmenson of the membershp problem s n, where n s the cardnalty of the data set. Ths allows us to translate our results for problems of arbtrary VC-dmenson nto specfc results for membershp. We consder a specal class of cell-probng schemes (T, A) whose cell-probng algorthm A satsfes a natural restrcton descrbed as follows. Defnton 1. A table structure s a mappng T whch for any data set S D and any query dstrbuton q over Q, specfes a table T S,q : [s] {0, 1} b of s cells, each of whch contans b bts. Gven any query x Q, a cell-probng algorthm A returns f(x, S) by makng at most t randomzed adaptve probes I x (1), I x (),..., I (t ) x [s] to the table T S,q, such that the maxmum contenton satsfes Φ t φ for any t t, and for any fxed query x and any fxed table T S,q, the random varables I x (t) for all t t are jontly ndependent. The ndependence of cell-probes of A does not make A non-adaptve, because the ndependence holds only when the query and the table are both fxed. Note that n ths sense all determnstc cell-probng algorthms (both adaptve or non-adaptve) satsfy ths property, because once the table and the query are both fxed, the sequence of the cellprobes of a determnstc cell-probng algorthm are fxed, hence they are jontly ndependent. Informally speakng, for A, the randomness s used only for balancng the cellprobes, but s not nvolved n the process of decson makng. The upper bound presented n the prevous secton and any upper bounds constructed by the technque of dstrbutng probes across multple copes of crtcal cells are all ncluded n ths defnton. We prove the followng lower bound theorem. Theorem 13. For any data structure problem f wth a VC-dmenson VC-dm(f) = n, f there exsts a cell-probng scheme (T, A) as defned n Defnton 1, wth sze of cell b Polylog(n) and contenton φ Polylog(n), then the cellprobe complexty t = Ω(log log s n). The theorem s proved by frst runnng n dfferent nstances of queres n parallel and then boundng the speed wth whch these parallel nstances of the cell-probng algorthm gather nformaton. Lemma 14. If there exsts a cell-probng scheme (T, A) for the data structure problem f where (T, A) and f are as n Theorem 13, then there exsts a communcaton protocol between an algorthm A and a black-box B whch s specfed as follows. The nput to B s an arbtrary stochastc vector q [0, 1] n that P n =1 q 1, whch s ntally unknown to A. The communcaton between A and B occurs n rounds. 1. At round t, A specfes an n s matrx P t, called a probe specfcaton, and sends t to B, where P t s adaptve to the nformaton receved prevously by A, and for any 1 n, t holds that s P t(, j) 1 ; (1) j=1 φ max Pt(, j). () 1 j s q. Upon recevng a P t, B sends C t bts to A, where C t s a random varable satsfyng s E [C t] b max Pt(, j), (3) 1 n j=1 where the expectaton s condtoned on P t, thus condtoned on all prevous communcaton between A and B.

7 3. After t rounds, the expected number of bts receved by A s at least n t bts from B. Proof. The dea of the proof s to run n nstances of the cell-probng algorthm n parallel; together these nstances form A. We observe that the cell-probes of each ndvdual cell-probng algorthm can be specfed by a probablty dstrbuton of probes over the table, and the jont dstrbuton of the cell-probes of all n nstances can be arbtrarly chosen by us as long as the margnal dstrbuton of the cell-probe of each ndvdual nstance s the same as before, therefore (by our choce of the jont dstrbuton) the total nformaton obtaned by A n each round s bounded. The detals of the proof are gven n Appendx A. The followng two combnatoral lemmas are needed for the proof of Theorem 13. Lemma 15. Let M be an N n nonnegatve matrx. Let r = 5ɛ 1 δn ln N. Assume that for every row 1 u N, there exsts a set R u `{1,,...,n} r of r entres such that P R u M(u, ) δ. Then there exsts q [0, 1] n that P q = ɛ, such that for all 1 u N, there exsts 1 n, such that M(u, ) < q. Proof. For each 1 u N, sort {M(u, ) R u} by non-decreasng order and let R u {1,,..., n} be the ndces of the smallest r entres. It holds that R u, M(u, ) δ, as otherwse t contradcts the assumpton r that P R u M(u, ) δ. It holds that for any choce of such {R u} 1 u N, there n ln N exsts a T {1,,..., n}, such that T = and T r ntersects all R u. We prove ths by the probablstc method: let T be a unformly random subset of {1,,..., n} of sze n ln N, thus each R r u s mssed by T wth probablty less than (1 r/n) n ln N/r < 1/N, thus by the unon bound, T ntersects all R u wth postve probablty. Fx such a T, defne q [0, 1] n as q = ɛ T 1 = rɛ n ln N f T, and q = 0 f otherwse. Therefore, P q = ɛ, and for any 1 u N, for such R u T, t holds that M(u, ) δ < rɛ = q. r n ln N P Lemma 16. For any nonnegatve n s matrx P that j P (, j) 1 for every, let R be the largest subset of {1,,..., n} that P R s. Then t holds that R 1 max j P (,j) s max P (, j). 1 n j=1 Proof. The sum P j max P (, j) chooses exactly s entres to sum up. Let A be the set of chosen columns n row. Let x := P j A P (, j). Note that x P j P (, j) 1. By the pgeonhole prncple, for any 1 n, P j A A P (, j) max j P (, j) = x max j P (, j). Note that P A = s, thus P x P max j P (,j) A = s. Therefore the sum P j max P (, j) can be wrtten as max P (, j) = P (, j) = x, j j A x max j P (,j) subject to the constrants that P s and x 1. It s easy to see that the value of P x s maxmzed when P lettng x = 1 for R and x = 0 for R, therefore j max P (, j) = P x R. Proof of Theorem 13: Gven the algorthm A as descrbed n Lemma 14, we wll bound the speed that A gathers nformaton. Due to Lemma 14, A s a decson tree n whch the current node of depth (t 1) has N t := C t 1 chldren, each of whch corresponds to a next probe specfcaton P t. We number these P t by u [N t] and denote each as P (u) t, where u can be nterpreted as the bt strng receved by A at round t 1. We then nductvely bound the next C t. Let M (t) be an N t n matrx defned as follows: M (t) (u, ) := φ max j P (u) t (, j). Each row of the matrx M (t) corresponds to a possble next probe specfcaton. We say that the stochastc vector q volates row u of M (t) f there exsts 1 n, such that M (t) (u, ) < q. Note that f row u of M (t) s volated by q, then accordng to (), the next probe specfcaton cannot be P (u) t. Let r t := 5t φ sn ln N t. We say that a row u of M (t) s good f there exsts R {1,,..., n} such that R = r t and P R M (t) (u, ) φ s. We clam that f a row u s not good, then for the correspondng P (u), t holds that s max t (, j) r t. (4) P (u) 1 n j=1 The proof s as follows: If a row u of M (t) s not good, then by defnton, for any R of sze r P t, R M (t) (u, ) > φ s, thus for any R that P 1 R s, t must hold max j P (u) t (,j) that R < r t, therefore due to Lemma 16, t holds that P s j=1 max 1 n P (u) t (x, j) r t. Due to (4) and (3), the amount of nformaton brought by a set of probes P (u) t where u s a bad row n M (t), s bounded by br t bts. We show by an adversary argument that there exsts a q that A always choose probes correspondng to bad rows. At each round t, the adversary always chooses some q that volates all the good rows n M (t). Accordng to Lemma 15, the adversary can do so as long as t t. Settng ɛ = 1 and δ = φ s n Lemma 15, n each round, the t adversary can ncrease the value of some q so that P n =1 q s ncreased by at most 1, thereby volatng all good rows t n the current M (t). Thus before round t, the vector q s always stochastc. Note that ncreasng the value of q wll never make a volated row non-volated, so t wll not make the adversary nconsstent. Aganst such an adversary, at each round t, A can only choose a probe specfcaton P (u) t where u s a bad row n M (t), accordng to Clam (4), whch mples that s max Pt(, j) 1 n rt = p 5t φ sn ln N t j=1 = p 5t φ snc t 1 ln. Due to (3), t holds that E[C t] b max P t(, j) p (5 ln )b t φ snc t 1, j where the expectaton s condtoned on all prevous rounds of communcaton. Therefore the followng recurson holds

8 for the sequence of random varables C 1, C,..., C t: E[C t C t 1] p (5 ln )b t φ snc t 1. The square root functon s concave, thus by Jensen s nequalty, t holds for the uncondtonal expectaton that E[C t] = E[E[C t C t 1]] hp E (5 ln )b t φ snc t 1 p (5 ln )b t φ sn E[C t 1]. Before the frst probe, q s unknown to A, thus due to (), for any, j, P 1(x, j) φ, therefore E[C 1] b max P 1(x, j) bφ s. j Let a 1 := bφ s and a := (5 ln )b t φ sn. The followng recurson holds for E[C t] that E[C 1] a 1; E[C t] (a E[C t 1]) 1. By nducton, E[C t] a 1 t 1 a 1 1 t. After t rounds, the expected total number of bts receved by A s at least n t, therefore n t t t E[C t] t t a 1 t 1 a 1 1 t a 1a 1 t. Polylog (n) s, Wth the assumpton that b Polylog(n) and φ t holds that a 1 Polylog(n) and a n Polylog(n). Solvng the above nequalty, we have that t log log n o(log log n) = Ω(log log n). Theorem 13 s proved. 4. CONCLUSION In ths paper, we propose to study the memory contenton caused by concurrent data structure queres. To study the problem, we ntroduce a measure of contenton to the classc cell-probe model of statc data structures. We show that f all postve queres are equally probable and smlarly all negatvely are equally probable, then there exsts a statc dctonary whch answers membershp queres wth asymptotcally optmal performance of tme, space and contenton. For the general case that the query dstrbuton s arbtrary, we show that for all data structure problems wth non-degeneratng VC-dmensons, f the randomness s used only for balancng the probes, then even wth unbounded space, the tme and contenton cannot be both optmal. A possble future drecton s to remove the assumpton of ndependent cell-probes n the lower bound. Note that we only rely on ths assumpton to make sure that the contenton constrant of () holds condtonng on any partcular sequence of prevous cell-probes, whch s requred by the adversary argument. We suspect that wth a more careful analyss, ths assumpton can be removed, whch would mply that the lower bound holds not only for the randomzed data structures that use the randomness only for balancng probes, but also for the true randomzed data structures where the randomness s also nvolved n the computaton of queres. Another nterestng and perhaps more realstc future drecton s to study the contenton caused by the updates n dynamc data structures. 5. REFERENCES [1] J. Carter and M. Wegman. Unversal classes of hash functons. Journal of Computer and System Scences (JCSS), 18(): , [] D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonan, and T. Von Ecken. LogP: Towards a realstc model of parallel computaton. ACM SIGPLAN Notces, 8(7):1 1, [3] M. Detzfelbnger and F. Meyer auf der Hede. An optmal parallel dctonary. In Proceedngs of the frst annual ACM Symposum on Parallel Algorthms and Archtectures (SPAA), pages ACM New York, NY, USA, [4] M. Detzfelbnger and F. Meyer auf der Hede. A new unversal class of hash functons and dynamc hashng n real tme. In Proceedngs of the 17th Internatonal Colloquum on Automata, Languages and Programmng (ICALP), volume 443, pages Sprnger, [5] M. Detzfelbnger and F. Meyer auf der Hede. How to dstrbute a dctonary n a complete network. In Proceedngs of the twenty-second annual ACM Symposum on Theory of Computng (STOC), pages ACM New York, NY, USA, [6] C. Dwork, M. Herlhy, and O. Waarts. Contenton n shared memory algorthms. Journal of the ACM (JACM), 44(6): , [7] S. Fortune and J. Wylle. Parallelsm n random access machnes. In Proceedngs of the tenth annual ACM Symposum on Theory of Computng (STOC), pages ACM New York, NY, USA, [8] M. Fredman, J. Komlós, and E. Szemeréd. Storng a Sparse Table wth O(1) Worst Case Access Tme. Journal of the ACM (JACM), 31(3): , [9] M. Herlhy, B. Lm, and N. Shavt. Low contenton load balancng on large-scale multprocessors. In Proceedngs of the fourth annual ACM Symposum on Parallel Algorthms and Archtectures (SPAA), pages ACM New York, NY, USA, 199. [10] W. Hoeffdng. Probablty nequaltes for sums of bounded random varables. Journal of the Amercan Statstcal Assocaton, 58(301):13 30, [11] C. P. Kruskal, L. Rudolph, and M. Snr. A complexty theory of effcent parallel algorthms. Theor. Comput. Sc., 71(1):95 13, [1] R. Pagh and F. Rodler. Cuckoo hashng. Journal of Algorthms, 51():1 144, 004. [13] P. Tzeng and D. Lawre. Dstrbutng hot-spot addressng n large-scale multprocessors. IEEE Transactons on Computers, 100(36): , [14] V. Vapnk and A. Y. Chervonenks. On the unform convergence of relatve frequences of events to ther probabltes. Theory of Probablty and ts Applcatons, 16():64 80, APPENDI A. PROOF OF LEMMA 14 The lemma s proved by frst smulatng the cell-probng algorthm A by a modfed cell-probng algorthm A whch ndependently probes all cells n the table n every step, and

9 then runnng n nstances of A n parallel as a new cellprobng algorthm A. Frst, we observe that a randomzed cell-probe can be smulated wth bounded error by ndependently probng all cells. Defnton 17. A product-space cell-probe to a table of s cells s a random set J [s] such that the probablty space of J s a product probablty space. For the rest of the proof, we assume the assumpton of Theorem 13. Assumpton 18. Let f be a data structure problem wth VC-dm(f) = n. There exsts a cell-probng scheme (T, A) as descrbed n Defnton 1, such that (T, A) solves f wth performance parameters (s, b, t, φ ), where the sze of cell b Polylog(n) and the maxmum contenton φ Polylog(n) s. Lemma 19. If Assumpton 18 s true, then there exsts a product-space cell-probng algorthm A, such that on any vald table T S,q, for any query x Q, the followng propertes hold for the sequence of product-space cell-probes J (1) x, J () x,..., J (t ) x. 1. At any step t t, A fals (returns a specal symbol ) ndependently wth probablty at most 3. Condtoned on that there s no falure after t steps, whch 4 s an event wth probablty at least t, t holds that J x (1), J x (),..., J (t ) x are jontly ndependent and A returns what A returns.. For any t t, the total probablty of each productspace cell-probe of A h Pr j J x (t) 1; (5) j [s] 3. For any t t, the contenton of any cell j [s] h q(x ) Pr j J x (t) φ. (6) Proof. A cell-probe of A can be represented as a random varable I [s], where I denotes the probed cell. Let p := Pr[I = ]. Let J [s] represent a product-space cell-probe. Gven a probablty vector p, a cell-probe I s smulated by a product-space cell-probe as follows: Independently probe each cell [s] wth probablty p := mn{p, 1 }. The resultng set s J. If J 1, then fals; f J = {}, then fals wth a probablty ɛ := mn{p, 1 p }. Let I = f not fal. Case 1: p 1 for all [s]. Then for all [s], p = p and ɛ = p. Let ρ = Q j [s] (1 pj). Snce p 1 for all [s], t holds that ρ 1. 4 The probablty Pr[I = ] = (1 ɛ ) Pr[J = {}] Y = (1 p ) p (1 p j) = p ρ, whch s proportonal to p. wth probablty ρ 1. 4 j The procedure succeeds Case : Let p 0 > 1 and all other p < 1. Then p 0 = 1 and ɛ 0 = 1 p 0, and for all > 0, t holds that p = p and ɛ = p. Let ρ = Q j>0 (1 pj). It holds that ρ > 1 snce P j>0 pj = 1 p0 < 1. For 0, Pr[I = ] = (1 p ) Pr[J = {}] = 1 ρ p ; and for cell 0, Pr[I = 0] = p 0 Pr[J = {}] = 1 ρ p 0. The procedure succeeds wth probablty 1 ρ > 1 4. For both cases, a cell-probe of A s smulated by a productspace cell-probe wth a probablty at least 1. The event of 4 a falure occurs ndependently wth probablty at most 3. 4 Wth a probablty at least t, no falure occurs at all, condtoned on whch ts s obvous that A can smulate A, and the product-space cell-probes are jontly ndependent snce the cell-probes of A are jontly ndependent. The total probablty of a product-space cell-probe s Pr[ J] = p 1. The probablty of a probe to each cell s no greater than before, therefore the maxmum contenton φ not ncreased. We observe that by runnng n nstances of the productspace cell-probng algorthm n parallel, the behavor of each ndvdual nstance depend only on the margnal dstrbuton of cell-probes of the nstance, but does not depend on the jont dstrbuton of cell-probes of all n nstances. Thus, the jont dstrbuton of cell-probes can be arbtrarly chosen by us as long as the margnal dstrbuton of cell-probes of each ndvdual nstance s the same as before. Lemma 0. Let A be an algorthm that on a vald table T S,q, for a set of n queres {x 1, x,..., x n} `Q n, at step t, A randomly probes n sets of cells (L (t) x 1, L (t) x,..., L (t) x n ). If for every x where 1 n and every t t, the margnal dstrbuton of L (t) x s dentcal to the dstrbuton of J x (t), where J x (t) s the t s product-space cell-probe of the algorthm A on the same table T S,q, then A returns f(x, S) for expected n t number of x. Proof. Let A run an nstance of A wth nput x n parallel for every x, denoted as A x. Let the set of cells probed by each ndvdual A x at tme t be L (t) x. On a fxed table T S,q and for a fxed x, snce L (t) x s dentcally dstrbuted as J x (t), every ndvdual nstance of A x smulates a runnng nstance of A wth nput query x. By Lemma 19, each A x termnates n t steps wthout falure wth probablty at least t, thus by the lnearty of expectaton, after t tme, the expected total number of termnated nstances s at least n t. In the next lemma, we construct a jont dstrbuton of cell-probes whch mnmzes the expected total number of probed cells. Lemma 1. For any probablty dstrbuton of J [s] where 1 n and each J s chosen from a product probablty space, there exsts a jont dstrbuton (L 1, L,..., L n),

10 such that for every 1 n, L s dentcally dstrbuted as J, and t holds that 3 [ E 4 L 5 max Pr [j J]. 1 n 1 n j [s] Proof. We construct the jont dstrbuton of (L ) ( [s] ) n as follow. Let p j = max 1 n Pr[j J ]. Choose each j [s] ndependently wth probablty p j. Let B denote the set of chosen elements of [s]. For every 1 n, let each j B jon L ndependently wth probablty Pr[j J ] p j. Note that p j Pr[j J ], therefore the probablty s well-defned. For each 1 n, and for every j [s], j jons L ndependently wth probablty p j Pr[j s chosen to L j B] = Pr[j J ], thus each L s dentcally dstrbuted as J. Note that for every L, all of ts elements are chosen from set B. It holds that 3 [ E 4 L 5 E [ B ] = p j = max Pr[j J]. 1 n 1 n j [s] j [s] Proof of Lemma 14: Let {x 1, x,..., x n} `Q n be a set of queres whch acheves the VC-dmenson VC-dm(f) = n,.e. any Boolean assgnment of f(x, S) s possble. Let such {x 1, x,..., x n} be the nput query set to A, where A s as descrbed n Lemma 0. By an nformaton theoretcal argument, n the worst case, A has to collect expected n t bts nformaton after t steps. Let the jont dstrbuton of (L (t) x 1, L x (t),..., L x (t) n ) of A be constructed as n Lemma 1, the total number of cells probed by A n step t wth (L (t) x 1, L (t) x,..., L x (t) n ) s bounded. Let the n s matrx P t defned as P t(, j) := Pr[j L x (t) ], and let q := q(x ). Snce each L (t) x s dentcally dstrbuted as J x (t) of A whch s descrbed n Lemma 19, due to () and (3), t holds for the P t that P j [s] Pt(, j) 1 and max j [s] P t(, j) φ q. Due to Lemma 1, the expected number of bts collected by A n step t s bounded by b Pj [s] max 1 n P t(, j). By seeng the runnng nstance of the algorthm A wth the nput {x 1, x,..., x n} as the player A of the communcaton game, and the table T S,q as the black-box B wth prvate nput q, Lemma 14 s proved.

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to