Tight Bounds for Distributed Functional Monitoring

Size: px

Start display at page:

Download "Tight Bounds for Distributed Functional Monitoring"

Isabel Greer
6 years ago
Views:

1 Tight Bounds for Distributed Functiona Monitoring David P. Woodruff IBM Amaden Qin Zhang IBM Amaden Abstract We resove severa fundamenta questions in the area of distributed functiona monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008), and receiving recent attention. In this mode there are k sites each tracking their input streams and communicating with a centra coordinator. The coordinator s task is to continuousy maintain an approximate output to a function computed over the union of the k streams. The goa is to minimize the number of bits communicated. Let the p-th frequency moment be defined as F p = i f p i, where f i is the frequency of eement i. We show the randomized communication compexity of estimating the number of distinct eements (that is, F 0 ) up to a 1 + ε factor is Ω(k/ε 2 ), improving upon the previous Ω(k + 1/ε 2 ) bound and matching known upper bounds up to a ogarithmic factor. For F p, p > 1, we improve the previous Ω(k + 1/ε 2 ) bits communication bound to Ω(k p 1 /ε 2 ). We obtain simiar improvements for heavy hitters, empirica entropy, and other probems. Our ower bounds are the first of any kind in distributed functiona monitoring to depend on the product of k and 1/ε 2. Moreover, the ower bounds are for the static version of the distributed functiona monitoring mode where the coordinator ony needs to compute the function at the time when a k input streams end; surprisingy they amost match what is achievabe in the (dynamic version of) distributed functiona monitoring mode where the coordinator needs to keep track of the function continuousy at any time step. We aso show that we can estimate F p, for any p > 1, using Õ(kp 1 poy(ε 1 )) bits of communication. This drasticay improves upon the previous Õ(k2p+1 N 1 2/p poy(ε 1 )) bits bound of Cormode, Muthukrishnan, and Yi for genera p, and their Õ(k2 /ε + k 1.5 /ε 3 ) bits bound for p = 2. For p = 2, our bound resoves their main open question. Our ower bounds are based on new direct sum theorems for approximate majority, and yied improvements to cassica probems in the standard data stream mode. First, we improve the known ower bound for estimating F p, p > 2, in t passes from Ω(n 1 2/p /(ε 2/p t)) to Ω(n 1 2/p /(ε 4/p t)), giving the first bound that matches what we expect when p = 2 for any constant number of passes. Second, we give the first ower bound for estimating F 0 in t passes with Ω(1/(ε 2 t)) bits of space that does not use the hardness of the gap-hamming probem. 1 Introduction Recent appications in sensor networks and distributed systems have motivated the distributed functiona monitoring mode, initiated by Cormode, Muthukrishnan, and Yi [20]. In this mode there are k sites and a singe centra coordinator. Each site S i (i [k]) receives a stream of data A i (t) for timesteps t = 1, 2,..., and the coordinator wants to keep track of a function f that is defined over the mutiset union of the k data streams at each time t. For exampe, the function f coud be the number of distinct eements in the union Most of this work was done whie Qin Zhang was a postdoc in MADALGO (Center for Massive Data Agorithmics - a Center of the Danish Nationa Research Foundation), Aarhus University. 1

2 of the k streams. We assume that there is a two-way communication channe between each site and the coordinator so that the sites can communicate with the coordinator. The goa is to minimize the tota amount of communication between the sites and the coordinator so that the coordinator can approximatey maintain f(a 1 (t),..., A k (t)) at any time t. Minimizing the tota communication is motivated by power constraints in sensor networks, since communication typicay uses a power-hungry radio [25]; and aso by network bandwidth constraints in distributed systems. There is a arge body of work on monitoring probems in this mode, incuding maintaining a random sampe [21, 50], estimating frequency moments [18, 20], finding the heavy hitters [5, 42, 45, 54], approximating the quanties [19, 35, 54], and estimating the entropy [4]. We can think of the distributed functiona monitoring mode as foows. Each of the k sites hods an N-dimentiona vector where N is the size of the universe. An update to a coordinate j on site S i causes vj i to increase by 1. The goa is to estimate a statistic of v = k i=1 vi, such as the p-th frequency moment F p = v i v 1 og v 1 v i. v p p, the number of distinct eements F 0 = support(v), and the empirica entropy H = i This is the standard insertion-ony mode. For many of these probems, with the exception of the empirica entropy, there are strong ower bounds (e.g., Ω(N)) if aowing updates to coordinates that cause vj i to decrease [4]. The atter is caed the update mode. Thus, except for entropy, we foow previous work and consider the insertion-ony mode. To prove ower bounds, we consider the static version of the distributed functiona monitoring mode, where the coordinator ony needs to compute the function at the time when a k input streams end. It is cear that a ower bound for the static case is aso a ower bound for the dynamic case in which the coordinator has to keep track of the function at any point in time. The static version of the distributed functiona monitoring mode is cosey reated to the mutiparty number-in-hand communication mode, where we again have k sites each hoding an N-dimensiona vector v i, and they want to jointy compute a function defined on the k input vectors. It is easy to see that these two modes are essentiay the same since in the former, if site S i woud ike to send a message to S j, it can aways send the message first to the coordinator and then the coordinator can forward the message to S j. Doing this wi ony increase the tota amount of communication by a factor of two. Therefore, we do not distinguish between these two modes in this paper. There are two variants of the mutiparty number-in-hand communication mode we wi consider: the backboard mode, in which each message a site sends is received by a other sites, i.e., it is broadcast, and the message-passing mode, in which each message is between the coordinator and a specific site. Despite the arge body of work in the distributed functiona monitoring mode, the compexity of basic probems is not we understood. For exampe, for estimating F 0 up to a (1 + ε)-factor, the best upper bound is Õ(k/ε2 ) 1 [20] (a communication and information bounds in this paper, if not otherwise stated, are in terms of bits), whie the ony known ower bound is Ω(k + 1/ε 2 ). The dependence on ε in the ower bound is not very insightfu, as the Ω(1/ε 2 ) bound foows just by considering two sites [4, 16]. The rea question is whether the k and 1/ε 2 factors shoud mutipy. Even more embarrassingy, for the frequency moments F p, p > 2, the known agorithms use communication Õ(k2p+1 N 1 2/p poy(1/ε)), whie the ony known ower bound is Ω(k + 1/ε 2 ) [4, 16]. Even for p = 2, the best known upper bound is Õ(k2 /ε + k 1.5 /ε 3 ) [20], and the authors main open question in their paper is It remains to cose the gap in the F 2 case: can a better ower bound than Ω(k) be shown, or do there exist Õ(k poy(1/ε)) soutions? Our Resuts: We significanty improve the previous communication bounds for approximating the frequency moments, entropy, heavy hitters, and quanties in the distributed functiona monitoring mode. In many cases our bounds are optima. Our resuts are summarized in Tabe 1, where they are compared with previous bounds. We have three main resuts, each introducing a new technique: 1 We use Õ(f) to denote a function of the form f ogo(1) (Nk/ε). 2

3 Previous work This paper Previous work This paper Probem LB LB (a static) UB UB F 0 Ω(k) [20] Ω(k/ε 2 ) Õ(k/ε 2 ) [20] F 2 Ω(k) [20] Ω(k/ε 2 ) (BB) Õ(k 2 /ε + k 1.5 /ε 3 ) [20] Õ( k p poy(ε) ) F p (p > 1) Ω(k + 1/ε 2 ) [4, 16] Ω(k p 1 /ε 2 ) (BB) Õ( k 2p+1 N 1 2/p ) [20] Õ( kp 1 ε 1+2/p poy(ε) ) A-quantie Ω(min{ k ε, 1 }) [35] Ω(min{ k ε 2 ε, 1 }) (BB) Õ(min{ k ε 2 ε, 1 }) [35] ε 2 Heavy Hitters Ω(min{ k ε, 1 }) [35] Ω(min{ k ε 2 ε, 1 }) (BB) Õ(min{ k ε 2 ε, 1 }) [35] ε 2 Entropy Ω(1/ ε) [4] Ω(k/ε 2 ) (BB) Õ( k ) [4], ε Õ( k ) (static) [33] 3 ε 2 p (p (0, 2]) Ω(k/ε 2 ) (BB) Õ(k/ε 2 ) (static) [40] Tabe 1: UB denotes upper bound; LB denotes ower bound; BB denotes backboard mode. N denotes the universe size. A bounds are for randomized agorithms. We assume a bounds hod in the dynamic setting by defaut, and wi state expicity if they hod in the static setting. For ower bounds we assume the message-passing mode by defaut, and state expicity if they aso hod in the backboard mode. 1. We show that for estimating F 0 in the message-passing mode, Ω(k/ε 2 ) communication is required, matching an upper bound of [20] up to a poyogarithmic factor. Our ower bound hods in the static mode in which the k sites just need to approximate F 0 once on their inputs. 2. We show that we can estimate F p, for any p > 1, using Õ(kp 1 poy(ε 1 )) communication in the message-passing mode 2. This drasticay improves upon the previous bound Õ(k2p+1 N 1 2/p poy(ε 1 )) of [20]. In particuar, setting p = 2, we resove the main open question of [20]. 3. We show Ω(k p 1 /ε 2 ) communication is necessary for approximating F p (p > 1) in the backboard mode, significanty improving the prior Ω(k + 1/ε 2 ) bound. As with our ower bound for F 0, these are the first ower bounds which depend on the product of k and 1/ε. As with F 0, our ower bound hods in the static mode in which the sites just approximate F p once. Our other resuts in Tabe 1 are expained in the body of the paper, and use simiar techniques. We woud ike to mention that after the conference version of our paper, our resuts found appications in proving a space ower bound at each site for tracking heavy hitters in the functiona monitoring mode [36], and a communication compexity ower bound of computing ε-approximations of range spaces in R 2 in the message-passing mode [34]. Our Techniques: Lower Bound for F 0 : For iustration, suppose k = 1/ε 2. There are 1/ε 2 sites each hoding a random independent bit. Their task is to approximate the sum of the k bits up to an additive error 1/ε. Ca this probem k-approx-sum. 3 We show any correct protoco must revea Ω(1/ε 2 ) bits of information about the sites inputs. We compose this with 2-party disjointness (2-DISJ) [48], in which each party has a bitstring of ength 1/ε 2 and either the strings have disjoint support (the soution is 0) or there is a singe coordinate which is 1 in both strings (the soution is 1). Let τ be the hard distribution for 2-DISJ, shown to require Ω(1/ε 2 ) bits of communication to sove [48]. Suppose the coordinator and each site share an instance of 2-DISJ in which the soution to 2-DISJ is a random bit, which is the site s effective input to k-approx-sum. The coordinator has the same input for each of the 1/ε 2 instances, 2 We assume the tota number of updates is poy(n). 3 In the conference version of this paper we introduced a probem caed k-gap-maj, in which sites need to decide if at east 1/(2ε 2 ) + 1/ε of the bits are 1, or at most 1/(2ε 2 ) 1/ε of the bits are 1. We instead use k-approx-sum here since we fee it is easier to work with: This probem is stronger than k-gap-maj thus is easier to ower bound, and it suffices for our purpose. k-gap-maj wi be introduced and used in Section 6.1 for heavy-hitters and quanties. 3

4 whie the sites have an independent input drawn from τ conditioned on the coordinator s input and output bit determined by k-approx-sum. The inputs are chosen so that if the output of 2-DISJ is 1, then F 0 increases by 1, otherwise it remains the same. This is not entirey accurate, but it iustrates the main idea. Now, the key is that by the rectange property of k-party communication protocos, the 1/ε 2 different output bits are independent conditioned on the transcript. Thus if a protoco does not revea Ω(1/ε 2 ) bits of information about these output bits, by an anti-concentration theorem we can show that the protoco cannot succeed with arge probabiity. Finay, since a (1 + ε)-approximation to F 0 can decide k-approx-sum, and since any correct protoco for k-approx-sum must revea Ω(1/ε 2 ) bits of information, the protoco must sove Ω(1/ε 2 ) instances of 2-DISJ, each requiring Ω(1/ε 2 ) bits of communication (otherwise the coordinator coud simuate k 1 of the sites and obtain an o(1/ε 2 )- communication protoco for 2-DISJ with the remaining site, contradicting the communication ower bound for 2-DISJ on this distribution). We obtain an Ω(k/ε 2 ) bound for k 1/ε 2 by using simiar arguments. One cannot show this in the backboard mode since there is an Õ(k + 1/ε2 ) bound for F 0 4. Lower Bound for F p : Our Ω(k p 1 /ε 2 ) bound for F p cannot use the above reduction since we do not know how to turn a protoco for approximating F p into a protoco for soving the composition of k-approx- SUM and 2-DISJ. Instead, our starting point is a recent Ω(1/ε 2 ) ower bound for the 2-party gap-hamming distance probem GHD [16]. The parties have a ength-1/ε 2 bitstring, x and y, respectivey, and they must decide if the Hamming distance (x, y) > 1/(2ε 2 ) + 1/ε or (x, y) < 1/(2ε 2 ) 1/ε. A simpification by Sherstov [49] shows a reated probem caed 2-GAP-ORT aso has communication compexity of Ω(1/ε 2 ) bits. Here there are two parties, each with 1/ε 2 -ength bitstrings x and y, and they must decide if (x, y) 1/(2ε 2 ) > 2/ε or (x, y) 1/(2ε 2 ) < 1/ε. Chakrabarti et a. [15] showed that any correct protoco for 2-GAP-ORT must revea Ω(1/ε 2 ) bits of information about (x, y). By independence and the chain rue, this means for Ω(1/ε 2 ) indices i, Ω(1) bits of information is reveaed about (x i, y i ) conditioned on vaues (x j, y j ) for j < i. We now embed an independent copy of a variant of k-party-disjointness, the k-xor probem, on each of the 1/ε 2 coordinates of 2-GAP-ORT. In this variant, there are k parties each hoding a bitstring of ength k p. On a but one specia randomy chosen coordinate, there is a singe site assigned to the coordinate and that site uses private randomness to choose whether the vaue on the coordinate is 0 or 1 (with equa probabiity), and the remaining k 1 sites have 0 on this coordinate. On the specia coordinate, with probabiity 1/4 a sites have a 0 on this coordinate (a 00 instance), with probabiity 1/4 the first k/2 parties have a 1 on this coordinate and the remaining k/2 parties have a 0 (a 10 instance), with probabiity 1/4 the second k/2 parties have a 1 on this coordinate and the remaining k/2 parties have a 0 (a 01 instance), and with the remaining probabiity 1/4 a k parties have a 1 on this coordinate (a 11 instance). We show, via a direct sum for distributiona communication compexity, that any deterministic protoco that decides which case the specia coordinate is in with probabiity 1/4 + Ω(1) has conditiona information cost Ω(k p 1 ). This impies that any protoco that can decide whether the output is in the set {10, 01} (the XOR of the output bits) with probabiity 1/2+Ω(1) has conditiona information cost Ω(k p 1 ). We do the direct sum argument by conditioning the mutua information on ow-entropy random variabes which aow us to fi in inputs on remaining coordinates without any communication between the parties and without asymptoticay affecting our Ω(k p 1 ) ower bound. We design a reduction so that on the i-th coordinate of 2-GAP-ORT, the input of the first k/2-payers of k-xor is determined by the pubic coin (which we condition on) and the first party s input bit to 2-GAP-ORT, and the input of the second k/2-payers of k- XOR is determined by the pubic coin and the second party s input bit to 2-GAP-ORT. We show that any protoco that soves the composition of 2-GAP-ORT with 1/ε 2 copies of k-xor, a probem that we ca k- 4 The idea is to first obtain a 2-approximation. Then, sub-sampe so that there are Θ(1/ε 2 ) distinct eements. Then the first party broadcasts his distinct eements, the second party broadcasts the distinct eements he has that the first party does not, etc. 4

5 BTX, must revea Ω(1) bits of information about the two output bits of an Ω(1) fraction of the 1/ε 2 copies, and from our Ω(k p 1 ) information cost ower bound for a singe copy, we can obtain an overa Ω(k p 1 /ε 2 ) bound. Finay, one can show that a (1 + ε)-approximation agorithm for F p can be used to sove k-btx. Upper Bound for F p : We iustrate the agorithm for p = 2 and constant ε. Unike [20], we do not use AMS sketches [3]. A nice property of our protoco is that it is the first 1-way protoco (the protoco of [20] is not), in the sense that ony the sites send messages to the coordinator (the coordinator does not send any messages). Moreover, a messages are simpe: if a site receives an update to the j-th coordinate, provided the frequency of coordinate j in its stream exceeds a threshod, it decides with a certain probabiity to send j to the coordinator. Unfortunatey, one can show that this probabiity cannot be the same for a coordinates j, as otherwise the communication woud be too arge. To determine the threshod and probabiity to send an update to a coordinate j, the sites use the pubic coin to randomy group a coordinates j into buckets S, where S contains a 1/2 fraction of the input coordinates. For j S, the threshod and probabiity are ony a function of. Inspired by work on subsamping [37], we try to estimate the number of coordinates j of magnitude in the range [2 h, 2 h+1 ), for each h. Ca this cass of coordinates C h. If the contribution to F 2 from C h is significant, then C h 2 2h F 2, and to estimate C h we ony consider those j C h that are in S for a vaue which satisfies C h 2 2 2h F We do not know F 2 and so we aso do not know, but we can make a ogarithmic number of guesses. We note that the work [37] was avaiabe to the authors of [20] for severa years, but adapting it to the distributed framework here is tricky in the sense that the heavy hitters agorithm used in [37] for finding eements in different C h needs to be impemented in a k-party communication-efficient way. When choosing the threshod and probabiity we have two competing constraints; on the one hand these vaues must be chosen so that we can accuratey estimate the vaues C h from the sampes. On the other hand, these vaues need to be chosen so that the communication is not excessive. Baancing these two constraints forces us to use a threshod instead of just the same probabiity for a coordinates in S. By choosing the threshods and probabiities to be appropriate functions of, we can satisfy both constraints. Other minor issues in the anaysis arise from the fact that different casses contribute at different times, and that the coordinator must be correct at a times. These issues can be resoved by conditioning on a quantity reated to the protoco s correctness being accurate at a sma number of seected times in the stream, and then arguing that the quantity is non-decreasing and that this impies that it is correct at a times. Impications for the Data Stream Mode: In 2003, Indyk and Woodruff introduced the GHD probem [38], where a 1-round ower bound shorty foowed [52]. Ever since, it seemed the space compexity of estimating F 0 in a data stream with t > 1 passes hinged on whether GHD required Ω(1/ε 2 ) communication for t rounds, see, e.g., Question 10 in [2]. A furry [9, 10, 16, 49, 51] of recent work finay resoved the compexity of GHD. What our ower bound shows for F 0 is that this is not the ony way to prove the Ω(1/ε 2 ) space bound for mutipe passes for F 0. Indeed, we just needed to ook at Θ(1/ε 2 ) parties instead of 2 parties. Since we have an Ω(1/ε 4 ) communication ower bound for F 0 with Θ(1/ε 2 ) parties, this impies an Ω((1/ε 4 )/(t/ε 2 )) = Ω(1/(tε 2 )) bound for t-pass agorithms for approximating F 0. Arguaby our proof is simper than the recent GHD ower bounds. Our Ω(k p 1 /ε 2 ) bound for F p aso improves a ong ine of work on the space compexity of estimating F p for p > 2 in a data stream. The current best upper bound is Õ(N 1 2/p ε 2 ) bits of space [28]. See Figure 1 of [28] for a ist of papers which make progress on the ε and ogarithmic factors. The previous best ower bound is Ω(N 1 2/p ε 2/p /t) for t passes [7]. By setting k p = ε 2 N, we obtain that the tota communication is at east Ω(ε 2 2/p N 1 1/p /ε 2 ), and so the impied space ower bound for t-pass agorithms for F p in a 5

6 data stream is Ω(ε 2/p N 1 1/p /(tk)) = Ω(N 1 2/p /(ε 4/p t)). This gives the first bound that agrees with the tight Θ(1/ε 2 ) bound when p = 2 for any constant t. After our work, Ganguy [29] improved this for the specia case t = 1. That is, for 1-pass agorithms for estimating F p, p > 2, he shows a space ower bound of Ω(N 1 2/p /(ε 2 og n)). Other Reated Work: There are quite a few papers on mutiparty number-in-hand communication compexity, though they are not directy reevant for the probems studied in this paper. Aon et a. [3] and Bar-Yossef et a. [7] studied ower bounds for mutiparty set-disjointness, which has appications to p-th frequency moment estimation for p > 2 in the streaming mode. Their resuts were further improved in [14, 31, 39]. Chakrabarti et a. [12] studied random-partition communication ower bounds for mutiparty set-disjointness and pointer jumping, which have a number of appications in the random-order data stream mode. Other work incudes Chakrabarti et a. [13] for median seection, Magniez et a. [44] and Chakrabarti et a. [11] for streaming anguage recognition. Very few studies have been conducted in the message-passing mode. Duris and Roim [23] proved severa ower bounds in the message-passing mode, but ony for some simpe booean functions. Three reated but more restrictive private-message modes were studied by Ga and Gopaan [27], Ergün and Jowhari [24], and Guha and Huang [32]. The first two ony investigated deterministic protocos and the third was taiored for the random-order data stream mode. Recenty Phiips et a. [47] introduced a technique caed symmetrization for the number-in-hand communication mode. The idea is to try to find a symmetric hard distribution for the k payers. Then one reduces the k-payer probem to a 2-payer probem by assigning Aice the input of a random payer and Bob the inputs of the remaining k 1 payers. The answer to the k-payer probem gives the answer to the 2-payer probem. By symmetrization one can argue that if the communication ower bound for the resuting 2-payer probem is L, then the ower bound for the k-payer probem is Ω(kL). Whie symmetrization deveoped in [47] can be used to sove some probems for which other techniques are not known, such as bitwise AND/OR and graph connectivity, it has severa imitations. First, symmetrization requires a symmetric hard distribution, and for many probems (e.g., F p (p > 1) in this paper) this is not known or unikey to exist. Second, for many probems (e.g., F 0 in this paper), we need a direct-sum type of argument with certain combining functions (e.g., the majority (MAJ)), whie in [47], ony outputting a copies or with the combining function OR is considered. Third, the symmetrization technique in [47] does not give information cost bounds, and so it is difficut to use when composing probems as is done in this paper. In this paper, we have further deveoped symmetrization to make it work with the combining function MAJ and the information cost. Paper Outine: In Section 3 and Section 4 we prove our ower bounds for F 0 and F p, p > 1. The ower bounds appy to functiona monitoring, but hod even in the static mode. In Section 5 we show improved upper bounds for F p, p > 1, for functiona monitoring. Finay, in Section 6 we prove ower bounds for a-quantie, heavy hitters, entropy and p for any p 1 in the backboard mode. 2 Preiminaries In this section we review some basics on communication compexity and information theory. Information Theory We refer the reader to [22] for a comprehensive introduction to information theory. Here we review a few concepts and notations. Let H(X) denote the Shannon entropy of the random variabe X, and et H b (p) denote the binary entropy function when p [0, 1]. Let H(X Y ) denote conditiona entropy of X given Y. Let I(X; Y ) denote the mutua information between two random variabes X, Y. Let I(X; Y Z) denote the mutua 6

7 information between two random variabes X, Y conditioned on Z. The foowing is a summarization of the basic properties of entropy and mutua information that we need. Proposition 1 Let X, Y, Z, W be random variabes. 1. If X takes vaue in {1, 2,..., m}, then H(X) [0, og m]. 2. H(X) H(X Y ) and I(X; Y ) = H(X) H(X Y ) If X and Z are independent, then we have I(X; Y Z) I(X; Y ). Simiary, if X, Z are independent given W, then I(X; Y Z, W ) I(X; Y W ). 4. (Chain rue of mutua information) I(X, Y ; Z) = I(X; Z) + I(Y ; Z X). And in genera, for any random variabes X 1, X 2,..., X n, Y, I(X 1,..., X n ; Y ) = n i=1 I(X i; Y X 1,..., X i 1 ). Thus, I(X, Y ; Z W ) I(X; Z W ). 5. (Data processing inequaity) If X and Z are conditionay independent given Y, then I(X; Y Z, W ) I(X; Y W ). 6. (Fano s inequaity) Let X be a random variabe chosen from domain X according to distribution µ X, and Y be a random variabe chosen from domain Y according to distribution µ Y. For any reconstruction function g : Y X with error δ g, H b (δ g ) + δ g og( X 1) H(X Y ). 7. (The Maximum Likeihood Estimation principe) With the notations as in Fano s inequaity, if the (deterministic) reconstruction function is g(y) = x for the x that maximizes the conditiona probabiity µ X (x Y = y), then 1 δ g 1 2 H(X Y ). Ca this g the maximum ikeihood function. Communication compexity In the two-party randomized communication compexity mode (see e.g., [43]), we have two payers Aice and Bob. Aice is given x X and Bob is given y Y, and they want to jointy compute a function f(x, y) by exchanging messages according to a protoco Π. Let Π(x, y) denote the message transcript when Aice and Bob run protoco Π on input pair (x, y). We sometimes abuse notation by identifying the protoco and the corresponding random transcript, as ong as there is no confusion. The communication compexity of a protoco is defined as the maximum number of bits exchanged among a pairs of inputs. We say a protoco Π computes f with error probabiity δ (0 δ 1) if there exists a function g such that for a input pairs (x, y), Pr[g(Π(x, y)) f(x, y)] δ. The δ-error randomized communication compexity, denoted by R δ (f), is the cost of the minimum-communication randomized protoco that computes f with error probabiity δ. The (µ, δ)-distributiona communication compexity of f, denoted by Dµ(f), δ is the cost of the minimum-communication deterministic protoco that gives the correct answer for f on at east a 1 δ fraction of a input pairs, weighted by distribution µ. Yao [53] showed that 7

8 Lemma 1 (Yao s Lemma) R δ (f) max µ D δ µ(f). Thus, one way to prove a ower bound for randomized protocos is to find a hard distribution µ and ower bound D δ µ(f). This is caed Yao s Minimax Principe. We wi use the notion expected distributiona communication compexity ED δ µ(f), which was introduced in [47] (where it was written as E[D δ µ(f)], with a bit abuse of notation) and is defined to be the expected cost (rather than the worst case cost) of the deterministic protoco that gives the correct answer for f on at east 1 δ fraction of a inputs, where the expectation is taken over distribution µ. The definitions for two-party protocos can be easiy extended to the mutiparty setting, where we have k payers and the i-th payer is given an input x i X i. Again the k payers want to jointy compute a function f(x 1, x 2,..., x k ) by exchanging messages according to a protoco Π. Information compexity Information compexity was introduced in a series of papers incuding [7, 17]. We refer the reader to Bar-Yossef s Thesis [6]; see Chapter 6 for a detaied introduction. Here we briefy review the concepts of information cost and conditiona information cost for k-payer communication probems. A of them are defined in the backboard number-in-hand mode. Let µ be an input distribution on X 1 X 2... X k and et X be a random input chosen from µ. Let Π be a randomized protoco running on inputs in X 1 X 2... X k. The information cost of Π with respect to µ is I(X; Π). The information compexity of a probem f with respect to a distribution µ and error parameter δ (0 δ 1), denoted IC δ µ(f), is the minimum information cost of a δ-error protoco for f with respect to µ. We wi work in the pubic coin mode, in which a parties aso share a common source of randomness. We say a distribution λ partitions µ if conditioned on λ, µ is a product distribution. Let X be a random input chosen from µ and D be a random variabe chosen from λ. For a randomized protoco Π on X 1 X 2... X k, the conditiona information cost of Π with respect to the distribution µ on X 1 X 2... X k and a distribution λ partitioning µ is defined as I(X; Π D). The conditiona information compexity of a probem f with respect to a distribution µ, a distribution λ partitioning µ, and error parameter δ (0 δ 1), denoted IC δ µ(f λ), is the minimum information cost of a δ-error protoco for f with respect to µ and λ. The foowing proposition can be found in [7]. Proposition 2 For any distribution µ, distribution λ partitioning µ, and error parameter δ (0 δ 1), R δ (f) IC δ µ(f) IC δ µ(f λ). Statistica distance measures Given two probabiity distributions µ and ν over the same space X, the foowing statistica distance measures wi be used in this paper: 1. Tota variation distance: TV(µ, ν) def = max A X µ(a) ν(a). 2. Heinger distance: h(µ, ν) def ( µ(x) ) 2 = x X ν(x) 1 2 We have the foowing reation between tota variation distance and Heinger distance (cf. [6], Chapter 2). Proposition 3 h 2 (µ, ν) TV(µ, ν) h(µ, ν) 2 h 2 (µ, ν). The tota variation distance of transcripts on a pair of inputs is cosey reated to the error of a randomized protoco. The foowing proposition can be found in [6], Proposition 6.22 (the origina proposition is for the 2-party case, and generaizing it to the mutiparty case is straightforward). 8

9 Proposition 4 Let 0 < δ < 1/2, and Π be a δ-error randomized protoco for a function f : X 1... X k Z. Then, for every two inputs (x 1,..., x k ), (x 1,..., x k ) X 1... X k for which f(x 1,..., x k ) f(x 1,..., x k ), it hods that TV(Π x1,...,xk, Π x 1,...,x k ) > 1 2δ. Conventions. In the rest of the paper we ca a payer a site, as to be consistent with the distributed functiona monitoring mode. We denote [n] = {1,..., n}. Let be the XOR function. A ogarithms are base-2 uness noted otherwise. We say W is a (1 + ε)-approximation of W, 0 < ε < 1, if W W (1 + ε)w. 3 A Lower Bound for F 0 We introduce a probem caed k-approx-sum, and then compose it with 2-DISJ (studied, e.g., in [48]) to prove a ower bound for F 0. In this section we work in the message-passing mode. 3.1 The k-approx-sum Probem In the k-approx-sum f,τ probem, we have k sites S 1, S 2,..., S k and the coordinator. Let f : X Y {0, 1} be an arbitrary function, and et τ be an arbitrary distribution on X Y such that for (X, Y ) τ, f(x, Y ) = 1 with probabiity β, and 0 with probabiity 1 β, where β (c β /k β 1/c β for a sufficienty arge constant c β ) is a parameter. We define the input distribution µ for k-approx-sum f,τ on {X 1,..., X k, Y } X k Y as foows: We first sampe (X 1, Y ) τ, and then independenty sampe X 2,..., X k τ Y. Note that each pair (X i, Y ) is distributed according to τ. Let Z i = f(x i, Y ). Thus Z i s are i.i.d. Bernoui(β). Let Z = {Z 1, Z 2,..., Z k }. We assign X i to site S i for each i [k], and assign Y to the coordinator. In the k-approx-sum f,τ probem, the k sites want to approximate i [k] Z i up to an additive factor of βk. In the rest of this section, for convenience, we omit subscripts f, τ in k-approx-sum f,τ, since our resuts wi hod for a f, τ having the properties mentioned above. For a fixed transcript Π = π, et q π i = Pr[Z i = 1 Π = π]. Thus i [k] qπ i = E[ i [k] Z i Π = π]. Let c 0 be a sufficienty arge constant. Definition 1 Given an input (x 1,..., x k, y) and a transcript Π = π, et z i = f(x i, y) and z = {z 1,..., z k }. For convenience, we define Π(z) Π(x 1,..., x k, y). We say 1. π is bad 1 for z (denoted by z 1 π) if Π(z) = π, and for at east 0.1 fraction of {i [k] z i = 1}, it hods that qi π β/c 0, and 2. π is bad 0 for z (denoted by z 0 π) if Π(z) = π, and for at east 0.1 fraction of {i [k] z i = 0}, it hods that q π i β/c 0. And π is good for z otherwise. In this section, we wi prove the foowing theorem. Except stated expicity, a probabiities, expectations and variances are taken with respect to the input distribution µ. Theorem 1 Let Π be the transcript of any deterministic protoco for k-approx-sum on input distribution µ with error probabiity δ for some sufficienty sma constant δ, then Pr[Π is good]

10 The foowing observation, which easiy foows from the rectange property of communication protocos, is crucia to our proof. We have incuded a proof in Appendix A. Observation 1 Conditioned on Π, Z 1, Z 2,..., Z k are independent. Definition 2 We say a transcript π is rare + if i [k] qπ i 4βk and rare if i [k] qπ i βk/4. In both cases we say π is rare. Otherwise we say it is norma. Definition 3 We say Z = {Z 1, Z 2,..., Z k } is a joker + if i [k] Z i 2βk, and a joker if i [k] Z i βk/2. In both cases we say Z is a joker. Lemma 2 Under the assumption of Theorem 1, Pr[Π is norma] Proof: First, we can appy a Chernoff bound on random variabes Z 1,..., Z k, and get Pr[Z is a joker + ] = Pr Z i 2βk e βk/3. i [k] Second, by Observation 1, we can appy a Chernoff bound on random variabes Z 1,..., Z k conditioned on Π being rare +, Pr[Z is a joker + Π is rare + ] π = π = π Pr [ Π = π Π is rare +] Pr [ Z is a joker + Π = π, Π is rare +] Pr [ Π = π Π is rare +] Pr Z i 2βk i [k] Pr [ Π = π Π is rare +] ( 1 e βk/2) ( 1 e βk/2). Finay by Bayes theorem, we have that i [k] q π i 4βk, Π = π Pr[Π is rare + ] = Pr[Z is a joker+ ] Pr[Π is rare + Z is a joker + ] Pr[Z is a joker + Π is rare + ] e βk/3 1 e βk/2 2e βk/3. Simiary, we can aso show that Pr[Π is rare ] 2e βk/8. Therefore Pr[Π is rare] 4e βk/ (reca that by our assumption βk c β for a sufficienty arge constant c β ). Definition 4 Let c = 40c 0. We say a transcript π is weak if i [k] qπ i (1 qπ i ) βk/c, and strong otherwise. Lemma 3 Under the assumption of Theorem 1, Pr[Π is norma and strong]

11 Proof: We first show that for a norma and weak transcript π, there exists a constant δ = δ (c ) such that Pr Z i qi π + 2 βk i [k] i [k] Π = π δ, (1) and Pr Z i qi π + 4 βk i [k] i [k] Π = π δ. (2) The first inequaity is a simpe appication of Chernoff-Hoeffding bound. Reca that for a norma π, 4βk. We have i [k] qπ i Pr Z i qi π + 2 βk Π = π, Π is norma i [k] i [k] 1 Pr Z i qi π + 2 βk Π = π, Π is norma i [k] i [k] 1 e 8 βk 2 i [k] qπ i 1 e 2 δ. (for a sufficienty sma constant δ ) Now we prove for the second inequaity. We wi need the foowing anti-concentration resut which is an easy consequence of Feer [26] (cf. [46]). Fact 1 ([46]) Let Y be a sum of independent random variabes, each attaining vaues in [0, 1], and et σ = Var[Y ] 200. Then for a t [0, σ 2 /100], we have for a universa constant c > 0. For a norma and weak Π = π, it hods that Pr[Y E[Y ] + t] c e t2 /(3σ 2 ) Var Z i Π = π = Var [Z i Π = π] (by observation 1) i [k] i [k] = i [k] q π i (1 q π i ) βk/c. (by definition of a weak π) Reca that by our assumption, βk c β for a sufficienty arge constant c β, thus βk βk/(100c ) and βk/c Using Lemma 1, we have for a universa constant c, Pr Z i qi π + 4 βk Π = π, Π is weak i [k] i [k] c e (4 βk) 2 3βk/c c e 16c/3 δ. (for a sufficienty sma constant δ ) 11

12 By (1) and (2), it is easy to see that given that Π is norma, it cannot be weak with probabiity more than 0.01, since otherwise by Lemma 2 and the anaysis above, the error probabiity of the protoco wi be at east δ > δ, for an arbitrariy sma constant error δ, vioating the success guarantee of the emma. Therefore, Pr[Π is norma and strong] Pr[Π is norma] Pr[Π is strong Π is norma] Now we anayze the probabiity of Π being good. For a Z = z, et H 0 (z) = {i z i H 1 (z) = {i z i = 1}. We have the foowing two emmas. = 0} and Lemma 4 Under the assumption of Theorem 1, Pr[Π is bad 0 Π is norma and strong] Proof: Consider any Z = z. First, by the definition of a norma π, we have i:z i =0 qπ i i [k] qπ i 4βk. Therefore the number of i s such that z i = 0 and qi π > (1 β/c 0 ) is at most 4βk/(1 β/c 0 ) 8βk. Second, by the definition of a strong π, we have i:z i =0 qπ i (1 qπ i ) i [k] qπ i (1 qπ i ) βk/c. Therefore the number of i s such that z i = 0 and β/c 0 qi π βk/c (1 β/c 0 ) is at most β/c 0 (1 β/c 0 ) 0.05k (c = 40c 0 ). Aso note that if z is not joker, then H 0 (z) (k 2βk). Thus conditioned on a norma and strong π, as we as z is not a joker, the number of i s such that z i = 0 and qi π < β/c 0 is at east (k 2βk) 8βk 0.05k > 0.9k 0.9 H 0 (z), where we have used our assumption that β 1/c β for a sufficienty arge constant c β. We concude that Pr[Π is bad 0 Π is norma and strong] Pr[Z is a joker] 2e βk/ Lemma 5 Under the assumption of Theorem 1, Pr[Π is bad 1 Π is norma] Proof: have qi π Ca a π is bad 1 for a set T [k] (denoted by T 1 π), if for more than 0.1 fraction of i T, we β/c 0. Let χ(e) = 1 if E hods and χ(e) = 0 otherwise. We have = π Pr[Π is bad 1 Π is norma] Pr[Π = π Π is norma] z Pr[Z = z Π = π, Π is norma] χ(z 1 π) Pr[Z is a joker] + Pr[Π = π Π is norma] π Pr[Z = z Π = π, Π is norma] χ(h 1 (z) = T ) χ(t 1 π) (3) [βk/2,2βk] T [k]: T = Pr[Z is a joker] + π [βk/2,2βk] z Pr[Π = π Π is norma] T [k]: T = T 1 π i T q π i Π = π, Π is norma (4) 12

13 The ast inequaity hods since in (4), in the ast term, we count the probabiity of each possibe set T of size and is 1 to π that its eements are a 1, which upper bounds the corresponding summation in (3). Now for a fixed, conditioned on a norma π, we consider the term qi π. (5) T [k]: T = T 1 π i T W..o.g., we can assume that q1 π... qπ s > β/c 0 qs+1 π... qπ k for an s = κ sk (0 < κ s 1). We consider a pair (qu, π qv π ) (u, v [k]). Terms in the summation (5) that incudes either qu π or qv π can be written as qi π + qv π qi π + quq π v π qi π. q π u T [k]: T = T 1 π u T,v T i T \u T [k]: T = T 1 π v T,u T i T \v T [k]: T = T 1 π v T,u T i T \v,u By the symmetry of qu, π qv π, the sets {T \u T [k], T =, T 1 π, u T, v T } and {T \v T [k], T =, T 1 π, v T, u T } are the same. Using this fact and the AM-GM inequaity, it is easy to see that the sum wi not decrease if we set (qu) π = (qv π ) = (qu π + qv π )/2. Ca such an operation an equaization. We repeat appying such equaizations to any pair (qu, π qv π ), with the constraint that if u [1, s] and v [s + 1, k], then we ony average them to the extent that (qu) π = β/c 0, (qv π ) = qu π + qv π β/c 0 if qu π + qv π 2β/c 0, and (qv π ) = β/c 0, (qu) π = qu π + qv π β/c 0 otherwise. We introduce this constraint because we do not want to change {i (qi π) β/c 0 }, since otherwise a set T which was originay 1 Π can be 1 Π after these equaizations. We cannot further appy equaizations when one of the foowings happen. (q π 1 ) =... = (q π s ) > β/c 0 = (q π s+1) =... = (q π k ). (6) (q π 1 ) =... = (q π s ) = β/c 0 (q π s+1) =... = (q π k ). (7) We note that actuay (7) cannot happen since i [k] (qπ i ) = i [k] qπ i is preserved during equaizations, and conditioned on a norma π, we have i [k] qπ i βk/4 > βk/c 0. Let q = (q1 π) =... = (qs π ). For a norma π, it hods that i [k] (qπ i ) = s q + (k s) β/c 0 = r [βk/4, 4βk]. Let α (0.1, 1]. Reca that [βk/2, 2βk], and we have set s = κ s k. We try to upper bound (5) using (6). (( ) ( )) ( ( ) k s qi π s β α ( ) ) r (k s)β (1 α). (8) α (1 α) s c 0 s T [k]: T = T 1 π i T c 0 ( (e(1 ) κs )k α ( ) ) ( eκs k (1 α) ( ) β α ( ) ) r (1 α) α (1 α) c 0 κ s k ( e βk ) α ( ) er (1 α) αc 0 (1 α) ( ) 8e (c 0 ) α α α (1 α) 1 α 13

14 ( ) 8e βk/2 (c 0 ) 0.1 (1/e) 2/e (9) In (8), the first term is the number of possibe choices of the set T (T = ) with α fraction of items in [s + 1, ], and the rest in [1, s]. And the second term upper bounds i T qπ i according to the discussion above. Here we have assumed α < 1, otherwise if α = 1, then (8) ( k ) (β/c0 ) (2e/c 0 ) βk/2, which is smaer than (9). Now, (4) can be upper bounded by 2e βk/8 + π ( = 2e βk/8 + 2βk ( Pr[Π = π Π is norma] 2βk ) βk/2 8e (c 0 ) 0.1 (1/e) 2/e (for a sufficienty arge constant c 0 ) 8e (c 0 ) 0.1 (1/e) 2/e ) βk/2 Finay, combining Lemma 3, Lemma 4 and Lemma 5, we get Pr[Π is good] Pr[Π is good, norma and strong] = Pr[Π is norma and strong](1 Pr[Π is bad 0 Π is norma and strong] 3.2 The 2-DISJ Probem Pr[Π is bad 1 Π is norma and strong]) Pr[Π is norma and strong](1 Pr[Π is bad 0 Π is norma and strong]) Pr[Π is norma] Pr[Π is bad 1 Π is norma] 0.98 (1 0.01) In 2-DISJ probem, Aice has a set x [n] and Bob has a set y [n]. Their goa is to output 1 if x y, and 0 otherwise. We define the input distribution τ β as foows. Let = (n + 1)/4. With probabiity β, x and y are random subsets of [n] such that x = y = and x y = 1. And with probabiity 1 β, x and y are random subsets of [n] such that x = y = and x y =. Razborov [48] proved that for β = 1/4, Dτ 1/(400) 1/4 (2-DISJ) = Ω(n). It is easy to extend this resut to genera β and the average-case compexity. Theorem 2 ([47], Lemma 2.2) For any β 1/4, it hods that ED β/100 τ β (2-DISJ) = Ω(n), where the expectation is taken over the input distribution τ β. In the rest of the section, we simpy write τ β as τ. 3.3 The Compexity of F Connecting F 0 and k-approx-sum 2-DISJ,τ Set β = 1/(kε 2 ), B = 20000/δ, where δ is the sma constant error parameter for k-approx-sum in Theorem 1. 14

15 We choose f to be 2-DISJ with universe size n = B/ε 2, set its input distribution to be τ, and work on k-approx-sum 2-DISJ,τ. Let µ be the input distribution of k-approx-sum 2-DISJ,τ, which is a function of τ (see Section 3.1 for the detaied construction of µ from τ). Let {X 1,..., X k, Y } µ. Let Z i = 2-DISJ(X i, Y ). Let ζ be the induced distribution of µ on {X 1,..., X k } which we choose to be the input distribution for F 0. In the rest of this section, for convenience, we wi omit the subscripts 2-DISJ and τ in k-approx-sum 2-DISJ,τ when there is no confusion. Let N = i [k] Z i = i [k] 2-DISJ(X i, Y ). Let R = F 0 ( i [k] X i Y ). The foowing emma shows that R wi concentrate around its expectation E[R], which can be cacuated exacty. Lemma 6 With probabiity at east (1 6500/B), we have R E[R] 1/(10ε), where E[R] = (1 λ)n for some fixed constant 0 λ 4/B. Proof: We can think of our probem as a bin-ba game: Think each pair (X i, Y ) such that 2-DISJ(X i, Y ) = 1 are bas (thus we have N bas), and eements in the set Y are bins. Let = Y. We throw each of the N bas into one of the bins uniformy at random. Our goa is to estimate the number of non-empty bins at the end of the process. By a Chernoff bound, with probabiity ( 1 e βk/3) (1 100/B), N 2βk = 2/ε 2. By Fact 1 and Lemma 1 in [41], we have E[R] = ( 1 (1 1/) N) and Var[R] < 4N 2 /. Thus by Chebyshev s inequaity we have Pr[ R E[R] > 1/(10ε)] Var[R] 1/(100ε 2 ) 6400 B. Let θ = N/ 8/B. We can write ( E[R] = 1 e θ) + O(1) = θ (1 ) θ2 + θ2! 3! θ3 4! + + O(1). This series converges and thus we can write E[R] = (1 λ)θ = (1 λ)n for some fixed constant 0 λ θ/2 4/B. The next emma shows that we can use a protoco for F 0 to sove k-approx-sum with good properties. Lemma 7 Any protoco P that computes a (1 + γε)-approximation to F 0 (for a sufficienty sma constant γ) on input distribution ζ with error probabiity δ/2 can be used to compute k-approx-sum 2-DISJ,τ on input distribution µ with error probabiity δ. Proof: Given an input {X 1,..., X k, Y } µ for k-approx-sum. The k sites and the coordinator use P to compute W which is a (1 + γε)-approximation to F 0 (X 1,..., X k ), and then determine the answer to k-approx-sum to be W (n ). 1 λ Reca that 0 λ 4/B is some fixed constant, n = B/ε 2 and = (n + 1)/4. Correctness. Given a random input (X 1,..., X k, Y ) ζ, the exact vaue of W = F 0 (X 1,..., X k ) can be written as the sum of two components. W = Q + R, (10) 15

16 where Q counts F 0 ( i [k] X i \Y ), and R counts F 0 ( i [k] X i Y ). First, from our construction it is easy to see by a Chernoff bound and the union bound that with probabiity ( 1 1/ε 2 e Ω(k)) 1 100/B, we have Q = {[n] Y } = n, since each eement in S\Y wi be chosen by every X i (i = 1, 2,..., k) with a probabiity at east 1/4. Second, by Lemma 6 we know that with probabiity (1 6500/B), R is within 1/(10ε) from its mean (1 λ)n for some fixed constant 0 λ 4/B. Thus with probabiity (1 6600/B), we can write Equation (10) as W = (n ) + (1 λ)n + κ 1, (11) for a vaue κ 1 1/(10ε) and N 2/ε 2. Set γ = 1/(20B). Since F 0 (X 1, X 2,..., X k ) computes a vaue W which is a (1 + γε)-approximation of W, we can substitute W with W in Equation (11), resuting in the foowing. where κ 1 1/(10ε), N 2/ε 2, and Now we have W = (n ) + (1 λ)n + κ 1 + κ 2, (12) κ 2 γε W = γε ((n ) + (1 λ)n + κ 1 ) γε (B/ε 2 + 2/ε 2 + 1/(10ε)) 1/(10ε). N = ( W (n ) κ 1 κ 2 )/(1 λ) = ( W (n ))/(1 λ) + κ 3, where κ 3 (1/(10ε) + 1/(10ε))/(1 4/B) 1/(4ε). Therefore ( W (n ))/(1 λ) approximates N = i [k] Z i correcty up to an additive error 1/(4ε) < βk = 1/ε, thus computes k-approx-sum correcty. The tota error probabiity of this simuation is at most (δ/2+6600/b), where the first term counts the error probabiity of P and the second term counts the error probabiity introduced by the reduction. This is ess than δ if we choose B = 20000/δ An Embedding Argument Lemma 8 Suppose that there exists a deterministic protoco P which computes (1 + γε)-approximate F 0 (for a sufficienty sma constant γ) on input distribution ζ with error probabiity δ/2 (for a sufficienty sma constant δ) and communication o(c), then there exists a deterministic protoco P that computes 2-DISJ on input distribution τ with error probabiity β/100 and expected communication compexity o(og(1/β) C/k), where the expectation is taken over the input distribution τ. Proof: In 2-DISJ, Aice hods X and Bob hods Y such that (X, Y ) τ. We show that Aice and Bob can use the deterministic protoco P to construct a deterministic protoco P for 2-DISJ(X, Y ) with desired error probabiity and communication compexity. Aice and Bob first use P to construct a protoco P. During the construction they wi use pubic and private randomness which wi be fixed at the end. P consists of two phases. 16

17 Input reduction phase. Aice and Bob construct an input for F 0 using X and Y as foows: They pick a random site S I (I [k]) using pubic randomness. Aice assigns S I with input X I = X, and Bob constructs inputs for the rest (k 1) sites using Y. For each i [k]\i, Bob sampes an X i according to τ Y using independent private randomness and assigns it to S i. Let Z i = 2-DISJ(X i, Y ). Note that {X 1,..., X k, Y } µ and {X 1,..., X k } ζ. Simuation phase. Aice simuates S I and Bob simuates the rest (k 1) sites, and they run protoco P on {X 1,..., X k } ζ to compute F 0 (X 1,..., X k ) up to a (1 + γε)-approximation for a sufficienty sma constant γ and error probabiity δ/2. Let π be the protoco transcript, and et W be the output. By Lemma 7, we can use W to compute k-approx-sum with error probabiity δ. And then by Theorem 1, for 0.96 fraction of Z = z over the input distribution µ and π = Π(z), it hods that for 0.9 fraction of {i [k] z i = 0}, qi π < β/c 0, and 0.9 fraction of {i [k] z i = 1}, qi π > β/c 0. Now P outputs 1 if qi π > β/c 0, and 0 otherwise. Since S I is chosen randomy among the k sites, and the inputs for the k sites are identicay distributed, P computes Z I = 2-DISJ(X, Y ) on input distribution τ correcty with probabiity We now describe the fina protoco P: Aice and Bob repeat P independenty for c R og(1/β) times for a arge enough constant c R. At the j-th repetition, in the input reduction phase, they choose a random permutation σ j of [n] using pubic randomness, and appy it to each eement in X 1,..., X k before assigning them to the k sites. After running P for c R og(1/β) times, P outputs the majority of the outcomes. Since Z I = 2-DISJ(X, Y) is fixed at each repetition, the inputs {X 1,..., X k } at each repetition have a sma dependence, but conditioned on Z I, they are a independent. Let µ to be input distribution of {X 1,..., X k, Y } conditioned on Z I = b. Let ζ be the induced distribution of µ on {X 1,..., X k }. The successfu probabiity of a run of P on ζ is at east 0.8 TV(ζ, ζ ), where TV(ζ, ζ ) is the tota variation distance between distributions ζ, ζ, which is at most max{tv(binomia(k, β), Binomia(k 1, β)), TV(Binomia(k, β), Binomia(k 1, β) + 1)}, and can be bounded by O(1/ βk) = O(ε) (see, e.g., Fact 2.4 of [30]). Since conditioned on Z I, the inputs at each repetition are independent, and the success probabiity of each run of P is at east 0.7, by a Chernoff bound over the c R og(1/β) repetitions for a sufficienty arge c R, we concude that P succeeds with error probabiity β/1600. We next consider the communication compexity. At each run of P, et CC(S I, S I ) be the expected communication cost between the site S I and the rest payers (more precisey, between S I and the coordinator, since in the coordinator mode a sites ony tak to the coordinator, whose initia input is ), where the expectation is taken over the input distribution ζ and the choice of the random I [k]. Since conditioned on Y, a X i (i [k]) are independent and identicay distributed, if we take a random site S I, the expected communication between S I and the coordinator shoud be equa to the tota communication divided by a factor of k. Thus we have CC(S I, S I ) = o(c/k). Finay, by the inearity of expectation, the expected tota communication cost of the O(og(1/β)) runs of P is o(og(1/β) C/k). At the end we fix a the randomness used in construction of protoco P. We first use two Markov inequaities to fix a pubic randomness such that P succeeds with error probabiity β/400, and the expected tota communication cost of the o(og(1/β)c/k), where both the error probabiity and the cost expectation are taken over the input distribution µ and Bob s private randomness. We next use another two Markov inequaities to fix Bob s private randomness such that P succeeds with error probabiity β/100, and the expected tota communication cost of the o(og(1/β)c/k), where both the error probabiity and the cost expectation are taken over the input distribution µ. The foowing theorem is a direct consequence of Lemma 8, Theorem 2 for 2-DISJ and Lemma 1 (Yao s 17

Tight Bounds for Distributed Functional Monitoring

Tight Bounds for Distributed Functional Monitoring Qin Zhang MADALGO, Aarhus University Joint with David Woodruff, IBM Almaden NII Shonan meeting, Japan Jan. 2012 1-1 The distributed streaming model (a.k.a.